Lecture 20: Correlation and simple regression Flashcards

Question

waar kijk je naar als je hypothesis testing wilt doen

Answer 1

dan kijk je standardized en de t value

Answer 2

R = correlation between each model's prediction

Answer 3

R^2 = variance explained by the model (=explained variance, altijd 0 voor H0)

Answer 4

takes into account all the predictors that you added

Answer 5

because we want parsimony! we want to punish too complex models, we want our results to have theoretical implications.

Answer 6

correlation (predicted - observed) ^2

Answer 7

are the same thing!

Answer 8

one predictor variable

Answer 9

1. always plot your data 2. total variability = predicted variability + error. more explained than unexplained is success! 3. keep models simple! we want as few predictors as possible

Answer 10

omdat we het willen vertalen naar psychologische theorieën.

Answer 11

pearson product-moment correlation coefficient PPMCC

Answer 12

altijd standardized, tussen -1 en +1

Answer 13

op de covariance: how much variation do the variables share?

Answer 14

cov xy/sxsy

Answer 15

niet gestandardized.

Answer 16

In statistics, the Pearson correlation coefficient, also referred to as the Pearson’s r, Pearson product-moment correlation coefficient (PPMCC) or bivariate correlation, is a measure of the linear correlation between two variables X and Y. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. It is widely used in the sciences. It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s.

Answer 17

plotten: (xi - x̄)(yi - ȳ) dus de x per datapunt en de y per data punt min het gemiddelde, keer elkaar dan krijg je die grafiek met puntjes en lijnen die steeds naar de grote lijnen gaan (zie schrift)

Answer 18

z score berekenen voor x en y: z = (xi - x̄)/sdx z = (yi - ȳ)/sdy

Answer 19

distance between each data point and the mean

Answer 20

plot maken: kijken naar het verschil. als het een perfecte correlatie is: zouden alle datapunten overlappen. dan is de explained variance heel groot. waar het niet overlapt is er unexplained variance.

Answer 21

have a sd of 1!!! dus als we de covariance zouden berekenen is dat gewoon de covariance (want je deelt dan door 1)

Answer 22

1. z scores 2. covariance formule: rxy = covxy/sxsy leiden alletwee tot hetzelfde!

Answer 23

values in red contribute to a negative correlation, and values in green contribute to a positive correlation. veel rood = sterk negatieve r veel groen = sterk positieve r

Answer 24

een t distribution

Answer 25

r * sqrt(N-2) / sqrt(1-r^2)

Answer 26

multiple regression

Answer 27

partial correlation: we control for the presence of a thrid variable = rxy*z

Answer 28

kijken naar de correlatie tussen x en y, whilst controlling for z

Answer 29

de covariance wel, de correlatie niet

Answer 30

outcome = prediction + error

Answer 31

y = B0 + B1 * x + ei

Answer 32

intercept (grant mean)

Answer 33

hoe hoger, hoe sterker de relatie

Answer 34

whether B1 is significantly high enough.

Answer 35

correlation gaat om: je wil weten of er een associatie is tussen de variabelen regression: je wil voorspellen hoe een variabele een andere kan beïnvloeden dus een regressie is een voorspelling!

Answer 36

- Sensitivity - Homoscedasticity

Answer 37

- extreme residuals: error high for this variable - cooks distance (>1) - Q-Q plots of residual plots

Answer 38

kijkt naar de impact op de mean, hoe grotere impact op de mean, hoe meer de kans dat het een outlier is en dus even naar kijken

Answer 39

een outlier can really affect your results! dus outliers warrant a followup. however, if your significance depends on one single outlier, maybe your conclusions werent so strong to begin with. het is vooral belangrijk bij kleinere samples

Answer 40

soort van lineaire levene's test. the variance of residuals should be equal across all expected values. you rprediction error should not differ for levels of the prediction values, want dan dan is er systematische error!

Answer 41

Look at scatterplot of standardized: predicted values*residuals. Roughly round shape is needed je wil een soort cloud in de plot (rondje met allemaal stipjes).

Answer 42

na de analysen, omdat het gaat over de residuals

Answer 43

gewoon lekker voorspellinkjes maken en leuk dingen uitrekenen

Answer 44

rxy * sy/sx

Answer 45

has the mnimal distance between all the points and the line.

Answer 46

every unit increase in IV lead to an ... increase of DV

Answer 47

ȳ - b1 * x̄

Answer 48

dv (met dakje) = b0 + b1 * IV

Answer 49

prediction error!!! (=residuals)

Answer 50

The fit of the model can be viewed in terms of the correlation (r) between the predictions and the observed values: if the predictions are perfect, the correlation will be 1.

Answer 51

de explained variance

Answer 52

For simple regression, this is equal to the correlation between x and y. For multiple regression (next lecture), these will differ.

Answer 53

correlation between predicted and observed, ^2

Answer 54

echt gedaan anders mag je niet klikken

Answer 55

compare model to the mean: F = (n-p-1)*r^2 . p(1-r^2)

Answer 56

df = n-p-1, of N - k - 1

Answer 57

signal/noise

Answer 58

1. t statistic 2. f value

Answer 59

1. standardized correlation r: -1 en +1 2. covariance between x and y, not standardized 3. regression coefficient in linear regression (standardized but not bounded, generalizes easily to settings with multiple predictions) 4. t statistic: standardized difference between b1 and 0. 5. overall model performance: f statistic of squared correlation to get the proportion of explained variance

Answer 60

de r verandert NIET de covariance en de slope veranderen WEL

Answer 61

standardized but not bounded, generalises easily to settings with multiple predictors

Lecture 20: Correlation and simple regression Flashcards

(89 cards)