correlation and regression Flashcards
what may be related to each other? - give an example
- two datasets may be related
e.g., height, weight
when can you see the relationship of the datasets?
- when you look at them on a graph
what were the first statistics invented for?
- for analysing co- relationships
when is there probably a mistake in data?
- if your data shows a perfect straight line
- if there’s more than one datapoints a long way away from all the others
when might data be worth checking for mistakes?
- if there’s no relationship at all between things you really expect to be related
what is the definition of correlation?
- finds the best fit line by minimising the difference between the data and line
what does a correlation report about a relationship?
- strength and direction of a relationship
what is a residual?
- difference between an observed value and a predicted value in regression analysis
what is a zero correlation?
- no relationship between the variables
- cluster of data points
what is a positive correlation?
- relationship between two variables that tend to move in the same direction
what is a negative correlation
- two individual variables generally move in opposite directions
what would you do to get the line of best fit?
- could try adjust the line manually but wouldn’t be the best fit
- need to use maths instead
what equation allows you to work out the line of best fit?
r = Sxy/ Sx.Sy
what does Sxy stand for?
- how much x and y change together
what is Sx. Sy?
- how much x and y change separately
what is the equation to work out r?
n/i = 1 (xi-x)(yi-y) / square root of n/i= 1 (xi-x)^2 square root of n/i= 1 (yi- y) ^2
what two aspects does a R value tell you?
- direction
- strength
what value is r when the correlation is positive?
- if r is above 0
1 > r > 0
what value is r when the correlation is negative?
- r is below zero
-1 < r < 0
what is the value of r when the correlation is strong?
- if r is close to one
r +/- 1
what is the r value when the correlation is weak?
- r is close to zero
r- 0
when are r values especially useful?
- useful for values in the middle e.g., - 0.4 to 0.4
what does the r-squared value tell you?
- how much of the variance is explained by your correlation
what is the r- squared value when correlation explains a lot of variance?
- if r2 is close to one
r2-1
what is the r- squared value when correlation explains only a little variance?
- if r2 is close to zero
r2- 0
what other name is r-squared given?
- coefficient of determination
what is 1-r2?
- amount of variance not explained
- random noise
what is regression?
- gives your the strengths, directions and equations of relationships
what is the regression equation?
y = mx + c
what is m in the equation?
- slope
what is c in the equation?
- intercept
what happens when x= 0 ?
y= intercept
what happens to y-axis when x increases by 1?
- y increases by the slope
what do both correlation and regression involve?
- both involve linear relationships between one or more input (predictor) variables and a single output (outcome) variable
what data can both correlation and regression deal with?
- categorical, ordinal, and non- linear predictors
what relationship does correlation describe?
- single relationship
what relationship does regression describe?
- multiple relationships
what is the difference between X and Y in correlation compared to regression?
- in correlation, X and Y are inter- changeable whereas X and Y are not inter- changeable in regression
do correlation and regression allow prediction?
- correlation doesn’t allow prediction
- regression allows prediction
what symbols are used for correlations?
- R and r2
what symbols are used for regression?
- R
- R2
- F
- t
- SE
- B1-n
what does jamovi allow us to explore? what do you use?
- multiple relationships in one go
- use a correlation matrix
what does correlation matrix include? what do we calculate?
- includes all information we need but we must calculate df ourselves
how do you calculate df?
df = n - 2
how do you calculate correlations?
r([df])= [Pearson’s r], p = [p-value]
what is overall regression?
- r2= [(r2) value]
what is the model fit of regression?
F ([df1], [df2]) = [F-value], p= [p-value]
what is multiple linear regression?
- single outcome variable (y) but multiple predictor variables (x1, x2)
what do you find in multiple linear regression?
- find the best- fitting surface
where are residuals in multiple linear regression?
- residuals are distance from the surface
what can the predictors be in multiple linear regression?
- predictors can be almost anything:
continuous, ordinal, discrete
normally- distributed or not
linear or non- linear
what is multiple linear regression said to be?
- flexible
e.g., ChatGPT, fMRI, COVID, elections
what does each predictor result in?
- result in an estimate, a standard error, a t- score and a p- value
what is the problem with correlation and regression?
- extrapolation
what does non- linear relationships cause?
- causes problems
what are the solutions to the problems?
- look at the data
- check for mistakes
- perhaps transform the data: quadratic, cubic, logarithmic
does correlation equal causation?
- no