linear regression and correlation Flashcards
What do we mean when we talk about bivariate data
Data where there are two variables.
The two variables can be either categorical, or numerical.
This session we are dealing with continuous bivariate data i.e. both variables are continuous
when do you use correlation
There is no distinction between the two variables. No causation is implied, simply association:
when do you use regression
One variable y is a response to another variable x. You could use the value of x to predict what y would be:
Properties of Pearson’s correlation coefficient (r)
r must be between -1 and +1
+1 -perfect positive linear association
-1 = perfect negative linear association
0 = no linear relation at all
Assumptions for hypothesis test and confidence intervals for p(coefficent)
Both variables are plausibly Normally distributed.
There is a linear relationship between them.
The null hypothesis is that there is no association between them.
Check assumptions with a scatter diagram of the data
Should display a roughly elliptical pattern.
what is the equation for Estimating the best fitting line
y=a+bx
y-dependent varaib;e
a-incept- start of the line- where it meets the horizontal axis
b-slope
x-independent V
what is Multiple linear regression
Sometimes there is more than one possible explanatory variable influencing the outcome variable.
Multiple linear regression can be used to investigate the influence of several explanatory variables simultaneously on the outcome.
Why carry out a multiple regression analysis?
To identify any explanatory variables that may be associated with the y variable.
To investigate the extent to which one or more explanatory variables are linearly related to the y variable after adjusting for other variables that may be related to it.
To predict the value of the y variable from the explanatory x variables.
Multiple regression equation
Suppose we are interested in the effect of p explanatory variables, x1, x2,…, xp, on the outcome variable y.
The estimated multiple regression equation would be:
y = b0+ b1x1 + b2x2 + … + bpxp
Where xp is the pth explanatory variable.
y is the predicted value of the outcome given a particular set of values of x1, x2,…, xp;
b0, is the estimated intercept and is a constant term and is the value of y when all the xp’s are zero.
The bp’s are the estimated regression coefficients.
That is b1 represents the amount by which y increases on average if we increase x1 by one unit, but keep all the other xp’s constant (or adjust or control for them).
basically is we are looking to predicted birthweight for a baby girl of 30 weeks gestation born with a normal delivery to a mother aged 40?
the equation would be
b0+b1(age in this case30)+ bs (gestation) +b3(sex)+b4(delivery)
each b would dif have a value eg b1- 1.3 which would be xby what you are looking for so 1.3x30 would be the b1
what is correlation
Correlation is used to denote association between two quantitative variables. The degree of association is estimated using the correlation coefficient. It measures the level of linear association between the two variables.
what is regression
Regression quantifies the relationship between two quantitative variables. It involves estimating the best straight line with which to summarise the association. The relationship is represented by an equation, the regression equation. It is useful when we want to describe the relationship between the variables, or even predict a value of one variable for a given value of the other