Unit 3- Bivariate Data Flashcards
What is bivariate data and what associations do we identify and describe?
Bivariate data is data that has two variables and is therefore collected in pairs.
We focus on: associations between two numerical values or categorical values.
What are explanatory vs response variables.
- Explanatory variable: these are independent and are used to explain or predict a difference in the response variable.
- Response variable: these are dependant and is explained by the explanatory variable.
What are scatterplots and their purpose?
Scatterplots display two variables with scattered data points. This allows us to identify if there is a clear association.
Associations:
- directions are positive or negative
- form is linear or non-linear
- strength is weak, moderate, strong.
Explain Pearson’s correlation coefficient.
This is (r).
- shows the strength of a relationship.
- only valid for linear relations.
-ve or +ve shows the direction.
- closer to 1 or -1 is perfectly strong.
What are the strength categories of Pearson’s correlation coefficient.
(for +ve or -ve)
0.75-1 strong
0.5-0.75 moderate
0.25-0.5 weak
less than 0.25 no association
Explain the coefficient of determination.
This is (R^2)
- the proportion of the total variation that can be explained by the linear relationship.
- e.g. R^2 value of 0.94 means 94% of variation is explained.
Explain least-squares regression line.
(Line of best fit.)
given by y=a+bx
y= response variable
a= y int
b= gradient
x= explanatory variable
to find b, b=r x (s(y)/s(x))
r= correlation coefficient
s(y)= sample standard deviation of y vals
s(x)= sample standard deviation of x vals
to find a, a= y(mean)-b(mean) times x(mean)
y= mean of y vals
b= slope
x= mean of x vals
Explain residual plots.
These verify if it is appropriate to fit a linear model to bivariate data.
- this is a graph of the residuals against the explanatory variable.
- a good residual plot is when the dots are evenly spread and no pattern is evident.
Explain Association vs Causation.
Correlation does not imply causation.
- just because two variables might display an association, it does not guarantee a cause-and-effect relationship.
- both variables may be responding to a third variable
- or this is simply a coincidence.
What are the equations for a residual Plot?
(residual is explanatory on x axis and residuals on y axis)
First find predicted y values: Use regression formula and sub the actual x values into the x in the equation and find the y.
Residuals: actual y value - predicted y value
What is an extrapolation vs interpolation?
Extrapolation is outside the dataset. Interpolation is inside of this.
Try to do both in exam. If it is inside dataset use your graph and look at where a value would be. Otherwise use the regression eqn and sub this in.
If it says use graph, then use a diagram to find a value. Otherwise just use algebra. Its good to do both though.
How do we see our reasonability?
Use extrapolation and interpolation and see if the graph matches the algebra.
How do you calculate r and R squared on the scientific calculator?
Mode- [2] stat- [2] A+BX (insert data into table)
[AC]- [1] shift- [3] r
to find R^2 just square this value