Week 18 Flashcards
Correlation tells us about the strength of association between variables but… ?
Limitations of correlation
We cannot make statements about cause and effect from correlation
Correlation tells us about linear relationships between variables but…?
Limitations of correlation
Many variables can be strongly related but the nature of this relationship is nonlinear
Limitations of correlation : Correlation and significant are not the same thing?
Correlation and significance are not the same thing. A sufficiently large sample/study could find small correlation to be significant (when in reality it is probably not)
When is there a correlation?
What are the two types?
When there is a strong relationship between two variables
Linear, curvilinear
What is correlation measured by?
Using a correlation coefficient.
“P” (rho) is used to describe population data correlations and “r” is used when describing sample data
Interpreting the size of the correlation coefficient is almost always…?
Subjective and dependent on context
What is the most common measure of correlation?
Pearson Correlation Coefficient
(Equation on L8 BIOL143 slide 6)
What is a model?
A cartoon (simplification) of reality.
It can be useful to simplify things so that you can focus on interesting/important details while removing some of the complexity in real systems
One type of exploratory data analysis tool we could use is: Empirical modelling. What is this?
“looking at your data and then thinking about what it looks like”
eg does a straight line fit this data? what kind of correlation is it? etc..
Regression terminology: Explanatory variables?
Variables determined by the experimenter
- Synonyms: Covariables; independent variable; regressor; predictor…
Regression terminology: Response variables?
Change as a result of changes the explanatory variable(s)
- Synonyms: outcome variable; dependent variable; measured variable, etc…
What is the ordinary least squares model?
The model we are using here is –> ordinary least squares model.
Line of Best Fit == y = mx + c
What is the regression equation?
Y = alpha + Beta X
Y - Dependent variable
alpha = Population Y intercept
Beta = population slope gradient
X = independent variable
The regression equation in the least squares model?
- we want to chose values for α and β that minimise the sum of the squared errors (black vertical lines)
- This error represents the difference between the observed values and the predicted values. These black lines/errors are also known as residuals
- in building a model this way we can find the line that, on average, possesses the least summative divergence (error) between the observations and the model/trendline.
What is regression analysis?
A method to assess and quantify the relationship between one variable (the dependent variable) and one (or more) independent variables.
What is linear regression used for?
Used when there is linear (or Straight-Line) relationship between the variables (dependent and independent)
- Typically to calculate an R-Squared (goodness-of-fit) measure
What does an R squared measure do (regression)?
Tells us the proportion of variability in the response that is accounted for by the model
R^2 = 1 –> line perfectly explains data (never achieved)
R^2 = 0 –> model explains no variation