Lecture 8 Flashcards
Correlation, Regression and Modelling
2 Types of Correlation:
Linear and Curvilinear
What is Correlation?
When there is a strong relationship between two variables
How is strength of correlation measured?
A correlation coefficient. “Ρ” (rho) is used to describe population data correlation. “r” is used when describing sample data
Most common measure of correlation
Pearson Correlation Coefficient
Limitations of Correlation
Cannot make statements about cause and effect from correlation alone. Correlation and significance are not the same thing. Correlation tells us about linear relationships between variables – BUT many variables can be strongly related but the nature of this relationship is nonlinear (E.g. effort to go from 0% - 20% on an exam is not the same as the effort to go from 60% to 80%)
Type of modelling used to display data
Empirical modelling
Explanatory variables
Determined by the experimenter. Could be known as: Independent variable, regressor, predictor. Found on horizontal axis.
Response variables
Changes as a result of changes to the explanatory variable. AKA: Outcome variable, dependent variable, measured variable. Found on vertical axis
Regression Equation:
Y = 𝜶̂ + 𝜷X ̂
In the Least Squares model, we want to chose values for α and β that minimise the sum of the squared errors
What are residuals?
Represent the difference between the observed values and the predicted values
Regression Analysis Limit
Upper and lower limits may exist, model usually only useful for predictions within the measured range. Take care when extrapolating!
Regression Analysis
A method to assess and quantify the relationship between one variable (the dependent variable) and one (or more) independent variables
Linear Regression
Used when there is a linear relationship between the variables (dependent and independent). Typically used to calculate an R-Squared (goodness-of-fit) measure
R-Squared Measure
Tells us the proportion of variability in the response that is accounted for by the model.
R-squared =1: line perfectly explains data (never happens)
R-squared = 0: model explains no variation