Week 12 Flashcards
Regression is?
- More Fiddly than other methods
- Has more assumptions
Why do Linear Regression
- Not looking at differences
- Looking at relationships
- Regression goes further than correlation - Allows us to make predictions
- Produces a model that allows for sophisticated exploration of relationships in variables
In Second Year Stats
- Looked at relationships - Correlation
- Differences Between Groups and Within-Groups
- Used t-tests and aNOVAs
- Variation in Dependent Variable
Correlation
Allows us to estimate direction and strength of a linear relationship
Why do Linear Regression
- How well will a set of variables predict an outcome?
- Which variable in a set of variables is the best predictor of an outcome?
- Does a particular predictor variable predict an outcome if another variable is controlled for?
Predictor Variable
Same as Independent Variable in Regression
Outcome Variable
The same as the Dependent Variable in Regression
What is a Model?
- An approximation to the actual data
- simple summary of data
- Makes data easier to interpret, communicate
- Allows us to predict data
What is a Regression Model
Mathematically Describes the linear relationship
* Y = Beta X + C
* Y = Predicted valuus of the DV
* Beta = The slope of the line
* X = Scores on the Predictor (IV)
* C = The Intercept
The Intercept
- Point where the function crosses the y-axis.
- Sometimes Regression model only becomes significant when we remove the intercept, and the regression line reduces to
- Y = b(X) + error
Standardized beta (β)
- Compares the strength of the effect of each IV to the DV
- The higher the absolute value of the beta coefficient, the stronger the effect
How Does Regression Work?
- Linear combination of another variable don’t always have to be continuous
- Can have a combination of variables
- Need to find the Line of Best Fit
Line of Best Fit
- Many lines produced with Regression Formula
- How do we know what line is best?
- Mimimises the difference between observed values and data predicted by the line
- This is called error
- In regression also called residuals
* Y = b(X) + C + error
N (Cases): k (Predictors) Ratio
- Assumption about sample size
- Need a certain number of participants to trust validity
- Simple Linear Regression Assumption
- Number of Cases multiplied by Predictors
- The more Predictors we have the more cases we need for the study
Checking Linearity
- checking for Linearity requires scatter plots
- Need Scattepots between each DV & IV
- Looking for Non-Linear evidence
Check for Normality
- Kolmogorov-Smirnov/Shapiro-Wilks: p > .05
- Skewness & Kurtosis: z score is < ±1.96 then it is normal
- Histogram follows a bell curve.
- Detrended Q-Q Plots: Equal amounts of dots above and below the line.
- Normal QQ Plots: Normal if dots hugging the line.
Check for Univariate Outliers
Week 12 Part 2 - 10:00
- Identified on Box & Whisker Plots
- Dots indicate outliers
- Asterisk indicates extreme cases
- Number tells you which case is the issue
Reason Univariate Outliers are Problematic
- Regression Analysis gives formula for a straight line
- A data point that stands outside other data points can change the slope of your straight line
- This makes the line a poor predictor of the value of other data points
How to deal with Outliers
- Check if Outlier is a data entry error and fix it
- Check if outlier is from different population - Justifies removing their data
- Separate outliers and run different analysis
- Run Analysis with and without outliers and report both models
- Winsorization - Change values so they’re not Outliers anymore
- Use transformations or Bootstrapping
Winsorization
- Change the score of outlier to value of 5th percentile for minimum values
- Change the score of outlier to value of 95th percentile for maximum values
- Slightly problematic because it changes the data
- But this retains extremeness without removing outlier data
Bootstrapping
- Uses transformations to deal with outliers
- Creates samples from your sample
- Uses your Mean and Standard Deviation to create another data set
- does this repeatedly
- This creates a large data set where extreme values are more normal
Homoscedasticity
- Means Scame Scatter or Same Variance
- Residuals are equal for all scores on the Outcome Variable
Check for Normality, Linearity and Homoscedasticity
- We need the residuals to behave in a certain way
- Residuals are the difference between predicted scores and outcome variable
- SPSS generates a Histogram Q-Q Plots