Week 12 Flashcards

1
Q

Regression is?

A
  • More Fiddly than other methods
  • Has more assumptions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do Linear Regression

A
  • Not looking at differences
  • Looking at relationships
  • Regression goes further than correlation - Allows us to make predictions
  • Produces a model that allows for sophisticated exploration of relationships in variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In Second Year Stats

A
  • Looked at relationships - Correlation
  • Differences Between Groups and Within-Groups
  • Used t-tests and aNOVAs
  • Variation in Dependent Variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Correlation

A

Allows us to estimate direction and strength of a linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do Linear Regression

A
  • How well will a set of variables predict an outcome?
  • Which variable in a set of variables is the best predictor of an outcome?
  • Does a particular predictor variable predict an outcome if another variable is controlled for?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Predictor Variable

A

Same as Independent Variable in Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Outcome Variable

A

The same as the Dependent Variable in Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Model?

A
  • An approximation to the actual data
  • simple summary of data
  • Makes data easier to interpret, communicate
  • Allows us to predict data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a Regression Model

A

Mathematically Describes the linear relationship
* Y = Beta X + C
* Y = Predicted valuus of the DV
* Beta = The slope of the line
* X = Scores on the Predictor (IV)
* C = The Intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The Intercept

A
  • Point where the function crosses the y-axis.
  • Sometimes Regression model only becomes significant when we remove the intercept, and the regression line reduces to
  • Y = b(X) + error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Standardized beta (β)

A
  • Compares the strength of the effect of each IV to the DV
  • The higher the absolute value of the beta coefficient, the stronger the effect
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How Does Regression Work?

A
  • Linear combination of another variable don’t always have to be continuous
  • Can have a combination of variables
  • Need to find the Line of Best Fit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Line of Best Fit

A
  • Many lines produced with Regression Formula
  • How do we know what line is best?
  • Mimimises the difference between observed values and data predicted by the line
  • This is called error
  • In regression also called residuals
    * Y = b(X) + C + error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

N (Cases): k (Predictors) Ratio

A
  • Assumption about sample size
  • Need a certain number of participants to trust validity
  • Simple Linear Regression Assumption
  • Number of Cases multiplied by Predictors
  • The more Predictors we have the more cases we need for the study
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Checking Linearity

A
  • checking for Linearity requires scatter plots
  • Need Scattepots between each DV & IV
  • Looking for Non-Linear evidence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Check for Normality

A
  • Kolmogorov-Smirnov/Shapiro-Wilks: p > .05
  • Skewness & Kurtosis: z score is < ±1.96 then it is normal
  • Histogram follows a bell curve.
  • Detrended Q-Q Plots: Equal amounts of dots above and below the line.
  • Normal QQ Plots: Normal if dots hugging the line.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Check for Univariate Outliers

Week 12 Part 2 - 10:00

A
  • Identified on Box & Whisker Plots
  • Dots indicate outliers
  • Asterisk indicates extreme cases
  • Number tells you which case is the issue
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Reason Univariate Outliers are Problematic

A
  • Regression Analysis gives formula for a straight line
  • A data point that stands outside other data points can change the slope of your straight line
  • This makes the line a poor predictor of the value of other data points
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How to deal with Outliers

A
  1. Check if Outlier is a data entry error and fix it
  2. Check if outlier is from different population - Justifies removing their data
  3. Separate outliers and run different analysis
  4. Run Analysis with and without outliers and report both models
  5. Winsorization - Change values so they’re not Outliers anymore
  6. Use transformations or Bootstrapping
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Winsorization

A
  • Change the score of outlier to value of 5th percentile for minimum values
  • Change the score of outlier to value of 95th percentile for maximum values
  • Slightly problematic because it changes the data
  • But this retains extremeness without removing outlier data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Bootstrapping

A
  • Uses transformations to deal with outliers
  • Creates samples from your sample
  • Uses your Mean and Standard Deviation to create another data set
  • does this repeatedly
  • This creates a large data set where extreme values are more normal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Homoscedasticity

A
  • Means Scame Scatter or Same Variance
  • Residuals are equal for all scores on the Outcome Variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Check for Normality, Linearity and Homoscedasticity

A
  • We need the residuals to behave in a certain way
  • Residuals are the difference between predicted scores and outcome variable
  • SPSS generates a Histogram Q-Q Plots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Dealing with Heteroscedasticity

A
  • Check graph in SPSS
  • Check the data for any patterns
  • If dots are scattered randomly then we are all good
21
Q

If Regression Assumptions are violated

A
  • Check Normality of predictors, if you fix these heteroscedasitcity can dissapear
  • Use a transformation on the Outcome Variable
  • Consider using a different method like Weighted Least Squares Regression
  • Use some kind of Non-Linear Regression
22
Q

Null Hypothesis for Regression

A
  • Slope of Regression line will be equal to 0
  • Beta = 0
23
Q

Alternative Hypothesis

A
  • Slope of the Regression line will not be Zero
  • Beta Not= 0
24
Q

Running Linear Regression

A

1. Analyse
2. Regression
3. Linear

4. Move your DV into the dependent box.
5. Move your independent variable into the independent box.
6. Ok

25
Q

Linear Regression - R value

A
  • Same as Pearson’s Correlation - p value
  • Tells us strength and direction of relationship
26
Q

Linear Regression R Square Value

A
  • Tells us amount of variance in DV explained by IV
  • Proportion of Variance that can be explained by the variable
  • 23% of variability in grades explained by attendance in this example
  • Known to overestimate the explained variance
27
Q

Linear Regression - R Square Adjusted

A
  • R needs to be adjusted to be smaller than R squared
  • Corrects bias of overestimated explained variance
  • Useful as Goodness of Fit Statistic
28
Q

Goodness of Fit Statistic

A
  • Determines how well sample data fits a distribution from a normal population
  • Determines if a sample is skewed or normal in the actual population
29
Q

Regression ANOVA

A
  • Uses the df, F value and the p value
  • Compares error rate with line of best fit and the error rate of the baseline model of 0
  • ANOVA is significant if it is “better” than the baseline
30
Q

Unstandardised Coefficient

A
  • The Slope of the Regression Equation
  • Amount of change in a Dependent Variable due to a change of an Independent Variable
  • This is the Beta coefficient
    e.g. each unit of attendance is associated with 1.88 unit of increase in grades
31
Q

Coefficient t-tests

A
  • Check if IV is a significant predictor of the DV
  • Become more relevant when we start adding more predictors
32
Q

Standardised Coefficients

A
  • A measure of the effect size
  • Useful for multiple Regression
  • Important when we have more than one Predictor
  • Predictors often measured in different scales
  • e.g, IQ Points, Classes attended, additional study time
33
Q

Dealing with Multiple Predictors

A
  • Most commonly found in Research Projects
  • Allows us to predict the outome variable from more than one predictor
  • Answers how well does combination predictors predict the outcome
    Y = b1(X1) + b2(X2) + C + error
34
Q

Univariate OUtliers

A

Outlier on one variable

35
Q

Multivariate Outlier

A

Outlier on a combination of variables

36
Q

Assumptions with Regression

A
  • Normality
  • Univariate Outliers
  • Multivariate Outliers
  • Multicollinearity
  • Normality, Linearity & Homosedasticity of residuals
37
Q

Multicollinearity

A
  • Two or more IV’s highly correlated in regression
  • IV can be predicted from another IV in a regression model.
38
Q

How to check for Multivariate Outliers

A

Mahalonobis Distance

39
Q

Mahalanobis Distance

A
  • largest value should not be greater than the critical 𝜒2 value for df = k at 𝛼= .001.
  • Where k = the number of predictors.
  • Use same table as Cook’s Distance
  • For simplicity use table below:
40
Q

Cooks Distance

A
  • Tells you if there are cases that influence the regression line
  • Use same table as Mahalanobis Distance
  • rule of thumb is if Cook’s D is > 1 you have influential cases.
  • Dealt with the same way as Univariate Oultiers
41
Q

Check for Multicollinearity

A
  • Pearson’s Correlations between IV
  • if i > .85 then there is multicollinearity
  • Tolerance: Values < .1 are multicollinear; < .2 warrant a closer look
  • VIF: Values > 10 are clearly Multicollinear; > 5 warrant a closer look
  • If you find a problem then remove the offending variable
  • If they are so closely related then they are basically the same thing. treat as one variable.
42
Q

Check for Multivariate Outliers

A
  • Use Residual Statistics Table
  • First, we find the critical 𝜒2 for a model with 4 predictors: 𝜒2 = 18.467 - Check the Mahalanobis Distance Table
  • Use Mahal. Distance Maximum (13.803 here)
  • 13.803 < 18.467 Therefore there are no multivariate outliers.
  • Cooks D is < 1 so there are no influential cases
43
Q

Interpreting Multiple Regression

A
  • Use the Variables Entered/Removed Table - Tells you how many predictors are in the model (4)
  • Then Model Summary Table
  • R is not just Person’s R anymore
  • It is correlation between actual scores and predictions in the regression equation
  • R square = Proportion of variance in DV Accounted for y combined predictors
  • Again R square Adjusted is a corrected version of R square that accounts for the positive bias.
44
Q

Interpreting Multiple Regression ANOVA

A
  • Now tests the comBination of predictors
  • A significant predictor of GHQ
  • The table has the df, the F value, and the p-value.
45
Q

Interpreting Multiple Regression Coefficients

A
  • Unstandardized coefficient is the slope of the regression
  • Shows each unit increase in one of the independent variables is associated with a b unit increase in GHQ
  • All other IVs are kept constant
  • Beta values = Standardised regression coefficients
  • Allow direct comparison of regression coefficients.
  • Displayed in units of standard deviation.
46
Q

Interpreting Multiple Regression Standardised Coefficients

A
  • t-values and p-values test the significance of the unique contribution of each predictor
  • Changes depending on predictors included in the model.
47
Q

Multiple Regression Tolerance & VIF

A
  • Tolerance: values < .1 are multicollinear; < .2 warrant closer inspection.
  • VIF: values > 10 are clearly multicollinear; > 5 warrant closer inspection.
48
Q

Remove Non-Significant Predictors

A
  • If you have a predictor that is not reflecting anything it makes the model worse
  • This changes the numbers slightly
  • Only have significant predictors in the model
49
Q

Applied look at Regression Equation

A
  • Our general form for the regression is:
    Y = b1(X1) + b2(X2) + b3(X3) + C + error
  • And if we take this equation and substitute in our variables we get:
    GHQ = b1(neuroticism) + b2(state-anxiety) + b3(trait-anxiety) + C + error

GHQ = .555(Neuroticism) + .318(state-anxiety) + .471(trait-anxiety) + 13.552 + error

50
Q

What is the value for R?

A
  • Correlation between theDV & IV
  • Value greater than 0.4 is taken for further analysis.
51
Q

What does R tell us?

A

The strength & direction of the relationship

52
Q

What does the value of R2 Adjusted tell the researcher

A
  • Tells you the percentage of variation explained by only the independent variables that actually affect the dependent variable.
53
Q

What does the value of R2 Adjusted tell the researcher

A
  • Tells you the percentage of variation explained by only the independent variables that actually affect the dependent variable.
54
Q

What does the value of R2 Adjusted tell the researcher

A
  • Tells you the percentage of variation explained by only the independent variables that actually affect the dependent variable.
55
Q

What does the value of R2 Adjusted tell the researcher

A
  • Tells you the percentage of variation explained by only the independent variables
  • Those that actually affect the dependent variable.