Week 7 Regression Flashcards

To provide an overview of the lecture on Regression

1
Q

What is the correlation coefficient?

A

*the index of the degree of association between two variables, typically Pearson r or related product-moment correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do Bivariate regression and Multiple Regression do, and how do they differ?

A
  • Bivariate regression allows for the prediction of one variable from another variable
  • Multiple regression is an extension of bivariate correlation, where the relationship is determined between a single DV (criterion) and multiple predictors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why would you use correlational research?

A

*Some variables that do not lend themselves to an experimental design, such as personality traits, sex, etc and these are of interest to behavioural scientists. *However, most variables that cannot be studied experimentally can be studied correlationally

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Remind me; what is a positive, negative and neutral relationship?

A

Positive relationship: two variables that move in the same direction, e.g., generally, as your height increases so too would your weight.

  • Negative relationship: two variables that move in opposite directions, e.g., alcohol intake & driving skill
  • Neutral relationship: (flat line?) no relationship e.g. driver’s shoe size & the number of kilometres travelled.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What exactly does Bivariate or Linear Regression achieve?

A
  • bivariate regression enables an equation to be developed to predict one variable (Y) from the other (X)
  • The regression coefficient is the value by which the score on a predicted variable is multiplied to predict the score on the criterion variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

So a correlation coefficient measures the correlation by identifying the strength of the association between two variables. However, different forms of measurement, for example, nominal, ordinal and scales require different analytical techniques.
What choices do I have?

A
  • Continuous measures (ratio or interval) generally use Pearson’s r as it is the most common parametric correlational analysis that measures the direction and strength of a linear relationship.
  • Spearman’s rho and Kendall Tau are correlational analyses that measure ordinal or ranked (nominal) level data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is bivariate regression easier to interpret than multiple regression?

A
  • Bivariate regression is easier to interpret than multiple regression as bivariate = only 2 variables
  • in Multiple Regression if there is a degree of inter-correlation among the IVs can lead to ambiguous output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How are Regression relationships best represented?

A
  • Relationships are best shown by a scatterplot
  • Only when a linear relationship exists can you use SPSS to do further correlational &/or regression analysis
  • Venn diagrams best visually represent the strength of the prediction in a relationship.
  • The Coefficient of Determination is r2, which indicates the proportion of variance in one variable predicted by the other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

My IVs are not inter-correlated, how do I interpret my output?

A
  • When IV‘s are uncorrelated then simply add their individual coefficients
  • to find R2 one simply adds the r2 values for each IV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

So when my IV’s are not intercorrelated I can simply add their individual coefficients; what happens when they are intercorrelated?

A
  • when correlated the overlap becomes problematic because it is no longer just an additive process
  • we only look at unique variance explained in each IV in Multiple Regression.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If I have inter-correlated IVs in multiple regression, is there another way to interpret the output?

A
  • Venn diagrams best visually represent the strength of the prediction in a relationship
  • look at the residuals which are the difference between the actual Y score & predicted Y score.
  • The larger the R, the smaller the combined residuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When do we use inferential statistics?

A
  • When we want to draw inferences about populations from information available in samples.
  • We can describe the relationship (association) between 2 variables through the direction and strength of those 2 variables.
  • Inferential statistics are used when we want to test a hypothesis about that relationship by looking at the degree of association and comparing it to a critical value in a table which is inferred to the population.
  • The hypothesis is tested to assess whether a significant relationship is identified between 2 given variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

I have heard there are different terms instead of DV, IV, and so on; what are some of these terms?

A
  • IV are called predictor variables
  • DV is known as the criterion variable

*Remember Multiple Regression is NOT causal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is r, R, R2?

A
  • Bivariate correlation coefficient = (r)
  • Bivariate regression = (R)
  • Multiple correlation is big R & represents a linear combination of predictors ascertaining a line of bit fit
  • SMC (Squared multiple correlation) Proportion of Variance explained = R2.
  • R2 gives the predictive value of the analysis (between 0 - 1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the assumptions of Regression & other correlational designs?

A
  • Curvilinear relationships – check prior to analyses
  • Sample size
  • Outliers – univariate, multivariate (Mahalanobis distance, & Casewise Diagnostics)
  • Normality, linearity, homoscedasticity, homogeneity of variance
  • Homogeneity of variance-covariance
  • Multicollinearity and Singularity
  • Care needs to be taken with interaction terms (Centre your data in this case)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do I know what a strong correlation is?

A
.00 to .09	Very weak, negligible
.10 to .29	Weak, low, small
.30 to .49	Moderate, medium
.50 to .69	Strong, high, large
.70 to 1.0	        Very strong, very high
\+/- 1 =              Perfect Correlation
17
Q

What does a regression Line & Line of best fit show?

A
  • A regression line predicts the value of the criterion variable from the predictor variable.
  • When a line of best fit is drawn, we can ascertain the Coefficient of Determination (proportion of explained variance) as well as the Coefficient of Alienation (proportion of unexplained variance).
  • If all points in the scatterplot fell exactly on the predictor line then there would be a perfect correlation (either + or - depending on the direction of the slope of the line).
18
Q

How do we calculate the line of best fit?

A

The line of best fit is calculated by Ŷ= a + bX + ε.
“a” = (known as the constant or intercept) is the point where the line intercepts the y axis (ie., the value of Y when X is zero).
“b” = (known as the b weight or regression coefficient) is the slope of the line (the amount by which Y increases for every single unit increase in X).
ε = Errors in prediction (residuals) are represented by the difference between actual Y scores and predicted Y scores (Y - Ŷ).

19
Q

What is standard (or simultaneous) multiple regression?

A

standard (or simultaneous) multiple regression asks:

  • What is the size of the overall relationship between the DV & the set of IVs?
  • How much of the relationship is contributed to, by each IV?
  • all predictors are entered simultaneously into the model
20
Q

What does Sequential or Hierarchical Multiple Regression fundamentally want to determine?

A

Sequential or Hierarchical MR fundamentally asks:

*After the 1st set of IVs are entered, does the 2nd set of IVs add to the prediction of the DV?

21
Q

What does Stepwise or statistical MR fundamentally wish to find out?

A

Stepwise or statistical MR wishes to determine:

*What is the best linear combination of IVs to predict the DV in the given sample?

22
Q

what is the difference between regression and correlation?

A

Correlation and regression are often used interchangeably to label the statistical analyses that allows the assessment of the relationship between one DV & several IVs.

  • Regression is most often used when the intent of the analysis is prediction
  • Correlation is used when the intent is simply to assess the relationship between the DV & the IVs
23
Q

why would you report adjusted R squared rather than regular R squared?

A
  • R tends to be overestimated because in sampling distribution the magnitude of chance fluctuations is larger with smaller sample sizes, so the smaller the sample the greater the overestimation.
  • Adjusted R square is preferable because adjustment has been made for expected inflation & accounts for shrinkage.
24
Q

What does significance in the regression coefficients (B & Beta) indicate?

A

If the indicator is significant it suggests that the predictor makes a significant unique contribution to the regression (i.e., to predicting the criterion)

25
Q

How does Hierarchical (or Sequential) Multiple Regression work?

A

Sequential (or Hierarchical) Multiple Regression (HMR) is where IVs are entered either individually or in blocks in an order specified by the researcher, with prior entry being used as a control, on the subsequent variables in the model. The order of these variables must be made on logical and/or theoretical considerations. The addition of each variable (or set of variables in the block) will be assessed at each step of entry on how much unique variance is added to the model and whether at that step a significant effect occurs.

26
Q

What is important to remember with HMR?

A
  • In Sequential Multiple Regression, the first variables entered may act as a control on the effect of variables added in subsequent steps in the mode
  • because at each step in HMR only the variance added will be identified. So where you have correlated variables entered in the first block or as the first variable entered, only the unique variance added will be identified
27
Q

Why should Caution be taken using Stepwise (or Statistical) Multiple Regression (StMR) /why is its use controversial?

A

Stepwise (or Statistical) Multiple Regression (StMR) is where the order of entry is purely based on statistical criteria. This is done by the statistical package rather than order of entry being governed by the researcher. *My sample may have 2 IVs that are closely correlated but one has a slightly lower coefficient. In stepwise analysis, the variable with the lower coefficient may be dropped. However, in subsequent research with a similar sample, the opposite may be true.
*Remember, the stepwise model removes the control away from the researcher in entry or importance of each variable in the prediction of the outcome.

28
Q

What are some of the more advanced forms of analysis, over and above Multiple regression analysis?

A
  • Canonical Correlation – Latent constructs are formed
  • Logistic Regression – Binomial – Categorical outcome variable
  • Reliability analysis, for example, Cronbach’s Alpha
  • Structural Equation Modelling
  • Path Analysis - Mediation and Moderation
29
Q

What are some of the assumptions checks for SMR?

A
  • Sample Size – with 4 predictors our sample should be least +60 according to Stevens criteria
  • Outliers – Univariate and Multivariate (M/V)– check with z-scores for univariate outliers. Note that a Casewise Diagnostics emerged as well as a Mahalanobis Distance score – this suggests for that case, there is an unusual pattern across the predictors on the DV. Mahalanobis only looks at predictors when evaluating M/V outliers.
  • Normality, linearity (restricted range) & homoscedasticity Satisfactory
  • Multicollinearity and Singularity check correlations & know your data
  • Levels of Measurement – Continuous or dummy coded
  • Interaction terms and curvilinear relationships
30
Q

How can Path Analysis be run in SPSS?

A
  • path analysis is generally done using SEM but a simplified version can be done via SPSS: when you want to model observed variables through direct & indirect effects of a mediating variable or, alternatively when you want to assess the impact (change) of an interaction of 2 variables on the relationship between 2 other variables (moderation).
  • SPSS performs a series of multiple regression analyses where variables are regressed on prior variables in the model
  • Alternatively, there is the Modgraph package by Paul Jose, University of Wellington. This program will assist you by mapping the simple slopes of each group in a 2 level analysis with categorical data