Week 3 Correlation and Regression Flashcards

1
Q

What is correlation

A

-Correlation deals with the association between two variables and is one of the most important data analytic techniques in psychology
•Correlation is a form of bivariate analysis
-Correlation quantifies a linear relationship between two variables X and Y in terms of
•Direction
•Degree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is correlation important

A
  • One of the most frequently used statistics, especially in social psychology. Also in other field such as medical studies
  • Building block for more sophisticated methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Understanding correlational information

A

•Correlation can be positive or negative
•An association between two variables can be linear or non-linear
•Correlation coefficients (r) range from -1 to 1
–Correlation of zero indicates no association between the variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Direction of correlational information

A

-Positive relationship: increases in X accompanied by increases in Y
-Negative relationship: increase in X accompanied by decrease in Y
No relationship: Knowing something about one variable tells you nothing about the other variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Form of the Relationship of correlation

A
  • Correlation measures the linear relationship between two variables.
  • If there is a nonlinear relationship, the correlation value may be deceptive.
  • If the two variables are independent of one another,the correlation will be approximately zero.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Degree of relationship

A

-perfect linear relation: every change in the X variable is accompanied by corresponding change in the Y variable
•Rough rules for thumb on how big/small correlations are
–Small effect: .1 < r < .3 or -.3 > r > -.1
–Medium effect: .3 < r < .5 or -.3 > r > -.5
–Large effect: .5 < r < .7 or -.5 > r > -.7
however, this is not formal and context does matter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Pearson correlation coefficient

A

The Pearson correlation coefficient (r) is most commonly used in psychology and measures the linear association between two continuous variables.
ØIt compares how much the two variables very together to how much they vary separately.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variability vs. Coverability

A

Variability: how much a given variable varies from observation to observation
Coverability: how much two variables vary together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Correlation and Causality

A

Most important lesson of the day: correlation does not
imply causation!
–A significant correlation does not mean that one variable causes the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Correlation and Causality

A

Most important lesson of the day: correlation does not
imply causation!
–A significant correlation does not mean that one variable causes the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Extreme Scores effect on corellation

A

Extreme scores or outliers can greatly influence

the value of a correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Regression Toward the Mean

A
  • With imperfect correlation, an extreme score on one measure tends to be followed by a less extreme score on the other measure
  • Extreme scores are often (but not always) due to chance
  • If it’s due to chance, it’s extremely unlikely that the other value will also be extreme
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Null Hypothesis testing for corellation

A

• The null hypothesis for correlation is: The correlation in the population is zero.
• If the probability associated with this null hypothesis is small (p < .05), then we reject the null hypothesis.
• So, we infer that the correlation value for the population is NOT zero
• There is a significant association between the two variables.
For correlation df= n– 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Null hypothesis significance testing

A
  • Measure the variables for participants in the sample, calculate the correlation r
  • What is the probability of finding an r this big, if the real association in the population (ρ) is zero?
  • If this probability is small (p<.05), our initial assumption is in doubt.
  • We REJECT the null hypothesis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Spearman correlation coefficient

A
  • The spearman correlation coefficient (rs) may be used when the data is ordinal (ranked).
  • And when the data is one-directional but not linear.
  • Convert the data to ranks before calculating correlations, which can linearize nonlinear data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Spearman problem with Outliers

A

Outlier is just one rank above the next highest value

17
Q

Cronbach’s Alpha

A

•Measure of reliability
•Requires at least three items or scales
•Calculated by the average covariance of item pairs divided by the total variance
•Ranges from 0 to 1.0
–0 = completely unreliable
•Average covariance is zero
–1.0 = completely reliable
•Perfect correlations among the items, i.e.: covariance = variance
–Value directly represents the proportion of reliable variance Greatly influenced by the number of items!
–Increasing the number of items can produce high reliability even if correlations among items are not large
–This is why scales often have very large numbers of items/questions

18
Q

Beginning regression role

A

predict one variable from another

19
Q

Regression model

A

• Data in the population is dispersed randomly around a population regression line.
• Y = a + bX + e
-a is the intercept parameter (sometimes called the constant)
-b is the slope parameter
-e is an error or residual term.
lErrors are assumed to be:
– Independent
– Normally distributed with a sum of zero

20
Q

Least squares solution

A

Observed Y Score: Y=a+bx+e
predicted Y score: Y^=a+bx
error or residual : Y-Y^
total square error: E(Y-Y^)^2

21
Q

Explained variation

A

When there is a perfect correlation, all Y scores fall exactly on the regression line.
however irl The Y scores does not line up exactly but (job satisfaction) do tend to follow the regression line
•so some of the variation in Y is explained by the regression
•But the Y scores also vary around the regression line.

22
Q

The ANOVA (F test) for regression

A

® The F test tells us whether the variance explained is significantly different from zero.
® If the F test is not significant, the regression is worthless. The predictor does not explain the outcome variable at all.
® Null hypothesis: r2 = 0.
® The F test has 2 degrees of freedom. For simple regression, the first is always 1, the
second df is the same as for the earlier t-test of regression coefficient

23
Q

Regression parameters (a, b):

A

® The intercept (a) is the estimated value of Y when X = 0.
® Slope (b) (gradient) (the standardized/regression coefficient in JASP) indicates whether there is a relationship between X & Y, whether that relationship is positive or negative, and the estimated change in Y when X increases by 1. For H0: b = 0
® Slope can be transformed into standardized form (covert X & Y into z-scores, and then do the regression). This is called a standardized regression coefficient (called beta in SPSS/JASP), and is the same as the correlation coefficient for bivariate data

24
Q

Regression diagnostics

A

a) Histogram of residuals – normality: we want the errors to be approximately normally distributed
b) Residuals plot – homoscedasticity: a scatterplot of residuals against predicted values to check for heteroscedasticity. Absence of any systemic pattern supports the assumption of homoscedasticity (where the variance of the residual/error term in a regression model is constant)

25
Q

Multiple regression

A

® Used to predict the value of a variable based on the value of two or more other variables.
® Standardization tells us which of the variables is a larger influence.