Week 3 Correlation and Regression Flashcards by tran tinchi

What is correlation

-Correlation deals with the association between two variables and is one of the most important data analytic techniques in psychology
•Correlation is a form of bivariate analysis
-Correlation quantifies a linear relationship between two variables X and Y in terms of
•Direction
•Degree

How well did you know this?

Not at all

Perfectly

Why is correlation important

One of the most frequently used statistics, especially in social psychology. Also in other field such as medical studies
Building block for more sophisticated methods

How well did you know this?

Not at all

Perfectly

Understanding correlational information

•Correlation can be positive or negative
•An association between two variables can be linear or non-linear
•Correlation coefficients (r) range from -1 to 1
–Correlation of zero indicates no association between the variables

How well did you know this?

Not at all

Perfectly

Direction of correlational information

-Positive relationship: increases in X accompanied by increases in Y
-Negative relationship: increase in X accompanied by decrease in Y
No relationship: Knowing something about one variable tells you nothing about the other variable

How well did you know this?

Not at all

Perfectly

Form of the Relationship of correlation

Correlation measures the linear relationship between two variables.
If there is a nonlinear relationship, the correlation value may be deceptive.
If the two variables are independent of one another,the correlation will be approximately zero.

How well did you know this?

Not at all

Perfectly

Degree of relationship

-perfect linear relation: every change in the X variable is accompanied by corresponding change in the Y variable
•Rough rules for thumb on how big/small correlations are
–Small effect: .1 < r < .3 or -.3 > r > -.1
–Medium effect: .3 < r < .5 or -.3 > r > -.5
–Large effect: .5 < r < .7 or -.5 > r > -.7
however, this is not formal and context does matter

How well did you know this?

Not at all

Perfectly

Pearson correlation coefficient

The Pearson correlation coefficient (r) is most commonly used in psychology and measures the linear association between two continuous variables.
ØIt compares how much the two variables very together to how much they vary separately.

How well did you know this?

Not at all

Perfectly

Variability vs. Coverability

Variability: how much a given variable varies from observation to observation
Coverability: how much two variables vary together

How well did you know this?

Not at all

Perfectly

Correlation and Causality

Most important lesson of the day: correlation does not
imply causation!
–A significant correlation does not mean that one variable causes the other

How well did you know this?

Not at all

Perfectly

Correlation and Causality

Most important lesson of the day: correlation does not
imply causation!
–A significant correlation does not mean that one variable causes the other

How well did you know this?

Not at all

Perfectly

Extreme Scores effect on corellation

Extreme scores or outliers can greatly influence

the value of a correlation

How well did you know this?

Not at all

Perfectly

Regression Toward the Mean

With imperfect correlation, an extreme score on one measure tends to be followed by a less extreme score on the other measure
Extreme scores are often (but not always) due to chance
If it’s due to chance, it’s extremely unlikely that the other value will also be extreme

How well did you know this?

Not at all

Perfectly

Null Hypothesis testing for corellation

• The null hypothesis for correlation is: The correlation in the population is zero.
• If the probability associated with this null hypothesis is small (p < .05), then we reject the null hypothesis.
• So, we infer that the correlation value for the population is NOT zero
• There is a significant association between the two variables.
For correlation df= n– 2.

How well did you know this?

Not at all

Perfectly

Null hypothesis significance testing

Measure the variables for participants in the sample, calculate the correlation r
What is the probability of finding an r this big, if the real association in the population (ρ) is zero?
If this probability is small (p<.05), our initial assumption is in doubt.
We REJECT the null hypothesis.

How well did you know this?

Not at all

Perfectly

Spearman correlation coefficient

The spearman correlation coefficient (rs) may be used when the data is ordinal (ranked).
And when the data is one-directional but not linear.
Convert the data to ranks before calculating correlations, which can linearize nonlinear data

How well did you know this?

Not at all

Perfectly

Spearman problem with Outliers

Outlier is just one rank above the next highest value

Cronbach’s Alpha

•Measure of reliability
•Requires at least three items or scales
•Calculated by the average covariance of item pairs divided by the total variance
•Ranges from 0 to 1.0
–0 = completely unreliable
•Average covariance is zero
–1.0 = completely reliable
•Perfect correlations among the items, i.e.: covariance = variance
–Value directly represents the proportion of reliable variance Greatly influenced by the number of items!
–Increasing the number of items can produce high reliability even if correlations among items are not large
–This is why scales often have very large numbers of items/questions

Beginning regression role

predict one variable from another

Regression model

• Data in the population is dispersed randomly around a population regression line.
• Y = a + bX + e
-a is the intercept parameter (sometimes called the constant)
-b is the slope parameter
-e is an error or residual term.
lErrors are assumed to be:
– Independent
– Normally distributed with a sum of zero

Least squares solution

Observed Y Score: Y=a+bx+e
predicted Y score: Y^=a+bx
error or residual : Y-Y^
total square error: E(Y-Y^)^2

Explained variation

When there is a perfect correlation, all Y scores fall exactly on the regression line.
however irl The Y scores does not line up exactly but (job satisfaction) do tend to follow the regression line
•so some of the variation in Y is explained by the regression
•But the Y scores also vary around the regression line.

The ANOVA (F test) for regression

® The F test tells us whether the variance explained is significantly different from zero.
® If the F test is not significant, the regression is worthless. The predictor does not explain the outcome variable at all.
® Null hypothesis: r2 = 0.
® The F test has 2 degrees of freedom. For simple regression, the first is always 1, the
second df is the same as for the earlier t-test of regression coefficient

Regression parameters (a, b):

® The intercept (a) is the estimated value of Y when X = 0.
® Slope (b) (gradient) (the standardized/regression coefficient in JASP) indicates whether there is a relationship between X & Y, whether that relationship is positive or negative, and the estimated change in Y when X increases by 1. For H0: b = 0
® Slope can be transformed into standardized form (covert X & Y into z-scores, and then do the regression). This is called a standardized regression coefficient (called beta in SPSS/JASP), and is the same as the correlation coefficient for bivariate data

Regression diagnostics

a) Histogram of residuals – normality: we want the errors to be approximately normally distributed
b) Residuals plot – homoscedasticity: a scatterplot of residuals against predicted values to check for heteroscedasticity. Absence of any systemic pattern supports the assumption of homoscedasticity (where the variance of the residual/error term in a regression model is constant)

Multiple regression

® Used to predict the value of a variable based on the value of two or more other variables. ® Standardization tells us which of the variables is a larger influence.