Chapter 14: Correlation and Regression Flashcards
correlation
A statistical method used to measure and describe the relationship between two variables
when do correlations exist
when changes in one variable tend to be accompanied by consistent and predictable changes in the other variable
what three aspects of a relationship do correlation measure?
the direction, form, and strength
positive correlation
two variables change in the same direction
negative correlation
two variables change in opposite directions
correlation coefficient
Measured by a value ranging from 0.00-1.00, where 0 is no correlation and 1 is a perfect correlation
what types of data use the Pearson correlation?
data having linear relationships
data from interval or ratio measurement scales
what type of data uses spearman correlation?
used with data from an ordinal scale (ranks)
what type of data uses point biserial correlation?
data where one variable is dichotomous and the other consists of regular numerical scores (interval or ratio scale)
when is phi-coefficient used?
when you have two dichotomous variables
pearson correlation formula
r= covariability of x and y (sp)/ variability of x and y separately (sqrt SSx*SSy)
covariation of x and y
sum of (x-mx)(y-my)
pearson correlation z-score formula (sample)
r= sum zx zy/ n-1
pearson correlation z-score formula (population)
p= sum zx zy/ n
uses of the pearson correlation
Used for prediction, validity, reliability, and theory verification
correlation doesn’t equal
causation
correlation and restricted range of scores
Severely restricted range may provide a very different correlation than would a broaden range of scores
Usually a smaller correlation
outlier
deviant individual in the sample
correlations and outliers
Outliers provide a disproportionately large impact on the correlation coefficient
coefficient of determination
Measures the proportion of variability in one variable that can be determined from the relationship with another variable (r²)
non-directional Pearson’s correlation hypotheses
H0: ρ = 0 and H1: ρ ≠ 0
positive correlation hypotheses
H0: ρ ≤ 0 and H1: ρ > 0
negative correlation hypotheses
H0: ρ ≥ 0 and H1: ρ < 0
degrees of freedom for correlation hypothesis tests
n-2
Correlation Hypothesis Test formula
t= r-p / sqrt (1-r)2/ n-2
How to Test if the Correlation is Different from 0
Convert your correlation to a t-value and use the t-table
regression
a method of finding an equation describing the best-fitting line for a set of data that aren’t perfectly related
reasons for regression
Make the relationship easier to see
Show the central tendency of the relationship
Predict y-values for given x-values
general equation for regression
y= bx+ a, where X and Y are variables, a is the intercept and b is the slope
best regression line
the one that minimizes the prediction error
Ŷ
the value of y predicted bt the regression line for each value of x
(Y-Ŷ)
the distance each data point is from the regression line (the error of predictor or residual)
Least-squared error solution
a line that minimizes the total squared error of prediction
how is the regression line calculated?
can be calculated with the components of the correlation coefficient
slope formula
b= SP/ SSx OR b= r sy/sx
y-intercept formula
a= My- b (Mx)
what does a perfect correlation mean?
there is no residual/error
standard error of estimate
how much on average do we expect our predictions to be off?
what happens to SEoE as r becomes stronger (0 to 1 or -1)
SEoE decreases to 0
effect of stronger correlations on the standard error of estimate
will result in fewer errors of prediction
predicted variability in y scores
SSregression = r² SSY
unpredicted variability in y scores
SSresidual = (1 − r²) SSY
Standard Error of Estimate based on r formula
SEoE= √(1-r²)SSY/ (n-2)
what happens to the regression slope if you standardize scores into z-scores
it is equal to the correlation coefficient
calculating a regression equation from a correlation
Calculate b (slope)
Use b value to calculate a (y-intercept)