Correlation Flashcards
Why would we compute a partial correlation
.
Why would we compute a semi-partial correlation?
.
What’s the main difference between semi and partial correlation?
.
Which correlation is larger or further away from zero, and why?:
.
What kind of variables is X and Y in correlations?
X and Y variables in correlations is random- beyond the experimenter’s control and subject to sampling error in both.
What is the goal of correlations?
To express the degree of relationship between X and Y.
Define univariate information.
Provide an example.
Univariate information deals with 1 variable varying with itself. Not looking at the relationship between 2 variables yet.
We use the general sums of squares information (i,e., x vary with x; y vary with y).
Explain the direction and strength of correlation.
It is bounded by -1 and +1. Zero indicates no relationship. The relationship gets larger in strength as we go from 0 to +1 or -1.
Conceptually define SSCP - what does SSCP tell us?
We are taking the cross product of 2 deviations. It tells us how X and Y varies together.
Conceptually define SS
It is the raw measure of variability.
The deviation of x times the deviation of x…
Conceptually define covariance
.
Conceptually define variance and SD
Variance is the SS over df.
SD is the average deviation from the mean.
What is the conceptual formula for pearson r?
r = degree to which X and Y vary together/degree of which X and Y vary individually
or
r = covariability of X and Y/variability of X and Y seperately
SSCPxy/sqrt of SSxSSy
Why would we assess scatterplots before we access numbers?
Since pearson r doesn’t show curvilinear graphs, we look at scatterplots to show us the trend and outliers.
What does pearson correlation measures specifically?
The degree of and direction of Linear Relationships between 2 variables.
X is to predictor as Y is to
Y is to outcome
Define bivariate
How variables vary with each other rather than separately.
Looking at bivariate information, what indicates the positive or negative direction?
Why can’t the denominator of the correlation statistics be negative?
Looking at the bivariate information, the SSCP or (x-xbar)(y-ybar) indicates the + or - direction.
The denominator is always positive due to the square rooting of the SSx and SSy.
In a pearson r correlation formula, what indicates the direction?
The SSCP. A positive SSCP = positive correlation, and negative SSCP = negative correlation.
Why is correlations considered the standardized relationship?
Because it’s bounded by -1 and +1.
Define covariance in regards to correlations.
Covariance is the stepping block to correlations - it is the unstandardized relationship between our 2 variables and also get variance along with relationship.
Where in a matrix do we see covariance?
The off-diagonal values will be covariances and the diagonal values are variances.
If the scatterplot looks about a football, what might the correlation be?
About .50.
If the scatterplot looks like a wider football, what might the correlation be?
About .30… less correlation.
What does the correlation effect size indicate?
What is the notation?
How is it interpreted?
little r squared = tells us the proportion in X that can be explained by Y.
Ex: If r squared is .26, we would indicate that 26% of the number of doctor visits can be explained by attitudes of drug use.
Correlations are standardized metrics and tells us how 2 variables are related to each other in a standardized way - we can calculate this correlation based on previously standardized information… what is this previously standardized information? And how will we get the scores?
Z-scores.
We convert all of our x and y values into z-scores and we can just do covariance of the z-scores (ZxZy or SSCP) over n-1.
Why is it important to calculate covariances (like z-scores)?
It is important to calculate a covariance that are unstandardized because in estimations, covariances are used.
Just like computing sd, we start with variance and square root it to get the sd…
What is the hypothesis testing notation for pearson correlation?
p = 0; no linear relationship
p ≠ 0; there is a relationship
What is the df for a correlations test? Why is this the case?
The df for r is n-2 because we have 2 means (a mean for x and a mean for y).
What is n in a correlations test?
The number of individuals (rows).
If the critical value is .632 and our r obtained is .51, do we reject or fail to reject the null?
We fail to reject the null.
Why would sample size affect the strength of a correlation?
It could be not powerful enough to detect the correlation. The larger the n, the smaller the critical value number.
What is the statistical sentence for pearson correlation if n = 10, df = 8, r = .51, CV = .632.
r (df) = .51, p >.05
r (8) = .51, p >.05
We fail to reject the null
In a covariance, does a large number indicate the magnitude of the relationship?
No, it doesn’t indicate strength, just the direction (pos. or neg.), because covariances are unbounded (there’s nothing to compare the magnitude to).
What does a covariance near zero indicate?
What does a correlation near zero indicate?
They both indicate there is no linear relationship.
However, a correlation near zero doesn’t mean there isn’t a relationship at all… just no LINEAR relationship.
Restricting the range of data (removing extremes) will do what to a correlational scatterplot?
In real life application, when would this happen?
The correlation or the degree of relationship gets weaker, smaller.
If we restrict age for example, like only getting college kids instead of 25 to 70 y.o.
What’s wrong with selecting for extremes in a correlation?
Researchers take the mid-point and say anything above or lower is too high or too low and dichotomize continuos variables - artificially inflating the relationship and with the low group and high group on the continuos measure, they attempt to do t-tests that are easier to interpret by the masses… but really, just to get a significant correlation.
What happens when a value is extreme on X and extreme on Y?
It is artificially inflated and makes the correlation much stronger (an outlier).
What happens when a value is extreme on Y but in the middle of X?
The correlation decreases - reduced the relationship.
How does trend of the base model affect the relationship due to an outlier?
If an outlier is near the trend of the base model (i.e., if my base model is headed into a positive direction), and the outlier on x is also in a positive direction relative to the trend, it will barely affect the model in a positive direction. However, if the trend is positive, but the outlier on x is negative, it brings the base model down by a lot. (pg 11).
What is the diagonals of correlation matrix?
1s… a variable that is correlated with itself is 1.
If I’m asked to compute the variance of x, what will I do?
I am essentially solving for x-xbar squared, so I will need to compute the mean of the observations, then solve for the sum of squares and divide it by n-1.