quiz 3 Flashcards
what is correlation
- Each individual is measured on two variables (X, Y)
- We are interested in exploring the relationship between scores on X and scores on Y
what is a bivariate scatterplot
plot of the data on the axis, we do not know if there is a correlation yet
explain 0 correlation, positive and negative
- If there is no relationship between X and Y, the correlation is 0
- If higher scores on X are associated with higher scores on Y, the correlation is positive
- If higher scores on X are associated with lower scores on Y, the correlation is negative
what is Pearson Correlation
- Statistic that allows us to express the relationship between X and Y (r)
- May take on values ONLY between -1 and +1
- If there is no correlation between X and Y, then r = 0
- If there is a positive correlation between X and Y, then r will be between 0 and +1
- If there is a negative correlation between X and Y, then r will be between -1 and 0.
what are the two parts of a Pearson Correlation and what do they tell us
- The sign (+/-) tells us whether the correlation is positive or negative
- The magnitude (absolute value) tells us the strength of the relationship
the closer a magnitude is to 1 means what?
the stronger the relationship.
explain Perfect Positive Correlation
- this does not happen, statistically cannot happen)
- r = +1.00
- Perfect Negative Correlation (r = -1.00)
what is significance testing looking at
- We want to make inferences to the whole population based on a sample selected from the population.
- Sampling error will always be involved.
- We might find a positive correlation in our sample, but how do we know that the variables are actually correlated in the population?
- How likely is it that I will make an error by claiming that the two variables are correlated in the population?
what does significance testing use
p-value
what does p-value tell us
- Tells us the chance that we will be WRONG if we conclude that there is a correlation between the two variables in the population
- p = .04 means that there is a 4% chance that we will make an error if we conclude that the two variables are in fact correlated in the population
- Convention: p ≤ .05 is considered “statistically significant”
- 5% chance or less that you’re wrong
what are three important considerations for correlations
- Shape of the relationship
- Homoscedasticity
- Restriction of range
what does the shape of the relationship mean
- Pearson r applies only if the relationship between the variables is presumed to be linear.
- Whatever the connection is, we are assuming that one thing is directly impacting the other
-Curvilinear relationships cannot be described by Pearson r
explain homoscedasticity and heteroscedasticity
Homoscedasticity = all data points fall within a (more or less) elliptical/oval shape; range of values on Y are same for each value of X
Heteroscedasticity = shape of data points deviates from ellipse (e.g., fan shaped); range of values on Y are NOT the same for each value of X
-We do not use Pearson’s R because it returns a result of 0, and this is not true because there is a relationship (same is true for curvilinear relationships)
visual difference between homo and heteroscedasticity
see ppt slides
explain restricted range
- Common reason why population correlation coefficients can be underestimated by sample r’s
- If your sample is in a restricted range, makes you conclude there is no relationship when there actually might
- The effect of a restricted range is to reduce the magnitude of the calculated r.
what two variables does Pearson r measure
variable 1: interval or ratio
variable 2: interval or ratio
what two variables does Spearman rho measure
variable 1: ordinal (ranks)
Variable 2: ordinal (ranks)
what two variables does Phi measure
variable 1: true dichotomy (nominal, two categories)
Variable 2: true dichotomy (nominal, two categories)
what two variables does tetrachoric measure
variable 1: artificial dichotomy (ex. pass or fail on a math test)
Variable 2: artificial dichotomy
what two variables does contingency coefficient measure
variable 1: nominal, two or more categories
Variable 2: nominal, two or more categories
what two variables does point biserial measure
variable 1: true dichotomy (nominal)
Variable 2: interval or ratio
what two variables does biserial measure
variable 1: artificial dichotomy
Variable 2: interval or ratio
what two variables does eta (curvilinear) measure
variable 1: interval or ratio
Variable 2: interval or ratio
what is artificial dichotomy
Artificial dichotomy is when a variable is not dichotomous but you are making it dichotomous
what is the coefficient of determination
- Obtained by squaring the correlation coefficient (r2)
- Interpreted as the percentage of variance in one variable that is predictable (explained by or shared with) the other variable
For example, if the correlation (r) between IQ and reading test scores is .7
- The Coefficient of Determination is .72 = .49 = 49%
- This means that 49% of the variance in reading test scores is predictable (or explained by) by IQ scores
- And 51% of the variance in reading test scores is due to other factors
why does correlation not imply causality
A strong correlation (positive or negative) between X and Y could mean any one of three things:
- X causes Y
- Y causes X
- A third,unmeasured variable influences both X and Y
what is factor analysis
- Expanding to the possibility that we have multiple measurements on each individual
- For example, each individual has five test scores
- We can take each pair of tests and calculate a correlation coefficient on that pair
what is a correlation matrix
visual representation of the correlation of each factor to all the other factors
- Entries in diagonal are +1.00 (perfect positive correlation)
- Section above is a mirror image of the section below
how to determine the number of unique entries in a correlation matrix involving n variables
n(n-1) / 2
what is the purpose of factor analysis
- Goal = simplification
- Discovery of underlying dimensions or constructs that can account for the pattern of correlations among our variables
- Reduces the number of variables we have to work with
what are the 6 steps in factor analysis
-Step 1. Deciding on the number of factors
- Step 2. Extracting the factors
- Perform the factor analysis by directing the software to extract the number of factors identified in Step 1
- Step 3. Examining the factor loadings
- Step 4. Performing a rotation
- Step 5. Examining the rotated factor loadings
- Step 6. Interpreting and naming the factors
what is the maximum number of factors that can be extracted
number of variables
what is an Eigenvalue
- amount of variance associated with each factor
what is a scree plot
- Scree Plot is a graphical representation
- eigenvalues (vertical axis) vs.
- the number of factors (horizontal axis)
- Locate the place where there is a large drop
- Number of factors to extract is at the top of the drop
what is a factor loading
-correlation of each of the original variables with each factor
explain factor rotation
- Plot the factor loadings on a graph
- Rotate the axes until they pass through the greatest number of data points
- Recalculate the factor loadings
what is Thurstone’s Criteria
- Eliminate negative factor loadings
2. Each variable has a high loading on only one factor
what is a rotated factor matrix
- Shows the correlation between each original variable and the new, rotated factors
- Step 6. Interpret and name the factors
explain unidimensional vs.. multidimensional
1) Unidimensional test =
- all items load on a single factor
- test is measuring a single construct
- Beck Depression Inventory might look like a unidimensional test but it is not
2) Multidimensional test =
- items group into two or more separate factors
- test is measuring more than one construct
- most tests are typically multidimensional
what are orthogonal factors
extracted factors are not correlated with one another (orthogonal)
-Lines are forming right angle with each other, will not eventually intersect/cross with each other
explain oblique factors
permit our factors to themselves be correlated with one another
what can you do if the factors are oblique
-If we extract many factors and the factors are oblique (correlated), we can repeat the process and factor analyze the factors themselves ( Second-Order Factors)
define factor
a mathematical concept that is utilized to determine if things have a relationship to one another
-Referring to the underlying relationship between two things, cannot now know it until you have done a factor analysis