Test 2 (pg 147-170) Flashcards
As a research method, __________ allow you to describe the relationship between two measure variables.
correlational designs
A ___________ helps by assigning a numerical value to the observed relationship.
correlation coefficient (descriptive statistics)
In addition to describing a relationship, correlations allow us to ________ from one variable to another.
make predictions
Correlations vary in their _______.
magnitude
_________ is an indication of the strength of the relationship between two variables (the strength of the relationship).
Magnitude
What are the different types of correlational relationships?
positive, negative, none, or curvilinear
The magnitude or strength of a relationship is determined by the ________ describing the relationship.
correlation coefficient
A __________ is a measure of the degree of relationship between two variables.
correlation coefficient
A correlation coefficient can vary between ____ and _____.
-1.00 and +1.00
The _______ the relationship is between the variables, the closer the coefficient is to either -1.00 or +1.00.
stronger
The _______ the relationship is between the variables, the closer the coefficient is to 0.
weaker
We typically discuss correlation coefficients as assessing a ______, ______ or ______ relationship, or no relationship at all.
strong, moderate, or weak
+or - .70 - 1.00 =
a strong relationship
+ or - .30 - .69 =
a moderate relationship
+or- .00 - .29
no relationship (.00) to weak
A correlation coefficient of either -1.00 or +1.00 indicates a __________.
perfect correlation, the strongest relationship possible.
What would it mean if the correlation for height in weight in a group of 20 people was perfect (+1.00)?
This would mean that the person with the highest weight would also be the tallest person, the person with the second highest weight would be the second tallest person, and so on down the line. (and the height and weight decreased by a set amount each time.)
If there was a perfect negative correlation in the 20 person height/weight study, what would that mean?
This would mean that they person with the highest weight was the shortest, and so on, and that the height increased and the weight decreased by a set amount for each individual.
A correlation coefficient of _______ represents a perfect relationship, while a coefficient of ______ indicates no relationship.
+ or - 1.00 = perfect
0.00 = none
A ________ is a figure that graphically represents the relationship between two variables.
scaterplott
In a ______, two measurements are represented for each participant by the placement of a marker.
scatterplot
The stronger the correlation, the _______ the data points cluster around an imaginary line through their center.
more tightly
When there is a perfect correlation, where do the data points fall on a scatterplot?
the data points all fall on a straight line.
Describe what a positive relationship would look like on a scatterplot.
The majority of the data points fall along an upward angle, from lower left corner to upper right corner. The relationship is linear, and the stronger the correlation, the closer to the imaginary straight line the points will be.
Describe what a negative relationship would look like on a scatterplot.
The majority of the data points fall along an downward angle, from upper left corner to lower right corner. The relationship is linear, and the stronger the correlation, the closer to the imaginary straight line the points will be.
A negative correlation indicates that an ______ in one variable will lead to an _______ in the other variable.
increase, decrease, or decrease, increase
A positive correlation indicates that an ______ in one variable will lead to an _______ in the other variable.
increase, increase, or decrease, decrease
Describe what no relationship would look like on a scatterplot.
the data points are scattered in a random fashion. The correlation coefficient would be 0 or very close to 0 (the example in the book was -.09)
A correlation coefficient of 0 usually indicated no meaningful relationship. However it is also possible for a correlation coefficient of 0 to indicate a _________.
curvilinear relationship
A graph represents the relationship between psychological arousal (the x-axis) and performance (y-axis). The individuals perform better when they are moderately aroused than when arousal is either very low or very high. The correlation coefficient for these data is also very close to 0, being -.05. What type of relationship is this? What does the scatterplot look like?
- curvilinear relationship (so the correlation coefficient does not represent the high correlation seen in the scatterplot)
- the points are tightly clustered in an inverted U shape
correlation coefficients tell us only about _______ relationships.
linear
List the type of relationship for the following examples: smoking and cancer Intelligence and weight mountain elevation and temperature memory and age
smoking and cancer = positive
Intelligence and weight = none
mountain elevation and temperature = negative
memory and age = curvilinear
Which of the following correlation coefficients represent the weakest relationship between the two variables? -.59 \+.10 -1.00 \+.76
+.10
Explain why a correlation coefficient of 0.00 or close to 0.00 may not mean that there is no relationship between the variables.
A correlation coefficient of .00 or close to .00 may indicate no relationship or a weak relationship. However, if the relationship is curvilinear, the correlation coefficient could also be .00 or close to this. In this case, there is a relationship between the two variables, but because the relationship is curvilinear, the correlation coefficient does not truly represent the strength of the relationship.
The most common error made when interpreting correlations is assuming that the relationship observed is ______ in nature - that a change in variable A causes a change in variable B.
causal ***correlations simply identify relationships - the do not indicate causality
What is wrong with the statement “Lets stop drug use in schools by making sure they can all read.”
illiteracy does not cause drug use, Even if there is a strong positive correlation between drug use and illiteracy.
(the assumed causality and directionality)
______ is the assumption that a correlation indicates a causal relationship between the two variables.
causality
_____ is the inference made with respect to the direction of a causal relationship between two variables.
directionality
The _______ is the problem of a correlation between two variables being dependent on another (third) variable.
third-variable problem
When we interpret a correlation, it is also important to remember that although the correlation between the variables may be very strong, it may also be that the relationship is the result of some ________ that influences both of the measured variables.
third variable
What is the well known third variable problem study conducted by social scientists and physicians in Taiwan?
The researchers attempted to identify the variables that best predicted the use of birth control - a question of interest to the researchers because of overpopulation problems in Taiwan. they collected data on various behavioral and environmental variables and found that the variable most strongly correlated with contraceptive use was the number of electrical appliances in the home. (So people with more electric appliances used contraceptives more), whereas those with fewer electrical appliances tend to use contraceptives less. The third variable was education - the more educated a person was, the more likely they were to use contraceptives, and the more likely they were to have better jobs and make more money, thus being able to afford more appliances.
It is possible statistically to determine the effects of a third variable by using a correlational procedure known as ________.
partial correlation
________ is a correlational technique that involves measuring three variables and then statistically removing the effect of the third variable from the correlation of the remaining two variables.
Partial correlation
With__________, if the third variable is responsible for the relationship between two variables that appear to correlate, then the correlation should disappear when the effect of the third variable is removed, or partialed out.
partial correlation
_________ is a variable that is truncated and has limited variability. (the variable does not vary enough)
restrictive range
Colleges that are very selective, such as Ivy league schools, would have a _________ on SAT scores - they only accept students with very high SAT scores. Thus, in these situations, SAT scores are not a good predictor of college GPAs because of the “______” on the SAT variable. (If SAT scores have limited range, the correlation between SAT and GPA appears to decrease.
restrictive range
________ - arguing that a well-established statistical trend is invalid because we know a “person who” went against the trend.
Person-who argument
______ is the most commonly used correlation coefficient when both variables are measured on an interval or ratio scale.
Pearson product-moment correlation coefficient (Pearson’s r)
In the Pearson’s r, ____ is the statistical notation we use to report this correlation coefficient.
r
The development of the Pearson’s r correlation coefficient is typically credited to ______ who published his formula for calculating r in 1895. (even though Edgeworth had thought of a similar formula 3 years earlier, he had published it deep in a statistical paper that was very hard to follow so no one discovered it until years later).
Karl Pearson
To calculate Pearson’s r, we begin by converting the raw scores on the two different variables to the _________. (one example of this would be a z-score)
same unit of measurement
A _____ represents the number of standard deviation units a raw score is above or below the mean.
z-score
High raw scores are always above the mean and have ______ z-scores. Low raw scores are always below the mean and have _______ z-scores.
positive
negative
After calculating a z-score the next step in calculating Pearson’s r is to calculate what is called a ______.
cross-product
A ________ is the z-score on one variable multiplied by the z-score on the other variable.
cross-product
If a correlation is strong and positive, what would the z-score be?
the positive z-score on one variable would go with a positive z-score on the other variable. Negative z-scores on one variable go with negative z-scores on the other variable,
If both z-scores used to calculate the cross-product are positive, than the cross-product will be ______.
positive
If both z-scores used to calculate the cross-product are negative, than the cross-product will be ______.
positive (two negatives multiplied to each other = a positive)
The mean of all of the cross-products will tell you the ______.
correlation coefficient
r = (E zxzy) / (N)
The sum of all z-scores(x) and z-scores(y) divided by the number of participants
Pearson’s r formula
z = (X-ú) / (ó)
x is the raw score to be standardized
u is the mean of all the scores
ó is the standard deviation of the entire population
z-score
When calculating a correlation coefficient, we should have at least 10 participants per variable. If there are two variables, how many individuals would we need?
10 participants per variable (thus, with two variables, we need a minimum of 20 individuals)
The ________ is a measure of the proportion of the variance in one variable that is accounted for by another variable; calculated by squaring the correlation coefficient.
coefficient of determination
In the height and weight example, the coefficient of determination tells us how much variation in weight is accounted for by the variation in height. Squaring the correlation coefficient of +94, we get r2 = .8836, or 88.36% of the variance in weight can be accounted for by the variance in height - a very high coefficient of determination.
TRUE OR FALSE
TRUE
The type of correlation coefficient used depends on the ________ .
type of data collected in the research study.
Pearson’s correlation coefficient is used when both variables are measured on an ________.
interval or ratio scale.
_________ is the correlation coefficient used when one (or more) of the variables is measured on an ordinal (ranking) scale.
Spearman’s rank-order correlation coefficient
To use the Spearman’s correlation coefficient, if one of the variables is interval or ratio in nature, it must be _________ before the calculations are done.
ranked (converted to an ordinal scale
______ means having only two possible values, such as gender.
dichotomus
_______ is the correlation coefficient used when one of the variables is measured on a dichotomous nominal scale and the other is measured on an interval or ratio scale.
Point-biserial correlation coefficient
_________ is the correlation coefficient used when both measured variables are dichotomous and nominal.
phi coefficient
Pearson- Both variables must be ______.
interval or ratio
Spearman - Both variables are _______.
ordinal (ranked)
Point-biserial - one variable is ______, and on variable is ________.
interval or ratio
nominal and dichotomous
Phi - both variables are _________.
nominal and dichotomous
In a recent study, researchers were interested in determining the relationship between gender and the amount of time spent studying for a group of college students. Which correlation coefficient should be used to assess this relationship?
In this study, gender is a nominal scale, and the amount of time spent studying is ratio in scale. Thus, a point-biserial correlation coefficient is appropriate.
If i wanted to correlate class rank with SAT scores for a group of 50 individuals, which correlation coefficient would I use?
Because class ranks are an ordinal scale of measurement and SAT scores are measured on an interval/ratio scale, you would have to convert SAT scores to an ordinal scale and use the Spearman correlation coefficient.
The correlation procedure allows us to predict from one variable to another, and the degree of accuracy with which we can predict depends on the __________.
strength of the correlation
A tool that enables us to predict an individual’s score on on variable based on knowing one or more other variables is ________.
regression analysis
__________ is a procedure that allows us to predict an individual’s score on one variable based on knowing one or more other variables.
regression analysis
If you are working in a human resources office, and you want to predict how well future employees might perform based on test scores and performance measures, you could use regression analysis to make such prediction by developing a ________.
regression equation.
The _______ is the best-fitting straight line drawn through the center of a scatterplot that indicates the relationship between the variables.
regression line
__________ involves determining the equation for the best-fitting line for a data set.
Regression analysis
The formula for a linear regression analysis is :
y = bx+a (same a slope, y=mx+b, just different symbols)
y is the predicted value on the y variable, b is the slope of the line, x represents an individuals score on the x variable, and a is the y-intercept.
b = slope(standard deviation for y divided by the standard deviation for x)
a = Y - b(X)
A more advanced use of regression analysis is known as ______ which involved combining several predictor variables in a single regression equation. With this type of analysis, we can assess the effects of multiple predictor variables (rather than a single predictor variable) on the dependent measure.
multiple regression analysis
The magnitude of a correlation coefficient is to ______ and the type of correlation is to _________.
absolute value; sign
The _______ should be used when both variables are measured on an ordinal scale.
spearman
Drew is interested in assessing the degree of relationship between belonging to a greek organization and the number of alcoholic drinks consumed per week. Drew should use the ______ correlation coefficient to assess this.
Point-biseral
student Test #1 Test #2
1 First First
2 Second Third
3 Third Second
4 Fourth Forth
5 Fifth Fifth
Spearman
An ________ is used to measure things by ranking and doesn’t necessarily imply equal distances between the rankings. Medals in the Olympics (Gold, Silver, Bronze) and socioeconomic status (upper, middle, lower) are examples of “ “. For example, you are rating which ice cream flavor out of three (chocolate, vanilla, and strawberry) you like best. You love chocolate but only like vanilla and strawberry a little bit. Using an “ “, your rankings would be 1) chocolate, 2) vanilla, and 3) strawberry although the distance between chocolate and vanilla is much larger than the distance between vanilla and strawberry.
ordinal
student Test #1 Test #2 1 100 Happy 2 90 Happy 3 80 Happy 4 70 Sad 5 60 Sad
Point-Biseral
student Test #1 Test #2 1 Pass Pass 2 Pass Pass 3 Pass Fail 4 Fail Fail 5 Fail Pass
Phi
A _______ is a measurement scale that identifies things using a word. Also known as a qualitative scale, items are usually organized by their category or name. There is no order or ranking in this type of scale. Some examples of nominal categories are gender (male and female) and ethnicity (Caucasian, African-American, Hispanic, Asian, etc.). For example, in an experiment using using student volunteers several nominal scales could be used to create different books. The students could be categorized by gender, age, or what year in school they are.
Nominal Scale
IV string & DV numeric –> __________ –> Pearson R
Dummy variable coding
IV string & DV string –> _________`
Cramer’s V
Hypo: as hrs violent tv increase, acts of aggression tend to increase. We would use …
Pearson r
A way of assigning numerical values to a categorical variable so that it reflects class membership
dummy variable coding