Week 3: Correlation Flashcards

Question

What is the process of standardisaiton?

Answer 1

To overcome the problem of dependence on the measurement scale, we need to convert the covariance into a standard set of units

Answer 2

dividing by product of the standard deviations of both variables.

Answer 3

Same formula of covariance but multipled of SD of x and SD of y

Answer 4

standard deviation for the number of adverts watched (sx) was 1.67, SD of number of packets of crisps bought (sy) was 2.92. If we multiply these together we get 1.67 × 2.92 = 4.88. .Now, all we need to do is take the covariance, which we calculated a few pages ago as being 4.25, and divide by these multiplied standard deviations. This gives us r = 4.25/ 4.88 = .87.

Answer 5

correlational coefficient or Pearson's r

Answer 6

standardised

Answer 7

Describes a relationship between variables If one variable increases, what happens to the other variable?

Answer 8

product-moment correlation

Answer 9

Pearson's r correlation coefficient

Answer 10

-1 and +1 (direction of relationship)

Answer 11

be with each other and the mean

Answer 12

there is unexplained variance in the data and results in the data points being more spread out.

Answer 13

* example of high negative correlation. The data points are close together and are close to the mean. * On the other hand, the graph on the right shows a low positive correlation. The data points are more spread out and deviate more from the mean.

Answer 14

between one variable and another hence its use in calculating effect size

Answer 15

two variablesare perfectly positively correlated, so as one variable increases, the other increases by a proportionate amount.

Answer 16

a perfect negative relationship: if one variable increases, the other decreases by a proportionate amount.

Answer 17

small effect

Answer 18

medium effect

Answer 19

large effect

Answer 20

correlation coefficient is different from zero (i.e., different from 'no relationship')

Answer 21

relationship that we have observed is statistically meaningful.

Answer 22

1. Z scores 2. T-statistic

Answer 23

normally distributed

Answer 24

* This can be fixed by adjusting r so sampling distribution is normal as follows:

Answer 25

confidence intervals in r

Answer 26

likely correlation in the population

Answer 27

* LB = 1.33 - (1.96 * 0.71) = -0.062 * UB = 1.33 + (1.96 * 0.71) = 2.72 * Have to convert values of LB and UB as in z metric to r correlaiton coefficient using formula in diagram * This gives UB of 0.991 and LB of -0.062 (since value so close to 0 transformation from z to r has no impact)

Answer 28

Imagine you're studying the relationship between hours of study and exam scores among students. You collect data from 50 students and find a correlation coefficient (r) of 0.3 between study hours and exam scores. With a sample size of 50, this correlation might not be statistically significant at a typical significance level (let's say p < 0.05). Now, if you increase your sample size to 500 students while keeping the relationship between study hours and exam scores the same, you might find that even a smaller correlation coefficient, let's say 0.15, becomes statistically significant at the same significance level. So, as you move from a smaller sample size to a larger one, you may find that weaker relationships between variables become statistically significant due to the increased precision and reliability provided by the larger sample size.

Answer 29

Link between age you die and number of ciggerattes you smoked

Answer 30

indicates no linear relationship at all so if one variable changes, the other stays the same.

Answer 31

causality e.g., although we conclude no of adverts increase nmber of toffees bought we can't say watching adverts caused us to buy toffees

Answer 32

* Third variable problem - causality between2 variables can not be assumed in any correlation because there might be other measured/unmeasured variables affecting results. This is known as third variable problem or tertium quid * Direction of causality: Correlation coefficients give nothing about which variables causes other to change. Even if we ignore third variable problem and assume 2 correlated variables were only important, correlation coefficient does not give direction at which causality operates e.g., if conclude that watching adverts causes us to buy packets of toffees, there is no statistical reason why buying packets of toffees cannot cause us to watch more adverts

Answer 33

significant

Answer 34

covariance to a measure of variance

Answer 35

coefficient of determination

Answer 36

proportion of the variance for a dependent variable )outcome) that's explained by an independent variable . (predictor)

Answer 37

19.4% of variability in exam performance can be explained by exam anxiety the variance in y accounted for by x’,

Answer 38

Multiply 0.1 * 0.1 for example

Answer 39

the correlation but without an indication of its direction.

Answer 40

1. Bivarate correlations 2. Partial correlations 3. Semi-partial or part correlations

Answer 41

elation between 2 variables

Answer 42

looks at the relationship between two variables while ‘controlling’ the effect of one or more additional variables.

Answer 43

the effect of one or more variables on either X or Y

Answer 44

* A correlation calculates each data points distance from line (residuals) * This is the error relative to the model (unexplained variance) * A third variable might predict some of that variation in residuals

Answer 45

unfiltiered variation of the other

Answer 46

third variable constant (but we don't manipulate these)

Answer 47

Partial Correlation between IV1 and DV = D / D+C Unique variance accounted for by the predictor (IV1) in the DV, after accounting for variance shared with other variables.

Answer 48

Partial correlation: Purple / Red + Purple If we were doing just a partial correlation, we would see how much exam anxiety is influencing both exam performance and revision time.

Answer 49

The partial correlation that we calculated took account not only of the effect of revision on exam performance, but also of the effect of revision on anxiety. If we were to calculate the semi-partial correlation for the same data, then this would control for only the effect of revision on exam performance (the effect of revision on exam anxiety is ignored).

Answer 50

control variable—a variable whose influence is statistically removed or controlled for when examining the relationship between the two primary variables (IV and DV).

Answer 51

relative to the amount of variance in the outcome that is left to explain after the contribution of other predictors have been removed from both the predictor and outcome.

Answer 52

we could look at the relationship between bladder relaxation (did the person wet themselves or not?) and the number of large tarantulas crawling up your leg controlling for fear of spiders (the first variable is dichotomous, but the second variable and ‘controlled for’ variables are continuous).

Answer 53

* . First, notice that the partial correlation between exam performance and exam anxiety is −.247, which is considerably less than the correlation when the effect of revision time is not controlled for (r = −.441). * . Although this correlation is still statistically significant (its p-value is still below .05), the relationship is diminished. * value of R2 for the partial correlation is .06, which means that exam anxiety can now account for only 6% of the variance in exam performance. * When the effects of revision time were not controlled for, exam anxiety shared 19.4% of the variation in exam scores and so the inclusion of revision time has severely diminished the amount of variation in exam scores shared by anxiety. * As such, a truer measure of the role of exam anxiety has been obtained.

Answer 54

other variables are ruled out

Answer 55

effect that the third variable has on only one of the variables in the correlation

Answer 56

Partials out the effect of one or more variables on either X or Y. e.g. The amount revision explains exam performance after the contribution of anxiety has been removed from the one variable (usually the predictor- e.g. revision).

Answer 57

unique variation of one variable with the unfiltered variation of the other.

Answer 58

* Semi-Partial Correlation between IV1 and DV = D / D+C+F+G Unique variance accounted for by the predictor (IV1) in the DV, after accounting for variance shared with other variables.

Answer 59

* purple/red + purple + white+ orange * When we use semi-partial correlation to look at this relationship, we partial out the variance accounted for by exam anxiety (the orange bit) and look for the variance explained by revision time (the purple bit).

Answer 60

A partial correlation quantifies the relationship between two variables while accounting for the effects of a third variable on both variables in the original correlation. A semi-partial correlation quantifies the relationship between two variables while accounting for the effects of a third variable on only one of the variables in the original correlation.

Answer 61

of bivariate correlation coefficients.

Answer 62

* Spearman's roh * Kendall's tau test

Answer 63

ordinal scale (e.g., grades)

Answer 64

Deselect Pearson's R tick box

Answer 65

first ranking the data n(numbers converted into ranks), and then running Pearson’s r on the ranked data

Answer 66

data have violated parametric assumptions such as nonnormally distributed data

Answer 67

Spearman's rho

Answer 68

proportion of variance in the ranks that two variables share.

Answer 69

when you have a small data set with a large number of tied ranks. This means that if you rank all of the scores and many scores have the same rank, then Kendall’s tau should be used

Answer 70

For small datasets, many tied ranks Better estimate of correlation in population than Spearman’s ρ

Answer 71

proportion of variance shared by two variables (or the ranks of those two variables).

Answer 72

tau is not comparable to r and r s

Answer 73

Kendall’s statistic is actually a better estimate of the correlation in the population we can draw more accurate generalizations from Kendall’s statistic than from Spearman’s.

Answer 74

* What type of measurement = continous * How many predictor variables = one * What type of continous variable = continous * Meets assumption of parametric tests - No

Answer 75

Pearson's correlation coefficient r output box

Answer 76

one of the two variables is dichotomous (e.g., example of dichotomous variable is women being pregnant or not)

Answer 77

depends on whether the dichotomous variable is discrete or continuous

Answer 78

one variable is a discrete dichotomy (e.g. pregnancy),

Answer 79

one variable is a continuous dichotomy (e.g. passing or failing an exam). e.g. An example is passing or failing a statistics test: some people will only just fail while others will fail by a large margin; likewise some people will scrape a pass while others will clearly excel.

Answer 80

must calculate the point–biserial correlation coefficient and then use an equation to adjust that figure

Answer 81

* Imagine interested in relationship between gender of a cat and how much time it spent away from home * Time spent away is measured in interval level --> mets assumptions of parametric data * Gender is discrete dichotomous variable coded with 0 for male and 1 for female

Answer 82

* Point-biserial correlation coefficient is r = 0.378 with p value of 0.001 * Sign of correlation coefficient dependent on which category you assign to code so ignore about direction of relationship * R^2 = (0.378) squared is 0.143 * Conclude that 14.3% of variability in time spent away from home is explained by gender

Answer 83

biseral correlation coefficient

Answer 84

biserial correlation bigger than point biserial

Answer 85

The researchers was interested in whether the amount someone gets paid and amount of holidays they take from work, whether these two variables would be related to their productivity at work - Pay: Annual salary - Holiday: Number of holiday days taken - Productivity: Productivity rating out of 10

Answer 86

medium effect size ±.1 = small effect ±.3 = medium effect ±.5 = large effect

Answer 87

o This indicates very little correlation between the 2 variables

Answer 88

the relationship between all possible combinations of your variables

Answer 89

- For Pay and Holiday, we can see the line is very flat and indicates the correlation between the two variables is quite low - - For pay and productivity, the line is steeper suggesting the correlation is fairly substantial between these 2 variables and same for holidays and pay and productivity and holidays here

Answer 90

* - The relationship between pay and holidays is very low correlation is -0.04 * - Between pay and productivity, there is a medium size correlation of r = 0.313 * Between holidays and productivity there is medium going on large effect size of 0.435 * Relationship between pay and productivity and also holidays and productivity is sig but correlation with pay and holidays was not sig

Answer 91

A student was interested in the relationship between the time spent preparing an essay, the interestingness of the essay topic and the essay mark received. He got 45 of his friends and asked them to rate, using a scale from 1 to 7, how interesting they thought the essay topic was (1 - I'll kill myself of boredom, 4 - it's not too bad!, 7 - it's the most interesting thing in the world!) (interesting). He then timed how long they spent writing the essay (hours), and got their percentage score on the essay (essay).

Answer 92

* Interval scale: difference between 10 degrees C and 20 degrees is same as 80 F and 90 F, 0 degrees does not mean absence of temp * Ratio: Height as 0 cm means no weight and weight, time

Answer 93

one IV and one DV

Answer 94

values ordered and ranked but values between them not uniform e.g., likert scale from strongly dsiagree to strongly agree education levels like elemenatry school, high school rankings like 1st place to 10th place

Answer 95

D. There was a significant positive correlation between interestingness of topic and the amount of time spent writing, with a large effect size. There was a non-significant positive correlation between time spent writing an essay and essay mark There was a non-significant positive correlation between interestingness of topic and essay mark

Answer 96

in between small and medium effect

Answer 97

your own research area

Answer 98

one variable is dichotomous, but there is an underlying continuum (e.g. pass/fail on an exam)

Answer 99

When one variable is dichotomous, and it is a true dichotomy (e.g. pregnancy)

Answer 100

* example of a true dichotomous relationship. * We can compare the differences in height between males and females. * Use dichotomous predictor of gender

Week 3: Correlation Flashcards

(138 cards)