Week 3: Correlation Flashcards

Question

What is the process of standardisaiton?

Answer 1

To overcome the problem of dependence on the measurement scale, we need to convert the covariance into a standard set of units

Answer 2

dividing by product of the standard deviations of both variables.

Answer 3

Same formula of covariance but multipled of SD of x and SD of y

Answer 4

standard deviation for the number of adverts watched (sx) was 1.67, SD of number of packets of crisps bought (sy) was 2.92. If we multiply these together we get 1.67 × 2.92 = 4.88. .Now, all we need to do is take the covariance, which we calculated a few pages ago as being 4.25, and divide by these multiplied standard deviations. This gives us r = 4.25/ 4.88 = .87.

Answer 5

correlational coefficient or Pearson's r

Answer 6

standardised

Answer 7

Describes a relationship between variables If one variable increases, what happens to the other variable?

Answer 8

product-moment correlation

Answer 9

Pearson's r correlation coefficient

Answer 10

-1 and +1 (direction of relationship)

Answer 11

be with each other and the mean

Answer 12

there is unexplained variance in the data and results in the data points being more spread out.

Answer 13

* example of high negative correlation. The data points are close together and are close to the mean. * On the other hand, the graph on the right shows a low positive correlation. The data points are more spread out and deviate more from the mean.

Answer 14

between one variable and another hence its use in calculating effect size

Answer 15

two variablesare perfectly positively correlated, so as one variable increases, the other increases by a proportionate amount.

Answer 16

a perfect negative relationship: if one variable increases, the other decreases by a proportionate amount.

Answer 17

small effect

Answer 18

medium effect

Answer 19

large effect

Answer 20

correlation coefficient is different from zero (i.e., different from 'no relationship')

Answer 21

relationship that we have observed is statistically meaningful.

Answer 22

1. Z scores 2. T-statistic

Answer 23

confidence intervals in r

Answer 24

likely correlation in the population

Answer 25

* LB = 1.33 - (1.96 * 0.71) = -0.062 * UB = 1.33 + (1.96 * 0.71) = 2.72 * Have to convert values of LB and UB as in z metric to r correlaiton coefficient using formula in diagram * This gives UB of 0.991 and LB of -0.062 (since value so close to 0 transformation from z to r has no impact)

Answer 26

decreases e.g 20 n p is not < 0.05 but at 200 pps it is p < 0.05

Answer 27

Link between age you die and number of cigarettes you smoked

Answer 28

indicates no linear relationship at all so if one variable changes, the other stays the same.

Answer 29

causality e.g., although we conclude no of adverts increase nmber of toffees bought we can't say watching adverts caused us to buy toffees

Answer 30

* Third variable problem - causality between variables can not be assumed in any correlation * Direction of causality: Correlation coefficients give nothing about which variables causes other to change.

Answer 31

significant

Answer 32

covariance to a measure of variance

Answer 33

coefficient of determination

Answer 34

proportion of the variance for a dependent variable )outcome) that's explained by an independent variable . (predictor)

Answer 35

19.4% of variability in exam performance can be explained by exam anxiety the variance in y accounted for by x’,

Answer 36

Multiply 0.1 * 0.1 for example

Answer 37

the correlation but without an indication of its direction.

Answer 38

1. Bivarate correlations 2. Partial correlations 3. Semi-partial or part correlations

Answer 39

relation between 2 variables

Answer 40

looks at the relationship between two variables while ‘controlling’ the effect of one or more additional variables.

Answer 41

the effect of one or more variables on either X or Y

Answer 42

* A correlation calculates each data points distance from line (residuals) * This is the error relative to the model (unexplained variance) * A third variable might predict some of that variation in residuals

Answer 43

unfiltiered variation of the other

Answer 44

third variable constant (but we don't manipulate these)

Answer 45

For example, when studying the effect of a diet, the level of exercise might also influence weight loss We want to know the unique effect of diet, so we need to partial out the effect of exercise

Answer 46

Partial Correlation between IV1 and DV = D / D+C Unique variance accounted for by the predictor (IV1) in the DV, after accounting for variance shared with other variables.

Answer 47

Partial correlation: Purple / Red + Purple If we were doing just a partial correlation, we would see how much exam anxiety is influencing both exam performance and revision time.

Answer 48

The partial correlation that we calculated took account not only of the effect of revision on exam performance, but also of the effect of revision on anxiety. If we were to calculate the semi-partial correlation for the same data, then this would control for only the effect of revision on exam performance (the effect of revision on exam anxiety is ignored).

Answer 49

control variable—a variable whose influence is statistically removed or controlled for when examining the relationship between the two primary variables (IV and DV).

Answer 50

relative to the amount of variance in the outcome that is left to explain after the contribution of other predictors have been removed from both the predictor and outcome.

Answer 51

we could look at the relationship between bladder relaxation (did the person wet themselves or not?) and the number of large tarantulas crawling up your leg controlling for fear of spiders (the first variable is dichotomous, but the second variable and ‘controlled for’ variables are continuous).

Answer 52

* . First, notice that the partial correlation between exam performance and exam anxiety is −.247, which is considerably less than the correlation when the effect of revision time is not controlled for (r = −.441). * . Although this correlation is still statistically significant (its p-value is still below .05), the relationship is diminished. * value of R2 for the partial correlation is .06, which means that exam anxiety can now account for only 6% of the variance in exam performance. * When the effects of revision time were not controlled for, exam anxiety shared 19.4% of the variation in exam scores and so the inclusion of revision time has severely diminished the amount of variation in exam scores shared by anxiety. * As such, a truer measure of the role of exam anxiety has been obtained.

Answer 53

other variables are ruled out

Answer 54

effect that the third variable has on only one of the variables in the correlation

Answer 55

Partials out the effect of one or more variables on either X or Y. e.g. The amount revision explains exam performance after the contribution of anxiety has been removed from the one variable (usually the predictor- e.g. revision).

Answer 56

unique variation of one variable with the unfiltered variation of the other.

Answer 57

* Semi-Partial Correlation between IV1 and DV = D / D+C+F+G Unique variance accounted for by the predictor (IV1) in the DV, after accounting for variance shared with other variables.

Answer 58

* purple/red + purple + white+ orange * When we use semi-partial correlation to look at this relationship, we partial out the variance accounted for by exam anxiety (the orange bit) and look for the variance explained by revision time (the purple bit).

Answer 59

A partial correlation quantifies the relationship between two variables while accounting for the effects of a third variable on both variables in the original correlation. A semi-partial correlation quantifies the relationship between two variables while accounting for the effects of a third variable on only one of the variables in the original correlation.

Answer 60

of bivariate correlation coefficients.

Answer 61

* Spearman's roh * Kendall's tau test

Answer 62

ordinal scale (e.g., grades)

Answer 63

Deselect Pearson's R tick box

Answer 64

first ranking the data n(numbers converted into ranks), and then running Pearson’s r on the ranked data

Answer 65

data have violated parametric assumptions such as nonnormally distributed data

Answer 66

Spearman's rho

Answer 67

proportion of variance in the ranks that two variables share.

Answer 68

when you have a small data set with a large number of tied ranks. This means that if you rank all of the scores and many scores have the same rank, then Kendall’s tau should be used

Answer 69

For small datasets, many tied ranks Better estimate of correlation in population than Spearman’s ρ

Answer 70

proportion of variance shared by two variables (or the ranks of those two variables).

Answer 71

tau is not comparable to r and r s

Answer 72

Kendall’s statistic is actually a better estimate of the correlation in the population we can draw more accurate generalizations from Kendall’s statistic than from Spearman’s.

Answer 73

* What type of measurement = continous * How many predictor variables = one * What type of continous variable = continous * Meets assumption of parametric tests - No

Answer 74

Pearson's correlation coefficient r output box

Answer 75

one of the two variables is dichotomous (e.g., example of dichotomous variable is women being pregnant or not)

Answer 76

depends on whether the dichotomous variable is discrete or continuous

Answer 77

one variable is a discrete dichotomy (e.g. pregnancy),

Answer 78

one variable is a continuous dichotomy (e.g. passing or failing an exam). e.g. An example is passing or failing a statistics test: some people will only just fail while others will fail by a large margin; likewise some people will scrape a pass while others will clearly excel.

Answer 79

must calculate the point–biserial correlation coefficient and then use an equation to adjust that figure

Answer 80

* Imagine interested in relationship between gender of a cat and how much time it spent away from home * Time spent away is measured in interval level --> mets assumptions of parametric data * Gender is discrete dichotomous variable coded with 0 for male and 1 for female

Answer 81

* Point-biserial correlation coefficient is r = 0.378 with p value of 0.001 * Sign of correlation coefficient dependent on which category you assign to code so ignore about direction of relationship * R^2 = (0.378) squared is 0.143 * Conclude that 14.3% of variability in time spent away from home is explained by gender

Answer 82

biseral correlation coefficient

Answer 83

biserial correlation bigger than point biserial

Answer 84

The researchers was interested in whether the amount someone gets paid and amount of holidays they take from work, whether these two variables would be related to their productivity at work - Pay: Annual salary - Holiday: Number of holiday days taken - Productivity: Productivity rating out of 10

Answer 85

medium effect size ±.1 = small effect ±.3 = medium effect ±.5 = large effect

Answer 86

o This indicates very little correlation between the 2 variables

Answer 87

the relationship between all possible combinations of your variables

Answer 88

- For Pay and Holiday, we can see the line is very flat and indicates the correlation between the two variables is quite low - - For pay and productivity, the line is steeper suggesting the correlation is fairly substantial between these 2 variables and same for holidays and pay and productivity and holidays here

Answer 89

* - The relationship between pay and holidays is very low correlation is -0.04 * - Between pay and productivity, there is a medium size correlation of r = 0.313 * Between holidays and productivity there is medium going on large effect size of 0.435 * Relationship between pay and productivity and also holidays and productivity is sig but correlation with pay and holidays was not sig

Answer 90

A student was interested in the relationship between the time spent preparing an essay, the interestingness of the essay topic and the essay mark received. He got 45 of his friends and asked them to rate, using a scale from 1 to 7, how interesting they thought the essay topic was (1 - I'll kill myself of boredom, 4 - it's not too bad!, 7 - it's the most interesting thing in the world!) (interesting). He then timed how long they spent writing the essay (hours), and got their percentage score on the essay (essay).

Answer 91

* Interval scale: difference between 10 degrees C and 20 degrees is same as 80 F and 90 F, 0 degrees does not mean absence of temp * Ratio: Height as 0 cm means no weight and weight, time

Answer 92

one IV and one DV

Answer 93

values ordered and ranked but values between them not uniform e.g., likert scale from strongly dsiagree to strongly agree education levels like elemenatry school, high school rankings like 1st place to 10th place

Answer 94

D. There was a significant positive correlation between interestingness of topic and the amount of time spent writing, with a large effect size. There was a non-significant positive correlation between time spent writing an essay and essay mark There was a non-significant positive correlation between interestingness of topic and essay mark

Answer 95

in between small and medium effect

Answer 96

your own research area

Answer 97

one variable is dichotomous, but there is an underlying continuum (e.g. pass/fail on an exam)

Answer 98

When one variable is dichotomous, and it is a true dichotomy (e.g. pregnancy)

Answer 99

* example of a true dichotomous relationship. * We can compare the differences in height between males and females. * Use dichotomous predictor of gender

Week 3: Correlation Flashcards

(132 cards)