correlation analysis lecture Flashcards

1
Q

what is correlation

A

consideration of whether there is any relationship or association between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

describe the correlation model

A
  • both Y and X are random variables;
    • sample observations are obtained by selecting a random sample of the units of association and taking on each a
      measurement of X and a measurement of Y
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

define correlation analysis

A

a statistical tool used to study the closeness of the relationship between two or more variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the correlation matrix

A

presents correlation coefficients among a group of variables

  • used by investigators to portray all possivle bivariate combos of set variables in order to determin patterns of interesting associations in order to study them further
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the correlation coefficient

A

the index which defines the strength of association between two variables

can be used to predict the value of one of the variables using another if a relationship exists

to determine relationship random samples must be taken from both sets of the two variables. this data is known as bivariate data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the basic rule for determining a relationship betw/ two variables

A
  • the two sets of data are presented as ordered pairs
  • dependant variable= y= the one who’s value is being predicted
  • indepentant II =x= the one used to make the prediction
    • ordered pairs are plotted on a graph and a relationship is inferred before calculations are done
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a scatter diagram

A

a diagram thgat shows the relationship between two variables by plotting the x,y pairs

independant values (x) are plotted on x axis

dependant values (y) are plotted on y axis

the coordiate of the two points form a correlation on the graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the pearson correlation coefficient (p)

A

A population parameter that measures the degree of association betw/ 2 varialbes

  • natural parameter for bivariate nominal data
  • requires interval or ratio measurements
  • used to asses the straight line association between X&Y
  • bivariate normal distrubution is a probablilty of distrubutions of X & Y aswell as the density of base pairs
  • this allows for b_oth positive and negative_ dependance betw/ X&Y
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

list the 5 correlation assumptions

A
  1. each value of X has a normally distributed subpopulation of Y values
  2. each value of Y has a normally distributed subpopulation of X values
  3. joint distribution of X&Y is a normal distrubution called ‘bivariate normal distribution’
  4. subpopulations of Y values have the same variance
  5. subpopulations of X values have the same variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is bivariate normal distibution

A

the joint normal distribution of X&Y

inferencial values can only be taken from normal joint x,y distro(bivariate)

no inferences can be made from non normal distrubutions although descriptive means can be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

five parameters of BIVARIATE DISTRUBUTION

A

σx : σy: standard deviations of each data set

µx µy : means for each data set

p: correlation coefficient= measures strength of X&Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the pearson coefficient

A

coeffecient used to asses the straight line assoc betw/ x & y and requires interval or ratio values

symbol for the sample correlation coefficient is r,

correlation varies from negative one to positive one (–1 r +1).

r-1 is perfect negative x,y relationship

r+1 is perfect positive x,y relationship

r=1 is a straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is pearson product moment correlation

A

numerical measure of the degree of association between two variables

  • provides a quantitative measure of the extent to which the two variables are associated
  • calculated from the bivariate data by a formula
    using values of data points
  • value of correlation coefficient calculated from a sample is denoted by the letter r
  • value of correlation coefficient calculated from a population is denoted by the Greek letter ρ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

pearson product moment correlation continued

A
  • correlation coeficients only show assoc not causeation
  • if r=1 it doesn’t mean p=1 ( an assoc in sample doesnt mean assoc in pop)
  • however a large sample size(no of pairs) increases the size of r and therefore suggests a high correlatio w/in the pop
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

list the types of correlations

A
  • r = +1, the two variables have perfect positive correlation. This means that on a scatter diagram, the points all lie on a straight line that has a positive slope
  • If r = –1, the two variables have perfect negative correlation. This means that on a scatter diagram, the points all lie on a straight line that has a negative slope
  • if its betw/ 0 and 1 two variables are positively correlated, but not perfectly so, the
    coefficient lies between
  • if its between –1 and 0 the two variables are negatively correlated, but not perfectly so,
  • r is 0: two variables have no overall
    upward or downward trend whatsoever,
    the
  • curvilinear relationship: positive/negative relationship till a certain point then after theis the realtionship inverses
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

confidence interval for pearson’s correlation

aka

Fisher’s r-to-z transformation

A

Fisher developed a transformation of r that tends to
become normal quickly as N increases.

used to conduct tests of r and calc CI

z=0.5ln (1+r/1-r)

z=+/- (criterionz) x (standard deviation)

criterion z=1.96 in case of 95% ci

can be used to calc upper and lower limits

17
Q

what method tests for the statistical significance of a correlation coeficient

A

based on a t-test that evalutes the H0 that p =0 in the population

  • ε = error term/ noise term/ residual term = random unobserved component
    • during hypothesis tests it has a normal distro w/ a mean of 0 and unknown cariance σ2 independadnt of x
18
Q

what is the phi coefficient

A

its a product—moment coefficient of correlation variation of Pearson’s definition of r when the two states of each variable are given values of 0 and 1 respectively.

19
Q

purpose of phi’s coefficient

A

designed for the comparison of truly dichotomous distributions ( only have 2 points on their scale for an unmeasurable attribute) i.e nominal values

aka known as the YUTE (φ)

relates to 2x2 tables

often used in psychoological and educational testing d/2 freq of applying dichotomy onto a continuous variable and PASS/ FAIL categories are found based on a threshold score

20
Q

what is YULE’S Q and what is it used for

A

: a nominal measure of association used to determine the association betw/ variables

or

the ratio of dx betw/ the products of diagonal cell freq and the sum of products of diagonal cell frequencies

  • used to analyse the strength and direction of association between two dichotomous variables(e.g.Gender, yes/no, T/F, aggree/disagree)
  • developed for variables w/ only 2 values
  • uses a 2x2 table where each variable is a dichotomy
  • distribution free statistic
21
Q

6 benefits of YULES Q

A
  1. no corrections required
  2. computed from 2x2 table w/o computising chi squared
  3. best meaningfully applied test for dichoutomous data
  4. no stringent assumptions for its application
  5. quick and easy calc
  6. measures the porportional reduction in error assoc w/ predictong one variable from the order
22
Q

4 cons of YULE’S Q

A
  1. can only use 2x2 tables
  2. if data fits into larger tables yule’s q can’t be used unless the data collapses to a 2x2 table
  3. collapsing data into smaller categoris causes info loss
  4. better to avoid collapsing data w/ YULE’S Q
23
Q

what is spearman’s rank order coefficient psp

A
  • alternative measure of the degree of assoc betw/ 2 variables
  • non parametric version of pearson product moment using sp to dx
  • measures the association betw/ ranks of observations
24
Q

conditions for spearman’s rho (1904) and kendall’s tau (1938)

A
  • x & y can have other joint distributions other than the bivariate one
  • the correlation betw/ x&y has the property of positve/ negative correlation
25
Q

what does spearmans correlation coefficient determine

A

the strength and direction of the monotonic relation between 2 variables instead of the strength of the linear(pearson)

monotonic relationship

  1. either as one variable increases so does the other
  2. as the value of one variable increases the other decreases
26
Q

assumptions made in the spearman’s coefficient

A
  • requires ordinal( scale of agreement)/ interval/ ratio(iq score) scale
  • a monotonic relationship exists betw/ the two variables
27
Q

spearman’s rho

A

each measurement is seperatley ranked for Xs and Ys in increasing order

then the pair for each Xi/Yi is replaced with the no of their rank

formula is apllied and the result is SPEARMAN’S RHO

  • rho is always betw/ -1 and 1
  • rho is pearson correlation applied to ranks
  • if y is monotonically increasing function of x then Yi matches Xi
28
Q

extra info on spearman’s

A
  • can be used when two variables are not normally distributed
  • not very outlier(values outside usual pattern) senseitive
    • therefore vailid results can still be obtained w/ outliers in the data