correlation analysis lecture Flashcards
what is correlation
consideration of whether there is any relationship or association between two variables
describe the correlation model
- both Y and X are random variables;
- sample observations are obtained by selecting a random sample of the units of association and taking on each a
measurement of X and a measurement of Y
- sample observations are obtained by selecting a random sample of the units of association and taking on each a
define correlation analysis
a statistical tool used to study the closeness of the relationship between two or more variables.
what is the correlation matrix
presents correlation coefficients among a group of variables
- used by investigators to portray all possivle bivariate combos of set variables in order to determin patterns of interesting associations in order to study them further
what is the correlation coefficient
the index which defines the strength of association between two variables
can be used to predict the value of one of the variables using another if a relationship exists
to determine relationship random samples must be taken from both sets of the two variables. this data is known as bivariate data
what is the basic rule for determining a relationship betw/ two variables
- the two sets of data are presented as ordered pairs
- dependant variable= y= the one who’s value is being predicted
- indepentant II =x= the one used to make the prediction
- ordered pairs are plotted on a graph and a relationship is inferred before calculations are done
what is a scatter diagram
a diagram thgat shows the relationship between two variables by plotting the x,y pairs
independant values (x) are plotted on x axis
dependant values (y) are plotted on y axis
the coordiate of the two points form a correlation on the graph
what is the pearson correlation coefficient (p)
A population parameter that measures the degree of association betw/ 2 varialbes
- natural parameter for bivariate nominal data
- requires interval or ratio measurements
- used to asses the straight line association between X&Y
- bivariate normal distrubution is a probablilty of distrubutions of X & Y aswell as the density of base pairs
- this allows for b_oth positive and negative_ dependance betw/ X&Y
list the 5 correlation assumptions
- each value of X has a normally distributed subpopulation of Y values
- each value of Y has a normally distributed subpopulation of X values
- joint distribution of X&Y is a normal distrubution called ‘bivariate normal distribution’
- subpopulations of Y values have the same variance
- subpopulations of X values have the same variance
what is bivariate normal distibution
the joint normal distribution of X&Y
inferencial values can only be taken from normal joint x,y distro(bivariate)
no inferences can be made from non normal distrubutions although descriptive means can be used
five parameters of BIVARIATE DISTRUBUTION
σx : σy: standard deviations of each data set
µx µy : means for each data set
p: correlation coefficient= measures strength of X&Y
what is the pearson coefficient
coeffecient used to asses the straight line assoc betw/ x & y and requires interval or ratio values
symbol for the sample correlation coefficient is r,
correlation varies from negative one to positive one (–1 r +1).
r-1 is perfect negative x,y relationship
r+1 is perfect positive x,y relationship
r=1 is a straight line
what is pearson product moment correlation
numerical measure of the degree of association between two variables
- provides a quantitative measure of the extent to which the two variables are associated
- calculated from the bivariate data by a formula
using values of data points - value of correlation coefficient calculated from a sample is denoted by the letter r
- value of correlation coefficient calculated from a population is denoted by the Greek letter ρ
pearson product moment correlation continued
- correlation coeficients only show assoc not causeation
- if r=1 it doesn’t mean p=1 ( an assoc in sample doesnt mean assoc in pop)
- however a large sample size(no of pairs) increases the size of r and therefore suggests a high correlatio w/in the pop
list the types of correlations
- r = +1, the two variables have perfect positive correlation. This means that on a scatter diagram, the points all lie on a straight line that has a positive slope
- If r = –1, the two variables have perfect negative correlation. This means that on a scatter diagram, the points all lie on a straight line that has a negative slope
- if its betw/ 0 and 1 two variables are positively correlated, but not perfectly so, the
coefficient lies between - if its between –1 and 0 the two variables are negatively correlated, but not perfectly so,
-
r is 0: two variables have no overall
upward or downward trend whatsoever,
the - curvilinear relationship: positive/negative relationship till a certain point then after theis the realtionship inverses