W10 - MULTIVARIATE DATA Flashcards
What kind of variables could we used in a multivariate data analysis
Dependent or independent variables
What defines a multivariate dataset
p > 1
What is a 11 x 5 dimensional data? Give an example
11 objects and 5 variables
Eg. 11 species of plants and 5 leaf morphology variables
What do the matrix values tell us
It defines the position of n points in p dimensional space
(N = Objects, P = Variables)
What three parameters does a cross product and sum square matrices describes?
Means (centroid)
Variance
Covariance
Define data variation and how it is quantified
Data variation explains the amount of different values we can observe for an object as they do not always have the same values
It is quantified by the sum of squares
Define data association and how it is quantified
Variables collected for the same object share patterns of deviation from the mean
It is quantified by the sum of cross-product or covariance
How does a covariance plot differ from regression plot?
It is not a run over rise plot - we are interested in the shared variation between variables.
Either variable can be plotted on the horizontal axis
Name the following matrices of covariance
(x1 x2)
(x3 x4)
(var1 cov1,2)
cov2,1 var2)
OR
(sum sqr sum cross)
sum cross sum sqr)
What does the three parameters in the sum of cross-product tell us when a cloud is plotted?
Mean tells us the position
Variance tell us the size
covariance tells us the shape
What is the trace?
sum of variances in a variance-covariance matrix.
It is the sum of diagonal elements
Why is trace useful?
Helps us to understand the distribution of variance across multiple dimensions when dealing with high dimensional datasets.
What is the numeral range of corelation coefficients?
Ranges from -1 to +1 with 0 being no corelations
What is the coefficient of determination?
Corelation squared = % of variation shared between variables
Interpreted the same was as Rsqr in regression analysis
For example a value of 0.83 means 83 percent of a variable is explained by the other variable