Chapter 1 Flashcards
If there are p variables, how many pairwise scatterplots can be produced? How implications does this have?(1)
p*(p-1)/2
Means scatterplots arent best for large p.
Heat maps can be used if data is not categorical as a replacement with yellow and white indication higher values and reds and oranges for lower values.
What does covariance indicate?(1)
Positive indicates that one variables value larger (or smaller) from its mean generally means larger or smaller for the other variable. Conversely, negative covariance indicates larger values than mean will produce smaller in the other variable (and vice versa).
Note 1/n-1 in the formula gives an unbiased estimate.
What is the sample mean vector of the data matrix X?(1)
1/nX^T1n
What does pre or post multiplying by a vector of ones do?(1)
Has the effect of calculating the column sums or row sums respectively.
What is the identity mstrix?
Zeros everywhere 1 diagonals.
What is the centering matrix (Hn)? What does this do?(2)
In-1/n1n^T1n.
Pre-multiplying X by the centering matrix Hn has the effect of subtracting the appropriate sample mean from each element. Therefore the centred data matrix has a sample mean vector of 0.
What lis the sample covariance matrix of the data matrix X?(1)
(1/n-1)X^THn*X
NEED TO LEARN PROOF!!
Note that is a symmetric and semi-positive definite-learn proof.
What are 2 properties of the centering matrix?(2)
LEARN PROOFS
Symmetric Hn^T=Hn
and idempotent Hn^2=Hn (Idempotence means that multiple applications of a particular operation do not change the result. In other words, if we try to center the
centering matrix we are left with the centering matrix.)
Calculated rij for correlation matrix.(1)
rij = sij/sisj
where si = √sii is the sample standard deviation of the ith variable.
What does R = Ip mean?(1)
Variables are uncorrelated as correlation matrix shows variables only correlate with one another.
How would you calculate the sample correlation matrix R from sample covariance matrix S?(1)
D^-1SD^-1.
Where D is diagonal matrix with standard deviations of the variables.
Thud R is positive semi-definite from S also being psd.
What are 2 single measures of multivariate scatter?(2)
Generalised variance=Det(S)
Total variation=Tr(S) (sum of diagonals of S)
What is a linear functional?(1)
q=1 for transformation f(x)=a^Tx where a is a vector length p Rp–>R
What is an affine transformation?(1)
Linear transformation combined with a shift in location
Rp–>Rqbf(x)=Ax+b for A qxp b vector length b.
What is a 2d projection?(1)
A linear transformation of x y=AX where A=(ej^T,ek^T)^T and ej is the length-p vector with 1 in element j and 0 everywhere else
-A selection of a pair of variables for a scatterplot matrix plot.