4 Models of Correlations (CCA, Regression, Fisher) Flashcards
Pearson’s Correlation
P = (covariance xy) / (std x * std y), P is always between -1 and 1.
1 means perfectly positively correlated, 0 means no correlation
Limit of Pearson’s Correlation
Can only detect certain types of correlations (linear correlations)
Mutual Probability
The probability of two or more events happening at the same time
Is there any relation between Pearson’s correlation and mutual probability?
Yes there is a direct relation.
Min E[(y - ßx)^2] = Var(y) - (1-p^2), where p is Pearson Correlation
Residual Prediction Error Min E[(y - ßx)^2] meaning
Try to minimize Residual Prediction Error. This error is the mismatch between y and ßx, where we try to predict y using x with ß.
X is independent variable and y is dependent. ß is parameter (or coefficient) that defines relationship between x and y. Minimal RPE means ß is very good.
Canonical Correlation Analysis (CCA)
CCA is a statistical method used to understand the relationship between two sets of variables. It measures and identifies associations between two multivariate datasets.
(Ex: correlation between text and images. Both are multivariate: composed of multiple visual features or words)
The eigenvectors wx and wy are the canonical weights for the two sets of variables
The first eigenvalue (largest) represents the correlation denoted as p between the two sets in their new projected space (space defined by the canonical variables)
Solution of CCA
Eigenvalue * [Cxx 0 0 Cyy] * [wx wy]
The eigenvectors wx and wy are the canonical weights for the two sets of variables
The first (largest) eigenvalue represents the correlation denoted as p between the two sets in their new projected space (space defined by the canonical variables)
Cxx is covariance matrix x, for variables within set x.
What to use to find Nonlinear Correlations
CCA can be used not only for linear multivariate (two sets) correlations but also for non linear.
Lagrangian Multipliers
Method to find the maximum or minimum (optimum) of a function while satisfying a constraint, by ensuring the gradients of the function and the constraint are proportional
(Ex: finding max profit but with a budget constraint)
L(ø, lambda) = f(ø) + (lambda * g(ø)), this is ‘the Lagrangian’
To find the optimum (min, max): L(ø, lambda) = 0
In CCA which eigenvalue is chosen and how?
Eigenvalue with the largest value (max)
Eigenvalue = Pearson correlation = Max Corr(W†x, W†y) =
max ( (W†xCxyWy) / ( (√(W†xCxxWx) * (√(W†yCyyWy) )
W†xCxxWx = 1, same for y
Therefore, max (W†xCxyWy) = eigenvalue = p
Regularization
Fix instability that arises if Eigenvalues in B (self correlation matrix for each dataset) are near zero by adding noise through diagonal term (ϵI). Instability means we can’t trust the directions [wx, wy].
B <- B + ϵI
(B^-1)Aw = (lambda)(w), here we’d replace B with B+ diagonal term to improve stability.
B: self correlation matrix (within each dataset)
A: cross-correlation matrix (between datasets)
w = [wx wy] : directions we want to learn for projecting datasets (these are eigenvectors)
How to deal with CCA in high dimensions size where d > N ( features > instances).
Instead of solving it as dxd (features x features), we work in NxN (instances x instances) space
Generalized eigenvalue problem solved with weighted combination of the data
Wx = ∑(x - µ)a = Xa
Wy = ∑(y - µ)a = Ya
Optimal (Xa, Ya) = Optimal (Wx, Wy) : (optimal directions to model relationship between datasets is the same)
Temporal CCA
Temporal CCA (Canonical Correlation Analysis) is a method that extends standard CCA by incorporating time-lagged versions of the datasets to maximize correlations not only between variables in the same time frame but also across different time steps, capturing temporal dependencies and dynamics in the relationships.
Least Squares Regression
Least Squares Regression is a method to find the best-fit line for predicting a target y from input x, by minimizing the average squared difference between the actual and predicted values. (Minimized square difference between f(x) and y)
f(x) = w†x + b (w is weight, x is input, b is bias)
What is bias b in a linear model?
Bias b in a linear model is a constant term that allows the model’s predictions to shift (up or down) independent of input x