Classification: Linear Models II Flashcards
What can the fisher discriminant be used for?
Dimensionality reduction: automatically find low-dimensional projections that preserve most of the information/structure in the data
Different rojections preserve different amounts of the data structure
What are interesting comparisons with the fisher discriminant, least squares and the perceptron?
The most interesting things are shown in this figure
What are the basic principles of the fisher criterion?
Principle: find 1-dimensional projection w, such that clusters defined by class labels are well separated and cohesive
Explain the gaussian mixture
Quick note: what is a guassian?
https://gyazo.com/c20b13f2eaaeeeef7ba81aaae246e10a
- a guassian normal distribution,
- a mean (myg)
- an variance (co-variance). Which is used for the shape of the guassian.
Now a guassian mixture is a collection of guassians including some mixing coefficient (lambda) (which is the size of the gaussian)
https://gyazo.com/08d8525a3a9a663ce4ddeea0689ec73f
We can now look at the data and place clusters we think is there, and draw lines to understand the data.
Place a reasonable amount of clusters, and use the EM algorithm for mixed guassians to find the best fit for these clusters
Example:
What can we say if the covariance matrices for all clusters are the same, in a mixed gussian model?
That means the decision boundries are linear!
What are the basics and use cases of logistic regression?
https://gyazo.com/59b5e9b6a703f6c9fbfc2a9c2cf1efd7
one or more independent input variables that can be distrece or nondiscrete.
Always one binary output variable
The goal is to estimate p for a linear combination of independent variables. In order to tie together our linear combination of variables, we use the sigmoid function which spans from 0 to 1
The regression coefficients are calculated using maximum likelihood estimation
What are the properties of logistic regression?
See figure
Explain how we can use data transformations and how they are usefull
By using data transformations we can change the data from being non linear seperable to being linear seperable,
this does however often have a computing cost
An example is shown in figure where we map (x, y) to (|x|, y)