Final Flashcards
Quiz: PCA vs ICA

- Mutually Orthogonal
- PCA
- Mutually Independent
- ICA
- uncorrelated ends up being independent for certain distributions (so PCA could find independent features, but not as a fact)
- Maximal Variance
- PCA
- NO on ICA
- independent variables would be combined for maximal variance (central limit theorem)
- Maximal Mutual Information
- ICA
- Maximize joint mutual info between original and transformed features
- ICA
- Ordered Features
- PCA
- eigenvalues ordered by variance
- Not for ICA
- kurtosis can be used to order but generally order is not considered
- PCA
- Bag of Features
- ICA
Definition of Utility (U)
U(s) = R(s) + gamma * maxa sums’ [T(S,a,s’) U(s’)]
Long term value of being in a state
What to do when we are facing Tit For Tat?

- Always defect
- best for low gamma
- always cooperate
- best for high gamma

Way to implement feature selection for wrapping
- Hill Climbing
- Randomized Optimization
- Forward/Backward Selection
How do ICA and PCA differ?
- ICA
- solves blind source separation problem
- directional
- images
- finds noses, mouth, hair
- local “parts of”
- natural scenes
- finds edges
- independent components of the world are edges
- documents
- gives topics
- can understand fundamental features of data
- about probability, information theory
- more expensive, doesn’t always produce answer
- PCA
- not directional
- images
- finds direction of maximal variance (brightness)
- finds the average face next
- global
- natural scenes
- brightness
- average scene
- about linear algebra
What is KL Divergence
Measures the difference between any two distributions (distance measures)
Impossibility Theorem
- No clustering scheme can achieve all three of:
- richness
- scale invariance
- consistency
- Mutually contradictory properties
- Can re-define the properties and get *nearly* all 3
Relevance
Feature xi
- Xi is strongly relevant if removing it degrades BOC (Bayes Optimal Classifier, best on avg, weighted avg of hypotheses)
- Xi is weakly relevant if
- not strongly relevant
- there exists subset of features that adding Xi improves BOC
- Otherwise Xi is irrelevant
Measures effect on BOC
Relevance ~ Information
Minimax
- 2 agents A, B
- A considers worst case counter while maximizing own value
- B considers worst case counter while minimizing A value
- Finding the maximum minimum or minimum maximum
prisoners dilemma
The prisoner’s dilemma is a standard example of a game analyzed in game theory that shows why two completely rational individuals might not cooperate, even if it appears that it is in their best interests to do so
Issues with SLC
- Using the distances (closest) only can result in weird cluster shapes (stringy clusters because they are in the same cluster if they are close anywhere)

Repeated Games and the Folk Theorem
- In repeated games, the possibility of retaliation opens the door for cooperation
- Folk Theorem Definition (math):
- Results known, at least to experts in the field, and considered to have established status, but not published in complete form
- Everybody knows, no one gets credit, in the cloud of understanding
- Folk Theorem Definition (game theory):
- Describes the set of payoffs that can result from Nash strategies in repeated games
- Any feasible payoff profile that strictly dominates the minmax/security profile can be realized as a Nash equilibrium payoff profile, with sufficiently large discount factor.
KMeans as Optimization - Definition
- Configurations
- Center, P
- Scores
- Err(P, center) = sum_x ( euclidean distance(Center_P(x) - x) )
- Neighborhood
- P, center = set of pairs
- (P’, center) U (P, center’ )
- change one or the other
- (P’, center) U (P, center’ )
- P, center = set of pairs
- A lot like Hill Climbing
Desirable Clustering Properties
- Richness
- For any assignment of objects to clusters, there is some distance matrix such that partition returns that clustering
- all inputs (distance) are valid and any clustering can be valid
- Not limited in clusters that can be expressed
- Scale-invariance
- change of units does not change clustering
- inches instead of feet
- change of units does not change clustering
- Consistency
- shrinking intra-cluster distances and expanding inter-cluster distances does not change clustering
- make similar more similar and less similar even less similar, it should not change the groupings
Entropy
Tells us if a feature provides any information
Min # of yes/no questions needed to convey information
Discounted Rewards
- Sum from t = 0 to inf : gamma<em>t</em> R(St)
- Geometric series
- Rmax / (1 - gamma)
- gamma = 0
- Rmax
- gamma = 1
- prior case (no discount)
- gamma = 0
- infinite distance in finite time
- add infinite numbers -> get finite number
Markov Decision Process Framework
- States
- S
- coordinates, descriptors, represent where we are
- Model
- T(s,a,s’) ~ Pr(s’ | s, a)
- Action
- A(s), A
- Reward
- R(s), R(S,a), R(s,a,s’)
- Policy
- pi(s) -> a
- optimal policy, pi*
Sequences of Rewards: Assumptions
- Infinite Horizons
- you can live forever
- if finite
- may take risks if horizon is short
- policy can change even if state is unchanged
- lose stationarity in model
- Utility of Sequences
- 2 sequences with same start, have same preference with start state missing
- stationary preferences
- if I prefer one set of sequences today, I will tomorrow
If X and y are independent, what are their joint and conditional entropy?
Joint Entropy:
H(X, y) = H(X) + H(y)
Conditional Entropy:
H(y | X ) = H(y)
What is the utility of a sequence of states?
- Sum of the reward over the states
- Utility of infinite sequences equal infinity
- two sequences, one with larger rewards randomly, both are infinite, neither are better
- it doesn’t matter what you do if always positive rewards
Implausible Threats
- Idea that a grim trigger threat is implausible.
- Why would opposing agent threaten when the result is less reward for themselves too?
- Potential issue with Grim Trigger
*
How can achievable averages payoffs be found in Repeated Games and the Folk Theorem
- Plot strategy pairs onto a 2 player plot
- Form a convex hull between strategy pairs
- “feasible region”
- Points within the convex hull are valid average payoffs for join strategies

3 fundamental theorems of nash equilibria
- In the n-player pure strategy game, if elemination of strictly dominated strategies eliminates all but one combination, that combination is the unique N.E.
- ex: prisoners dilemma
- any N.E will survive elimination of strictly dominated strategies
- if n is finite and all states are finite, there exists a (mixed) N.E.
tf-idf
- term frequency - inverse document frequency
- nearest-neighbor in textual data







