Final Flashcards

1
Q

Quiz: PCA vs ICA

A
  • Mutually Orthogonal
    • PCA
  • Mutually Independent
    • ICA
    • uncorrelated ends up being independent for certain distributions (so PCA could find independent features, but not as a fact)
  • Maximal Variance
    • PCA
    • NO on ICA
      • independent variables would be combined for maximal variance (central limit theorem)
  • Maximal Mutual Information
    • ICA
      • Maximize joint mutual info between original and transformed features
  • Ordered Features
    • PCA
      • eigenvalues ordered by variance
    • Not for ICA
      • kurtosis can be used to order but generally order is not considered
  • Bag of Features
    • ICA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definition of Utility (U)

A

U(s) = R(s) + gamma * maxa sums’ [T(S,a,s’) U(s’)]

Long term value of being in a state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What to do when we are facing Tit For Tat?

A
  • Always defect
    • best for low gamma
  • always cooperate
    • best for high gamma
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Way to implement feature selection for wrapping

A
  • Hill Climbing
  • Randomized Optimization
  • Forward/Backward Selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do ICA and PCA differ?

A
  • ICA
    • solves blind source separation problem
    • directional
    • images
      • finds noses, mouth, hair
      • local “parts of”
    • natural scenes
      • finds edges
      • independent components of the world are edges
    • documents
      • gives topics
    • can understand fundamental features of data
    • about probability, information theory
    • more expensive, doesn’t always produce answer
  • PCA
    • not directional
    • images
      • finds direction of maximal variance (brightness)
      • finds the average face next
      • global
    • natural scenes
      • brightness
      • average scene
    • about linear algebra
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is KL Divergence

A

Measures the difference between any two distributions (distance measures)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Impossibility Theorem

A
  • No clustering scheme can achieve all three of:
    • richness
    • scale invariance
    • consistency
  • Mutually contradictory properties
  • Can re-define the properties and get *nearly* all 3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Relevance

A

Feature xi

  • Xi is strongly relevant if removing it degrades BOC (Bayes Optimal Classifier, best on avg, weighted avg of hypotheses)
  • Xi is weakly relevant if
    • not strongly relevant
    • there exists subset of features that adding Xi improves BOC
  • Otherwise Xi is irrelevant

Measures effect on BOC

Relevance ~ Information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Minimax

A
  • 2 agents A, B
    • A considers worst case counter while maximizing own value
    • B considers worst case counter while minimizing A value
  • Finding the maximum minimum or minimum maximum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

prisoners dilemma

A

The prisoner’s dilemma is a standard example of a game analyzed in game theory that shows why two completely rational individuals might not cooperate, even if it appears that it is in their best interests to do so

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Issues with SLC

A
  • Using the distances (closest) only can result in weird cluster shapes (stringy clusters because they are in the same cluster if they are close anywhere)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Repeated Games and the Folk Theorem

A
  • In repeated games, the possibility of retaliation opens the door for cooperation
  • Folk Theorem Definition (math):
    • Results known, at least to experts in the field, and considered to have established status, but not published in complete form
    • Everybody knows, no one gets credit, in the cloud of understanding
  • Folk Theorem Definition (game theory):
    • Describes the set of payoffs that can result from Nash strategies in repeated games
    • Any feasible payoff profile that strictly dominates the minmax/security profile can be realized as a Nash equilibrium payoff profile, with sufficiently large discount factor.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

KMeans as Optimization - Definition

A
  • Configurations
    • Center, P
  • Scores
    • Err(P, center) = sum_x ( euclidean distance(Center_P(x) - x) )
  • Neighborhood
    • P, center = set of pairs
      • (P’, center) U (P, center’ )
        • change one or the other
  • A lot like Hill Climbing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Desirable Clustering Properties

A
  • Richness
    • For any assignment of objects to clusters, there is some distance matrix such that partition returns that clustering
    • all inputs (distance) are valid and any clustering can be valid
    • Not limited in clusters that can be expressed
  • Scale-invariance
    • change of units does not change clustering
      • inches instead of feet
  • Consistency
    • shrinking intra-cluster distances and expanding inter-cluster distances does not change clustering
    • make similar more similar and less similar even less similar, it should not change the groupings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Entropy

A

Tells us if a feature provides any information

Min # of yes/no questions needed to convey information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Discounted Rewards

A
  • Sum from t = 0 to inf : gamma<em>t</em> R(St)
  • Geometric series
  • Rmax / (1 - gamma)
    • gamma = 0
      • Rmax
    • gamma = 1
      • prior case (no discount)
  • infinite distance in finite time
    • add infinite numbers -> get finite number
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Markov Decision Process Framework

A
  • States
    • S
    • coordinates, descriptors, represent where we are
  • Model
    • T(s,a,s’) ~ Pr(s’ | s, a)
  • Action
    • A(s), A
  • Reward
    • R(s), R(S,a), R(s,a,s’)
  • Policy
    • pi(s) -> a
    • optimal policy, pi*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Sequences of Rewards: Assumptions

A
  • Infinite Horizons
    • you can live forever
    • if finite
      • may take risks if horizon is short
      • policy can change even if state is unchanged
      • lose stationarity in model
  • Utility of Sequences
    • 2 sequences with same start, have same preference with start state missing
    • stationary preferences
      • if I prefer one set of sequences today, I will tomorrow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If X and y are independent, what are their joint and conditional entropy?

A

Joint Entropy:

H(X, y) = H(X) + H(y)

Conditional Entropy:

H(y | X ) = H(y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the utility of a sequence of states?

A
  • Sum of the reward over the states
  • Utility of infinite sequences equal infinity
    • two sequences, one with larger rewards randomly, both are infinite, neither are better
    • it doesn’t matter what you do if always positive rewards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Implausible Threats

A
  • Idea that a grim trigger threat is implausible.
    • Why would opposing agent threaten when the result is less reward for themselves too?
  • Potential issue with Grim Trigger
    *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How can achievable averages payoffs be found in Repeated Games and the Folk Theorem

A
  • Plot strategy pairs onto a 2 player plot
  • Form a convex hull between strategy pairs
    • “feasible region”
  • Points within the convex hull are valid average payoffs for join strategies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

3 fundamental theorems of nash equilibria

A
  • In the n-player pure strategy game, if elemination of strictly dominated strategies eliminates all but one combination, that combination is the unique N.E.
    • ex: prisoners dilemma
  • any N.E will survive elimination of strictly dominated strategies
  • if n is finite and all states are finite, there exists a (mixed) N.E.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

tf-idf

A
  • term frequency - inverse document frequency
    • nearest-neighbor in textual data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the Q Function?

A

Q(s, a) = R(s) + gamma sums’ [T(s,a,s’) maxa’ Q(s’,a’)]

Value for arriving in a state s, leaving via a, then proceeding optimally thereafter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

MDP: Reward

A
  • Reward
    • R(s), R(S,a), R(s,a,s’)
      • mathematically equivalent
    • encompasses domain knowledge
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Principal Components Analysis (PCA)

A
  • Eigenproblem
  • Find direction of maximal variance
  • Finds orthogonal directions
    • PC2 is orthogonal to PC1
  • Properties
    • Global algorithm
    • Best reconstruction
      • no information lost (with all PCs kept)
      • Minimize L2 (squared) error moving from n to m dimensions
    • Each dimension has eigenvalue
      • non-negative
      • monotonically non-increase from PC1 to PC2 to PCn
      • Can through away higher eigenvalues (least variance)
      • eigenvalue of 0 is ignorable (0 entropy, meaningless)
    • well studied (fast algorithms)
  • Filter method
    • Can be difficult with classification if feature associated with label has low variance (the feature may be inadvertently thrown out)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Random Components Analysis

A
  • Generate random directions and then projects data out into those directions
  • works well
    • At what?
      • Before classification
        • Maintains signal in lower dimension
        • Picks up some correlations
  • projects in higher dimensional space than other algorithms like PCA
  • Advantage over PCA/ICA
    • fast
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

If 2 coins are independent, what is the entropy and mutual information between them?

A

I(A,B) = H(A) - H(A|B) = 1 - 1 = 0 (no mutual info between them)

H(A , B) = H(A) + H(B) = 1 + 1 = 2

H(A) = H(B) = -0.5 log_2(0.5) - 0.5 log_2(0.5) = 1 (Maximum entropy)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

mixed strategy vs pure strategy

A
  • mixed strategy
    • some distribution over strategies
  • pure strategy
    • mixed strategy with all probability mass on a single strategy
    • probability 1 of chosen strategy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Properties of EM

A
  • Monotonically non-decreasing likelihood
  • does not converge (in practice does)
    • in strange examples
  • will not diverge
  • can get stuck
    • random restart
  • works with any distribution (if E, M solvable)
    • expectation is expensive (generally)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Forward & Backward Feature Selection

A
  • Forward
    • Pass all features to learner
      • Start with no features
      • Repeat until error doesnt reduce
        • Add feature to get best score
  • Backward
    • Pass all features to learner
      • Start with all features
      • Repeat until error doesn’t reduce
        • Remove feature to get best score
  • Like hill-climbing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

MDP: Model

A
  • Model
    • T(s,a,s’) ~ Pr(s’ | s, a)
      • probability of transition into s’ given that you were in state s and took action a
    • rules of the game
    • physics of the world
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Feature Selection: Why?

A
  • Knowledge Discovery
    • interpretability & insight
      • which features matter
      • allow lower dimensions for understanding
  • Curse of dimensionality
    • need exponentially more data for more features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Value Iteration

A
  • Repeat until convergence
    • Start w/arbitrary utilities
    • Update utilities based on neighbors
      • U_prime(s)t+1 = R(s) + gamma * maxa sums’ [T(s,a,s’) U_primet(s’)]
        • ​update utility of s by looking at utilities of all other states
        • the R(s) is truth
  • Why do we converge?
    • Add truth to wrong.. more truth to wrong… more truth to wrong… converge to true as wrong is miniscule
  • Works because we propagate value across the states
    • Order matters, not absolute values
    • We don’t care about the actual value, just need utility good enough
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Curse of Dimensionality

A

Amount of data needed is 2N number of features

Adding more features means exponentially more data is needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Linear Discrimant Analysis (LDA)

A
  • finds a projection that discriminates based on the label
  • like supervised learning
  • Not filter method like others, cares about label explicitly
    *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Importance different between MDPs and supervised learning

A
  • delayed rewards
    • take many actions prior to reward
    • ex: chess
      • temporal credit assignment
  • minor changes matter
    • reward changes affect policy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Iterated Prisoner’s Dilemma with Uncertain Ending

A
  • With probability gamma, game continues
    • else, game over
  • Expected # of rounds
    • Finite with gamma < 1
    • 1 / (1 - gamma)
      • reminiscent of discount factor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Tit-For-Tat

A
  • iterated prisoner’s dilemma strategy
    • cooperate on first round
    • copy opponents previous move thereafter
  • strategies
    • always defect
      • cooperate, defect, defect, defect, …
    • always cooperate
      • cooperate forever
    • defect, cooperate, defect, cooperate, …
      • cooperate, defect, cooperate, defect, …
    • TFT (against another TFT agent)
      • cooperate forever
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is the best response to each strategy (gamma > 1/6)

A
  • Mutual Best Response
    • Pair of strategies where eaach best respone to other. Nash Equilibria
    • In the below chart, D - D and TFT - TFT are Nash Equilibria
      • TFT - TFT Cooperative Nash
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

MDP: States

A
  • States
    • S
    • coordinates, descriptors, represent where we are
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Policy Iteration

A
  • Algorithm
    • Start with initial policy (arbitrary)
    • Evalute policy
      • calculate Ut for policy (utility)
    • Improve Policy
      • pit+1 = argmaxa sum T(s,a,s’) Ut(s’)
      • Ut(s) = R(s) + gamma sums’ T(s, pit(s), s’) Ut(s’)
        • n linear (unlike VI) equations in n unknowns
  • Fewer iterations than Value Iterations
  • guaranteed to converge
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

mechanism design

A
  • set up incentives to get particular behavior
  • economics/government
    • ex: tax breaks for mortgage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Bellman Equation

A
  • True utility of a state
    • reward for the state plus discounted rewards
    • R(s) + gamma * maxa sums T(s,a,s’) U(s’)
      • ​non-linear
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Policies

A
  • pi*
    • maximizes expected reward
      • argmax_pi E [sumt to inf: gammat R(St) | pi]
  • Utility of a state
    • not equal to the reward of state (immediate feedback)
    • long term feedback
    • expected reward for given state and all future states
    • E [sumt to inf: gammat R(St) | pi , so = s]
    • accounts for all delayed rewards
47
Q

Define a Basic Clustering Problem

A

Take a set of objects and put into groups.

Given: Set of objects, X; inter-object distances ( d(X,y) = D(y, x) )

Output: Partition Pd(X) = Pd(y) if x & y are in same cluster

48
Q

Why use Feature Transformation? (Example)

A
  • Information Retrieval (ad-hoc) “google problem”
    • Need: retrieve subset of documents based off of relevance
    • Features
      • words
    • Problem with words as features?
      • insufficient indicators
      • lot of words
      • polysemy - words mean multiple things
        • apple (fruit or company)
        • false positive
      • synonymy - many words mean same thing
        • car vs. automobile
        • false negatives
49
Q

Possible filtering criteria for feature selection

A
  • Information Gain
    • depends on labels
  • Gini index
  • variance
  • entropy
    • doesn’t depend on labels
  • NN and prune with low weights
    • “useful”
  • non-redundant/independent
    • get rid of x2 if x2 = x1 + x3
50
Q

Feature Transformation

A
  • The problem of pre-processing a set of features to create a new (smaller? more compact?) feature set, while retaining as much (relevant? useful?) information as possible
  • usually less features (almost always)
  • Linear combination of original features
  • Feature selection is a subset
  • Features into a new feature space
51
Q

Q-Learning Convergence

A
  • Converges if:
    • s,a is visited infinitely often
    • learning rate sum = inf, sum2 < inf
    • next states have to be drawn from transition probabilities
    • rewards have to be drawn from reward function
52
Q

Cascade Learning

A
  • Using a logarthmic reduction in an unbalanced dataset to remove the majority class by half in each iterations, ultimately ending up with a small balanced dataset
    • As dataset size is reduced by half in each step, computational power can be double and end up the same (half the data, 2x computation is original to previous step)
  • In unbalanced dataset it is often important to get accurately classify the minority class
53
Q

Definition of Policy of a state

A

pi(s) = argmaxa sum<span>s’</span> [T(s,a,s’) U(s’)]

54
Q

security profile

A
  • mixed strategy, similar to minmax profile
55
Q

Usefulness

A

Measures effect on a particular predictor

Usefulness ~ Error | Model/Learner

56
Q

Properties of Q for general-sum games

A
  • Nash - Q
  • No longer using minimax games (not zero sum)
  • Nash Equilibrium operator instead of minimax
  • value iteration doesn’t work
  • Nash - Q doesn’t converge
  • No unique solutions
  • Policies cannot be computed independently
    • Nash Equilibrium is a joint behavior
      *
57
Q

Three Approaches to RL

A
  • Pi
    • policy
    • maps states to actions
    • direct use, indirect learning
    • policy search algorithms
  • U
    • utility
    • maps states to value
      • value function based algorithms
  • T,R
    • Can use to solve Bellman Equations
    • maps states, actions to new states and rewards
    • direct learning, indirect use
    • model-based learners
58
Q

Von Neumann’s Theorem

A
  • In a 2-player, zero-sum (non)deterministic game of perfect information
    • minimax = maximin
    • there always exists an optimal pure strategy for each player
  • Can determine optimal joint policy with matrix (without original tree)
  • Assumption
    • rational agent trying to maximize reward
    • other agents are doing the same rational thing
59
Q

What is important about subgame perfect?

A
  • You will fall into mutual cooperation no matter what
  • Can return to a cooperative state even after punishment
    • becomes a plausible threat
60
Q

Time Complexity of Feature Selection

A
  • Given n features -> m features
    • where m <= n
  • Exponential
    • there’s an exponential number of subsets 2n
  • NP-Hard
61
Q

Independent Components Analysis (ICA)

A
  • About finding independence (not correlation to maximize variance like PCA)
  • Transform features to be statistically independent from each other (mutual information is 0)
  • Maximize the mutual information between all of the original features and the original feature space
  • Underlying Assumption
    • mutually independent random variables combined linearly
    • observables are linear combinations of hidden variables
    • ex: cocktail party problem
      • microphones (observables) hear different linear combinations of voices (hidden variables) at party
62
Q

What are the general approaches to feature selection

A
  • Filtering
    • input -> set of features
    • selection algorithm
      • contains how well you’re doing
    • output -> fewer features
    • input to learning algorithm
    • Search Algorithm doesnt use learner
  • Wrapping
    • input -> set of features
    • algorithm - selection & learning
      • assess features and get fewer
    • Search Algorithm wrapped with learner
63
Q

Grim Trigger

A
  • Mutual Benefit, as long as you cooperate
  • Stop cooperating (cross the line)
    • Deal out vengence forever
  • Create a nash equilibrium situation, no incentive to “cross the line” which decreases the reward
64
Q

What is the expected message size?

A - 0 (50%)

B - 1 1 0 (12.5%)

C - 1 1 1 (12.5%)

D - 1 0 (25%)

A

= 1bit * 0.5 + 2bits * 0.25 + 3bits * 0.123 * 2 letters

= 0.5 + 0.5 + 0.75

= 1.75

This is an example of variable length encoding

The computed value is called entropy

65
Q

A soft clustering algorithm

A
  • Assumption: Data generated by
    • Repeat n times
      • One of K Gaussians, uniformly
      • Sample Xi from Gaussian
  • Find hypothesis h = that maximizes the probability of the data (maximum likelihood)
  • The maximum likelihood of the gaussian mu is the mean of the data (for one distribution)
    • What if there are k mus?
      • Hidden variables!
66
Q

MDP: Actions

A
  • Action
    • A(s), A
    • things that you can do in a particular state
      • gridworld: up, down, left, right
    • generalized as a function of state
67
Q

Clustering Properties Quiz

A
  • A
    • Not rich (cant have any combination of clusters)
    • SI - ordered
    • Consistent
  • B
    • Rich (any clusters)
    • Not SI - depends on unit
    • Consistent
  • C
    • Rich (any clusters)
    • SI (scaling is undone by theta and max)
    • Not consistent
68
Q

Formula for Entropy

A

(using log base 2)

69
Q

POMDPs

A
  • partially observable markov decision processes
    • more realistic
    • separation between actual world and the decision makers idea of the state
70
Q

central limit theorem

A

Law of large numbers - the data turns normal, gaussian

Linear combinations of statistically independent variables

71
Q

inverse reinforcement learning

A

start from observations of behavior, try to guess what reward function resulted in the behaviors

72
Q

Soft Clustering

A

Data points can belonhg to more than one cluster

probabilistic

73
Q

Is Pavlov v. Pavlov Nash?

A
  • Yes
  • Start out cooperating (both agents), continue to cooperate, no reason to not cooperate
74
Q

Supervised Learning

A

Use labeled training data to generalize labels to new instances

“Feature Approximation”

75
Q

Estimating Q from transitions (without R or T)

Q-Learning update rule

A
  • Qest(s,a) (alpha) r + gamma * maxa’ Q_est(s’.a’)
    • utility of the state
      • Qest
    • utility of the next state
      • maxa’ Qest(s’,a’)
    • alphat: learning rate
76
Q

2-player, zero-sum, non-deterministic game of hidden information

A
  • minmax != maxmin
  • von neuman fails
  • A strategy depends on B and B depends on A
    • with hidden information you wont know
  • mixed strategy
77
Q

With a fair coin (Heads/Tails) and an unfair coin (Heads both sides) what is the size of each message in bits if the coins are flipped 10 times and the results are transmitted?

A

Fair = 10 (need all bits)

Unfair = 0 (need no bits, already know result)

78
Q

What is true of a sequence that is more predictable?

A

more predictable = less information

79
Q

Nash Equilibrium

A
  • n players with strategies (s1, s2, …, sn)
  • particular strategies are in a nash equilibrium iff
    • for all strategies chosen, if you randomly choose a player to switch their strategy they would have no reason to (in equilibrium)
    • for all strategies chosen = argmaxsi utility(s1 … s2 … sn)
80
Q

Direct Learning, Indirect Use RL

A

model-based approaches

access to underlying T and R

81
Q

Minmax Profile

A
  • Pair of payoffs, one for each player, that represent the payoffs that can be achieved by a player defending itself from a malicious adversary
    • zero sum game
    • trying to lower the agents score
  • pure
82
Q

Properties of Q for Zero-sum stochastic games

A
  • Uses minimax operator
  • Value iteration works
  • minimax-Q Converges
  • Unique solution to Q*
  • Policies can be computed independently
    • 2 players can run minimax Q on their own
  • Update efficiently
    • polynomial time
    • linear programming for minimax
  • Q functions are sufficient to specify policy
83
Q

n-Step Prisoners dilemma

A

prisoners will always defect (nash equilibrium) in the last attempt (expecting other player to diverge from the previous attempts to maximize their own payout)

84
Q

Markovian Property

A
  • Only the present matters
  • Transition function in an MDP only depends on current state s
    • s’ only needs s
  • Current state can be markovian if it remembers where you’ve been
  • Things are stationary
    • rules do not change
    • transition model doesn’t change
85
Q

What measure can be used to assess dependence between two variables?

A

Mutual Information ( I )

I (X, y) = H(y) - H(X | y)

Measures the reduction of randomness of a variable given knowledge of another variable

86
Q

How can the four symbols with given frequencies be sent with bits, most efficiently?

A - 50%

B - 12.5%

C - 12.5%

D - 25%

A

A - 0

B - 1 1 0

C - 1 1 1

D - 1 0

87
Q

Information Theory

A

Mathematical Framework to allow us to compare probability density functions.

Allows us to determine what input variables provide more information for a response variable

88
Q

Reinforcement Learner “API”

A
  • Planner
    • Input: Model (T,R)
    • Output: Policy (pi)
  • Learner
    • Input: Transitions
    • Output: Policy
  • Modeler
    • Input: Transitions
    • Outout: model
  • Simulator
    • Input: model
    • Output: transitions
89
Q

Properties of Wrapping Algorithm for Feature Selection

A
  • Pros
    • Take into account model bias and learning
  • Cons
    • Soooo slow
90
Q

Game Theory

A
  • Mathematics of conflict
    • conflicts of interest
  • Single Agents -> multiple agents
  • Economics ( & politics)
  • Increasingly a part of AI/ML
91
Q
  • Runtime of SLC
    • given n points (objects)
    • k clusters
A

O(n3)

  • Repeat k times (n/2)
    • Look at all distances to find closest O(n2) pair that have different labels
92
Q

Differences between supervised, unsupervised, and reinforcement learning

A
  • Supervised learning
    • function approximation
    • given y,x pairs find function f
  • unsupervised learning
    • given x’s and goal is to find f
    • get description based on features
    • clustering, description
  • reinforcement learning
    • given x’s and z’s and learn some f to generate y’s
      • z - rewards
    • mechanism for decision making
93
Q

MDP: Policy

A
  • Policy
    • pi(s) -> a
    • pi*
      • optimal policy
      • maximizes long term expected reward
  • Function that takes in a state and returns the action you should take
94
Q

In what ways is information between two variables determined?

A

Joint Entropy (Randomness contained in 2 variables together)

H(X,y) = - sum [ P(X,y) log P(X, y)

Conditional Entropy (Entropy of y given X, Randomness of y given X)

H(y | X) = - sum [ P(X,y) log P (y | X
)

95
Q

Unsupervised Learning

A

Make sense out of unlabeled data

“data description”

96
Q

Single Linkage Clustering (SLC)

A

* Hierarchical Aglomerative Clustering Algorithm (HAC)

  • Algorithm:
    • repeat n-k times to make k clusters
      • consider each object a cluster (n objects)
      • define intercluster distance as distance between the cloests two points in the two clusters
      • merge two closest clusters
97
Q

Pavlov

A
  • Cooperate if agree, defect if disagree
98
Q

K-Means Clustering

A
  • Algorithm:
    • Repeat until convergence
      • Pick K Centers (random)
      • Centers claim closest points
      • Recompute the centers by averaging clustered points
  • Note: Center is not always a point
99
Q

cross-entropy

A
  • randomized optimization algorithm
  • similar to MIMIC
100
Q

Direct Use, Indirect Learning RL

A

Policy search based algorithms

101
Q

Algorithms that use filtering for feature selection

A
  • Decision trees are a filtering algorithm
    • Use information gain
      • features that provide most information given class labels
102
Q

Is Pavlov v. Pavlov Subgame perfect?

A
  • Yes
  • Synchronize to mutually cooperative state no matter the historical sequence
  • Average reward is mutual cooperation
    • C, C
    • C, D
    • D, C
    • D, D
103
Q

Computational Folk Theorem

A
  • Game: 2 player, bimatrix game, average reward repeated
  • Can build a pavlov-like machine for any game and construct subgame perfect Nash equilibrium for any game in polynomial time
    • Pavlov if possible
    • Zero-sum like (solve an LP)
    • at most one player improves
104
Q

Mutual Information

A

to determine if input vectors are similar

105
Q

Different versions of Q

A
  • Q-Learning is a family of algorithms that differ:
    • How to initialize Q_est?
      • all awesome “optimism in the face of uncertainty”
        • Q will explore less visited “still awesome” states
    • How to decay learning rate?
    • how to choose actions?
      • always choose ao - wont learn
      • always choose randomly - wont use it
      • use Q_est to choose actions - wont learn (if local min)
        • greedy
      • use Q_est with random restarts
        • slow
      • use Q_est with “simulated annealing” like approach
        • random actions sometimes
        • random action with p = epsilon, q_est action with 1 - epsilon
106
Q

What are the Utility and Policy of a state given Q?

A

U(s) = maxa Q(s,a)

pi(s) = argmaxa Q(s,a)

107
Q

Ideas for solving general-sum stochastic games

A
  • repeated stochastic games (folk theorem)
  • cheaptalk -> correlated equilibria
    • nothing said is binding
    • players can coordinate
    • can compute a correlated equilibrium (more efficient to compute than nash)
  • Cognitive hierarchy -> best responses
    • instead of solving for equilibrium, assume other players have limited computational resources then take best response of what you believe other players will do
    • more easily computed, assuming other player is fixed
  • side payments (coco values)
    • give side reward to encourage actions
108
Q

Properties of Filtering algorithms for feature selection

A
  • Pros
    • Faster than wrapping
  • Cons
    • Ignores the learning problem
    • Look at features in isolation
109
Q

2 player, non zero-sum, non-deterministic game of hidden information

A
  • prisoners dilemma
    • choose option that is not best for the group
      • over fear of other agent making bad choice
  • see image
    • (-6, -6) option is chosen, both agents will never cooperate
110
Q

Expectation Maximization

A
  • 2 phases
    • Expectation
      • Likelihood that data i comes from cluster j
      • E[Zij] = P(X = xi | mu = mu_j) / normalization
    • Maximization
      • Compute the means from the cluster
      • mu_j = sum_i E[Zij] * Xi / normalization
  • K-means if cluster assignments use argmax
  • Note: With EM probabilities, even points clearly in one cluster have a non-zero probability of the other cluster
111
Q

Subgame perfect equilibrium

A
  • Always best response independent of history
    • if we had history of the response of each agents and one of the responses was not the best, not subgame perfect equilibrium
  • Example: Grim Trigger vs. Tit-For-Tat is not Subgame perfect
    • If TFT decides to defect first, Grim defects
      • If Grim cooperated anyway, grim would do better
      • Grim makes a threat and follows through with the threat, resulting in worse outcome than if it hadn’t followed through
        • Grim: C | D , D, D, D, …
        • TFT: D | C, D, D, D, …
112
Q

Epsilon-Greedy Exploration

A
  • If GLIE “Greedy limit + infinite exploration” (decayed epsilon)
    • Q_est converges to Q
    • pi_est converges to pi*
  • Exploration - Exploitation dilemma
    • Exploration - learn
    • exploitation - use what you know
    • trade-off
113
Q

What is true of K-Means in euclidean space?

A
  • Error can only go down
    • Move point into new partition iff the error goes down
    • Average minimizes the squared error, moving the center would never increase error
    • Monotonically non-increasing in error
  • Each iteration is polynomial ( k*n per iteration )
  • Converges in finite time (kn iterations)
    • Need some way of breaking ties
      • If a point is equally far from both partitions
      • Consistently break ties
    • Only look at configuration once
  • Can get stuck (ties)