Exam 2 Flashcards

Question

x_i is <> relevant if its not strongly relevant or there's a subset of features S such that adding x_i to S improves the Bayes Optimal Classifier

Answer 1

Measures the effect of Bayes Optimal Classifier, how much information a particular feature provides

Answer 2

Measures effect on error, measures if a feature helps in minimizing error given a particular model

Answer 3

Example of an eigenproblem which will transform the features set by: - Finding the direction (vector) that maximizes variance, this is called the Principal Component - Finding directions that are orthogonal to the Principal component - Each Principal component has a prescribed eigenvalue - PCA gives the ability to do reconstruction, b/c its a linear rotation of the original space that minimizes L2 error by moving N to M dimensions so we don't lose information - We can center the problem around the origin by subtracting the mean of. the data - in effect, PCA produces a set of orthogonal gaussians - PCA is a global algorithm that is very fast.

Answer 4

- Attempts to maximize independence - Tries to find a linear transformation of the feature space, such that each of the individual new features are mutually statistically independent - The mutual info b/w any 2 random features equals 0 - The mutual info b/w the new features set and the old features set is as high as possible

Answer 5

- Similar to PCA but instead of generating directions that maximize variance, it generates random directions - It captures some of the correlations that works well with classification settings - Faster than PCA and ICA

Answer 6

- Finds a projection that discriminates based on the label, it finds projections of features that ultimately align best with the desired output (wrapping function instead of filtering)

Answer 7

Markov Decision Problem - State - representation - Model - Transition model, probability of transitioning to a new state given a state and an action - Actions - Things the object is allowed to do in a particular state s - Reward - Reward the object gets when transitioning to a state s, which tells it the "usefulness" of transitioning to that state Markov Decision Problem Solution - Policy - Takes a state and returns the proper action. A problem might have different policies

Answer 8

We don't have instant rewards for each action, we have the problem of figuring out which specific action(s) lead us to a positive/negative outcome

Answer 9

we get the first reward and everything else will fall off to nothing

Answer 10

we get a maximized reward

Answer 11

Summation of gamma to the t times reward of the state at time t Equals R_max divided by (1 - gamma)

Answer 12

1. Start with arbitrary utilities 2. Update utilities based on neighbors 3. Repeat until convergence Guaranteed to converge because with each step we're adding R(s), which is a true value.

Answer 13

1. Start with an arbitrary policy pi_0 2. Evaluate how good that policy is by calculating the utility of using the policy 3. Improve the policy

Answer 14

a model consisting of a transition function T and a reward function R, and the intended output is to compute the policy pi (Planning)

Answer 15

transitions (initial state, action, reward, result state, ...), and the intended output is to "learn" the policy pi.

Answer 16

Mapping states to actions. Problem with this approach is that the data doesn't reveal which actions need to be taken (Temporal Credit Assignment Problem)

Answer 17

Mapping states to values. Use argmax to turn into policy

Answer 18

Mapping (states & actions) to (next state & reward) Turn this into a utility function using Bellman equations then use argmax to get the policy. This is computationally indirect

Answer 19

Utility of leaving state s via action a, which is the reward of state s plus the discounted expected value of taking action a multiplied by the value of the optimum action in state s_prime

Answer 20

Estimating the value of Q(s,a) based on transitions and rewards Issue is that we don't have R(s) and T(s, a, s_prime) since we're not in an MDP setting

Answer 21

Estimate of the Q-function that we will update by a learning rate of alpha_t in the direction of the immediate reward r plus the estimated value of the next state

Answer 22

Getting the data that you need (learning)

Answer 23

Stop learning and actually use what you've already learn

Answer 24

There is only one agent interacting with the world with conflicting actions

Answer 25

Mathematics of conflicts of interests when making decisions. Related to AI when there's multiple agents acting in the environment

Answer 26

Mapping of "all possible" states to actions

Answer 27

Everyone is trying to maximize his/her own reward while knowing that everyone is trying to do the same

Answer 28

We're in a situation where no one has any reason to change his strategy

Answer 29

Instead of making the same decisions given the same circumstances (pure strategy), we assign probabilities to our state (distribution over strategy)

Answer 30

Make same decisions given the same circumstances

Answer 31

1. In the n-player pure strategy game, if equilibrium of strictly dominated strategies eliminates all but one combination, that combination is the Nash equilibrium 2. Any Nash equilibrium will survive elimination of strictly dominated strategies 3. If the number of players is finite and we have finite number of strategies for each player, there exists at least one (possibly mixed) Nash equilibrium

Exam 2 Flashcards

(60 cards)