general Flashcards

1
Q

Bayesian vs frequentist probability

A

Bayesian - degree of belief

Frequentist - proportion p of the repetition results in the given outcome with probability p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Random variable

A

Description of states which are possible, coupled with a probability distribution. Can be discrete or continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Probability mass function P(x)

A

Maps from a state of a random variable to the probability of that rv taking on that state. Tied to the random variable P(x) != P(y) Sum over all x of P(x) = 1 - normalized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Joint probability distribution

A

PMF that acts on many random variables at the same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Expected value

A

Of some function f(x) with respect to a probability distribution P(x) is the average, or mean value, that f takes on when x is drawn from P.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Variance

A

gives a measure of how much the values of a function of a random

variable x vary as we sample different values of x from its probability distribution:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Low variance implies

A

The values of f(x) cluster near their expected value. The square root of the variance is known as the Standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Covariance

A

Gives some sense of how much two values are linearly related to each other, as well as the scale of the variables:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

principle of optimality

A

Has an optimal substructure.

An MDP problem that can be broken into smaller parts and solved optimally.

Can be broken down into doing the best thing for the next step and the best thing from there onwards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bellman expectation equation

A

One step lookahead

It writes the value of a decision problem at a certain point in time in terms of the payoff from some initial choices and the value of the remaining decision problem that results from those initial choices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

acting greedy (policy)

A

maximizing short therm reward/.

max of action-value function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

methods to solve an MDP

A

policy iteration and value iteration

VI: look at all states and update the value function at every state, using the previous iteration. Build a new value func at each state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Value function

A

Expected return for a state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to calculate Incremental Mean

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

On-policy vs off-policy learning

A

On-policy learning

  • “Learn on the job”
  • Learn about policy π from experience sampled from π

Off-policy learning

  • “Look over someone’s shoulder”
  • Learn about policy π from experience sampled from µ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

2 steps of Monte-Carlo Control

A

Every episode:

1) Policy evaluation Monte-Carlo policy evaluation, Q ≈ qπ
2) Policy improvement -greedy policy improvement

17
Q

Greedy in the Limit with Infinite Exploration (GLIE)

A
18
Q

GLIE Monte-Carlo Control theorem

A

Theorem GLIE Monte-Carlo control converges to the optimal action-value function, Q(s, a) → q∗(s, a)

19
Q

SARSA update

Updating Actio-Value functions (Q) with SARSA

A
20
Q

SARSA algorithm for on-policy control

A
21
Q
A
22
Q

Bootstrapping - bias vs variance trade-off

A

bootstrapping increases the bias

23
Q

Q-Learning vs Sarsa

A
24
Q
A