Algoritmes Lecture 2 Flashcards

Question 1

Q

Reinforcement Learning

Answer

A

Learning from experience, rewards and punishment

Question 2

Q

environment

Question 3

Q

agent

Answer

A

has control
state rewards
actions

Question 4

Q

Markov decision process

Transition process

Answer

A

model for decision making action
set of states S
set of Action(s)
transition model P(s' | s,a)
Rewards R(s, a, s)

Question 5

Q

reward

Answer

A

r = postive OR
r = negative

Question 6

Q

Q-learning

Q = function
s= state
a = action

Answer

A

Methode for learning a functionQ(s, a)

estimated of the value of performing action a in state s

Question 7

Q

Overview Q learning

Answer

A

star with Q(s, a) for all s, a
when we take action and receive and receive a reward
estimate the value of Q(s, a) based on the current
rewards and expected future rewards
update Q(s, a) to take into the account the old estimate as well as the new one

Question 8

Q

formula Q-learning

Answer

A

start with Q(s,a ) = 0 for all s, a
every time we take an action a in state s and observe a reward r, we update
Q(s, a)

Question 9

Q

Greedy Decison-making

Answer

A

When in state s, choose action a with highest Q(s, a)

Question 10

Q

Explore VS exploit

Answer

A

AI know the way to the reward

Explore there are more possibility’s to get to the reward

Question 11

Q

epsilon

Question 12

Q

ɛ-greedy

Answer

A

Set ɛ equal to how often we want to move randomly
with probablity ɛ, choose a random move
with prob ɛ chose a random move

Question 13

Q

Code NIM

Answer

A

import random

from nim import train, play
ai = train(0) //add number to train it
play(ai)