Q-Learning Network MLM Flashcards
Q-Learning
Q-Learning is a model-free reinforcement learning algorithm. The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.
- Introduction
Q-learning is a values iteration algorithm in reinforcement learning. It’s used to learn the optimal policy for a Markov Decision Process (MDP) when the transition model is not known. The policy learned is the one that maximizes the total reward over all successive steps.
- Action-Value Function
In Q-Learning, the value of a state-action pair is represented by a Q-value, stored in a Q-table. The Q-value is a measure of the expected return from a state, given an action and following a specific policy.
- Q-table
The Q-table is a table of states and actions that guides the agent to the best action from a given state. The table is initialized arbitrarily, and then values are updated iteratively based on the reward received for actions taken.
- Learning Process
During the learning process, the agent explores the environment, and the Q-values are updated using the Bellman equation. This equation states that the Q-value for a state-action pair is the immediate reward plus the discounted maximum Q-value for the next state.
- Exploitation vs Exploration
The agent needs to balance exploration (trying out new actions to see their effect) and exploitation (choosing the action with the highest Q-value). This is often managed with an ε-greedy strategy, where the agent chooses a random action with probability ε and the action with the highest estimated reward with probability 1-ε.
- Convergence
Under certain conditions, the Q-learning algorithm is guaranteed to converge to the optimal policy. These conditions include having a finite number of states and actions, and each state-action pair being visited an infinite number of times.
- Advantages
Q-Learning is a model-free approach, meaning it can learn optimal actions just from interactions with the environment without needing a model of the environment’s dynamics. It can handle problems with stochastic transitions and rewards without requiring adaptations.
- Applications
Q-Learning has been used successfully in various domains including robotics, scheduling, gaming, and resource management.