RL_Interview_Qs Flashcards

1
Q

(2.01) What is Reinforcement Learning?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

(2.02) Can you explain the key components of a Reinforcement Learning problem: agent, environment, state, action, and reward?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

(2.03) What is the difference between supervised learning, unsupervised learning, and reinforcement learning? supervised learning

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

(2.04) What are the two main types of reinforcement learning algorithms: model-based and model-free?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

(2.05) What is the Markov Decision Porcess (MDP) and how is it related to Reinforcement Learning?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

(2.06) Can you explain the concepts of exploration and exploitation in the context of RL?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

(2.07) What is the difference between policy-based and value-based reinforcement learning methods?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

(2.08) What is Q-Learning? How does it work?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

(2.09) Can you describe the concept of discount factor (gamma) in RL and its purpose?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

(2.10) What is the Bellman Equation? How is it used in reinforcement learning?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

(2.11) What is the purpose of an epsilon-greedy strategy in RL?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

(2.12) Can you briefly explain the Temporal Difference (TD) learning method?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

(2.13) What are the main challenges of reinforcement leaarning?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

(2.14) What are some popular of reinforcement learning in real-world scenarios?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

(2.15) What is the role of the reward function in RL? Can you give an example?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

(2.16) What is the Monte Carlo method in RL and when is it used?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

(2.17) Can you explain the concept of state-value function (V) and action-value function (Q)?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

(2.18) What is SARSA? How does it differ from Q-Learning?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

(2.19) Can you provide an example of a continuous action space in RL?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

(2.20) What is deep reinforcement learning? How does it combine deep learning and reinforcement learning?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

(3.01) What are some common techniques to address the exploration-exploitation dilemma in RL?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

(3.02) How does the concept of “credit assignment” apply to RL?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

(3.03) What are the key differences between on-policy and off-policy learning in RL?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

(3.04) Can you explain the concept of the “curse of dimensionality” in the context of RL and how it affects learning?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

(3.05) What is the difference between asynchronous and synchronous Rl algorithms?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

(3.06) Can you explain the idea of bootstrapping in Temporal Difference learning?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

(3.07) What is the role of eligibility traces in reinforcement learning and how do they help with learning?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

(3.08) Can you describe the concept of function approximation in RL and why it’s important?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

(3.09) What is the REINFORCE algorithm and how does it relate to policy gradient methods?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

(3.10) How does experience replay work in Deep Q-Networks (DQN) and why is it important?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

(3.11) What is the role of target networks in DQN and how do they help stabilize learning?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

(3.12) What is the difference between value iteration and policy iteration in dynamic programming applied to RL?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

(3.13) Can you provide an example of a partially observable Markov decision process (POMDP) and how it differs from an MDP?

34
Q

(3.14) What is the Proximal Policy Optimization (PPO) algorithm and why is it considered an improvement over traditional policy gradient methods?

35
Q

(3.15) What are some common techniques to handle continuous action spaces in RL?

36
Q

(3.16) Can you explain the concept of intrinsic motivation in RL?

37
Q

(3.17) What is inverse reinforcement learning? How does it differ from regular reinforcement learning?

38
Q

(3.18) can you describe the idea of hierarchical reinforcement learning and its potential benefits?

39
Q

(3.19) How can transfer learning be applied to reinforcement learning and its potential benefits?

40
Q

(3.20) What are the key differences between multi-agent reinforcement learning (MARL) and single-agent reinforcement learning?

41
Q

(4.01) How do actor-critic methods combine the benefits of policy-based and value-based RL methods?

42
Q

(4.02) What are the main challenges in scaling up reinforcement learning to real-world applications?

43
Q

(4.03) How does Trust Region Policy Optimization (TRPO) improve the stability of policy updates in policy gradient methods?

44
Q

(4.04) Can you explain the concept of Generalized Advantage Estimation (GAE) and its role in policy gradient methods?

45
Q

(4.05) What are some advanced exploration strategies used in RL, beyond epsilon-greedy or softmax action selection?

46
Q

(4.06) How do model-based RL algorithms utilize an internal model of the environment to improve learning?

47
Q

(4.07) What is the role of meta-learning in RL and how can it improve learning efficiency?

48
Q

(4.08) Can you explain the concept of multi-task RL and its potential advantages?

49
Q

(4.09) What are the key principles of curriculum learning in RL and how can they be used to imporve agent training?

50
Q

(4.10) How can Bayesian methods be incorporated into RL to improve exploration and learning?

51
Q

(4.11) What is the role of information theory in RL? Specifically in terms of exploration and representation learning.

52
Q

(4.12) Can you describe the idea of distributional RL and its potential benefits over traditional RL methods?

53
Q

(4.13) What is Soft Actor-Critic (SAC) and how does it differ from other actor-critic algorithms?

54
Q

(4.14) How do imitation learning and apprenticeship learning relate to RL?

55
Q

(4.15) Can you explain the concept of off-policy correction in multi-agent RL?

56
Q

(4.16) What are some popular benchmarks and environments used for evaluating and comparing reinforcement learning environments?

57
Q

(4.17) How can the success of RL algorithms be measured beyond simple cumulative rewards?

58
Q

(4.18) What are the key challenges in sample efficiency for RL algorithms and how can they be addressed?

59
Q

(4.19) How can RL be applied to partially observable enviornments where the agent has incomplete information about the state?

60
Q

(4.20) What are some ethical considerations and potential risks in deploying RL in real-world applications?

61
Q

(5.01) Can you discuss the trade-offs between on-policy and off-policy algorithms in terms of sample efficiency and stability?

62
Q

(5.02) How do importance sampling techniques help bridge the gap between off-policy and on-policy learning in RL?

63
Q

(5.03) What is the role of entropy regularization in policy optimization and how does it affect exploration?

64
Q

(5.04) How do online planning algorithms, like Monte Carlo Tree Search (MCTS), relate to and differ from RL methods?

65
Q

(5.05) Can you discuss the challenges and strategies in designing reward functions for real-world RL applications?

66
Q

(5.06) What are some techniques to address function approximation errors in value-based RL methods and how do they affect convergence?

67
Q

(5.07) How can unsupervised and self-supervised learning methods be used to improve representation learning in RL?

68
Q

(5.08) Can you explain the concept of goal-conditioned RL and its potential advantages in sparse reward settings?

69
Q

(5.09) What are some practical challenges in training deep reinforcement learning models and how can they be addressed?

70
Q

(5.10) How can safety constraints be incorporated into RL algorithms to ensure safe exploration and deployment?

71
Q

(5.11) can you discuss the role of counterfactual reasoning in MARL and its potential benefits?

72
Q

(5.12) How can RL be combined with symbolic reasoning to improve problem-solving capabilities?

73
Q

(5.13) What are some key challenges in extending RL to deal with non-stationary environments?

74
Q

(5.14) How do algorithms liek Twin Delayed Deep Deterministic Policy Gradients (TD3) address the overestimation bias of DDPG?

75
Q

(5.15) What are some techniques for addressing the limitations of model-based RL, such as inaccurate environment models?

76
Q

(5.16) How do continual learning methods help RL agents adapt to changing environments or tasks over time?

77
Q

(5.17) Can you discuss some approaches to handling partial observability and information asymmetry in MARL?

78
Q

(5.18) What is the role of adversarial training in RL? Specifically, in terms of robustness and exploration?

79
Q

(5.19) How do algorithms like Hindsight Experience Replay (HER) help address the challenges of sparse rewards in RL?

80
Q

(5.20) Can you discuss some recent advances in the intersection of NLP and RL?

81
Q

(6.01) Can you discuss the connections between RL and optimal control theory and how they have influenced each other?