
Reinforcement Learning Cheat Sheet (Exam Killer Version)
*1. Core Idea (Write This in Any Answer Intro)
*
Reinforcement Learning is a learning paradigm where an agent interacts with an environment and learns to take actions that maximize cumulative reward over time.
Keywords to include:
Trial and error
Reward signal
Sequential decision making
2. RL Framework (Must Draw in Exam)
Agent → Action → Environment → Reward → New State
Write:
Agent (decision maker)
Environment (external system)
State (current situation)
Action (choice)
Reward (feedback)
👉 Example (very important for marks):
Game playing / robot navigation
** 3. Markov Decision Process (MDP)**
Definition:
MDP is a mathematical model for RL problems.
Tuple:
(S, A, P, R, γ)
S → States
A → Actions
P → Transition probability
R → Reward
γ → Discount factor
👉 Key concept:
Markov Property → Future depends only on present state
4. Return & Discount Factor
γ (0 to 1)
High γ → future matters
Low γ → immediate reward matters
5. Value Functions (Very Important)
State Value: V(s) → how good a state is
Action Value: Q(s,a) → how good an action is
👉 Always mention:
“Expected cumulative reward”
6. Bellman Equation (CORE CONCEPT)
👉 Key idea:
Breaks problem into smaller subproblems
Recursive nature
7. Policy
Policy = strategy of agent
Deterministic → fixed action
Stochastic → probability-based
👉 Write:
π(a|s)
8. Q-Learning (Most Important Algorithm)
Off-policy
Uses max future reward
9. SARSA
On-policy
Uses actual next action
10. Q-Learning vs SARSA (Exam Favorite)
11. Exploration vs Exploitation
Exploration → try new actions
Exploitation → use best known
👉 Method:
Epsilon-greedy
12. Monte Carlo vs TD Learning
13. Policy Iteration vs Value Iteration
Policy Iteration:
Evaluate → Improve
Value Iteration:
Directly update values
14. Common Exam Mistakes (Avoid These)
Writing definitions without examples
Skipping diagrams
Not explaining formulas
No comparison tables
15. 1-Minute Revision Strategy
Before exam Revise:
Bellman Equation
Q-Learning & SARSA
MDP
👉 These alone can cover most paper.
THIS IS THE PART1 IF YOU WANT PART2 OF CHEATSHEET JUST COMMENT BELOW OR VISIT, END OF THE SESSION
United States
NORTH AMERICA
Related News
How Braze’s CTO is rethinking engineering for the agentic area
10h ago
Amazon Employees Are 'Tokenmaxxing' Due To Pressure To Use AI Tools
21h ago

Implementing Multicloud Data Sharding with Hexagonal Storage Adapters
15h ago

DeepMind’s CEO Says AGI May Be ~4 Years Away. The Last Three Missing Pieces Are Not What Most People Think.
15h ago

CCSnapshot - A Claude Code Configs Transfer Tool
21h ago




