Bellman equation (part2)

Bellman equation (part2)

October 17, 2023

Definitions and Equations #

https://chat.openai.com/share/e71eecee-4263-4dc6-842e-a0dee3af28b0

this is part2 after Bellman equation (part1)

Definition of \( G_t \) #

It represents the random variable of cumulative rewards at time \( t \). The cumulative rewards can be expressed as:

\[\begin{aligned} G_t &= R_{t+1} + \gamma R_{t+2} + \gamma^2 R_{t+3} + \dots \\ G_t &= R_{t+1} + \gamma G_{t+1} \end{aligned} \]

Definition of \( V(s) \) #

It is the expected cumulative rewards given state \( s \). \[ V(s) = \mathbb{E}[G_t | S_t = s] \] Combining the above definitions, we get: \[ V(s) = \mathbb{E}[R_{t+1} + \gamma G_{t+1} | S_t = s] \]

The \( Q \) Function #

If you further condition \( V \) on action, you get the \( Q \) function: \[ Q(s, a) = \mathbb{E}[R_{t+1} + \gamma G_{t+1} | S_t = s, A_t = a] \]

Law of Total Probability #

Using the law of total probability: \[ V(s) = \sum_{a} \pi(a|s) Q(s, a) \] \[ V(s) = \sum_{a} \pi(a|s) \mathbb{E}[R_{t+1} + \gamma G_{t+1} | S_t = s, A_t = a] \]

G’s property #

This states that G doesn’t depend on t, and time series is supposed to be infinite.

\[ G(t+1|s) = G(t|s) \]

Therefore, the relation can be expressed as:

\[ \begin{aligned} Q(s, a) &= \mathbb{E}[R_{t+1} + \gamma G_{t+1} | S_t = s, A_t = a] \\ Q(s, a) &= \mathbb{E}[R_{t+1} + \gamma V(S_{t+1}) | S_t = s, A_t = a] \\ Q(s, a) &= \sum_{s’, r} p(s’, r | s, a) [r + \gamma V(s’)] \end{aligned} \]

Resulting in: \[ V(s) = \sum_a \pi(a|s) \sum_{s’, r} p(s’, r | s, a) [r + \gamma V(s’)] \]