Reinforcement learning

Bellman equation #

It seems one of the backbone idea of reinforcement learning when we want to have a formula to update the decision process.

Bellman specifically provides the goal function that the agent can try to optimize. (it’s called value function)

It was used in the Q-learning algorithm for frozen-lake example. It is used in Monte carlo method for updating the estimates.

initialize estimator (either Q or V) and policy (P)

For each episode (complete run of a single test)

generate episode using the policy
- agent in state S takes action and get a reward r generate (S: state, a: action, r: reward)
- update what you can expect when taking action (S, a) update Q(S, a) or V(S)
- update policy (in state S, what action you will prefer probablistically)