Reinforcement learning
October 16, 2023
Bellman equation #
https://chat.openai.com/share/4def344b-c94f-4194-a428-adbeb4d12175
It seems one of the backbone idea of reinforcement learning when we want to have a formula to update the decision process.
Bellman specifically provides the goal function that the agent can try to optimize. (it’s called value function)
It was used in the Q-learning algorithm for frozen-lake example. It is used in Monte carlo method for updating the estimates.
Monte Carlo method #
https://chat.openai.com/share/2b0b9c46-4744-4e46-93cb-aca8e02716e0
algorithm #
initialize estimator (either Q or V) and policy (P)
For each episode (complete run of a single test)
- generate episode using the policy
- agent in state S takes action and get a reward r generate (S: state, a: action, r: reward)
- update what you can expect when taking action (S, a) update Q(S, a) or V(S)
- update policy (in state S, what action you will prefer probablistically)
On top of goal function (such as bellman equation), it adds another layer of random variables. #
- state, action -> next state mapping is not deterministic, it is probabilistic
- reward for taking action can be probablistic