reinforcement learning
October 12, 2023
chatgpt conversation #
Frozen Lake (Q-learning) #
frozen lake (Q learning) implementation
Frozen Lake (policy gradient) #
heuristic aspect of RI #
definition #
There is a goal. Goal can be measured numerically in intermediate states. (Reward)
Find a way to reach the goal using the reward as hint. One often has to react to intermediate state (without perfect information) Often, model it by using decision (Policy) acting on the perceived state
property #
The environment is not perfectly modeled (understood) Yet we try to come up with model which will react to environment so that we can reach the goal
Guiding data is different from supervised learning
- no annotated data
- guiding data creation is part of learning process
- which can be random