MDPs
Markov Decision Process
- given a state s and an action a, the resultant state is influenced by s, and a, as well as a random environment influence.
- think of MDPs as graphs where states are nodes with actions to chance nodes. 
- because randomness exists, instead of "paths", we have "policies" 
Difference from search problems 
Solution is deterministic; meaning the same intermediate states/actions will always result in the same result. 
MDPs involve environment where outcome is partly random, and partly influenced by user's decisions.
Model-Free Monte Carlo
Basically estimates expected utilities of following a particular policy pi. Therefore, estimates Qpi.
SARSA 
New Note
Description
   Login to remove ads X
Feedback | How-To