Markov Decision Processes

MDPs

Markov Decision Process

- given a state s and an action a, the resultant state is influenced by s, and a, as well as a random environment influence.

- think of MDPs as graphs where states are nodes with actions to chance nodes.

- because randomness exists, instead of "paths", we have "policies"

Difference from search problems

Solution is deterministic; meaning the same intermediate states/actions will always result in the same result.

MDPs involve environment where outcome is partly random, and partly influenced by user's decisions.

Model-Free Monte Carlo

Basically estimates expected utilities of following a particular policy pi. Therefore, estimates Qpi.

SARSA

New Note

Description