MDPs are POMDPs


A (fully observable) Markov Decision Process (MDP) is just a Partially Observable Markov Decision Process (POMDP) where the states are observable. So, we can formulate an MDP as a POMDP such that the observation space is equal to the state space. We also need to take care of the observation function. Let’s see how exactly.

Formally, an MDP can be defined as a tuple \(M_\text{MDP} = (\mathcal{S}, \mathcal{A}, T, r, \gamma)\), where

  • \(\mathcal{S}\) is the state space
  • \(\mathcal{A}\) is the action space
  • \(T = p(s' \mid s, a)\) is the transition function
  • \(r\) is the reward function
  • \(\gamma\) is the discount factor

A POMDP is defined as a tuple \(M_\text{POMDP} = (\mathcal{S}, \mathcal{A}, T, r, \gamma, \color{red}{\Omega}, \color{red}{O})\), where \(\mathcal{S}\), \(\mathcal{A}\), \(T\), \(r\) and \(\gamma\) are defined as above, but, in addition to those, we also have

  • \(\color{red}{\Omega}\): the observation space
  • \(\color{red}{O} = p(o \mid s', a)\): the observation function, which is the probability distribution over possible observations, given the next state \(s'\) and action \(a\)

So, to define \(M_\text{MDP}\) as \(M_\text{POMDP}\), we have

  • \[\color{red}{\Omega} = \mathcal{S}\]
  • The observation function is \(\color{red}{O} = p(o \mid s', a) = \begin{cases} 1, \text{ if } o = s' \\ 0, \text{ otherwise } \end{cases}\)

In other words, the probability of observing \(o = s'\), given that we end up in \(s'\), is \(1\), while the probability of observing \(o \neq s'\) is \(0\). This has implications on how you update the belief state \(b(s')\) because \(b(s')\) will be set to \(0\) if \(o \neq s'\).