MDPs are Markov Games


MDPs

A Markov Decision Process (MDP) is the mathematical framework that can be used to model (sequential) stochastic decision-making problems, which are solved by a single (Reinforcement Learning) agent. Formally, an MDP can be defined as a tuple

\[\text{MDP} = (\mathcal{S}, \mathcal{A}, T, r, \gamma),\]

where

  • \(\mathcal{S}\) is the state space
  • \(\mathcal{A}\) is the action space
  • \(T = p(s' \mid s, a)\) is the transition function
  • \(r(s, a)\) is the reward function
  • \(\gamma\) is the discount factor

Markov Games

A Markov Game (MG) (also called Stochastic Game) is a mathematical formalism to describe problems where we have multiple agents (or players) that interact (cooperate, compete or both) with each other, which is the type of problem studied in Game Theory. Formally, an MG with \(n\) agents can be defined as a tuple

\[\text{MG} = (\mathcal{S}, \mathbb{A}, \Gamma, \mathbb{r}, \gamma),\]

where

  • \(\mathcal{S}\) is the state space
  • \(\mathbb{A} = \{\mathcal{A}_1, \dots, \mathcal{A}_n\}\) is a set of action spaces, one for each of the \(n\) agents
  • \(\Gamma = p(s' \mid s, a_1, \dots, a_n)\) is the transition function
  • \(\mathbb{r} = \{ r_1(s, a_1, \dots, a_n), \dots, r_n(s, a_1, \dots, a_n) \}\) is a set of reward functions, one for each agent
  • \(\gamma\) is the discount factor

MDPs are Markov Games

So, what are the differences between an MDP and an MG?

  • In an MG, we have a set of action spaces, one for each agent, while in an MDP we have only one action space,
  • In an MG, the transition function conditions on the action of all players, and
  • In an MG, we have a set of reward functions, one for each agent, where each reward function also conditions on the action of all players.

An MDP can therefore be viewed as a Markov Game with one player.