TR21-128 Authors: Xiaotie Deng, Yuhao Li, David Mguni, Jun Wang, Yaodong Yang

Publication: 4th September 2021 09:57

Downloads: 163

Keywords:

Similar to the role of Markov decision processes in reinforcement learning, Markov Games (also called Stochastic Games)lay down the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions. In this paper, we introduce the solution concept, approximate Markov Perfect Equilibrium (MPE), to finite-state Stochastic Games repeated in the infinite horizon, and prove its $\mathbf{PPAD}$-completeness in computational complexity. Technically, we adopt a function with a polynomially bounded description in the strategy space to convert the MPE computation to a fixed-point problem, even though the stochastic game may demand an exponential number of pure strategies, in the number of states, for each agent. The completeness result follows the reduction of the fixed-point problem to \textsc{End of the Line}.

Past works on the stochastic games are mostly zero-sum MARL algorithms. A $P^{PPAD}$ algorithm for the general sum stochastic games in the finite horizon can be derived to establish an approximation algorithm for the general-sum stochastic games. That implies an approximate NE solution to the infinite-horizon setting. Such a possible extension suffers from three weaknesses: 1. the solution is time-dependent and hence not a perfect equilibrium; 2. the time-dependent solution suffers a weakness of noncredible threats; 3. the time complexity is not tight (lower bound $\mathbf{PPAD}$ and upper bound $P^{PPAD}$). Our result beats such a solution in all those three properties.