Exercises part III: Game theory#

Game theory is a framework to consider the dynamics of games where players face each other and have to choose one strategy from a defined set of strategies. In simple two-player games, the outcome of the game can therefore be represented as a matrix \(A\), where the \((i,j)\)-th element represents the outcome for player 1 of choosing strategy \(i\) when player 2 chooses strategy \(j\). In general, we consider symmetric game where the so-called payoff matrix is the same for both players and a game is therefore defined by its payoff matrix \(P\).

Prisoner’s dilemma, nash equilibria, replicator dynamics and the Moran process#

The prisoner’s dilemma is classic model in game theory. Two people have been caught for committing a crime together (allegedly!). The police needs evidence and offers them a deal: Snitch (or defect) on the other prisoner and you will go free. That is the best possible outcome for a prisoner, and all other outcomes will be marked with negative values to represent time in prison. If only one defects, they receive the temptation reward (\(T = 0\)) while the other receives the sucker reward (\(S < 0\)). If they both defect, they are both punished with \(S<P<0\). If neither defect, they end up with a light sentence due to missing evidence, a reward \(R\) in and of itself with \(S<P<R<0\).

Let strategy 1 be cooperation between prisoners and strategy 2 be defection, the payoff or award matrix \(A\) can be represented like so:

\[\begin{split} A = \begin{pmatrix} R & S\\ 0 & P \end{pmatrix} = \begin{pmatrix} -2 & -6\\ 0 & -4 \end{pmatrix} \end{split}\]

What should prisoner 1 do? If they assume a 50/50 chance that the other prisoner snitches, the values given above suggest they should defect as the average of row 2 is less negative (-2) than the average of row 1 (-4). Let’s call this strategy the rational choice. If they both act rationally, they both get four years in prison and neither of them could individually improve on this situation given additional knowledge. This fact, that no single player can improve their outcome on their own, makes the defect/defect outcome of four years in prison a Nash Equilibrium.

That being said, the obvious optimal outcome is for both of them to cooperate and only get two years in prison each. But that state could prove unstable because of the temptation that then exists to defect…

The Replicator equation is the mean-field approach (deterministic, well-mixed, continuous) used to model such games in evolutionary dynamics. It lets us track the fitness of a given strategy as a function of the distribution of strategies in a population by letting strategies replicate or decay at a rate proportional to their fitness. In general, the replicator equation for the frequency \(x_i\) of strategy \(i\) in the population is given by

\[ \dot{x}_i = x_i\left[f_i(x) - \phi(x)\right] \]

where \(f_i(x)\) is the fitness of strategy \(i\) given the distribution of strategies and \(\phi(x)\) is the average fitness across all \(n\) strategies, \(\phi(x) = \sum_{j=1}^n x_jf_j(x)\).

For the prisoner’s dilemma, the fitness of a strategy would be \(f_i(x) = \sum_{k=1}^n A_{i,k}x_k\).

A Moran process is a simple stochastic model used to describe replicator dynamics in finite populations. It is similar to a voter model. At each time step, two random individuals are chosen: One dies and one replicates, such that the population size remains fixed. Selection can be enforced by biasing the individuals chosen for death or replication. Or, as we will attempt here, perhaps by having the individuals play out a game…

  1. Assuming that the winner of a Prisoner’s dilemma gets to replicate over the loser, and that nothing happens during ties, construct a master equation model for a population of \(N\) agents randomly playing the prisoner’s dilemma.

  2. Analyze your master equation. What does the dynamic look like? Do the outcomes match what is expected from the deterministic replicator equation?

Rock-paper-scissors#

Rock-paper-scissors is a simple two-player game that lets us explore an example where three strategies are available for each player. The payoff matrix has three possible outcomes: win (1), tie (0), and loss (-1). We can therefore write the game as

\[\begin{split} A = \begin{pmatrix} 0 & -1 & 1\\ 1 & 0 & -1\\ -1 & 1 & 0\\ \end{pmatrix} \end{split}\]
  1. First, construct a master equation model for the Moran process of the rock-paper-scissors game.

  2. Second, write the replicator equations for the rock-paper-scissors game.

  3. Do the two approaches predict the same kind of behavior?