Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Understanding responses to environments for the Prisoner's Dilemma: A meta analysis, multidimensional optimisation and machine learning approach

Glynatsi, Nikoleta 2020. Understanding responses to environments for the Prisoner's Dilemma: A meta analysis, multidimensional optimisation and machine learning approach. PhD Thesis, Cardiff University.
Item availability restricted.

[img]
Preview
PDF (PhD Thesis) - Accepted Post-Print Version
Download (10MB) | Preview
[img] PDF (Cardiff University Electronic Publication Form) - Supplemental Material
Restricted to Repository staff only

Download (50kB)

Abstract

This thesis investigates the behaviour that Iterated Prisoner’s Dilemma strategies should adopt as a response to different environments. The Iterated Prisoner’s Dilemma (IPD) is a particular topic of game theory that has attracted academic attention due to its applications in the understanding of the balance between cooperation and com petition in social and biological settings. This thesis uses a variety of mathematical and computational fields such as linear al gebra, research software engineering, data mining, network theory, natural language processing, data analysis, mathematical optimisation, resultant theory, markov mod elling, agent based simulation, heuristics and machine learning. The literature around the IPD has been exploring the performance of strategies in the game for years. The results of this thesis contribute to the discussion of successful performances using various novel approaches. Initially, this thesis evaluates the performance of 195 strategies in 45,600 computer tournaments. A large portion of the 195 strategies are drawn from the known and named strategies in the IPD literature, including many previous tournament winners. The 45,600 computer tournaments include tournament variations such as tournaments with noise, probabilistic match length, and both noise and probabilistic match length. This diversity of strategies and tournament types has resulted in the largest and most diverse collection of computer tournaments in the field. The impact of features on the performance of the 195 strategies is evaluated using modern machine learning and statistical techniques. The results reinforce the idea that there are properties associated with success, these are: be nice, be provocable and generous, be a little envious, be clever, and adapt to the environment. Secondly, this thesis explores well performed behaviour focused on a specific set of IPD strategies called memory-one, and specifically a subset of them that are considered extortionate. These strategies have gained much attention in the research field and have been acclaimed for their performance against single opponents. This thesis uses mathematical modelling to explore the best responses to a collection of memory-one strategies as a multidimensional non-linear optimisation problem, and the benefits of extortionate/manipulative behaviour. The results contribute to the discussion that behaving in an extortionate way is not the optimal play in the IPD, and provide evidence that memory-one strategies suffer from their limited memory in multi agent interactions and can be out performed by longer memory strategies. Following this, the thesis investigates best response strategies in the form of static sequences of moves. It introduces an evolutionary algorithm which can successfully identify best response sequences, and uses a list of 192 opponents to generate a large data set of best response sequences. This data set is then used to train a type of recurrent neural network called the long short-term memory network, which have not gained much attention in the literature. A number of long short-term memory networks are trained to predict the actions of the best response sequences. The trained networks are used to introduce a total of 24 new IPD strategies which were shown to successfully win standard tournaments. From this research the following conclusions are made: there is not a single best strategy in the IPD for varying environments, however, there are properties associated with the strategies’ success distinct to different environments. These properties reinforce and contradict well established results. They include being nice, opening with cooperation, being a little envious, being complex, adapting to the environment and using longer memory when possible.

Item Type: Thesis (PhD)
Date Type: Completion
Status: Unpublished
Schools: Mathematics
Subjects: Q Science > QA Mathematics
Funders: EPSRC
Date of First Compliant Deposit: 30 September 2020
Last Modified: 30 Sep 2020 11:34
URI: http://orca.cf.ac.uk/id/eprint/135221

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics