Essays on stochastic games and learning in intertemporal choice

Poensgen, David

doi:10.21248/gups.79581

This cumulative dissertation contains four self-contained chapters on stochastic games and learning in intertemporal choice. Chapter 1 presents an experiment on value learning in a setting where actions have both immediate and delayed consequences. Subjects make a series of choices between abstract options, with values that have to be learned by sampling. Each option is associated with two payoff components: One is revealed immediately after the choice, the other with one round delay. Objectively, both payoff components are equally important, but most subjects systematically underreact to the delayed consequences. The resulting behavior appears impatient or myopic. However, there is no inherent reason to discount: All rewards are paid simultaneously, after the experiment. Elicited beliefs on the value of options are in accordance with choice behavior. These results demonstrate that revealed impatience may arise from frictions in learning, and that discounting does not necessarily reflect deep time preferences. In a treatment variation, subjects first learn passively from the evidence generated by others, before then making a series of own choices. Here, the underweighting of delayed consequences is attenuated, in particular for the earliest own decisions. Active decision making thus seems to play an important role in the emergence of the observed bias. Chapter 2 introduces and proves existence of Markov quantal response equilibrium (QRE), an application of QRE to finite discounted stochastic games. We then study a specific case, logit Markov QRE, which arises when players react to total discounted payoffs using the logit choice rule with precision parameter λ. We show that the set of logit Markov QRE always contains a smooth path that leads from the unique QRE at λ = 0 to a stationary equilibrium of the game as λ goes to infinity. Following this path allows to solve arbitrary finite discounted stochastic games numerically; an implementation of this algorithm is publicly available as part of the package sgamesolver. We further show that all logit Markov QRE are ε-equilibria, with a bound for ε that is independent of the payoff function of the game and decreases hyperbolically in λ. Finally, we establish a link to reinforcement learning, by characterizing logit Markov QRE as the stationary points of a game dynamic that arises when all players follow the well-established reinforcement learning algorithm expected SARSA. Chapter 3 introduces the logarithmic stochastic tracing procedure, a homotopy method to compute stationary equilibria for finite and discounted stochastic games. We build on the linear stochastic tracing procedure (Herings and Peeters 2004), but introduce logarithmic penalty terms as a regularization device, which brings two major improvements. First, the scope of the method is extended: it now has a convergence guarantee for all games of this class, rather than just generic ones. Second, by ensuring a smooth and interior solution path, computational performance is increased significantly. A ready-to-use implementation is publicly available. As demonstrated here, its speed compares quite favorable to other available algorithms, and it allows to solve games of considerable size in reasonable times. Because the method involves the gradual transformation of a prior into equilibrium strategies, it is possible to search the prior space and uncover potentially multiple equilibria and their respective basins of attraction. This also connects the method to established theory of equilibrium selection. Chapter 4 introduces sgamesolver, a python package that uses the homotopy method to compute stationary equilibria of finite discounted stochastic games. A short user guide is complemented with discussion of the homotopy method, the two implemented homotopy functions logit Markov QRE and logarithmic tracing, and the predictor-corrector procedure and its implementation in sgamesolver. Basic and advanced use cases are demonstrated using several example games. Finally, we discuss the topic of symmetries in stochastic games.

Verfasserangaben:	David Poensgen ORCiD GND
URN:	urn:nbn:de:hebis:30:3-795819
DOI:	https://doi.org/10.21248/gups.79581
Verlagsort:	Frankfurt am Main
Gutachter*in:	Michael Kosfeld ORCiD GND, Matthias Blonski ORCiD GND
Dokumentart:	Dissertation
Sprache:	Englisch
Datum der Veröffentlichung (online):	08.02.2024
Jahr der Erstveröffentlichung:	2023
Veröffentlichende Institution:	Universitätsbibliothek Johann Christian Senckenberg
Titel verleihende Institution:	Johann Wolfgang Goethe-Universität
Datum der Abschlussprüfung:	23.10.2023
Datum der Freischaltung:	08.02.2024
Freies Schlagwort / Tag:	Markov perfect equilibrium; homotopy method; reinforcement learning; stationary equilibrium; stochastic game
Seitenzahl:	191
Bemerkung:	Kumulative Dissertation - enthält die eingereichte Manuskriptversion (Author Submitted Manuscripts) des folgenden Artikels: Eibelshäuser, Steffen; Klockmann, Victor; Poensgen, David; Schenk, Alicia von (2023): The Logarithmic Stochastic Tracing Procedure: A Homotopy Method to Compute Stationary Equilibria of Stochastic Games. Informs Journal on Computing 2023, 36(6), Seiten 1215-1532, eISSN 1526-5528. DOI 10.1287/ijoc.2022.0360
HeBIS-PPN:	515353558
Institute:	Wirtschaftswissenschaften
DDC-Klassifikation:	3 Sozialwissenschaften / 33 Wirtschaft / 330 Wirtschaft
Sammlungen:	Universitätspublikationen
Lizenz (Deutsch):	Deutsches Urheberrecht

Open Access

Essays on stochastic games and learning in intertemporal choice

Volltext Dateien herunterladen

Metadaten exportieren

Weitere Dienste