Essays on stochastic games and learning in intertemporal choice

Poensgen, David

doi:10.21248/gups.79581

This cumulative dissertation contains four self-contained chapters on stochastic games and learning in intertemporal choice. Chapter 1 presents an experiment on value learning in a setting where actions have both immediate and delayed consequences. Subjects make a series of choices between abstract options, with values that have to be learned by sampling. Each option is associated with two payoff components: One is revealed immediately after the choice, the other with one round delay. Objectively, both payoff components are equally important, but most subjects systematically underreact to the delayed consequences. The resulting behavior appears impatient or myopic. However, there is no inherent reason to discount: All rewards are paid simultaneously, after the experiment. Elicited beliefs on the value of options are in accordance with choice behavior. These results demonstrate that revealed impatience may arise from frictions in learning, and that discounting does not necessarily reflect deep time preferences. In a treatment variation, subjects first learn passively from the evidence generated by others, before then making a series of own choices. Here, the underweighting of delayed consequences is attenuated, in particular for the earliest own decisions. Active decision making thus seems to play an important role in the emergence of the observed bias. Chapter 2 introduces and proves existence of Markov quantal response equilibrium (QRE), an application of QRE to finite discounted stochastic games. We then study a specific case, logit Markov QRE, which arises when players react to total discounted payoffs using the logit choice rule with precision parameter λ. We show that the set of logit Markov QRE always contains a smooth path that leads from the unique QRE at λ = 0 to a stationary equilibrium of the game as λ goes to infinity. Following this path allows to solve arbitrary finite discounted stochastic games numerically; an implementation of this algorithm is publicly available as part of the package sgamesolver. We further show that all logit Markov QRE are ε-equilibria, with a bound for ε that is independent of the payoff function of the game and decreases hyperbolically in λ. Finally, we establish a link to reinforcement learning, by characterizing logit Markov QRE as the stationary points of a game dynamic that arises when all players follow the well-established reinforcement learning algorithm expected SARSA. Chapter 3 introduces the logarithmic stochastic tracing procedure, a homotopy method to compute stationary equilibria for finite and discounted stochastic games. We build on the linear stochastic tracing procedure (Herings and Peeters 2004), but introduce logarithmic penalty terms as a regularization device, which brings two major improvements. First, the scope of the method is extended: it now has a convergence guarantee for all games of this class, rather than just generic ones. Second, by ensuring a smooth and interior solution path, computational performance is increased significantly. A ready-to-use implementation is publicly available. As demonstrated here, its speed compares quite favorable to other available algorithms, and it allows to solve games of considerable size in reasonable times. Because the method involves the gradual transformation of a prior into equilibrium strategies, it is possible to search the prior space and uncover potentially multiple equilibria and their respective basins of attraction. This also connects the method to established theory of equilibrium selection. Chapter 4 introduces sgamesolver, a python package that uses the homotopy method to compute stationary equilibria of finite discounted stochastic games. A short user guide is complemented with discussion of the homotopy method, the two implemented homotopy functions logit Markov QRE and logarithmic tracing, and the predictor-corrector procedure and its implementation in sgamesolver. Basic and advanced use cases are demonstrated using several example games. Finally, we discuss the topic of symmetries in stochastic games.

Author:	David Poensgen ORCiD GND
URN:	urn:nbn:de:hebis:30:3-795819
DOI:	https://doi.org/10.21248/gups.79581
Place of publication:	Frankfurt am Main
Referee:	Michael Kosfeld ORCiD GND, Matthias Blonski ORCiD GND
Document Type:	Doctoral Thesis
Language:	English
Date of Publication (online):	2024/02/08
Year of first Publication:	2023
Publishing Institution:	Universitätsbibliothek Johann Christian Senckenberg
Granting Institution:	Johann Wolfgang Goethe-Universität
Date of final exam:	2023/10/23
Release Date:	2024/02/08
Tag:	Markov perfect equilibrium; homotopy method; reinforcement learning; stationary equilibrium; stochastic game
Page Number:	191
Note:	Kumulative Dissertation - enthält die eingereichte Manuskriptversion (Author Submitted Manuscripts) des folgenden Artikels: Eibelshäuser, Steffen; Klockmann, Victor; Poensgen, David; Schenk, Alicia von (2023): The Logarithmic Stochastic Tracing Procedure: A Homotopy Method to Compute Stationary Equilibria of Stochastic Games. Informs Journal on Computing 2023, 36(6), Seiten 1215-1532, eISSN 1526-5528. DOI 10.1287/ijoc.2022.0360
HeBIS-PPN:	515353558
Institutes:	Wirtschaftswissenschaften
Dewey Decimal Classification:	3 Sozialwissenschaften / 33 Wirtschaft / 330 Wirtschaft
Sammlungen:	Universitätspublikationen
Licence (German):	Deutsches Urheberrecht

Open Access

Essays on stochastic games and learning in intertemporal choice

Download full text files

Export metadata

Additional Services