Abstract. Institutions in economics are commonly modeled as repeated games, and strategies in repeated games are modeled as algorithms. Algorithms are explicit sequences of instructions that map from an input to an output. But in a world of open-ended affordances, Goodhart’s law implies no finite-length algorithm can maintain cooperation in large groups over time, posing a fundamental limit to algorithmic models of human cooperation. Instead, this paper suggests strategies be modeled as interpretive, a nonsymbolic mapping using holistic distance to inductively learned category exemplars. This has long been understood to be true of the human brain, but recent implementation of large-scale interpretive systems in silico holds forth the possibility of tractable general equilibrium models of human cooperation in open-ended worlds.
Human social institutions – at every scale from the drug deal to international governance, and every point in time from the paleolithic tribe to the present day – have been fruitfully modeled as repeated games. In this conception, an institution is an equilibrium complex of strategies where each individual plays his best response given the strategies of others that he interacts with, in light of the fact that the interaction will, in some way, persist into the future.
To consider an institution as the scaffolding of a repeated game allows a vast theoretical arsenal to be brought to bear. However, as compared to one-shot games, the space of possible strategies is far too vast for agents to apply, or for modelers to attribute, straightforward optimizing behavior. Therefore strategies are modeled as algorithms mapping from a finite set of observables (including the history of the game) to a finite set of responses. These algorithms, to the extent they are shared, constitute the “rules” in the canonical definition of institutions as “rules of the game” (North 1990).
This orthodox conceptual stack, however – grounded in an algorithmic model of behavior on the micro level and culminating in the institution on the macro level – inevitably leads to pessimism when considering the problem broadly enough: if the set of observables is open-ended, there exists no possible fixed and finite-length algorithm under which self-enforcing cooperation on the pattern of actual human institutions is evolutionarily stable (Bowles & Gintis 2011; Harwick 2020). Goodhart’s Law tells us that given enough time for costly signaling equilibria to break down under selection for cost-reduction, no finite-length algorithm can reliably infer a true state of the world from a signal that adversaries have an incentive to falsify (Harwick 2022; Harwick & Caton 2022). Thus, cooperative agents executing a fixed algorithmic strategy cannot identify themselves reliably enough over time to maintain the assortativity necessary for stable cooperation in a behaviorally open-ended world.
It would seem some element of this conceptual stack must give way to explain actual institutions. This paper suggests that algorithmic mappings from observations to responses – “rules” – are an inadequate paradigm for modeling human decisionmaking and organization. The alternative is an interpretive mapping, where open-ended observations are placed into inductively-learned classes through holistic similarity to exemplars, rather than by explicit if-then statements. Interpretive mapping characterizes both biological and artificial neural architecture, as well as other systems, and is able to deal much more robustly with edge cases – especially the adversarial edge cases selected for in repeated social dilemmas – than algorithms. We will thus think of mental models, subjective frames, institutions, and so on, not as stopgaps for the boundedness of our rationality, as they are often conceived in economics, but as something sui generis and functional on their own terms.
Taking these features of interpretive cognition seriously will entail regrounding game theory in a phenomenological mold, not merely because it bears greater verisimilitude to actual human cognition, but because it makes possible a general equilibrium theory of human cooperation that game theory has thus far not been able to offer.
After running through puzzles of human sociality and other large-scale adversarial games that cannot be explained with an algorithmic model, the paper lays out the difference between algorithmic and interpretive mappings, and shows that an interpretive model succeeds at precisely these points. It then considers the complementarities between interpretive and algorithmic systems, the implications for human epistemology and governance – especially the functional role of what has been called “tacit” knowledge – and what strategy selection in an open-ended environment might look like analytically.
North (1990) famously defined institutions as “rules of the game”, and divided them into formal and informal. Hodgson (2006) refines this definition to “systems of established and prevalent social rules that structure social interactions.” Greif (2006) assimilates “social interaction” to a game in the game-theoretic sense, which is repeated in a population, even if not among the same individuals each time. We will refer to such games as recurring games, a superset of repeated games, which are played among the same players. An institution, then, is an equilibrium complex of strategies in such a recurring game, along with supporting beliefs and expectations. The space of possible institutions is limited to those complexes of rules (strategies) which are locally stable and self-reinforcing.
“Rules,” however, must be unpacked further. They may refer to prescriptive rules or constitutive rules (Searle 1995). The former are injunctions like “do not foul a player” in basketball, and are more easily assimilated to the notion of strategy. In order for “do not foul a player” to be an equilibrium strategy, it must be in no player’s interest to do so, which in this case also requires that it be in players’ own interests to defer to a referee, in whose interest it is to penalize fouls. They may be formal or informal, and are the paradigmatic case of an institution as rule.
Constitutive rules, however, define the game itself: a team is constituted by five players. A goal consists in getting the ball through the net. Wide variations from these rules are not prohibited – they do not mean one is “breaking” the rules, rather, one is simply playing a different game. In this way, the cognitive-perceptual element that institutional economists (North 2005, Hayek 1937, Lachmann 1971) have always identified as an important aspect of institutions can be assimilated to the “rules” definition. There are rules that tell you what game you are playing, and there are rules that tell you how to play the game, and both must be in the interests of players in equilibrium.
It is not obvious that these are the same sorts of things, despite both going by the name “rules”. In most formalizations, the cognitive-perceptual element is accordingly backgrounded, despite its emphasis in verbal expositions. We will argue (contra Searle) that the two concepts can indeed be unified into a broader concept of strategy, although doing so will require an alternative formalization of the problem.
A dominant tradition in game theory, and decision theory more broadly, models both strategies and decision processes as algorithms. An algorithm is a sequence of explicit instructions mapping from an input to an output, where both are represented using a language of finite countable symbols. The domain of the input and output may be countably infinite (for example an algorithm may take or produce any combination of the symbols in its language), but it may not be continuous or analog. Binmore (1987) takes “a rational decision process… [to] refer to the entire reasoning activity that intervenes between the receipt of a decision stimulus and the ultimate decision… Such an approach forces rational behavior to be thought of as essentially algorithmic,” and Friedman (1986: 12) explicitly defines a strategy as “a set of instructions”. In decision theory, this has the advantage of being precise and tractable.
There are multiple levels at which this decision process may be understood: an agent may choose an action in a game via a strategy, in which case the strategy is an algorithm mapping from inputs to a decision. Or he may make a decision to select a strategy, in which case the decision process is an algorithm mapping from a game structure to a strategy. Either way, the decision-making process is modeled as a well-defined transformation from one symbolic representation to another.
This algorithmic conception is made concrete in recurring games, where the strategy space is too vast to do the optimizing search common in models of one-shot games. In this case, a strategy itself is formalized as an algorithm mapping from (usually) a subset of the history of the game to a decision in one repetition of the game. One standard formalization is the finite automaton (Marks 1992; Rubinstein 1986), an abstract machine with a fixed number of states that maps game histories (or summaries) to actions. For example, tit-for-tat is a one-memory finite automaton. More complex strategies can be modeled by algorithms run on Turing machines (Halpern & Pass 2014), which by the Church-Turing thesis can execute any well-defined algorithm with any well-specified input and arbitrarily large memory. In this sense, computer programs are concrete implementations of such algorithms running on Turing machines.
An algorithm must specify the domain of acceptable inputs and a rule for mapping each input to a unique output. Inputs outside this domain are either ignored or misclassified. In the case of human social life, the space of potential stimuli and signals is effectively unbounded. But based on the flexibility of the notion of the algorithm, it has been taken for granted that human strategies can, at least in principle, be modeled by suitable algorithms. Thus, both individual strategies and institutional “rules” are typically understood as algorithmic mappings – explicit procedures designed to produce determinate outputs from bounded classes of inputs. This assumption is foundational to rational choice modeling across domains.
The question that an algorithmic model of human cooperation must answer is: (1) What are the inputs into the decision process? (2) What are the outputs or affordances of a decision process? And (3) what is the structure of the intervening transformation? All three must be explicit, enumerable, and symbolically representable.
The search for such algorithms in human institutions has been fruitful in a partial-equilibrium sense, so to speak. Greif (1994), for example, shows that, given certain cultural predispositions, various exchange institutions were at least temporary equilibria. Leeson (2012) shows that, given certain religious beliefs, ordeals were a self-enforcing system of justice. Ostrom (1990) shows that, given human propensities to moralize and punish, collective action problems can be reliably overcome in practice.
The general equilibrium problem, however, has been comparatively neglected in economics. To wit: if we endogenize culture and beliefs – and indeed stop taking cultural capacity for granted at all – how rich must we make the inputs and outputs to the decision algorithm to constitute a plausible analog to the decision processes in actual human institutions?1
There are several stylized facts that an algorithm must be able to explain:
A model will, at the very least, need more affordances than “cooperate” and “defect”, and many efforts have been made in this direction. Targeted punishment of free-riders would be a realistic addition. However – per (1) – punishing defectors is itself costly, and transforms the problem into a second-order free-rider problem, where punishment itself is a public good (Yamagishi 1986) with diffuse benefits and concentrated costs. It remains to be established that punishment itself is any more viable than cooperation in general.
Assortativity (Bergstrom 2003) would seem to be a general solution, where agents can take actions to increase or decrease the probability of entering a game with agents of other strategies. If cooperators can preferentially match with other cooperators, cooperation can be evolutionarily stable even if it is not a Nash equilibrium – that is, cooperation can proliferate even if it is costly on net for any individual. Similar thresholds for cooperation fall out of a variety of assortative structures, including kin altruism, group selection, network models, and reputation models (Bowles & Gintis 2011).
But like punishment models, the assortativity itself becomes a target for free riding, the staving off of which itself becomes a public good. Consider then the problem of maintaining assortativity in a dynamic system. Cooperators must at a minimum be able to identify each other with better-than-random probability (and we cannot assume recognizability without begging the question). Thus, cooperators signal their type to one another and avoid, exclude, or defect against other types.
Thus the complexity of a cooperative algorithm – or indeed of any algorithm in a signaling game – potentially depends on the entire array of affordances (i.e. signals produced) and knowledge of how they correlate with agent type. To the extent that cooperative agents can produce some combination of signals at lower cost than noncooperative agents, we get a separating equilibrium with cooperation.
However, in a dynamic setting, Goodhart’s Law erodes the success of any particular algorithm over time: when agents will cooperate on the basis of some set of observed signals, there is selective pressure for defecting agents to also produce these signals at low cost. Free-riding on informative signals – mimicry – is always possible in principle, given enough time in a dynamic setting, a process which must eventually destroy the informativeness of such signals. At the very least, punishing the deceptive production of such signals now becomes a second-order free rider problem.
We are therefore caught in the horns of a dilemma. A model of cooperation with assortativity can get off the ground only by stipulating an informative signal, i.e. that there exists a set of signals that are reliably lower-cost for cooperators than for defectors. For example, basic kin altruism models typically take kin recognition for granted; reputation mechanisms (e.g. Kandori 1992) must foreclose the possibility of cheaply alienating identities (Harwick & Caton 2020); spatial and network models rely on the costliness or impossibility of travel (“viscosity”) to rule out migration of defectors into cooperative demes (e.g. Taylor 1992). But the dynamic problem is precisely that, if we do not stipulate such a signal, the informativeness of a signal of cooperation depends on the imposition of costs on false users – and we arrive right back at the second-order free rider problem. Barring that, we have only transient equilibria as algorithms tuned to particular signals become obsolete and exploitable, replaced – hopefully – by selection for new functional algorithms.
Thus for an algorithm to sustain empirically plausible cooperation, we must either solve the second order free rider problem, or content ourselves with churn and red-queen selection for longer and longer algorithms plumbing the combinatorial depths of affordances for informative signals (Harwick 2025). Learning and self-updating can be thought of as ways to increase the complexity of an algorithm, but do not solve the underlying problem if the learning and self-updating are themselves algorithmic. In short, given the open-ended nature of the affordances used in actual human life and the features of human social life necessary to explain, modeling human cooperation as an explicit algorithm must result in a dead end.
Based on the Church-Turing thesis, which implies that any well-defined mapping can be represented by an algorithm of some complexity, game theory and institutional economics have typically started with simple algorithms and enriched them as necessary to address specific aspects of human interaction. But our argument implies that we can never arrive at a general equilibrium theory of human cooperation this way, at least at finite length. It is not merely the complexity of actual human strategies; the point of models after all is to strip away extraneous complexity and understand the essentials in a tractable way. More important is that the open-ended affordance space of human behavior and observation, in conjunction with the signal-dissipating Goodhart dynamic, drive human strategies to cover the entire affordance space, in principle, and to self-update. Such a decision process could indeed in principle be modeled algorithmically, but it could hardly be less complex than actual human decisionmaking.
But even though algorithms can in principle represent all well-defined mappings, in practice there are nonsymbolic mapping processes that can be understood with considerably less complexity on their own terms rather than as algorithms. Indeed the human brain is one such, and it will be worth distinguishing the way human brains make decisions from the traditional algorithmic model.
The basic problem in decision theory is to establish a mapping between observations and actions. But the domain of sensory input – considered as audiovisual, tactile, or other sensory streams – is nonsymbolic (analog), as is the domain of human action, considered as a sequence of muscular movements. In human decisions, symbolic representation arises – and only sometimes – only in the intervening process by which we translate nonsymbolic sensory streams into meaning, which is then translated into nonsymbolic action.2 In other words, action that we recognize as intentional is informed by semantically meaningful representations (inferences, beliefs, expectations) of unobservable but relevant states of the world that have been constructed and interpreted from sensory streams.
An algorithm can only deal with analog input after it has been interpreted. Interpretation involves discretizing analog input, encoding it into a form representable with a finite language. So most game theory, outside of the subset specifically dealing with signaling games, starts in the middle of this process, with observations already interpreted into formal set-theoretic language, and ends before the action taken is translated back into concrete action.3 A finite automaton takes as its input the history of the game, represented as a sequence of ‘cooperate’ or ‘defect’ pairs, perhaps with some stochastic error, and returns an element of the action space. In the formal representation of the game, the player does not have to ask what did he mean by this, or what do I do about this? By contrast, in actual life it is often not immediately clear whether a concrete action counts as cooperation or defection, nor is it always clear, once a formal action has been decided upon, what concrete actions actually implement it – and indeed these ambiguities may be capitalized upon strategically, both offensively and defensively. As Rubinstein (1991) notes, “it is rare that a situation involving a conflict of interests is described clearly and objectively by a set of rules.”
The analog vs symbolic distinction would not be problematic for algorithmic models of human decisionmaking to the extent the interpretive process is unambiguous. If this is the case, differences in practice can be catalogued as biases by behavioral economists and social psychologists, and might be empirically interesting, but do not bear on the basic theory. A camera for example can straightforwardly digitize analog visual input, even if individual cameras output images with differences in color balance. Interpretation always loses information compared to the original analog stream, but with a sufficiently lengthy representation, can be made arbitrarily close to the analog source (compare for example the file size of photos generated by a 12 vs a 48 megapixel camera).
But compared to the mapping from visual input to a digital image, the mapping from observed cues to semantic representations of unobserved states of the world, and a fortiori the mapping from observed cues to actions, is not – and cannot be – continuous or well-behaved. The disjunction between these spaces is known in computer science as the Semantic Gap, and due to the Goodhart dynamic described above, no fixed encoding function from cues to semantic representations or to actions can be dynamically stable without active policing (which, again, raises the second order free-rider problem). Over time, we must assume free riders learn to emulate any fixed set of cues.4
Thus, small changes in objective cues may lead to wildly different classifications, especially in an adversarial game where the other party has an incentive to falsify the cues he produces (consider, for example, the search for “tells” if one suspects someone is lying). On the other hand, wildly different objective cues may end up in the same class. A stray mark may completely change the character of a painting, but two completely different paintings – or even a painting and a song – may evoke the same feeling.
For this reason there is no “true” encoding function to approximate, as in the case of the camera. When the output of an interpretive process is beliefs and expectations that guide a player’s actions, there will always be an incentive for other players to find ways to misrepresent, to induce the player to take actions that benefit the second party at his own expense by producing signals that lead the player to believe the state of the world is other than it is. The mapping from observations to meanings must therefore always be mutable.
Thus in the case of human decisionmaking, the act of interpretation is not unambiguous with respect to the “objective” set of signals: perception is mediated by mental models or frames (Goffman 1974; Devereaux & Koppl 2024). We perceive, not things as they are, but the classes we place them in, partly consciously and largely unconsciously (Hayek 1969).5 At the level of conscious decisionmaking, Nozick (1969) describes how a change in classification can change the dominant strategy, despite nothing about the underlying cues changing.6 Framing effects also bedevil experimental economics: experimenters must be sensitive to the fact that participants can construe the task in ways that are different from the experimenter’s intended construal, especially in cross-cultural research. For example, a participant might assign social valence to actions the experimenter intends to regard as purely instrumental, and behave in unexpected ways.
Economists have traditionally considered such framing effects as artifacts of “bounded rationality” (Simon 1957). An algorithm acting on unambiguously encoded interpretations would presumably be invulnerable to such framing effects, but actual humans approximate it using imperfect heuristics, although these are sometimes acknowledged to be “ecologically” rational (i.e. functional within normal contexts [Gigerenzer & Brighton 2009]). North (1990), despite emphasizing the cognitive-perceptual aspect of institutions, nevertheless treats them as necessitated by cognitive imperfections, and Geanakoplos (1992) reconciles the strong conclusions of Aumann’s (1976) agreement theorem to the reality of disagreement by assuming “mistakes in information processing”.
By contrast, the Goodhart problem suggests we should regard such departures, not as unfortunate but inevitable computational limitations, nor as kludges of an evolved system, but as necessary adaptations to an open-ended and adversarial world.
Standard formalizations of game theory, following Aumann (1976), rule out framing effects by regarding the state space as an objective fact, and requiring that it be partitioned by agents in a manner independent of their own actions (thus “the horse I bet on” is a category that depends on my own actions), even if they may partition it differently. But from the perspective of actual human cognition, an act-independent state space is fundamentally a contradiction. States of the world cannot be classified in a manner independent of our own actions, because the very purpose of perception is as a prelude to action, and mental representations are largely encoded as potential actions (Clark 2001: 93). Claxton (2015: 64) argues that “perception’s job is scoping out the possible ‘theatre of action’ – a sense of all the things that current circumstances permit me to do – so that I can select and craft my actions appropriately.” The delineation of “the horse”, “to bet”, and other semantic components of the category as meaningfully discrete objects are not objective features of the world, but perceptual artifacts that mark useful potential subjects of action. This will be more true the more abstract and socially constructed the arena of action, for example voting. Much human perceptual architecture is shared, hence the appearance of relatively objective category boundaries for more concrete objects. But in general, agents with different affordances, different utility functions, and so on, will not merely disagree about the state of the world; they will potentially disagree, as in Nozick’s example, about what constitutes a class of events.
Thus unlike the standard Aumann axiomatization, in an adversarial world with open-ended affordances, we should expect any mapping with an action codomain to be:
Having considered the diachronic problem of how interpretive systems are shaped by an adversarial world, consider now the synchronic problem of how such a system, shaped as it is, approaches an adversarial world.
Specifically, consider how a recurring prisoner’s dilemma would be approached, not by a finite automaton in a formal model, but by a connectionist system like the human brain in an open-ended world. Human infants exhibit prosocial behavior at age 14-18 months (Tomasello 2009), both helping, cooperating, and shunning or punishing antisocial behavior (even as a third party). All this appears well before the language faculty, and even before object permanence, suggesting that prosociality is deeply ingrained at a functional level, and that the early human environment was sufficiently assortative to make this pay off. In order to implement such a strategy however, the problem becomes: (1) how to construe the game at hand (what counts as cooperation or defection)? And (2) what cues to use to determine whether the opposing player is a cooperator or a defector type? These interpretive problems must be solved through learning – hence the significance, and the variety, of human culture and institutions.
A connectionist system is structured as an activation network of neurons, in principle simple gates that transmit a signal to further neurons when an activation threshold is reached based on signals transmitted to it.7 Suppose, to make the weakest possible assumption, an infant enters the world with neurons connected entirely randomly, such that there is no systematic relationship between stimulus and response (this is the starting point of Hayek 1952).8 Over time, given some minimal reward function mapping states of the world (especially those resulting from one’s own actions) to a valence, connections that result in actions that bring positive results will be reinforced, and connections that result in actions that bring negative results will be pruned.
The result is that “similar” input results in similar patterns of activation, where similarity is judged by relevance to the agent’s goals – which as we have seen must be discontinuous with respect to the input. Having been cheated, or having observed someone else being cheated, the brain learns to associate those cues with the free rider concept and respond with appropriate actions. Such a system “naturally classifies and generalizes. All initial states in the same basin converge to the same attractor and hence are classified as identical” (Kauffman 1993: 228). Thus even an agent with inbuilt prosocial tendencies must learn to classify input in order to discriminate between cooperative and noncooperative types, but we may imagine counterparties categorized as such on the basis of completely disparate cues. As Tolstoy might have said, every free rider is untrustworthy in his own way.
Furthermore, unlike an algorithm, categories of varying breadth can arise endogenously based on reinforcement rather than being prespecified. By contrast with the formal structure of the prisoner’s dilemma, various forms of free riding or defection must be dealt with in different ways, because one’s own defection or punishment maps to different concrete actions in different situations. One may stop doing business, one may initiate a lawsuit, one may shout, etc. Even if the ‘cooperator’ and ‘defector’ categories are innate enough to think of humans as broad-domain altruists, both the translation of cues into those categories, and the translation of those categories into action, must be learned inductively.
The result is, like an algorithm, a mapping between inputs and outputs. Unlike an algorithm, the intervening process consists in connections whose continual updating along Bayesian lines is built into the construct, without a clear distinction between memory and program, and without the need for constructs like self-updating algorithms.9 Also unlike an algorithm, the connectome is nonsymbolic, although (as the language faculty shows) certain structures can emulate explicit algorithms through symbol-processing capabilities.
On the one hand, the fact that this process is individualized provides additional protection at the population level. Like immune systems, to the extent there exists variance in the mapping function across individuals in a population level, strategies optimized to exploit one individual can fail to generalize. There may be some people who will fall for Nigerian Prince scams, but a scammer does not know in advance who they are, which limits the scale of specialized parasitic strategies.
But on the other hand, these mental models are largely shared, partly because of shared low-level architecture (I can be reasonably certain that another human will identify discrete objects in an image in the same way I do), and partly because, despite the persistent threat in social dilemmas, human social life presents itself to members at a low level as a coordination game such that it pays to construe recurring games in the same way as the rest of the population (and indeed punishment can convert social dilemmas into coordination games). The cognitive-perceptual element of institutions pointed to by North (1990) and Greif (2006) can be assimilated to the “rules of the game” definition this way: when mental models converge across a population (Denzau & North 2004), they become constitutive rules defining a recurring game such as a credit transaction or a chess game. While mental models will never converge entirely (Devereaux & Koppl 2024), institutions, as sets of constitutive rules defining recurring games in human society, nevertheless stand as “points of orientation” (Lachmann 1971), and become intersubjective rather than merely solipsistic. These shared frames can stabilize meaning, but – as we argue below on bureaucratization – always at the risk of exploitation. Institutions must, therefore, be understood as intrinsically dynamic even if the basic perceptual architecture of the humans constituting them is stable.
A connectionist system is a feasible implementation of the requirement for a discontinuous and unstable mapping function between input and action. Thus in the case of concrete social dilemmas, the brain can learn both over the course of development and in day-to-day interactions (1) how to construe a concrete situation as a social dilemma, and (2) what sensory cues reliably indicate a cooperative partner, even in one-shot interactions. Historically ethnic, religious, and sartorial markers were used or developed for this latter purpose (Harwick 2023) and can remain in that role if policed against free riders; in modern societies, cues of authenticity serve a similar purpose due to large communities necessitating an even more fluid association between concrete signals and trustworthiness (Greif 2006b describes the breakdown of the former regime as the cost of falsifying signals of communal membership fell). By comparison to a rigid algorithm, this constant process of Bayesian updating and category induction – though by no means invulnerable to gaming – is much better equipped to deal with the dynamical problem of maintaining sufficient assortativity to stabilize cooperation.
The classical computer and the human brain are reasonable exemplars for the algorithmic/interpretive distinction. But the human brain is not the only connectionist system, and connectionist systems are not the only interpretive systems. Likewise classical computers are not the only algorithmic systems.
The human language faculty for a long time gave the impression that symbol-manipulation was intrinsic to intelligence and cognition, an assumption central to a tradition in analytic philosophy running from the formalism of the logical positivists to later computationalists. But nonhuman biological neural systems also categorize input along the same lines, even without the recursive symbolic capacity that humans (and only humans) have. Indeed, on this view, the explicit algorithm-emulation that humans do is a culturally scaffolded skill exapted from the language faculty, not even intrinsic to the language faculty itself, much less to cognition in general. Like the problem of matching observed cues to an innate prosocial strategy, nonhuman animals also face the problem of interpreting open-ended observations for the purpose of innate strategies. A robin is attuned to cues that identify her own hatchlings, a squirrel to cues of underground caches, and so on.
Artificial neural networks (ANNs) likewise work on the same connectionist principles as the brain, although the architectural specifics differ. ANNs are decades old as a concept, but it is only in the past few years that the computational capacity and architectural refinement have progressed to the point of sophistication rivaling a human brain in tasks like text or image classification. Although ANNs do operate on digitized input such as text or photographs, the distinction between interpretive and algorithmic systems is not simply a matter of taking analog versus digital input, but the interpretation of input semantically versus symbolically, with a discontinuous and unstable mapping between the two. Consider two photographs of the same object from a slightly different angle that have no pixels at all in common. It would be very difficult to write an explicit and general algorithm classifying them as similar. But considered as a representation of a visual scene – that is, a depiction of a theatre of action – an artificial neural network trained to classify visual input in the same semantic space as a human brain can identify them as depicting the same object.
ANNs do run on an algorithmic substrate. But it would be a mistake to regard AI decisionmaking as “algorithmic”, as if this were a synonym for computer-based. Indeed, ANNs exhibit many of the same epistemic features as humans – but not classical computers. Most significantly, unlike algorithms which can always be inspected by a profiler to determine why a given output was produced from an input, this is possible neither with natural nor artificial neural networks (Sørgaard 2023), leading to a field of “interpretability” research that stands in the same relation to ANNs as psychology does to human minds (Xu & Yang 2025; Lindsey et al. 2025). In principle, the attractor state constituting the output of a perceptual event may consist in a pattern covering the entire connectome.
On the other side of the dichotomy, rule-governed symbol manipulation can be done by a variety of non-electronic systems. DNA is a significant example, whose codons can be thought of as algorithmic instructions for the construction of proteins (Deacon 2021), out of which life as we know it on earth is constructed. On the other hand, a genome at the population level over phylogenetic time might be thought of as a non-connectionist interpretive system, in the sense that selection acts as a reward function adapting a genome over evolutionary time to an open-ended space where it faces adversarial interactions with both conspecifics, parasites, and predators (and these categories may have substantial overlap). The output space is phenotypic rather than behavioral or semantic, and the mechanism is not computational except in a very loose sense or mediated by semantic representations. But in broad terms the same adversarial considerations apply, and analogous features may be expected: the genome “learns” to adapt to its environment, but because the genome is rigid and rule-bound at any point in time, it does so via the sort of selective churn that we argued was unnecessarily pessimistic in the case of human behavior.
Thus the relationship between algorithmic and interpretive systems is not necessarily a dichotomy, but a stack, sometimes with multiple layers. Interpretive systems can be built on an algorithmic substrate, and though the symbolic content of the substrate (the firing of the neurons of a natural or artificial system) may be perfectly perspicacious, the semantic import of that content will remain opaque. Because the mappings between the perceptual, semantic, and action domains are discontinuous and unstable, as Hayek (1952: 179) argued, “we shall never be able to bridge the gap between physical and mental phenomena”. Phenomenology, the subjective “inside” view, can never be totally reconciled to the neurological “outside” view, in the sense that we can describe phenomenological states entirely in neurological terms.
These features of the mapping between sensory input and semantic space have been widely noted since Hayek (1952). But that these features should be necessary for strategic reasons has not, at least not in economics.
Classification in an interpretive system is based on holistic similarity to exemplars, rather than explicit if-then statements.10 It is not simply that the input space of interpretation is continuous, but that the number of dimensions is undefined in advance: the system cannot necessarily know what it is looking for.
Although an algorithmic expansion of an interpretive mapping is always possible in principle, it is not always possible at finite length. This poses hard limits to both human introspection, and to AI interpretability research. It will not always – or even often – be possible for a person to explain why he perceives something the way he does, or why he took the actions he did, because to do so would require the translation of a vast nonlinear classification system over an unprespecified set of cues into a linear language-based algorithm. Any such fully faithful explanation cannot come from a system itself, but only from a system of greater complexity. (Hayek 1952: 189).
To the extent “consciousness” is defined as self-modeling, this poses a dilemma for philosophy of mind. If we are to define consciousness meaningfully, it must be compatible with the opacity result that a mind cannot explain itself fully. Indeed, verbalized introspection tends to be post-hoc using the same faculties we use to explain the behavior of others (which is not to say that introspection is never, or even infrequently, accurate or useful, any more than the attribution of intention to others is inaccurate or unuseful). That this is a structural limitation can be seen in the fact that chain-of-thought tokens output by reasoning models are similarly post-hoc and not necessarily reflective of the model’s actual reasoning process (Chen et al. 2025; Lanham et al. 2023). Thus if we do model ourselves using the same capacity as we use to model others, the metaphysical question “is it conscious?” is not separable from the epistemic question “how do we know if it’s conscious?” – meaning it’s inseparably tied to the specific structural-functional homologies relied upon by the interpretive capacity of the human brain. Lacking the specific context of structural-functional homology, it is not clear that the question “can AIs (or animals, or aliens, or…) be conscious” is meaningful.
This suggests that a theory of moral value (which we may regard as a foundational institution in any society) cannot be based on gradations of consciousness. Consider the strategic limitations placed on moral philosophies by the requirement that they be, at least, evolutionarily stable (that is, compatible with their own persistence).11 Because we limit the space of moral rules (and institutions more broadly) by evolutionary stability and not Nash equilibrium, moral behavior can be genuinely altruistic and self-sacrificing. However, it cannot be both altruistic and universalist (Choi & Bowles 2007). A minimal requirement of an institution, and by extension of a moral norm, is to maintain the assortativity within which cooperation is stable. The question of moral value – whom is it obligatory to cooperate with under what circumstances? – concerns precisely this.
It may be the case that propositionally universalistic philosophies are not functionally universalistic in practice; indeed this has likely been the case since the dawn of the Axial age due to viscosity in population movement. However, the increasing latitude that modern social organization and connectivity afford for taking propositional content seriously (Harwick 2023) and expanding one’s circle of moral concern suggests, at least, the desirability of a moral philosophy that would remain evolutionarily stable if its propositional content were taken seriously.
The fact that human decisionmaking is fundamentally interpretive and nonsymbolic rather than algorithmic has practical implications for how we think about human knowledge, and downstream of that, how we approach architectural questions about large-scale human cooperation such as the legal system or the rule of law.
If an algorithmic expansion of an interpretive rule is not feasible, the interpretive rule encodes “tacit knowledge” (Hayek 1945), that is, real knowledge that can be acted upon, but not articulated or justified. The concept was used by Hayek to make the point that the algorithmization of economic allocation (i.e. central planning) would necessarily fail to account for a great deal of load-bearing knowledge in the economy. It is not merely that the “man on the spot” with “knowledge of time and place” has more knowledge of the relevant resources, but that entrepreneurship with respect to a resource is an interpretive process, and an algorithmized economic plan will fail to account for a great deal of knowledge, even if it has access to the exact same perceptual data as the man on the spot.
The argument here, however, also implies the strategic functionality of tacit knowledge. Especially in adversarial situations, holistic and exemplar-based classification has the potential to account for a much greater set of cues than would be possible explicitly (Harwick 2025), a crucial advantage against free-riding mimics of cooperative signals. Indeed, the need to stay ahead of mimics in the Goodhart arms race is a plausible driver of human intelligence (Cosmides & Tooby 2005).
Besides the architectural stacking discussed earlier, algorithms and interpretive systems can stack institutionally as well, and indeed many processes that we are used to considering as algorithmic are only so after an act of interpretation. Law, for example, is sometimes idealized as the algorithmic application of rules to cases. But the question at hand in any given legal case is not generally “what is the appropriate remedy for this kind of case”, but “what kind of case is this?” A defendant and a plaintiff bring a case precisely because they place the situation in different categories, or at least have opposing interests as to which category the situation is placed into. A judge can deterministically apply the law after he has made the interpretive judgment placing the case into a relevant category. Suppose the rules for judging a situation to be murder versus self defense were able to be specified in advance. Then the Goodhart problem implies that, over time, murderers would be increasingly able to produce signals inducing the judge to place the case in the category of self-defense. The judge must be able both to take account of an unprespecified body of evidence, and to update the rule-in-practice should an existing rule-in-practice prove inadequate to the case (and he must have a basis for judging the existing rule-in-practice to be inadequate). Judges can, after all, make reasonable determinations in wholly unique cases (indeed every case is unique along many dimensions).
Thus it cannot the case that (in the common crypto mantra) “code is law”, as if law could dispense with the interpretive base. But the institutional stack can be inverted, with semantic meaning attributed to algorithmic outputs. For example, the proposal to use immutable blockchain ledgers for titling of physical or intellectual property (Allen et al. 2021) provides an objective and algorithmic process for the transfer of an electronic asset. However, to regard this electronic asset as a title to, say, a piece of land, requires it to be an input into a legal interpretive process.
In both cases, interpretation is necessary in adversarial subgames of an institution, and algorithms suffice for cooperative subgames (hence their substitutability on some, but not all, margins). This can formalize Schumpeter’s (1942) worry about overbureaucratization of both private and public institutions: an overbureaucratized (that is, beholden to explicit processes) corporation will not be able to adequately respond to the novelty necessarily generated by adversarial market competition. Indeed the novelty generated by entrepreneurs in the Schumpeterian tradition can be seen as the analog to the Goodhart process within the bounds of market competition. In an open-ended space of consumer wants, it will always be possible to produce a product consumers prefer over the status quo, although it will not necessarily be predictable. Thus entrepreneurs must use judgment (i.e. an interpretive process) both to proactively seek such possibilities, and to respond to competitors doing so (Foss & Klein 2012). An overbureaucratized corporation will have difficulty at precisely this point.
In public institutions similarly, the rule of law is sometimes idealized as algorithmic, predictable, and general. Nevertheless, large polities with bureaucratic procedures for things like permits or benefit qualifications, will face an unfavorable tradeoff between errors of omission and errors of commission. If denying a valid claim is politically costly, many invalid claims will be approved (for example benefit programs). If approving an invalid claim is politically costly, many valid claims will be denied (for example construction permits). The more formalized the decision procedure, the worse this tradeoff becomes – and indeed the Goodhart problem implies it will become worse over time for any level of bureaucratization, unless interpretive processes supervene to update bureaucratic rules.
There exists a long tradition in the social sciences of an interpretive or hermeneutical approach (Lavoie 1990), reaching into economics through economic sociology (Weber 1956), continental phenomenology (Bergson 1934; Mises 1966), and Verstehende or Gestalt psychology (Hayek 1952). It has, however, been almost entirely eclipsed since the first half of the twentieth century by an axiomatic-formalist approach that assumes away the interpretive process.
Game theory was developed during the rising tide of this formalist approach, and in no small part contributed to its triumph. By comparison with the obscuritanist tenor of continental philosophy, formalism was precise, mechanical, and lent itself to quantitative study. But the dilemma today is no longer between logical precision and faithfulness to human interiority: we now have a well-developed theory of the mechanics of connectionist systems as interpretive devices, as well as the more recent technological ability to implement full-fledged and commercially viable connectionist systems in silico.
By contrast with both classical and evolutionary game theory, which have no role for the interpretive process, it will be worth sketching a phenomenological approach to game theory. This does not demand a rejection of standard game-theoretic tools. However it does entail flipping the stack, so to speak, with a signaling problem as a superset rather than a subset of games. And it is worth bearing in mind that the motivation for such a regrounding is not (just) the greater versimilitude to actual human cognition, but the prospect of a general equilibrium theory of human cooperation against the Goodhart tide that traditional game theory has yet to offer.
Game theory has traditionally been divided into classical and evolutionary branches, which share many of the same formal tools, but differ in their underlying processes and scope of application. In classical game theory, like standard economics, individuals are forward-looking and rational: they make decisions and arrive at rules for behavior on the basis of their expected utility. The knowledge assumptions for a classical game-theoretic analysis, however, are stringent and presume unrealistic sophistication. Players must have common knowledge of the game structure and of the rationality of other players,13 and be computationally unbounded optimizers.
On the other hand, in evolutionary game theory, like in behavioral ecology, individuals are backward-looking and myopic: they make decisions on the basis of hardcoded rules that may or may not be conditioned on memory or present observation, and equilibrium is achieved when one such rule (or a set of such rules) can maintain robustness against randomly introduced new strategies. By comparison with classical game theory, such automata seem substantially less sophisticated than actual humans. It is often used in accounts of animal behavior, but sometimes human institutions are given an evolutionary analysis by authors who downplay the powers of human rational faculties (e.g. Hayek 1982; Henrich 2016).
Both approaches start with a formal description of the game. In classical game theory, as noted before, the analysis starts where players’ interpretive process ends. Players must be assumed to understand the game in the same way. In evolutionary game theory, the modeler does not ascribe any interpretive powers to the players, but the modeler himself interprets the situation, removes extraneous features, and sets up a stylized game.
On top of these foundations, signaling games can be constructed. The game structure may include affordances for communication, these may be more or less costly, they may be entirely arbitrary as to their content. Equilibria are then solved for, which can result in separating or pooling equilibria, depending again on various features of the formal setup. On this approach, signaling games are a subset of the entire space of games.
By contrast, the phenomenological approach takes the signaling problem to precede the formal specification of the game. Thus, signaling interactions involving the interpretation of open-ended signals are a superset of the set of games.
In the barest core of the phenomenological approach, an agent is conceived as some system having (1) a set of perceptual capabilities such that the outside world can affect its internal state, and (2) a set of affordances such that its internal state can affect the outside world,14 and (3) some evaluative function over states of the world, which may be inside (as in classical game theory – a utility function) or outside (as in evolutionary game theory – a fitness function) the boundaries of the agent itself. The problem facing the agent is to translate perceptual input (observation) into appropriate actions to affect the state of the world in such a way as to climb the hills of its evaluative function. In contrast to classical game theory, which begins with an objective state space which agents may partition in various ways to represent uncertainty, the core epistemic primitive in phenomenological game theory must be a signal space (Harwick 2025, for example, takes this approach).15
In this minimal setup, strategies are not encoded as algorithms to be executed in a formal game, but as responses to observational cues, which can then be abstracted into a formal game. Of course, just as not every exercise in applied game theory needs to resort to the epistemic formalisms behind the common knowledge constructs provided something like common knowledge is empirically plausible, neither will they need to resort to an explicit signal space in many ordinary cases. Indeed, the very purpose of institutions is to harness the human social and communicative faculties to create shared mental models, to establish common knowledge as to what counts as cooperation, defection, and so on, in what contexts – precisely what makes it possible to conduct a classical game-theoretic analysis.
To the extent we are content to start an analysis from an equilibrium ruleplex like this, standard analyses suffice. To the extent we wish to discuss institutional change, behavior in the absence of an established institution, or the origin of the human proclivity to coordinate such models (including institutions and language) in the first place, then we must recur to a phenomenological approach rather than an objective state space.
The interpretation of the game may happen within or outside of the agent’s own boundaries. To encompass classical game theory, where interpretation is done by the agent himself, we note that the agent’s utility function depends on unobservable states of the world, including future states. The mapping function within the agent is sufficiently complex that we may decompose the mapping into (1) mapping from observations to inferences or expectations about these unobservable or future states, and (2) mapping from inferences and expectations to concrete actions.
By explicitly considering the phenomenological problem before the formal solution, we thus consider expectation formation, not as a background to strategy selection, but as a core component of complex strategies involving internal representation. The question of common knowledge, upon which much philosophizing and soul searching has been done in game theory (see e.g. Binmore 1987), is fundamentally a question of response to cues – and no wonder that should seem a loose end in a formal algorithmic approach.
To this point, though the principles of human neural organization have been well understood at an abstract level for decades, the formalistic turn in game theory has cemented an algorithmic model of behavior that is not just a simplifying assumption, but an important theoretical roadblock to developing a theory of cooperation in an open-ended world. By taking both human and artificial perception to be interpretive systems situated in an open-ended world, we raise dynamic problems that are entirely invisible when modeling algorithmic systems in closed worlds.
Now, with widespread artificial connectionist systems to suggest what is inherent to connectionist systems in general as opposed to specific to human or mammalian brains, we are in a position to reground the epistemology of models of strategic interaction in a manner both tractable and faithful to the way actual humans make decisions in open-ended worlds.
Leave a Reply