Abstract. If the basic purpose of moral norms is to coordinate on the conditions under which one should cooperate in social dilemmas, this paper shows that the boundaries of such conditions must be fractal. In other words, as one focuses on the border of the area in signal space where the best response flips from cooperate to defect, the adversarial nature of a social dilemma means there must always be some possible detail, otherwise irrelevant, that can flip the best decision, and some point can always be found that will be undecidable at any fixed resolution. Thus any finite-length moral code, as an approximation to that infinitely detailed boundary, faces a tradeoff between leaving gains from cooperation on the table, and vulnerability to exploitation. Implications are discussed as to the intrinsically dynamic nature of norms and institutions, the impossibility of identifying law with morality, and runaway cultural selection.
All rules, from contracts to norms, have boundaries of applicability. A rule like “the bishop may only move diagonally” applies only in the context of a game of chess. But even the most categorical moral rules, like “thou shalt not kill”, have generally recognized exceptions, such as war or self-defense.
It’s difficult to imagine a situation where two people would disagree over whether they are playing chess, and thus whether the bishop rule applies. But in adversarial games, players have opposing interests as to whether a rule applies. In a murder trial, for example, much hinges on whether a concrete action counts as self-defense. In such games, unconditional rules – fiat iustitia ruat cælum – are exploitable, and thus cannot survive.
In legal cases, the applicability of a rule is determined by a court. In interpersonal situations, applicability is a matter of judgment. One might hope for a set of necessary and sufficient conditions, announced in advance, for whether a rule applies. This may be costly to judge – hence the necessity of specialized courtsand the difficulty of personal judgment – but one might nevertheless hope it will be deterministic.
This paper shows that moral or legal judgment in adversarial games cannot be deterministic, in the sense of there being an enumerable set of necessary and sufficient conditions for when a rule applies. It is not simply that such a set is too costly to enumerate, but that any finite enumeration will face undecidable situations. The basic problem is that the boundary of applicability is fractal – that is, infinitely detailed, such that the decision can hinge unpredictably on any single signal. And while that might be an academic consideration for cooperative endeavors like playing chess, where one rarely finds one’s self in ambiguous situations, in adversarial games, there are major payoffs to aiming for the undecidable boundary regions. One can, literally or figuratively, “get away with murder” – that is, defect without being punished.
The following section formalizes the fractal geometry of moral and legal rules with the idea of signal space, the total set of observable or potentially observable aspects of a situation. The boundary of a rule is defined in signal space – that is, one would like to judge the applicability of the rule based on observable aspects of the situation. But for any finite-length rule, there are an infinite number of situations where that rule will be misapplied. The paper discusses a number of applications to finance, law, and the evolution of altruism, and concludes with implications such as the intrinsic dynamism of norms and institutions, and under what circumstances ‘discretion’ can be higher-resolution than explicit ‘rules’.
We will formalize signal space as an infinite sequence of binary variables \(\{0,1\}^\mathbb{N}\), representing all observable aspects of some decision problem.1 For example, “the other party smiles when shaking my hand” may take a 0 or a 1 value, as may “the paint on the wall is green”. Clearly the entire array will not be relevant for every interaction, but the entire array consists of variables that I can observe should I judge them relevant.
By Cantor’s diagonal argument, the set of infinite sequences of binary variables is uncountable. Define an \(f: \{0,1\}^\mathbb{N} \to [-2,2]\times[-2i,2i]\) such that adjacent sets (i.e. sets with one element changed) will map to adjacent points in a 4×4 complex square. We can think, therefore, of the Euclidean distance between two points in complex space as approximating the Hamming distance between two input sets in a local region. Thus we represent signal space by the complex plane, with any point corresponding to a definite situation, that is, a concrete set of signals.
Next, we will define a decision as a function \(d: \mathbb{C} \to \mathbb{C}\) that moves one to a different point in signal space. We can restrict decisions to quadratic functions of the form dc(z) = z2+c, all decisions being uniquely described by some \(c \in \mathbb{C}\).2 Because functions of this form always have a fixed point c = z–z2 such that d(z)=z, the null decision d∅(z) = z – that is, the decision to remain at the same point z in signal space – is always available.
Our decisionmaker is a Kantian, meaning that if he is willing to apply a decision dc at z, he must also be willing to apply it at dc(z). Let \(d^n_c(z)\) be the nth iteration of a decision dc, e.g. \(d_c^3(z)=d_c(d_c(d_c(z)))\). From a given point \(z \in \mathbb{C}\) in signal space, we will define a decision space \(\mathcal{D}(z)\) as the set of decisions that, after repeated application from that point, do not diverge (they may converge or cycle):
(1) $$\mathcal{D}(z) = \{c \in \mathbb{C}\ \ |\ \ \lim_{n \to \infty} |d^n_c(z)| \nrightarrow \infty\}$$
Let π(z, d) be the payoff of decision d from point z, with the restriction
(2) $$\pi(z,d) > 0 \text{ if } d \in \mathcal{D}(z) \\ \pi(z,d) < 0 \text{ otherwise }$$
Regardless of the specific form of the payoffs (on which more below), they will be positive for decisions within the decision space, and negative for decisions outside it. Thus we can think of \(\mathcal{D}(z)\) as the set of evolutionarily stable decisions from a given situation z.
We can also define a stability locus j with respect to a decision as the set of points in signal space whose decision space contains d, i.e. all points in signal space where that decision will result in a satisfactory outcome for the agent. Intuitively, one might think of it as defining the situations under which a decision is viable.
(3) $$j(d) = \{z \in \mathbb{C}|d\in \mathcal{D}(z)\}$$
The boundary of j(d) will be the Julia set \(\bar{j}(d)\), and j itself can be referred to as the filled Julia set (Yang 2002). In the quadratic case, no point in any j will exceed a magnitude of 2 (i.e. \(j(d) \subset \{w\ |\ |w|≤2\}\forall d\)).
The Julia set, and by extension its filled counterpart, has some interesting properties. First of all, the set of all points c defining a decision function with a stability locus of positive area will be the Mandelbrot set. Defining \(\triangle j\) as the area of j,
(3) $$\mathcal{M} = \{c \in \mathbb{C} | \triangle j(d_c) > 0\}$$
Which implies that \(\mathcal{D}(z) \subset \mathcal{M} \forall z\). That is, the Mandelbrot set defines a space of decision functions that can ever be viably applied, and \(\triangle j \to 0\) the closer c approaches to the boundary of \(\mathcal{M}\). These are, in a literal sense, edge cases (Figure 2b). Of course \(j(d_c) \ne \emptyset : c \notin \mathcal{M}\), i.e. j is not empty when c lies outside the Mandelbrot, even if it has zero area and consists entirely of disconnected points.
Consider now the traditional definition of a Julia set, i.e. the boundary of j when \(\triangle j > 0\). A point z is in the Julia set if points in the neighborhood of z display chaotic behavior on dn as \(n \to \infty\) (and we may define the neighborhood of z as that set of input sequences with a Hamming distance of 1). Indeed, iterating the neighborhood will eventually cover all points in \(\mathbb{C}\).
Julia sets are also fractal, that is, infinitely detailed, and nowhere differentiable. In other words, for any \(z \in \bar{j}(d)\), an arbitrarily small motion in any direction can lead unpredictably in or out of the set j as d iterates. At the border of a stability locus, there is always the possibility that some previously ignored signal might turn out to be decisive.
Humans, of course, are finite creatures, with finite time and attentional capacity. Define a \(\rho=\mathcal{N}^{|\rho|}\), and a \(\hat{\gamma}^\rho\) such that \(\hat{\gamma}^\rho_n = \gamma_n\) if \(n \in \rho\) and 0 otherwise. That is, \(\rho\) is a \(|\rho|\)-length set of signal-indices that an an agent can pay attention to. We will call \(|\rho|\) the resolution with which an agent samples signal space, and \(\rho\) itself an agent’s worldview.
Earlier, the projection function \(f: {0,1}^\mathbb{N} \to [-2,2]\times[-2i,2i]\) was defined such that Euclidean distance in the codomain corresponds to a weighted Hamming distance in the domain. Suppose there exists an ordered list \(\rho^*\), with \(\rho^*_l\) being its truncation at length l, such that \(f(\hat{\gamma}^{\rho^*_l})\) is Lipschitz-continuous in l, and \((\forall l, \gamma)|f(\hat{\gamma}^{\rho^*_l})- f(\gamma)| > |f(\hat{\gamma}^{\rho^*_{l+1}})- f(\gamma)|\). That is, \(\rho^*\) defines the weights by which Hamming distance maps to Euclidean distance, and \(f(\hat{\gamma}^{\rho^*_l})\to f(\gamma)\) asymptotically as \(l \to \infty\).
\(\rho^*\), however, is unknown, and is not necessarily the ordering that converges fastest over a given subset of signal space. Thus, agents traversing different areas of signal space will have learned by induction different worldviews (which we may think of as bearing some amount of “overfit” to previous experience), which may do better or worse out of sample, and lead to disagreement about the “true” location in novel areas of signal space.3 Nevertheless, from the perspective of an agent with a given worldview, any \(\hat{z}^\rho = f(\hat{\gamma}^\rho)\) in signal space will constitute an estimate approximately within some radius of the true \(z=f(\gamma)\), with the expected radius an asymptotic function \(E[r(\rho)] \to 0\) as \(|\rho| \to \infty\).
A decision rule \(\hat{j}^{\rho}(d)\), then, is an approximation to a stability locus \(j(d)\), sampled at \(2^{|\rho|}\) points. An agent will consult a decision rule, not the actual stability locus, when ruling in or out a particular decision.4 One might think of \(\hat{j}^{\rho}(d)\) as the letter of the law, and j(d) as the spirit of the law.
THEOREM. The minimum-length description of a decision rule \(\mathcal{L}(\hat{j}^\rho)\), the information necessary to draw the outline at a resolution of \(\rho\), will be proportional to \(\rho\phi(j)\), where \(\phi(j) \in (1,2)\) is the fractal dimension of some stability locus j.
PROOF. Because f projects from one dimension to two, incrementing the resolution by 2 will halve the radius r. By the box-counting, or Minkowski-Bouligand, definition of fractal dimension, the number of “balls” N(r) of radius r necessary to cover the border of outline j is proportional to \(r^{-\phi(j)}\):
$$N(r) \sim r^{-\phi(j)}$$
By this definition, if r halves, N(r) increases by a factor of \(2^\rho\). The informational length \(\mathcal{L}(\hat{j}^\rho)\), in turn, will be proportional to \(\log_2 N(r) = \phi\). ■
In other words, the length of the code necessary to describe the applicability of a rule grows more than in proportion with the information the rule takes account of.
There is no analytic solution for \(\phi(j)\) (which we can also write as \(\phi(c)\), noting that each point \(c \in \mathcal{M}\) corresponds to a decision dc, and thus to a unique \(j(d_c)\)), but it can be approximated numerically (Figure 3).
This means that more “normal” actions, decisions with the widest application, will have boundaries in signal space whose description length grows less quickly with resolution. By contrast, the more exceptional the decision, the faster its boundary description grows with resolution.
So far, the perspective is entirely that of a single player. The total observable environment, whether it includes zero, one, or many other players, is all encompassed in the idea of signal space. Nevertheless, these latter areas played against other humans will be more interesting, and as we will see, in certain types of games, the decision will be driven to points at the edge of rule boundaries where the resolution of the rule becomes decisive.
In these examples, suppose a point z in signal space corresponds to someone asking you to make decision d2. z is on the edge of \(\hat{j}(d_2)\). If z is indeed within \(j(d_2)\), the payoff is higher than if you choose a safe decision d1. If it is not, there is a negative payoff.
To take a trivial example, imagine that z corresponds to the situation of receiving a phone call from the IRS demanding urgent payment in the form of gift cards. At any reasonable resolution, the decision to comply and buy the gift cards is not in the decision space at z – that is, you can anticipate a negative payoff.
We will make an unrealistically optimistic assumption: in principle, by attending to the entirety of signal space, one can predict the outcome. There is always some relevant signal that gives away the game. Nevertheless, even despite this optimistic prediction, we face an undecidability problem. Because humans can only attend to a finite number of signals, even when explicitly looking for relevant ones, there will always be points in the vicinity of the Julia set where the question of whether a given decision is within the decision space is undecidable at any arbitrary resolution. And in adversarial games – that is, games in which one party has an incentive to falsify signals to induce the other to take actions against their own interests – this undecidability becomes a matter of practical importance.
Now imagine you are a loan officer at a bank. z corresponds to the situation of receiving an application for a risky project.5 If you make the decision to approve the loan, you face the risk that the borrower defaults. You have a sense of the vicinity of z based on salient signals, and you may focus on as many signals as you like, subject to constraints of time and effort, to narrow down your exact location – for example, you may interview the borrower, ask for references, or inspect the business plan.
Define \(p(z,d,\rho)\) as the probability that decision d is within the decision space at point z given signal set \(\rho\). As a circle of radius \(r(|\rho|)\) traverses a line from inside \(\hat{j}^\rho(d)\) to outside, the probability that the point in the center is within the true boundary of j(d) begins to fall in the vicinity of the border, and forms a sigmoid-like shape centered on the estimated border. The curve will be steeper the higher the resolution (the smaller the radius), that is, the probability is estimated with higher accuracy. The curve will also be shallower the higher the fractal dimension \(\phi(j{(d))}\), which measures how densely the fractal border fills a two-dimensional space.
Any application z with p≈1 at low resolution will have any number of suppliers willing to lend. Thus π(z,d) is bid to a risk-free market rate of return for any z with p(z)≈1. In the vicinity of the border of \(\hat{j}^\rho (d)\) however, as p begins to fall, viable and non-viable projects can no longer be distinguished at low resolution. The market becomes less competitive, and lenders may specialize in certain types of loans by being attuned to certain signals that nonspecialists may miss. These signals may include aspects of the project, aspects of the broader market into which the borrower intends to enter, or aspects of the borrower himself – and indeed, the value of long-term lender-borrower relationships is that investing a great deal into increasing the resolution with respect to one borrower is a sort of sunk capital investment that improves the resolution of future applications from that borrower, and commits the two parties to cooperating over time (Williamson 1983).
Bernanke & Gertler (1995) identify the loss of such investments as a secondary monetary policy transmission channel, though one that declines in importance with the availability of external finance. But external finance poses exactly the same problem: like a loan officer, an investor cannot quickly and cheaply scale up his resolution to judge an investment viable or not. The “seizing up” of credit markets in a financial crisis can be thought of this way: when previously relied-upon signals are invalidated, investors can only evaluate an investment on the basis of low-resolution signals, which will appear much riskier than it would to a higher-resolution investor. Thus, “good” assets can be subject to fire sale.6
The fact that the risk premium – and thus the potential payoffs – rise with proximity to the border means that the fractal boundary of j(d) cannot be avoided simply by taking safe and central decisions. The problem of adverse selection means that points near the border will be deliberately selected for, and not just by opportunistic borrowers, but also by return-seeking lenders. Note that in Figure 2a, there are deep ingresses of divergent behavior into the border, and there always exist decisions near the border of the Mandelbrot (as in Figure 2b) with a sufficiently small stability locus as to have no “safe” application at arbitrarily high resolution.
Consider now the space of signals given off by a borrower or another second party. For example, suppose a borrower has previously defaulted on a loan. Should this be judged an honest failure – in which case lending is advised – or as negligence or fraud – in which case it is not? The basic problem is that anyone engaged in negligence or fraud has an incentive to give off signals that indicate honesty – and this problem is exactly why signals are more credible the more costly they are (Zahavi 1977).
Let \(c_0(\rho)\) be the cost of attending to a set of signals \(\rho\), and \(c_1(\rho)\) the cost of sending a set of signals \(\rho\) that counterindicate one’s underlying type (say, honest or defaulter). A basic result from costly signaling theory is that no signal will hold informational value in equilibrium (that is, \(p \to 0.5\)) over any range where \(\Delta c_1(\rho) < \Delta c_0(\rho)\), where \(\Delta c(\rho) \equiv c(\rho)-c(\rho-1)\), that is, where the cost of producing a signal is lower than the cost of verifying it (Lachmann et al. 2001 – and note that the cost of producing a signal can be imposed by a social punishment). That is, when \(\Delta_n c_1(\rho) < \Delta_n c_0(\rho)\), with \(\Delta_n c(\rho)\) defined as the increment in cost from adding signal \(\rho_n\) to \(\rho\), it will never be worthwhile to include \(\rho_n\) in one’s worldview so long as \(\Delta_n c_0(\rho)>0\). A low-resolution observer, that is, one attending only to easily falsifiable signals, must place no credence on “cheap talk”.
One may think, therefore, of any interaction subject to adverse selection as existing near the border of a stability locus. The costly signaling story can be thought of as opportunistic parties pushing their emitted signals close to the border at finer and finer scales until increased fineness becomes more costly than the benefits of exploiting the first party. Provided the first party can match the resolution at lower cost, we have a stable costly signaling equilibrium. If he cannot, the market unravels – and indeed the fixed resolution available to a smart contract via oracles is why the market for algorithmic unsecured loans must necessarily unravel (Harwick & Caton 2021).
Think of z now as representing a case under consideration by a court of law, with \(\rho\) being the body of evidence considered by it. Because courts follow precedent, the judge must be willing to apply the same rule at point d(z) as he does at point z, and so on. Courts, of course, are specifically concerned with the sorts of edge cases that arise when a moral or legal rule has been potentially misapplied.
In a legal setting, \(\mathcal{L}(\hat{j}^\rho)\) has the straightforward interpretation of being the textual length of a law or regulation pertaining to a single action, or of a contract enforceable by a court. Anderlini & Felli (1994) also think of private contracts as a finite-length mapping from signal space to decision space, and show that there exist certain classes of problems for which the mapping is not computable with a Turing machine using a finite-length contract, hence the importance of residual rights to arrive at a decision in such cases. Such decisions may, of course, be the “wrong” ones under the circumstances.
Note again that this is not a merely theoretical exercise, as undecidable or perversely-decided edge cases will be deliberately aimed for by opportunistic exploitation of low-resolution signals. No society can dispense with some mechanism for (1) deciding edge cases, and (2) revising rules in the event of a perverse application. As Alchian & Demsetz (1972) argue, “it is hard to imagine any contract [or, we might add, law], which, when taken solely in terms of its stipulations, could not be evaded by one of the parties.
The same applies to regulators, who may have substantive goals in mind but are limited to promulgating rules of finite length. One striking observation is that the Code of Federal Regulations in the United States has increased at a reasonably constant rate of about 2,500 pages per year since 1950, although the year-to-year variance is high (Regulatory Studies Center 2024). The traditional and plausible explanation for this among economists has been rent-seeking and regulatory capture. Without diminishing the importance of that explanation, another surely significant factor is the fact that regulators play an adversarial game with the people they regulate, the latter of whom have every incentive to signal formal compliance while substantively evading regulatory goals (of course, it should not be presumed that the regulatory goal is good or noble). Because the mapping from regulations to substantive outcomes has a fractal boundary, any regulatory body with the power to promulgate regulations, therefore, will necessarily have to increase the resolution of its regulations at a relatively constant rate, depending on the fractal dimension of the boundary of the substantive goal they have in mind.
Second, courts are arenas where \(\rho\) is contestable, and parties may gain by selectively raising the salience of signals that violate the monotonicity of f in resolution. Rules of evidence, therefore, must also rise in textual length – or be a subject of judgment, on which see below (Section 3.1).
Altruism, as distinct from reciprocity, is defined as a willingness to bear a cost in a positive-sum game, where future rewards are not directly contingent on that action. We can operationalize this as a Kantian maxim: do that which, if everyone were to do it, would be better, even if you could individually do even better by defecting. By contrast with law, the moral rules that direct altruistic behavior are typically stated at very low resolution, often even categorically (“Thou shalt not kill”, “always be kind”), and the boundaries are learned and internalized implicitly through induction. Typically they enjoin cooperation in some sort of social dilemma, and thus have meaning only in a social setting with significant opportunities for defection (Curry et al. 2019).
Altruism is never a Nash equilibrium, but it can be evolutionarily stable (i.e. it has a stability locus of positive area), if altruists can preferentially sort with one another (Alger & Weibull 2013; Bergstrom 2003). The practical problem, therefore, will be using observable signals to form an expectation of another party: not whether they will in fact reciprocate or not, but whether they are the type of person who would reciprocate, if the roles were reversed. In practice, gratitude is one such signal in the modern world, but historically, overt signals of religious or ethnic affiliation have also served this role (Harwick 2023; McElreath et al. 2003).
Like the adverse selection problem, the fact that certain rules are only stable under assortativity means there will be decision rules that have positive area only when sampling signal space with a sufficiently high resolution. Defectors will try to give off signals of being cooperators, in order to receive the benefits of cooperation without in turn contributing. The extent to which they are able to do so limits the stability locus of altruistic strategies.
To take a modern example, imagine that z corresponds to the situation of driving by a young woman on the side of the road with a broken down car. Compassion dictates helping, and by Kantian moral logic, on average, the payoffs from living in a society of helpers more than outweigh the costs of helping, though of course the helped party will rarely be in a position to reciprocate directly. And yet, the breakdown may be a pretext for robbing the helper. Observable signals include: What neighborhood are we in? What time of day is it? How busy is the road? And these are used to form an expectation about the other party: will she be gratefully on her way, or is she there to lure hapless helpers?
Or, to take a broader example, imagine contributions to a public good. Although contributing is never in one’s narrow interests in groups larger than about 5 without perfect information (Harwick 2020), the provision of public goods, or more broadly some sort of collective endeavor – for example, defense, hunting, or governance – is the human evolutionary niche, and characterizes every human society. The stability locus for “contribute” consists of points in signal space where contributors can reliably identify and exclude noncontributors. This is borne out in the literature on ethnolinguistic fractionalization, which finds that limitations on assortativity also limit the viability of cooperative and trusting strategies (Dinesen et al. 2020).
If rules can only ever be finite approximations to a fractal stability locus, it is worth evaluating the utility of finite-length rules. Should such rules always be open to revision, or is there a role for immutable rules?
One common way around this problem is to bake vague concepts into the low-resolution rule, whose resolution can be augmented as necessary. As Harwick (2020) notes, many legal categories such as negligence are precisely such stand-ins: they allow a finite-length rule to stand over time while accommodating higher-resolution definitions of key terms over time. For a legal system to establish a fixed set of necessary and sufficient conditions for what constitutes negligence would be to render it unable to handle inevitable new edge cases. More generally, Rappaport (1971) argues that “continuing reinterpretation is likely to be assured by the cryptic nature of sacred discourse. It is not their weakness but their strength that myth and revelation are obscure.” While fixed (“sacred”) rules are desirable to reduce conflict arising from contestability, fuzzy stand-ins are necessary for continued resilience against opportunism, and can afford the necessary flexibility.
It may also be wondered whether discretion is indeed higher-resolution than explicit rules. Much bureaucratization consists in mandating the consideration of certain signals, and indeed it might be thought that this can prevent the overlooking of some significant signals.
This is not an unfounded intuition. Take the example of discrimination law. A company will have a set of implicit or explicit rules for signals that must be considered: job experience, personality, and so on. Federal law in the United States also prohibits the consideration of certain signals: race, sex, and so on. These signals are, of course, informative: in the absence of more detailed information, race or gender can indicate directly valued but hard-to-observe qualities. Nevertheless, the rationale of discrimination law is that hiring managers may be tempted to rely on low-resolution signals like race or gender instead of the higher-resolution signals one would get from a résumé or an interview, and that race or gender will not be informative conditional on these higher-resolution signals. Explicit processes, and the prohibition of very low-resolution signals, can be useful to force a lower-bound resolution on decisionmakers.
But explicit processes must never be an upper bound on resolution. While discretionary decision may often be lower-resolution than a bureaucratized decision, the maximum resolution can be much higher. Indeed, much useful knowledge has long been acknowledged in economics to be in “tacit” form: induced from long experience, and profitably actionable, but incompletely (if at all) articulable (Hayek 1945). Kornblith’s (1982) example of chick-sexing is a useful exemplar:
After a bit of practice, chicken sexers achieve a very high degree of accuracy in classifying young chicks by sex. Interestingly enough, not only are the chicken sexers themselves incapable of explaining what visual cues they rely on in performing this feat, but no one has yet determined what visual cues are relied upon. Nevertheless, the task is performed.
We have, therefore, two different ways of communicating rules: (1) by enumerating sets of necessary and sufficient conditions, and (2) by analogy to specific exemplars – in Justice Potter Stewart’s famous formulation, “I know it when I see it.” These correspond to different apperceptions of signal space: we might think of (1) as perceiving a situation analytically, as an enumerated list of signals \(\hat{\gamma}^\rho \in \{0,1\}^\rho\), and (2) as entailing “gestalt” perception (Hayek 1952) of a spatial coordinate \(z \in \mathbb{C}\); that is, perceiving the situation as a unified whole rather than by analysis of its constituent signals. While in principle the rule induced from analogy can always be expressed in terms of necessary and sufficient conditions (that is, f was defined as a bijection from \(\{0,1\}^\mathbb{N} \to \mathbb{C}\)), in practice the latter is a much more economical way to communicate over an indefinite number of potentially important signals, and Ellison & Holden (2014) conceive of precedent-based law in exactly this way. Concepts like “negligence” may have a fractal boundary, but communication through analogy to specific cases – provided the cases are sufficiently informative – can enable just this sort of variable-resolution decisionmaking in hard cases. And indeed, classifying observations by analogy seems to be more fundamental to neural architecture than explicit classification through language (O’Brien & Opie 2002), exactly the sort of adaptation we would expect for organisms faced with frequent adversarial games.
Thus, just as most native speakers of a language could not accurately articulate the grammatical rules they effortlessly follow in everyday speech, tacit judgment – gut feelings, bad vibes – play an indispensable role in navigating adversarial situations. While the relevant signal may be so subtle as to fail to rise to conscious awareness, the maximum resolution of judgment trained on appropriate high-dimensional exemplars can be higher than that of an explicit rule.
A moral framework can be thought of as a function mapping a decision in a situation to Mandatory, Permissible, or Forbidden; i.e. \(M:D, Z \to \{-1,0,1\}\), where \(d \in D\) is decision function, and \(z \in Z\) is a point in signal space. While permissibility is not exactly the same thing as evolutionary stability, it seems reasonable to judge a moral framework by its evolutionary stability. That is, as a separate question from using a norm to judge a decision, a minimal criterion for judging the framework itself is that it should be compatible with its own persistence (Binmore 2005), and stable on iteration.7
Consider strict deontology as a set of axiomatic duties such that \((\forall z)\ M(d, z)=M(d)\), that is, permissibility is insensitive to context. As in the section on Kantian altruism, it is easy to show that any such function, insensitive to context, will not be compatible with its own persistence. Always-cooperate is never a viable strategy. Thus a strict deontology is ruled out as a universalizable moral framework.
A context-sensitive deontology, which specifies boundaries of applicability for given duties, will fare better, but runs into the problem that any finite-length description of such duties will be exploitable at the edges, and thus also not evolutionarily stable unless c1 lies strictly above c0 for each individual signal. It is possible that narrow domains can be identified where this condition is satisfied (for example one might consider policies governing employee behavior – a situation where monitoring and/or trust are sufficiently high – to be circumscribed deontologies), but it is doubtful that this can be true of an overarching moral framework.
Act utilitarianism’s well-known mapping problem follows straightforwardly from the previous analysis. The knowledge necessary to evaluate an action on an act-utilitarian basis far exceeds the observable signals in a situation, which we have already seen to be too informationally rich to feasibly attend to. Furthermore, the cases where act-utilitarian prescriptions differ from more orthodox prescriptions (for example, violating a moral rule for the greater good) are usually cases that call up an adversarial game. The person to whom one lies does not want to be lied to, however justifiable, and in equilibrium, the dissipation of the signal value is likely to result in worse outcomes. Identifying the situations where this is not the case involves threading a fractal boundary at increasing resolution. Thus, relatively context-insensitive rules like “thou shalt not bear false witness [even with a strong utilitarian justification]” can be an important way to fence off certain borders and prevent the sort of signaling arms races that can otherwise result from unbounded decision spaces.
These are precisely the considerations that demand leavening a strict act utilitarianism with a rule-orientation. And yet, the project of enumerating such rules runs headlong into the deontological dilemma, that an explicit and finite set of rules mapping actions to permissibility – while potentially useful in circumscribed domains – cannot constitute an overarching moral framework.
If deontology is insufficiently context-sensitive to be evolutionarily stable, and utilitarianism is too context-sensitive to be practically actionable, an appropriate moral framework must thread this needle with the hope of finding a resolution sufficiently high to be resistant to exploitation, but sufficiently low to be practically actionable. It is not a foregone conclusion that there exists any resolution satisfying both these conditions. But in light of the previous section’s argument on the necessity of discretion and judgment, the resolution in practice of well-cultivated virtue has a better chance of doing so than the resolution of an explicit ruleset.
The centrality of judgment to virtue ethics commends it as both as a description of actual moral cognition and as a normative goal for moral development. Virtue, understood as a disposition from which flow virtuous actions, is cultivated by training on informative moral stories, which are perceived holistically rather than analytically,8 and to which actual situations are pattern-matched by analogy. The usual criticism of virtue ethics – its resistance to enumerating sets of necessary and sufficient conditions for permissible action, and its reliance on fuzzy judgment – is precisely its strength in an adversarial world where the boundaries of evolutionary stability are fractal.
As hinted in the discussion of the Code of Federal Regulations, the fact that there will always be undecidable points in signal space at any finite resolution means no institution can be static.9 To take one adversarial game, Boyd and Lorberbaum (1987) show that no finite-length strategy in a repeated prisoner’s dilemma is uninvadeable. To cast the conclusion in terms of the present model, at any finite length there is some “optimal” description \(\mathcal{L}(\hat{j}^\rho{(d))}\). From here, any undecidable edge case can only be addressed either by moving the border and creating space for different edge cases, or increasing the resolution.
On a short timescale, this means that any social system with a maximum length set of laws, norms, and institutions, will cycle. As certain exploitative strategies are adequately dealt with, they recede and are forgotten. Resolution is dedicated to new exploitative strategies, allowing old ones to return as they are rediscovered. Alternatively, a social system that fails to adapt to novel exploitative strategies must eventually be invaded by that strategy.
On a longer timescale, the back-and-forth between cooperative and exploitative strategies can be thought of as an arms race driving increased resolution. Over human history, there have been several such developments. Writing was perhaps the most significant, which removed the low brain-sized upper bound on \(\mathcal{L}\) by allowing it to be offloaded onto permanent physical records (Harwick 2018). Much more recently, the development of searchable electronic storage has also removed the upper bound that the index of \(\mathcal{L}\) must fit in the brain. And on an even longer timescale, there is suggestive evidence that the development of human intelligence and cultural capacity is itself the result of an arms race between cooperative language users and the opportunists that tried to exploit linguistic understanding adversarially (Knight 1998; Cosmides & Tooby 2005).
Whether that arms race continues into the future, or whether opportunistic strategies overwhelm the ability of developed societies to adapt and lead to a collapse of human intelligence, remains to be seen.
Leave a Reply