Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The dynamics of reasonable doubt
(USC Thesis Other)
The dynamics of reasonable doubt
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
THE DYNAMICS OF REASONABLE DOUBT
by
Nicholas Scurich
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(PSYCHOLOGY)
May 2012
Copyright 2012 Nicholas Scurich
ii
Epigraph
We should rethink the concept of reasonable doubt and make it fluid rather than static.
—Alan Dershowitz
iii
Dedication
To my family for their love and support.
iv
Acknowledgements
I am immensely indebted to my advisor Richard John, who was been my mentor
since my undergraduate years, for shaping my thinking on behavioral research and
methodology. I am especially grateful to him for introducing me to the fascinating
world/doctrine of Bayesianism. I certainly would not be where I am at now without his
continual support and guidance.
I would like to thank Tom Lyon for an introduction to the substantive area of
psychology and law that continues to enthrall me to this day. Dan Simon has been an
excellent mentor and superb collaborator. Additionally, I learned many important lessons
from Steve Read and Jack McArdle throughout my graduate education.
Lastly, I would like to thank my peers in the Psychology Department at USC,
especially Beth Ahern who has always furnished me with helpful input.
v
Table of Contents
Epigraph .............................................................................................................................. ii
Dedication .......................................................................................................................... iii
Acknowledgements ............................................................................................................ iv
List of Tables ................................................................................................................... viii
List of Figures .................................................................................................................... ix
Abstract ............................................................................................................................... x
Chapter 1: Introduction ....................................................................................................... 1
Defining Reasonable Doubt ............................................................................................ 4
The Early Days ........................................................................................................... 5
More Recent Times ..................................................................................................... 7
Quantifying Reasonable Doubt ..................................................................................... 12
Conclusion .................................................................................................................... 17
Chapter 2: The Rational Juror ........................................................................................... 19
Bayes Theorem ............................................................................................................. 20
Decision Theory ............................................................................................................ 25
Accounting for Legal Norms ........................................................................................ 28
The Prior ................................................................................................................... 29
The Utility Ratio ....................................................................................................... 30
Chapter 3: Quantitative Proof and Policy Considerations ................................................ 35
Trial by Mathematics .................................................................................................... 37
Heuristic Use ................................................................................................................. 39
Conclusion .................................................................................................................... 41
vi
Chapter 4: Threshold Variability: Virtue or Vice? ........................................................... 43
The Economic Rationality of Threshold Variability .................................................... 43
The Legal Propriety of Threshold Variability .............................................................. 45
Chapter 5: Recasting Reasonable Doubt........................................................................... 50
Biased Predecision Processing ...................................................................................... 53
Information Distortion .............................................................................................. 55
Coherence Based Reasoning ..................................................................................... 57
Differentiation and Consolidation Theory ................................................................ 60
Reasonable Doubt as a Relative Construct ................................................................... 63
Chapter 6: Empirical Studies of Threshold Shifting ......................................................... 68
Study 1: Conventional Evidence ................................................................................... 68
Participants ................................................................................................................ 68
Materials and Procedure ........................................................................................... 70
Analytic Methods ...................................................................................................... 72
Results ....................................................................................................................... 75
Discussion ................................................................................................................. 81
Study 2: Statistical Evidence ........................................................................................ 85
Introduction ............................................................................................................... 85
Materials and Procedure ........................................................................................... 86
Results ....................................................................................................................... 87
Discussion ................................................................................................................. 92
Study 3: Character Evidence ......................................................................................... 94
Introduction ............................................................................................................... 94
Materials and Procedure ........................................................................................... 97
Results ....................................................................................................................... 98
Discussion ............................................................................................................... 103
vii
Chapter 7: General Discussion........................................................................................ 106
Implications for Cognitive Consistency Theory ......................................................... 106
Implications for Legal Doctrine .................................................................................. 108
Harmless Error Analysis ......................................................................................... 109
General Limitations and Future Directions................................................................. 113
Concluding Remarks ................................................................................................... 119
Cases ............................................................................................................................... 121
Bibliography ................................................................................................................... 122
Appendix [A]: Stimuli from the Conventional Case ...................................................... 137
Appendix [B]: Fit Indices of the Logistic Regressions from Study 1. ........................... 140
Appendix [C]: Correlation Matrix of Likelihood Ratings, Verdict Confidence, and
Threshold Disparity for Study 1. .................................................................................... 141
Appendix [D]: Fit Indices of the Logistic Regressions from Study 2. ........................... 142
Appendix [E]: Correlation Matrix of Likelihood Ratings, Verdict Confidence, and
Threshold Disparity for Study 2. .................................................................................... 143
Appendix [F]: Fit Indices of the Logistic Regressions from Study 3. ............................ 144
Appendix [G]: Correlation Matrix of Likelihood Ratings, Verdict Confidence, and
Threshold Disparity for Study 3. .................................................................................... 145
viii
List of Tables
Table 1: Descriptive Statistics for Each Piece of Evidence 76
Table 2: Unstandardized Regression Coefficients for Verdict Confidence
and Likelihood Ratings in Predicting Threshold Disparity 80
Table 3: Descriptive Statistics for Each Piece of Evidence 88
Table 4: Unstandardized Regression Coefficients for Verdict Confidence
and Likelihood Ratings in Predicting Threshold Disparity 91
Table 5: Descriptive Statistics for Each Piece of Evidence 99
Table 6: Unstandardized Regression Coefficients of Verdict Confidence
and Likelihood Ratings in Predicting Threshold Disparity 102
ix
List of Figures
Figure 1: Mean Likelihood Estimates (+/- 2 S.E.) and Implicit Thresholds
(+/- 2 S.E.) over the Course of Trial 77
Figure 2: Threshold Disparity and Mean Verdict Confidence over the Course
of Trial 79
Figure 3: Mean Likelihood Estimates (+/- 2 S.E.) and Implicit Thresholds
(+/- 2 S.E.) over the Course of Trial 89
Figure 4: Threshold Disparity and Mean Verdict Confidence over the Course
of Trial 90
Figure 5: Mean Likelihood Estimates (+/- 2 S.E.) and Implicit Thresholds
(+/- 2 S.E.) over the Course of Trial 100
Figure 6: Threshold Disparity and Mean Verdict Confidence over the Course
of Trial 101
x
Abstract
The doctrine of reasonable doubt is deeply entrenched within American culture,
but the concept continues to mystify legal scholars, courts and jurors, and a coherent
definition remains elusive. Reasonable doubt (RD) can be reified with the tool of decision
theory as a tradeoff between acquitting the guilty and convicting the innocent. For
instance, Blackstone’s maxim that ten erroneous acquittals are equal in cost to one
erroneous conviction implies, roughly, that jurors ought to convict only if their
confidence in the defendant’s guilt exceeds 0.91. This dissertation proposes several
descriptive discrepancies from this normative account. First, it is argued that the
proffered evidence serves as the focal point for RD, such that jurors listen to the evidence
and then determine whether the remaining doubt is reasonable. In short, jurors know RD
when they see it. Verdicts do not depend on whether the evidence satisfies an exogenous
tradeoff. Second, three studies were conducted to test the hypothesis that mock jurors
systematically shift their operational definition or threshold for RD in order to promote
cognitive consistency, a state in which attitudes, beliefs and cognitions are congruous.
Shifting the decision threshold can theoretically attenuate “close calls”—that is, when the
evidence is close to the RD threshold—by inflating the perceived disparity between the
two. The studies revealed that mock jurors’ implicit threshold regularly shifted in the
opposite direction of the proffered evidence. When a piece of evidence increased the
likelihood of guilt, it concomitantly decreased the threshold for conviction, and when a
piece of evidence decreased the likelihood of guilt, it increased the threshold for
xi
conviction. The degree to which the threshold shifted was positively related to decisional
confidence, one manifestation of cognitive consistency. Threshold shifting violates an
axiom of decision theory, which is that the consequences of an outcome are independent
from the chances of its occurrence. This finding has implications for both psychological
and legal theory. With respect to the latter, the findings indicate that RD is a relative
construct and suggest that the analysis of legal doctrine is more complicated than has
been previously supposed.
1
Chapter 1: Introduction
At the core of every criminal trial is a dispute about facts. Resolving the dispute
entails sifting through masses of ambiguous and conflicting evidence, and determining
which party ought to prevail. In American jurisprudence, this responsibility belongs to
the jury—a group of individuals that collectively represents the community—but this
responsibility is constrained. Jurors are not free to simply determine which side has the
more plausible case. Rather, their decision making is governed by a rule that requires the
highest level of certitude in order to convict a criminal defendant, and it mandates
acquittal in the absence. The rule is, of course, the reasonable doubt rule.
Reasonable doubt is sacrosanct in American culture. Considered an “unassailable
hallmark of freedom,” its symbolism as a bulwark against the governmental infringement
of personal liberty is profound (King, 2006). In a country that divides into political
schisms on most issues, not even the most ardent partisans question the legitimacy of the
reasonable doubt rule. Often described as the “cornerstone of Anglo-Saxon justice”
(Sundby, 1989), judicial reforms rarely if ever call for an overhaul of the rule, and there
has never been any legislative attempt to change it (Newman, 1993). Many assume the
rule is part of our constitutional heritage, despite the fact that the term does not appear in
the Constitution or the Bill of Rights (King, 2006). Reasonable doubt transcends the
domain of law like no other legal term or concept. It is frequently used in ordinary
conversation and the mainstream media as a metaphor to express the highest level of
2
certainty (e.g., Walters, 1998; Ellis, 2010; Glynn, 2000), a testament to how inviolate and
entrenched the concept is.
But the doctrine of reasonable doubt can be the root of divisiveness. Its
polemicizing power is perhaps best illustrated by the acquittal of OJ Simpson. While the
verdict jubilated some, a majority of the American population wondered how, given the
immense incriminating evidence, the jury could find OJ Simpson not guilty of murdering
his estranged wife and her companion (Davies, 1995; Bugliosi, 1996). This verdict was
riddled with allegations of racial discrimination, police conspiracy, celebrity-preferential
treatment, and jury ineptitude, but at its core the verdict left society bewildered over the
meaning and purpose of reasonable doubt (see generally King, 2006; Dershowitz, 1996).
To many, the Simpson verdict was plainly jury nullification. Nullification occurs
when a jury refuses to convict a defendant whom they believe actually committed the
alleged crime. It has negative connotations, stemming mostly from the acquittal of
southern racists in the mid-20
th
century, however nullification is legal, and it can serve
important regulatory purposes, such as limiting abusive prosecutorial and governmental
conduct (Weinstein, 1992). But unlike the archetypical all-white juries of the south,
contemporary jurors rarely ever profess nullification. Instead, they claim to have found
reasonable doubt. Nullification is related to the concept of reasonable doubt in so far as
“without the reasonable doubt standard, jurors would be deprived of a safe harbor in
which to shelter their nullification verdicts (King, 2006, xv.)”
The public’s antagonism towards the Simpson verdict could stem from jurors’
efforts to justify it in legal terms. For example, King (2006) argued that the public would
3
have been more accepting of the verdict if jurors candidly admitted to nullification, rather
than pre-textually declaring reasonable doubt:
[The jurors] may have even convinced themselves that they found reasonable
doubt, but if they scoured their souls, they would know that they were perhaps
delivering a much needed message with their verdict, but by justifying the verdict
by claiming reasonable doubt, the only message they delivered was that the
verdict was an outrage (p. xix).
Of course, this line of reasoning presupposes that the Simpson jurors were in fact certain
of his guilt, but engaged in nullification to “police the police with [their] verdict,” as
Johnny Cochran—Simpson’s head defense attorney—put it in his summation.
The correlation between nullification and reasonable doubt is not perfect. Nor is a
verdict of “not guilty” synonymous with “innocent” in American jurisprudence (Bugliosi,
2006). It does not necessarily follow that the Simpson jurors engaged in nullification by
failing to return a guilty verdict; they might simply not have been convinced of his guilt
beyond a reasonable doubt. Moreover, a not guilty verdict only means that the
prosecution failed to establish Simpson’s guilt beyond a reasonable doubt. It does not
mean that jurors believed Simpson was necessarily innocent. Indeed, Simpson was
subsequently found liable for the double-murder by a civil jury under the “preponderance
of evidence” rule. And virtually no one questions the appropriateness of that verdict.
The Simpson trial was extraordinary in many respects. However, at least one
aspect of it was common to all criminal trials: jurors were confronted with the reasonable
doubt rule and forced to make sense of it. What does it mean to have reasonable doubt?
What types of doubts are unreasonable? When is a doubt sufficiently reasonable that a
finding of legal guilt is not warranted? Jurors ponder these questions as a matter of
4
course. And serious consequences turn on their answers. Yet, these exact questions have
been the source of a debate that has spanned nearly three centuries, and has perplexed
intellectuals, philosophers, and jurists alike. Despite an astonishing amount of attention,
and its centrality to the criminal justice system, a consensual definition of reasonable
doubt remains elusive to this day.
Defining Reasonable Doubt
To be a meaningful safeguard, the reasonable-doubt standard must have a tangible
meaning that is capable of being understood by those who are required to apply it.
It must be stated accurately and with the precision owed to those whose liberty or
life is at risk.
–Justice Harry Blackmun
Until the middle of the twentieth century, the use of the reasonable doubt (RD)
rule in criminal trials was strictly customary. This changed in the landmark case In re
Winship (1970), in which the US Supreme Court held that the due process guarantees of
the Fifth and Fourteenth Amendments "protect the accused against conviction except
upon proof beyond a reasonable doubt of every fact necessary to constitute the crime with
which he is charged. (p. 364)" Winship established that failure to instruct jurors of the RD
requirement in any criminal proceeding is a prima facie constitutional violation and
grounds for automatic reversal.
Following Winship, it became increasingly common for defendants to appeal their
conviction on the basis that the definition of RD failed to provide the protection
guaranteed by due process. These appeals brought the task of defining RD to center stage.
Through a trilogy of cases in the 1990’s, the Supreme Court expressed disapproval with
5
the staple definition of RD and intimated that it might be constitutionally deficient.
Before turning to these cases and their specific holdings, the origins of RD are described
in order to allow a deeper appreciation of the philosophical context from whence the term
was devised and from which it has since departed.
The Early Days
Eighteenth-century jurors were sworn to deliver a “truthful verdict.” Jurors were
told that certainty was required in order to convict, and that if they were doubtful, they
should acquit (Sheppard, 2003). This highly exacting standard was used in a period when
philosophers and theologians were grappling with theories of knowledge and awareness.
Sometime before this period, Aristotle made a distinction between the notion of absolute
knowledge of the universe (which was perhaps knowable only to God) and knowledge
that is derived from sense and reason, sometimes referred to as empirical knowledge.
Though it was always acknowledged that humans do have empirical knowledge, it was
not until the end of the seventeenth century when the movement to eschew the idea of
absolute knowledge, as a matter of practical affairs, gained traction (Shapiro, 1986).
Perhaps the most influential exponent of this movement was John Wilkins, a
British philosopher and founder of the Royal Statistical Society, who argued that absolute
certainty “demands a kind of evidence or criterion of truth that is in principle impossible
to have (cited by Waldman, 1959, p. 303).” Wilkins instead proposed the criterion of
moral certainty, which required beliefs to “be so certain as to not admit of any reasonable
doubt concerning them (cited by Waldman, 1959, p. 303).” The term “moral certainty”
was coined not because it concerned ethics or morality, but simply to contrast it with
6
absolute certainty. As described by Wilkins, moral certainty reflected the level of
certainty at which “no one without a prejudice would dissent from (Sheppard, 2003, p.
12).” A morally certain belief was in principle open to a determined skeptic’s doubt, but
it was understood to be a firm and settled truth based on evidence.
Around the turn of the eighteenth century Baron Geoffrey Gilbert, John Morgan
and Daniel McKinnon were generating some of the first formulations of Anglo-Saxon
jurisprudence. These scholars were heavily influenced by Wilkins’ moral certainty
criterion, which would eventually become consensus amongst intellectuals during the
Enlightenment, and were the first to move away from the exacting standard of absolute
certainty that had previously pervaded jurisprudence (Morano, 1975). In turn, this move
would be influential to Starkie, Thayer, and Wigmore, arguably the most influential
treatise writers of American legal doctrine, who unreservedly endorsed the moral
certainty standard. Starkie (cited in Shapiro, 1986, p. 761), for example, argued:
Absolute, metaphysical and demonstrative certainty is not essential to proof by
circumstances. It is sufficient if they produce moral certainty to the exclusion of
every reasonable doubt… proof beyond a reasonable doubt does not require that
guilt be established with the absolute certainty of a mathematical demonstration,
nor does it mean a vague, speculative, or whimsical doubt, nor a possible doubt,
but such a doubt as an intelligent, reasonable, and impartial man may honestly
entertain after a careful examination and conscious consideration of all the
evidence.
For jurists influenced by the Wilkins’ school of thought, moral certainty was synonymous
with RD; if one were morally certain, he or she had no RD, by definition.
7
More Recent Times
In the mid-eighteen century, equating reasonable doubt with moral certainty was
common and this terminology was not unusual. Indeed, as Sheppard (2003) notes, “[the]
use of such terms involved no more than a refinement of their own form of talk (p. 13).”
As time went on, however, the use of such words became less frequent and jurists took it
upon themselves to further elaborate on the synonym. Consider the explanation given by
Chief Justice Shaw of the Supreme Judicial Court of Massachusetts, in what became
known as the Webster Charge:
What is reasonable doubt? It is a term often used, probably pretty well
understood, but not easily defined. It is not a mere possible doubt; because
everything relating to human affairs, and depending on the moral evidence, is
open to some possible or imaginary doubt. It is the state of the case, which, after
the entire comparison and considerations of all the evidence, leaves the minds of
jurors in that condition that they cannot say they feel an abiding conviction, to a
moral certainty, of the truth of the charge. The evidence must establish the truth of
the fact to a reasonable and moral certainty (Victor, p. 1244).
The Webster Charge became a mainstay in Anglo-jurisprudence, and it or some slight
variation was employed in almost every American jurisdiction for well over a century
(Morano, 1975). All of this changed in the 1990’s, when, through a trilogy of cases, the
US Supreme Court came to strongly discourage the equating of moral certainty with RD.
The first in the series of cases was Cage v. Louisiana (1990) in which Thomas
Cage appealed his capital conviction on the basis that the RD instruction was
“constitutionally defective.” Cage claimed that describing RD as “an actual substantial
doubt [that] would give rise to grave uncertainty” was too demanding, and was thus
inconsistent with Winship because “it suggested a higher degree of doubt than is required
8
for acquittal under the reasonable doubt standard (p. 41).” The Court agreed and ruled the
instruction constitutionally invalid. Even though it held the terms “substantial” and
“grave” led jurors to require a degree of doubt that exceeded the requirement of the RD
standard, the Court went on to suggest that those terms, if considered in reference to
“moral certainty” could satisfy the Winship requirements. The holding in Cage was
subsequently criticized because it failed to clarify whether the absence of all three
phrases (i.e., “substantial doubt”; “grave uncertainty”; “moral certainty”) is what
rendered the instruction constitutionally invalid.
The use of the phrase “moral certainty” in defining RD was specifically addressed
in Sandoval v. California (1994) and its companion case Victor v. Nebraska (1994). In
these cases, the definition of RD provided to jurors was almost cribbed from the familiar
Webster charge:
It is not a mere doubt; because everything relating to human affairs, and
depending on the moral evidence, is open to some possible or imaginary doubt. It
is that state of the case which, after the entire comparison and consideration of all
the evidence, leaves the minds of jurors in that condition that they cannot say they
feel and abiding conviction, to a moral certainty, of the truth of the charge
(Sandoval, 1994, p. 1239).
The Court spent some time pondering the semantics of this instruction. First, it
found no problem with the description of RD as “not a mere possible doubt.” “A
‘reasonable doubt,’ at a minimum, is one based upon reason,” the Court held. The Court
surmised that jurors would assume the possible doubt phrase meant only a “fanciful
doubt,” which, it held, was not a sufficient basis to block a conviction.
The Court then noted that the moral evidence phrase was not part of the modern
lexicon, but, after reviewing several nineteenth century treatises including Starkies’, held
9
that its historical meaning had not changed. Jurors would understand, the Court asserted,
that “moral evidence” only meant “empirical evidence,” or the evidence that is adduced
at trial to prove guilt.
But the Court had greater reservations about the moral certainty phrase. Much of
the concern is encapsulated by Justice Blackmum’s oft-cited dissent:
[There is] the real possibility that such language would lead jurors reasonably to
believe that they could base their decision to convict upon moral standards or
emotion in addition to or instead of evidentiary standards. The risk that jurors
would understand “moral certainty” to authorize convictions based in part on
value judgments regarding the defendant’s behavior is particularly high in cases
where the defendant is alleged to have committed a repugnant or brutal crime
(Victor, 1994, p. 37).
The Majority did not disagree with Blackmum’s argument or the corollary that a
conviction might be based on a standard inconsistent with Winship. The Majority
believed, however, that in this particular instance, any adverse consequences of the use of
the moral certainty phrase were offset by other phrases in the instruction, such as “an
abiding conviction.” As noted by Horowitz (1997), “the conviction in Sandoval was
upheld only because there was other ‘saving’ language in the instruction (p. 288).”
Aside from the equivocal holding in Sandoval, The Court, through dictum,
expressed disapproval of the use of the moral certainty phrase. The Court was “willing to
accept Sandoval’s premise that ‘moral certainty,’ standing alone, might not be recognized
by modern jurors as a synonym for ‘proof beyond a reasonable doubt’ (p. 1239).” Justice
O’Connor went on to note, “this Court does not condone the use of the antiquated ‘moral
certainty’ phrase,” and was joined by Justice Ginsberg who explicitly admonished, “the
10
‘moral certainty’ phrase…should be avoided as unhelpful in defining reasonable doubt.”
The Court did not, however, explicitly hold that the phrase was constitutionally invalid.
Following the trilogy, lower courts became extremely reluctant to include the
moral certainty phrase in a jury instruction, and they took on the task of crafting
alternative definitions that are less likely to garner appellate attention. One set of
alternative definitions describes the requisite state of mind necessary to convict under the
RD rule. Solan (2000, p. 113) cataloged the key phrases from such instructions, which
include telling jurors: you must have “an abiding conviction of guilt;” you must be
“firmly convinced;” you must have “a belief that does not waiver or vacillate;” you must
have “proof of such a convicting character that you would be willing to rely and act upon
it without hesitation in the most important of your own affairs.” Other alternatives seek to
discern a reasonable doubt from an unreasonable doubt. For example, jurors are told that
a reasonable doubt is: “not an unreasonable doubt;” “a doubt which is something more
than a guess or surmise;” “not a conjecture or fanciful doubt;” “not a doubt which is
raised by someone simply for the sake of raising doubts;” “a doubt that a reasonable
person hearing the same evidence would have.”
Though perhaps less likely to elicit appellate attention, these definitions are far
from enlightening (for a thorough review, see Laudan, 2003, p. 301-10). To be sure,
consider the circularity of telling jurors “a reasonable doubt is a doubt a reasonable
person would have,” or the tautology of telling jurors “a reasonable doubt is not an
unreasonable doubt.” Similarly, a doubt that causes hesitation is not helpful because most
people hesitate before making any consequential decision.
11
In light of the emptiness of the available alternatives, some courts have resisted
providing any definition of RD. Trial courts in some jurisdictions are even prohibited
from providing a definition when queried by jurors. In at least ten states any judicial
explanation of RD is automatic grounds for reversal; by contrast, only fifteen states
mandate a judicial definition of the term (Laudan, 2003). The same non-uniformity exists
at much of the federal level, including the Supreme Court, which seems torn on the issue
of whether a definition should be provided. As Justice Ginsburg confusingly noted, “the
Court has never held that reasonable doubt must be defined, however it also has never
held that the term does not need to be defined (cited Horowitz, 1997, p. 291).”
Others have a more principled aversion to defining RD. Some appellate courts
and commentators assert that no definition is necessary because the term is “self-
explanatory.” “Jurors know what is ‘reasonable’ and are quite familiar with the meaning
of ‘doubt’…the term is of common use and acceptance (US v Glass, 1988, p. 386).” The
Seventh Circuit Court of Appeals even argued that providing a definition only makes “the
clear more clear, thereby confusing the jury,” and therefore, “no attempt should be made
to define reasonable doubt (US v Lawson, 1974, p. 433).”
Relatedly, it has been said that jurors possess an “original understanding” of RD
and that the responsibility for defining the term should fall upon the jury (see Diamond,
1990). As one anonymous author put it, “because reasonable doubt is an inherently
amorphous term that demands value judgment in its application, the jury is best suited, as
a representative body of the community, to determine its meaning (Note, 1995, p. 1972).”
Moreover, by forcing jurors to confront the inherent vagueness of the term, it “provokes
12
thought, because the phrase, standing alone, invites deliberation…it focuses juror
attention on a concept rather than on words (Note, 1995, p. 1970-1971).”
The idea that RD is “self-evident” or “self-defining” is not consistent with a spate
of empirical research, which finds that jurors lack comprehension and are confused by the
concept (Saxton, 1998; Levine, 1998; Montgomery, 1980; Ogloff, 1991). For example,
only about one third of 600 eligible jurors in a Michigan study understood that proof
beyond a reasonable doubt did not require absolute certainty (Kramer, & Koenig, 1990).
In one Florida study, more than half of venirepersons failed to understand that the
prosecution had to establish guilt beyond a reasonable doubt (Strawn, & Buchanan,
1976). About a quarter of these same participants believed that, when the evidence
equally favors the prosecution and the defense, the defendant should be found guilty. One
experiment found that jurors were confused by the moral certainty phraseology as well
as the other common descriptions of RD (Horowitz & Kirkpatrick, 1996; Kerr et al.,
1976). On the basis of such findings, Judge Jon Newman (1993) of the Second Circuit
concluded, “that the charges currently in use are ambiguous and open to widely disparate
interpretation by jurors (p. 985),” and Judge Patricia Wald (1993) argued that “jury
instructions [on reasonable doubt]…are like foreign movies without subtitles (p. 111).”
Quantifying Reasonable Doubt
A standard of proof represents an attempt to instruct the fact finder concerning the
degree of confidence our society thinks he should have in the correctness of
factual conclusions for a particular type of adjudication
—Justice John Harlan
13
Much of the lower courts’ refusal to define reasonable doubt has political
motivations. Lower courts are disinclined to provide a definition that might make them
susceptible to appellate review. As Laudan (2003) suggests, “better [for trial courts] to
say nothing…than to say something that might trigger a successful appeal (p. 316).”
Aside from this motivation, however, many trials courts resist attempting to provide a
definition because of the monumental difficulty of such an undertaking and the silliness
that often results. RD is, in all likelihood, a term that simply escapes verbal explanation.
It escapes verbal explanation because RD fundamentally is a description of uncertainty,
and words are inapposite for describing uncertainty. Thus, attempts to describe RD in
verbal terms are foredoomed because they are wrong in principle.
Probability is the language of uncertainty. Uncertainty refers to the condition of
being unsure about the truth or falsity of a proposition and it comes in different degrees.
These degrees are properly described by probabilities, which span the range of 0-1.0,
where a probability of 1.0 reflects absolute certainty. Probabilistic language is commonly
invoked to describe the uncertainty associated with even the most mundane propositions.
For instance, that there is a 60% chance of rain tomorrow. Even if poorly calibrated, such
statements do convey the degree of the uncertainty far more concretely than using non-
numeric terms, such as describing the chances of rain tomorrow as “pretty probable” or
“overwhelmingly probable.”
RD incontrovertibly involves uncertainty. As noted by Justice Harlan (1970),
[“I]n a judicial proceeding in which there is a dispute about the facts on some earlier
event, the fact finder cannot acquire unassailably accurate knowledge of what happened.
14
Instead, all the fact finder can acquire is a belief about what probably happened (p. 370;
emphasis in original).” Justice Harlan went on to argue that “a standard of proof
represents an attempt to instruct the fact finder concerning the degree of confidence our
society thinks he should have in the correctness of factual conclusions for a particular
type of adjudication (p. 370).” Since the RD rule—which is the standard of proof in
criminal adjudication—involves uncertainty, some argue that it must necessarily be
expressed in probabilistic, numeric terms.
Prosecutors and judges have occasionally described RD quantitatively. This was
typically accomplished by providing a numerical analogy to describe the level of
certainty required for conviction (see Tillers & Gottfried, 2006). For example, one
prosecutor explained that proof beyond a reasonable doubt is “kind of like being
somewhere between the 75 and 90 yard line on a 100 yard-long football field (State v
Casey, 1994, p. 98).” Another equated proof beyond a reasonable doubt to “a 1000 piece
puzzle with sixty pieces missing (People v Ibarra, 2001, p. 3).” And one trial judge
presented jurors with a certainty scale ranging from zero to ten and suggested that RD is
“in excess of 7.5 (McCullough v State, 1983).”
Empirical evidence suggests that quantified explanations of RD do produce their
intended effect. For example, Kagehiro & Stanton (1985; see also Kagehiro, 1990)
conducted a mock-trial experiment in which they held everything constant and
manipulated the standard of proof and whether it was described in numeric or non-
numeric terms. When described in non-numeric terms, there was no difference in the rate
of conviction across the different standards. Participants were just as likely to convict
15
under the preponderance of evidence standard as the RD standard. However, when
described in numeric terms, the rate of convictions dropped significantly under the RD
standard compared to the preponderance of evidence or the clear and convincing
evidence standards.
Despite the apparent efficacy, appellate courts have repeatedly castigated
quantified descriptions of RD. As Tillers & Gottfried (2006) sardonically observe, “that
mathematical quantification of beyond a reasonable doubt is impermissible appears to be
established beyond a reasonable doubt (p. 3).” Although appellate courts are hostile
towards quantification for a variety of political reasons, which are developed in chapter 3,
the primary source of antagonism derives from the numbers themselves. The use of a
quantitative standard, of course, requires a number to be specified. While most assume
that RD requires certainty in excess of 0.90 in order to convict, courts and many
commentators do not understand where this number came from and some question its
appropriateness (see Kaye, 1999).
Two years prior to Winship, John Kaplan (1968), a Stanford law professor, wrote
a seminal article that applied Decision theory to the fact-finding process. Decision theory
describes how to optimize decisions in the face of uncertainty. Kaplan (1968)
demonstrated that, because trials involve uncertainty, a decision to convict fundamentally
entails a tradeoff between the possibility of convicting the defendant should he happen to
be innocent and the possibility of acquitting the defendant should he happen to be guilty.
Perhaps the most salient tradeoff in this regard is Blackstone’s (1796) maxim that ten
erroneous acquittals are equal in cost to one erroneous conviction. Kaplan (1968) showed
16
how Decision Theory could translate any specified tradeoff into a numeric value that
corresponds to RD.
Justice Harlan was deeply influenced by Kaplan’s article. Besides the citation that
appears in the concurrence, Justice Harlan fully endorsed the decision-theoretic logic. His
analysis in Winship started by noting that “the choice of the standard for a particular
variety of adjudication does, I think, reflect a very fundamental assessment of the
comparative social costs of erroneous factual determinations (p. 369).” Moreover, “the
choice of the standard to be applied in a particular kind of litigation should, in a rational
world, reflect an assessment of the comparative social disuility of [the two types of
erroneous outcomes].” To cement the foundation of his thesis, Justice Harlan concluded:
I view the requirement of reasonable doubt in a criminal case as bottomed on a
fundamental value determination of our society that it is far worse to convict an
innocent man than to let a guilty man go free (p. 370).
But Harlan’s commentary stopped noticeably short of specifying how much
worse. This did not preclude decision-analytic attempts to translate RD into a specific
level of uncertainty (e.g., Dekay, 1996; Connolly, 1987; Arkes, & Mellers, 1987), though
most of these attempts are considered merely didactic. The Court also never commented
on the prospect of quantifying RD. Although Justice Harlan undeniably used a decision-
analytic framework for discussing the underpinnings of RD, he declined to indicate
how—if at all—this analysis should be implemented practically. It is for these reasons
that some commentators assume, if not ironically, that RD is “inherently qualitative
(Fortunato, 1996).”
17
Conclusion
The time has come for American courts to take the [reasonable doubt] standard
seriously and apply it conscientiously as a rule of law.
—Judge Jon O. Newman
Jurors are universally instructed to refrain from making a judgment until they
have heard the complete evidence. Before jurors are allowed to deliberate, a defendant
can object to the case being tendered to the jury on the grounds that the prosecution failed
to produce sufficient evidence of guilt (Muller & Kirkpatrick, 2003). In this instance, the
judge must determine whether the evidence established guilt beyond a reasonable doubt.
If the evidence can rise to this level, the case is given to the jury; if it cannot, the case is
dismissed outright.
A similar issue arises when appellate courts must determine whether the RD rule
was appropriately applied as a matter of law. That is, whether a criminal defendant was
convicted based on a standard of proof that is congruous with Winship. It is a
constitutional violation for defendants to be convicted on evidence less than that required
by due process, and defendants do occasionally challenge their conviction on such
grounds, albeit not with much success (Newman, 1993).
Both of these issues theoretically force courts to face the question: was the
standard sufficiently stringent so as to be consistent with Due Process? The answer to this
question presupposes that there is some reification of RD such that a sufficiently stringent
standard can be discerned from a non-stringent standard. I submit that no verbal
18
formulation of RD is capable of such discernment. Instead, the decision-theoretic
framework that the Winship court relied upon is necessary to even attempt to address this
question.
The following chapter provides a primer on decision theory that will elucidate the
relation between “the comparative social costs of erroneous factual determinations” and
the RD standard of proof. This analytical framework separates several important
dimensions of the decision-making process, including disentangling the RD criterion
from jurors’ beliefs about the evidence. Although it cannot definitively answer the
question of whether the standard is sufficiently stringent, because there is no normative
consensus on the appropriate value tradeoff, decision theory provides an account of how
a rational juror would make decisions. That is, how jurors ought to make decisions (given
certain assumptions) rather than how jurors do make decisions. A comprehensive
discussion of the legal considerations relevant to decision theory will also be provided.
The decision theoretic model is ultimately used as a foundation to describe a
novel hypothesis about a way in which jurors’ decisions, as a descriptive matter, might
systematically differ from how a rational juror should make decisions under the RD rule.
This hypothesis is motivated by a burgeoning body of research on cognitive psychology,
and it accompanies a current debate amongst jurists and legal academicians about the
principle dynamics of the RD standard. All of these issues will be discussed in due
course.
19
Chapter 2: The Rational Juror
This chapter describes the machinery for determining how a rational juror would
make decisions. Three qualifications are in order before proceeding any further. First, this
section is concerned with how jurors ought to behave, rather than how jurors do behave.
Whether jurors do behave in a manner consistent with how they ought to is an empirical
question that is not addressed in this chapter. Second, ‘rationality’ cannot be proven or
disproven based on empirical observation. Rationality concerns methods of thinking, not
outcomes per se. Even very poor methods of thinking can sometimes yield desirable
results. Behavior is considered rational if and only if it conforms to the theory’s
conception of rationality. Third, one is free to choose any conception of rationality they
desire. In Economic Theory, the prevailing conception of rationality is maximizing
expected utility. This conception of rationality is assumed throughout this dissertation.
20
Bayes Theorem
Discovered by the Reverend Thomas Bayes at the turn of the 18th century, and
published posthumously in 1763, Bayes theorem is a logical result of probability theory
(Bolstad, 2007). It is logical in the sense that the theory is derived from a set of coherence
axioms and it follows mathematically from fundamental assumptions of probability
theory (for a discussion see Baron, 2000). The theorem itself dictates how one ought to
update the probability of a proposition in light of new information. There are, roughly,
three conceptions of probability–frequentist, logical, and subjective—though the
particular type is not important to the Theorem (von Winterfield & Edwards, 1986). The
Theorem only prescribes what can be done by way of manipulating and combining
probabilities once they are specified; it does not specify the probabilities themselves.
The subjective conception of probability is most closely associated with Bayes
Theorem. In contrast to other probabilities, which comport to “objective facts” that are
based on observed frequencies or logical possibilities (Baron, 2000), subjective
probability is a personal judgment (i.e., credence) about the likelihood of a proposition or
event (see Savage, 1954). A distinct advantage of this conception is that it is amendable
to unique, one-off events, such as whether a democrat will be elected in 2012, or whether
the defendant committed the alleged murder. One-off events vex the other forms of
probability, which presume repeated, identical trials. Other than card games and artificial
gambling tasks, repeated, identical trials are rare; one-off events are commonplace, and
thus make the subjective conception of probability especially important (Kemeny, 1959).
21
It is worth emphasizing that probability judgments pertain only to beliefs about
the likelihood of a proposition. Though such beliefs are useful for making decisions,
other considerations, such as decisional goals and options, are also relevant to the
decision-making process (McFall & Treat, 1995). These considerations are not part of
probability theory. Thus, probabilities alone are necessary but not sufficient to make
decisions. The way belief functions interact with the other decision-making
considerations is described in the next section.
Bayes theorem can be expressed in odds form as:
2.1
As is apparent, the Theorem is comprised of three components (from left to right): the
prior odds; the likelihood ratio; and the posterior odds. The prior odds reflect the state of
belief about the proposition prior to the reception of information. The likelihood ratio
quantifies the degree to which the received information changes the belief about the
proposition. The Theorem holds that the product of the prior odds and the likelihood ratio
are equal to the posterior odds. The posterior odds indicate the belief about the
proposition after receiving information. A numeric example is now provided for the sake
of illustration.
Let us assume that a defendant is accused of taking money from a company safe.
The relevant proposition (X) is whether the defendant took the money from the safe.
1
1
Some have argued that such a demonstration is inapposite to fact-finding because this proposition is
necessarily but not sufficiently related to guilt, which also requires a demonstration of mens rea, among
other specified elements (Tribe 1971a). While this argument is true—X is not perfectly synonymous with
“guilt”—the distinction is unimportant for the pedagogical purposes here.
22
Before receiving any evidence, the fact-finder might believe that the odds that the
defendant stole the money are 1:10. Hence, the prior odds of X or ( ) are roughly 9%.
During the trial, a witness testifies that she saw the defendant take the money out
of the safe (E). Although this is highly incriminating evidence, one must be cognizant of
the infirmities of human judgment. Since eyewitness identifications are not infallible, one
needs to know whether, and to what extent, the identification is more likely to be true
than false. To accomplish this, one first needs to assess likelihood of an identification (E),
given the defendant did take the money (X), or p(E|X). This conditional probability is
referred to as the “true positive rate.” Next, one needs to assess the likelihood of an
identification (E), given the defendant did not actually take the money (~X), or p(E|~X).
This conditional probability is referred to as the “false positive rate.” The ratio of the true
and false positive rate forms the likelihood ratio for the eyewitness identification. For the
sake of illustration, assume that a true identification always occurs but that a false
identification occurs in one out of 100 identifications. The likelihood ratio is thus 100,
indicating that the identification is 100 times more likely to be true than false.
Given a prior of 1:10 and an eyewitness identification, which are mistaken one
out of 100 times, the posterior odds that the defendant took the money are 10:1, or about
91%. This posterior might seem counter-intuitively small given that the evidence is so
highly incriminating. After all, the identification is very likely to be accurate. The
seemingly small posterior, however, is the consequence of the prior. If the prior were
10:1 in favor of guilt rather than innocence, the posterior based on the exact same
evidence would be 1,000:1, roughly a 99.9% chance that the defendant took the money.
23
Again, whether either of these figures is sufficient for a guilty verdict depends on other
decision-making considerations, which are discussed in the next section.
Most trials, of course, consist of more than just a single piece of evidence. The
propriety of Bayes to marshal multiple pieces of evidence crucially depends on whether
the pieces of evidence are statistically independent. Independence is a concept that comes
in many forms. One type, known as unconditional independence (referred to simply as
“independence”), is when knowledge of one event does not affect the estimate of the
other event. For instance, flipping a head should not influence the likelihood that the next
flip of a fair coin will also result in a head; in this instance, the coin flips are considered
independent. In ecology, including the judicial context, such independence is rare.
Another type of independence is known as conditional independence (Schum &
Martin, 1982). Conditional independence occurs when there is no relation between
variables conditional on some third variable. For example, two pieces of evidence might
be correlated and are therefore not independent, but assuming the defendant is guilty (or
“conditional on guilt”), the evidence may no longer be correlated, thus the evidence is
conditionally independent (Edwards, 1992). It is customary for each piece of
conditionally independent evidence to constitute a separate likelihood ratio (Friedman,
1992). So long as likelihood ratios are conditionally independent, they may be multiplied
to yield the posterior (Bolstad, 2007).
Bayes does not specify when such multiplication must take place. The most
conventional approach is to multiply, or “update” after the reception of each individual
piece of evidence. This chronological approach would take on the following form:
24
2.2
The prior (the left-most term) is modified by the likelihood ratio for the first piece of
evidence (the center term) yielding the posterior (the right-most term). This value reflects
the current state of belief after learning about the first piece of evidence.
Now a second piece of evidence is about to be introduced. Before hearing this
piece of evidence, the current state of belief is equal to the first posterior. Notice that the
right-most term in equation 2.2 is equal to the left-most term in equation 2.3. In this way,
the first posterior is the new prior because it is the belief about the proposition prior to
hearing the second piece of evidence. Hence the phrase “today’s posterior is tomorrow’s
prior (Lindley, 1970, p. 2).” Following the introduction of the second piece of evidence,
this prior is multiplied by the likelihood ratio for the second piece of evidence (the center
term of equation 2.3), which yields the posterior of the second piece of evidence (the
right-most term). This posterior reflects the impact of the original prior, the first piece of
evidence and second piece of evidence.
2.3
As Bayes presupposes no order effects, the order in which the information is received
does not alter its value (Edwards, Lindman, & Savage, 1963). Regardless of whether
is received before , or vice versa, the resulting posterior is numerically equivalent.
An alternative to the previously described updating approach is to reserve any
updating until all the evidence is received. This approach is referred to as ‘chunking’ and
would take on the following form:
25
2.4
Here, the prior is modified by one likelihood ratio that includes both pieces of evidence.
This approach treats both pieces of evidence as if they are one large piece of evidence
and hence only one likelihood ratio.
Chunking is particularly desirable when the pieces of evidence are not
conditionally independent. If likelihood ratios are not conditionally independent, the
dependencies must be modeled to avoid spuriously inflating the likelihood estimates (in
essence, doubt counting the value of each piece of evidence), a task with stifling
computational complexity (see Schum, 1994). The chunking approach, however, avoids
this complexity by treating the conditionally non-independent elements as constituents of
a larger part, and conducting the analysis at this combined level. Notably, this level of
analysis is not necessarily inappropriate in legal settings (Schum, 1992). It is also
noteworthy that the chunking approach renders Bayes theorem perfectly consistent (see
Friedman, 1992; 1997) with the Story Model of Juror decision making (Pennington &
Hastie, 1991; 1992), a purported alternative to Bayesian modeling, in which jurors seem
to update their beliefs in a chunking-type manner.
Decision Theory
Bayes theorem dictates how to update a specified belief in light of new
information, but it does not indicate how such beliefs should be translated into a binary
decision. Whether a belief is sufficient to undertake a given course of action depends on
the consequences of that action and the a priori preferences of the decision maker.
26
Fortunately, the principles of Bayes theorem have been extended to incorporate such
preferences (McFall & Treat, 1995). This extension goes by the locution Decision
Theory. In a vein similar to Bayes theorem, Decision Theory does not dictate what
preferences to hold, rather, it indicates the optimal decision given a set of preferences.
The derived decision is considered optimal for two reasons (see Baron, 2000).
First, in the long run, adherence to the decision rule will yield the specified preferences
more frequently than alternative strategies. Second, the decision rule is based on axioms
that create a type of internal consistency. For example, one axiom is the principle of
transitivity: if option A is preferred to option B, and option B is preferred to option C,
then option A must be preferred to option C. Because of this and the other axioms, the
implied decision is logically coherent and thus conforms to the operative conception of
rationality.
A subset of Decision Theory is known as Signal Detection Theory (Swets &
Pickett, 1966; Coombs, Dawes, & Tverskey, 1970). Signal Detection Theory (SDT)
posits that three considerations are relevant to the decision-making process: the prior; the
likelihood ratio; and the utility ratio. The utility ratio is a means to rank-order the
preferences of the various outcomes of a decision. These include the relative benefits of
the correct decisions (i.e., true positives and true negatives), and the relative costs of the
erroneous decisions (i.e., false positives and false negatives). One should bear in mind
that preferences, much like subjective probabilities, are subjectively held, and that
utilities basically indicate the “good” or “goodness” of the various alternatives (Baron,
2000). SDT holds that when the likelihood ratio exceeds the product of the inverse prior
27
and the utility ratio, an affirmative decision should follow, otherwise it should not. This
relation is formally expressed as:
2.4
The rightmost term in equation 2.4 is the utility ratio. The subscripts indicate which
outcome the given utility represents. For example, reflects the utility of a true
positive, whereas reflects the utility of a false positive, and indicates the utility of
a true negative while corresponds to a false negative.
In the context of criminal adjudication, it is commonly, though not
uncontroversially (see next section), assumed that the utility of the correct decisions is
equal. That is, convicting the guilty (i.e., a true positive) and acquitting the innocent (i.e.,
a true negative) are equally preferable. Such an assumption allows equation 2.4 to be
simplified to depend only on the relative utility of the incorrect decisions (i.e., convicting
the innocent – false positives; and acquitting the guilty – false negatives). The utility
associated with incorrect decisions is referred to as disutility.
≥ 2.5
One additional simplification can be made to equation 2.5. The inverse prior can
be moved to the left side of the equation. This makes the left side of the equation, now
denoted as , to reflect the optimal threshold (Swets, 1992), a posterior (recall,
pursuant to Bayes, that the product of the prior and the likelihood ratio is the posterior).
2.6
28
Two things follow from Equation 2.6. First, the threshold required for a conviction
directly depends on the relative disutility of the errors. Second, the decision itself
depends on whether the posterior odds exceed this threshold. For example, if one
assumes that the disutility of a false positive is 10 times greater than a false negative,
would equal 10:1 odds, or about a 91% probability. This means that the posterior
probability must exceed 91% in order to convict. If one assumes that the disutility of a
false positive is 100 times greater than a false negative, would equal 100:1 odds, or
about a 99% posterior probability. This means that the posterior must exceed 99% in
order to convict.
Again, it is important to reiterate that Decision Theory does not provide the
utilities; probabilities and utilities are not self-defining. Rather, Decision Theory can be
used to derive a decision—given a specified set of utilities or disutilities—that conforms
to the operant conception of rationality, which, in this case, is the objective of
maximizing expected utility.
Accounting for Legal Norms
Although decision theory does not specify what the probabilities and utilities
ought to be, two of the decision components—the prior and the utilities—are
theoretically constrained by legal norms. There has been much commentary on how, if at
all, these components should be interpreted under extant doctrine. This commentary is
described below. Reconciling the components with legal doctrine is difficult but not
29
intractable, and the criticisms ultimately do not amount to the demise of using decision
theory to model how a rational juror would behave (cf. Allen, 1997; Callen, 1992).
The Prior
A Bayesian prior is a means to operationalize the Presumption of Innocence.
Properly understood as the point of departure from which a juror begins a consideration
of the evidence in the current case (Friedman, 2000), some courts and commentators
contend that a non-zero prior probability is inconsistent with the Presumption (State v.
Skipper, 1994; Cohen, 1977; Tribe, 1971b; Jaffee, 1988). The argument is that, until
evidence is produced to the contrary, jurors ought to be agnostic with respect to the
defendant’s guilt, and “to hold that a claim [of the defendant’s guilt] begins with more-
than-zero prior probability is to imply that the claim is brought with reason (Jaffe, 1988,
p. 1006).”
A prior of zero renders Bayes theorem unworkable because the product of zero
with any integer is always zero. Hence the title of Jaffe’s (1988) article: “Prior
probability – a black hole in the mathematician’s view of the sufficiency and weight of
evidence.” But a zero prior goes well beyond agnosticism with respect to guilt. It
logically indicates that there is no possibility of guilt and that no amount of evidence
could change this view (Kaye & Balding, 1995). There are virtually no real-world
situations in which such a claim can be sustained. Moreover, in Taylor v Kentucky
(1978), a case often cited in conjunction with the Presumption, the US Supreme Court
held that “guilt or innocence is to be determined solely on the basis of evidence
introduced at trial (p. 485).” This does not require jurors to begin their consideration of
30
the case with the belief that the defendant could not possibly be guilty; and any juror who
harbored such feelings would be excluded for cause. Rather, Taylor stipulates that jurors
cannot consider the “indictment” as evidence of guilt. As such, jurors ought to consider
the defendant no more likely than anyone else to be guilty, in which case the prior would
be quite small but not zero (Kaye & Balding, 1995).
Friedman (1995; 2000) argues that the equally likely conceptualization is
inadequate for several reasons. First, a defendant may contend that he is less likely to be
guilty than the mean member of the population. Indeed, crimes that require physical
agility or that are gender specific (i.e., rape) might exclude broad classes of non-viable
defendants outright. Second, there is no obvious reference class that defines the relevant
population from which the defendant is a “randomly selected” member. Some argue the
relevant population should be considered “all the people in the universe (Weinstein &
Dewsbury, 2006, p. 168).” It seems illogical to include individuals that could not have
possibly committed the crime in the reference class, but hedging the class would be
essentially arbitrary (Allen, & Pardo, 2007). In sum, Friedman (2000) notes that a
Bayesian analysis is not incompatible with the Presumption of Innocence so long as the
prior is not zero. The rational juror would have a small prior, but exactly how small is an
unsettled question that is open to debate, with the caveat that it cannot be zero.
The Utility Ratio
Assertions about the proper tradeoff of juridical errors have roots dating back to
the Book of Genesis. In Anglo-Saxon jurisprudence, the utility ratio has always embodied
a preference for false negatives (acquitting a guilty defendant) relative to (false positives
31
convict an innocent defendant) (Volokh, 1997). This is a social value judgment that is
grounded in the comparative costs of each type of error. A false positive is considered
more costly because it violates the social contract between the state and the individual,
erodes the integrity of state (see Dworkin, 1977), undermines the deterrent effect of
punishment (Polinsky & Shavell, 1999; Posner, 1998), and gratuitously imposes the
monetary cost of punishment (Lillquist, 2002).
However, even if one accepts that a false positive, as a matter of social policy, is
the greater evil, it is not clear how disparate the relative costs are. In 1476, Fortescue
seminally posited that twenty false negatives are equal in cost to one false positive.
Almost 200 years later, Hale opined that the number is five, and shortly thereafter,
Blackstone held the number is ten. Reformers of the 19
th
century raised the number
considerably, with Starkie notably holding that the proper ratio is “ninety-nine (i.e., an
indefinite number) (Risinger, 1998, p. 443).” To date, Blackstone’s ratio of 10:1 is the
most popular and persistent, and Justice Blackmun even characterized it as “perhaps not
an unreasonable assumption (Ballew v Georgia, 1978, p. 234).”
Blackstone’s ratio must be understood in the context of the 18
th
century criminal
justice system, in which a conviction almost surely resulted in a death sentence (Lillquist,
2004; 2008). The irreversibility of this imposition is what primarily motivated the
asymmetry in Blackstone’s ratio (Lillquist, 2005). In recent times, however, nearly all
westernized governments have eschewed capital punishment, the sole hold out being the
United States, where the imposition is limited to a very narrow set of offenses (Stuntz,
2001). Though incarceration does have seriously drab consequences, which include a
32
consistent lowering of one’s life expectancy (see Schnittker & John, 2007), it is unclear if
the 10:1 ratio should apply when capital punishment is not involved (Schauer, 1993;
Lillquist, 2005).
Even in the 18
th
century, not all commentators accepted that the ratio should favor
acquitting the guilty. Jeremy Bentham, a prominent English jurist and moral philosopher,
was critical of Blackstone’s adage and the commentary it engendered. Bentham noted, “at
first it was said to be better to save several guilty men, than to condemn a single innocent
man; others, to make the maxim more striking, fix the number ten; a third made this ten a
hundred, and a forth made it a thousand (cited in Twining, 1986, p. 98).” But Bentham
was keen to the consequences of such rhetoric, which he believed to “give crime
impunity, under the pretext of insuring the safety of innocence (cited in Twining, 1986, p.
98).”
Allen and Laudan (2008) expanded on Bentham’s thesis by noting that
minimizing false convictions entails increasing the number of acquittals, both true and
false. False acquittals leave criminals unpunished, undeterred and able to further
victimize the citizenry. Allen and Laudan (2008) argue that false acquittals are a more
pernicious risk than being falsely convicted of a crime. According to a rough calculation,
a random person in the United States is over 300 times more likely to be the victim of a
serious violent crime than to be falsely convicted. By decreasing the likelihood of false
convictions, it paradoxically increases the likelihood of being the victim of a violent
crime, and the state has an obligation to protect citizens from both harms. Allen and
Laudan (2008) contend “the acceptable number of false convictions is the number of
33
false convictions that minimizes the aggregate disutility of grave risk to innocent persons,
either from false conviction or from crime victimization (p. 84).”
Others have levied an even more fundamental objection against Blackstone-type
ratios. The discussion of tradeoffs has historically focused only on the disutilities, making
the assumption that the utility of the correct decisions is equal since a “juror should feel
no regret at reaching correct decisions (Lempert, 1977, p. 1036).” Laudan and Saunders
(2009) detest this assumption, which they refer to as “the Blackstonian fantasy,” and
posit that the “myopic focus on expected losses is really quite stunning….to put wholly to
one side any consideration of the respective utilities associated with correct outcomes [is]
an egregious miscalculation (p. 14).”
Consider just a few qualities of correctly convicting the guilty: victims get their
retribution; offenders get their just deserts; safety is promoted by incapacitation; others
might be deterred from committing a similar act. And correctly acquitting the innocent
has desirable consequences, which include both the preservation of justice and
vindication for the accused. Indeed, whether these consequences are equally desirable is
debatable and has been debated. For instance, Tribe (1971a) assumed that correctly
convicting the guilty is the more desirable outcome, Milanich (1981) opined that
correctly acquitting the innocent is the most desirable outcome, and Lillquist (2002)
argued that the utility associated with the correct decisions depends on the particular type
of crime. For example, Lillquist (2002; 2005) believes that the value of a correct
conviction is greater in cases involving terrorists rather than traffic offenses because of
specific deterrence. Similarly, the value of a correct conviction is the greatest in a capital
34
case because “society can be sure the defendant will never offend again (Lillquist, 2002,
p. 150).”
There is not unanimity on the appropriate utility ratio, nor is there likely to be
agreement in the future. But this does not preclude the enterprise of using decision theory
to model how a rational juror ought to behave. If anything, the use of decision theory
forces society to confront these types of value tradeoffs, rather than relegating them to the
jury’s opaque “black-box.” Decision theory can implement whatever tradeoff is
prescribed by legal norms. The fact that there is no consensus on the appropriate norm
does not undercut the value of decision theory.
35
Chapter 3: Quantitative Proof and Policy Considerations
Consider the following hypothetical from Nesson (1979, p. 1192-1193):
In an enclosed yard are twenty-five identically dressed prisoners and a prison
guard. The sole witness is too far away to distinguish individual features. He sees
the guard, recognizable by his uniform, trip and fall, apparently knocking himself
out. The prisoners huddle and argue. One breaks away from the others and goes to
a shed in the corner of the yard to hide. The other twenty-four set upon the fallen
guard and kill him. After the killing, the hidden prisoner emerges from the shed
and mixes with the other prisoners. When the authorities later enter the yard, they
find the dead guard and the twenty-five prisoners. Given these facts, twenty-four
of the twenty-five are guilty of murder.
There is a ninety-six percent chance that any randomly selected prisoner is guilty.
Suppose one of the prisoners is brought to trial for murder. Should a juror convict in the
absence of other evidence? The visceral answer is no.
Two reasons might explain this reaction. First, one might believe that a ninety-six
percent chance of guilt still leaves room for reasonable doubt; that is, reasonable doubt
requires the chances of guilt be closer to unity in order to convict. Second, and perhaps
more compellingly, there is something intuitively wrong with convicting on mathematical
chances alone. But what could this be? Convictions are regularly based on less certain
and less reliable evidence, such as shaky eyewitness identifications. Yet, a conviction
based on mathematical evidence alone, which is arguably almost always more precise,
offends the sensibility of fairness.
This topic has been the source of a heated debate that has spanned four decades
(Tillers & Green, 1988; Fineberg & Schervish, 1986; Callen, 1992; Allen, 1997;
Friedman, 1997), and does not appear to be slowing (Park et al., 2010). The debate
36
fundamentally concerns a tradeoff between the precision of mathematical evidence and
the sense of unfairness that it provokes. Some argue that the sense of fairness in the trial
process greatly outweighs any increase in precision, and there thus is a deep
incompatibility between the law and the decision theoretic conception of a rational juror.
In short, the trial process is concerned with means not just ends.
This distinction is exemplified by the pursuit of legal rather than factual guilt
(Lawson, 1992; Hoffman, 2007). Jurors are not simply asked to determine whether the
defendant “did the alleged act” (factual guilt); they are asked to determine whether the
admissible evidence proves the defendant’s guilt beyond a reasonable doubt (legal guilt)
(Dershowitz, 1996). Factual guilt, even if it could be known, is not a judicial aspiration
because the possible means of reaching this determination, such as torture and coercion,
do not justify the ends. It is commonly acknowledged that the use of decision theory can
increase factually correct verdicts (Koehler & Shaviro, 1990). But it is also said that
decision theory cannot increase legally correct verdicts because the method is considered
an impermissible means.
One question is whether the use of decision theory offends the underlying
jurisprudential values in the same way as, say, torture or coercion. There is also a related
policy question of whether any such offensiveness does in fact sufficiently outweigh the
increase in precision, such that decision theory should not be used in the trial process.
The answer to these questions crucially depends on the capacity in which decision theory
is applied in the trial process. At the extreme, the model would eschew jurors altogether
and verdicts would be determined pursuant to raw calculations from decision theory. This
37
possibility is referred to as “trial by mathematics.” A more modest application is the use
of the model for pedagogical purposes. This possibility is referred to as “heuristic use.”
Although each capacity invokes a different set of considerations, many are hostile to the
application of the model in either.
Trial by Mathematics
Eschewing the judgment of jurors altogether came to the forefront in an infamous
case People v. Collins (1968). In that case, a mathematics instructor testified that, based
on six items of evidence, there was one chance in twelve million that the defendant was
innocent, according to the product rule of probability theory. Tribe (1971a) eloquently
took this estimate to task for a variety of reasons, including the lack of a factual basis for
the individual probability estimates, the product rule required an unmet assumption of
independence, and the expert committed the fallacy of transposition in equating the
probability of observing the evidence in the population with the probability that the
defendant was innocent. Tribe (1971a) went on to argue that, even if correct, probabilistic
proof would be a deficient basis for verdicts because RD reflects a subjective belief, not
mathematical odds. Moreover, he argued that subjective probabilities, although more apt,
inherently fail to account for essential aspects of criminal verdicts, such as intent or
motive. These issues aside, Tribe (1971a) did explicitly argue that the benefits of
quantification are vastly outweighed by the costs to the institution of justice, a decidedly
political argument. His specific arguments follow.
The Quantification of Sacrifice. Tribe (1971) argued that a conviction based on a
probability less than unity openly acknowledges a margin of error and thus “puts an
38
explicit price on an innocent man’s liberty (p. 387).” For instance, if RD means that
jurors should convict when they are at least 0.95 certain of the defendant’s guilt, one
innocent defendant in twenty would expectedly be convicted. For a number of reasons,
Tribe (1971a) argued that the costs of “explicitly spelling that out…would be too high (p.
1375).” First, openly acknowledging that innocents are convicted could lead to greater
social callousness and more injustice overall. Second, verdicts are more likely to be
accepted when they are declared with certainty. Finally, deterrence is likely to be
undermined when it is acknowledged that false convictions are inevitable. According to
Tribe (1971a), it is the overtness not the actual risk of erroneous convictions that subverts
the prestige of the legal system; an institution that openly acknowledges error cannot
legitimately be the final arbitrator (cf. Shivaro, 1989).
Individualized Justice. Another concern with probabilistic proof is that it runs
counter to the notion of “individualized justice.” According to this view, judicial
decisions ought to be made on the basis of the individual’s conduct, not whether the
individual is the member of some statistical class with a given predilection (Brillmayer &
Kornhauser, 1978). This argument is especially fervent when the statistical class is based
on factors that are beyond the individual’s control, such as whether the person is African
American or has black hair, two of the factors that were used in the Collins case. The
putative incongruence between individualized judgment and probabilistic proof stems
from the fact that statistical inference assumes equality amongst the class members (i.e.,
equally and identically distributed). This conceptualizes defendants as equal members of
a class, rather than as individuals who have certain, unique characteristics (Grove &
39
Meehl, 1996; cf. Underwood, 1979). As stated by Judge David Bazelon (1978, p. 58),
“the greatest inequality is equal treatment of unequals—and people are unequal.”
Gambling with Guilt or Innocence. Tribe (1971b) argued that there is a
“qualitative” difference between a conviction that results when a juror is “fully
convinced” of guilt and a conviction that results when a juror has “reason to believe the
[defendant] may be innocent (p. 386).” Jurors should not gamble with the defendant’s
liberty; they must have an “actual belief” or an “emotional stake in the decision.” This
requires jurors to be certain of the defendant’s guilt, because, again, any probability less
than unity indicates some possibility of innocence and is thus a gamble (Cohen, 1977). Of
course, this view is naïve, since metaphysical certainty is an illusion. However, Tribe
(1971b) argued that it is society, not jurors, who should be the one taking the gamble.
Jurors should be certain in their verdicts, even though society is aware that such verdicts
are not infallible. Because jurors should be prohibited from gambling, a verdict cannot be
legitimately based on a probability less than unity.
Heuristic Use
The original article to which Tribe so vigorously objected proposed a more
modest application of the model to fact-finding. In that article, Finklestein and Fairley
(1970), a lawyer-statistician team, suggested that an expert should be permitted to instruct
jurors how to update their beliefs in light of statistical evidence, such as the rarity of a
palm print. This could be accomplished by using a chart to show jurors how to assimilate
evidence that one in one thousand people possess a palm print identical to the defendant’s
(Finklestein, 1971). Some have argued that even this modest approach subverts
40
fundamental values of the justice system, and, moreover, that it creates logistical
problems that greatly outweigh any benefits (Tribe, 1971a,b; Nesson, 1985; Brillmayer,
& Kornhauser, 1978).
Logistical Problems and the Presumption of Innocence. Tribe (1971) expressed
concern about the logistics of the chart approach. In order for jurors to understand how to
update their prior probability, they must first introspect what this value is. Tribe (1971)
opined that attempting to acknowledge a possibility of guilt before the conclusion of trial
signified the demise of the presumption of innocence: “Jurors cannot at the same time
estimate probable guilt and suspend judgment until they have heard all the defendant has
to say (p. 1371).” Additionally, he argued, introspection will cause jurors to realize that
the defendant is not a randomly selected person but one whom the authorities strongly
suspected after extensive investigation. This realization is said to conflict with the
“complex symbolic functions of trial procedure and its associated rhetoric (Tribe, 1971,
p. 1371).”
Dwarfing Soft Evidence. In addition to the logistical problems, Tribe (1971)
speculated that quantified evidence would “dwarf” soft evidence, on the assumption that
“if you can’t count it, it doesn’t exist (p. 1361). Readily quantifiable factors are easier to
process—and hence more likely to be recognized and then reflected in the outcome—
than are factors that resist ready quantification (p. 1362).” Moreover, the heuristic use
would either require jurors to quantify “soft” evidence, a task that increases the
susceptibility of error, or somehow intelligently combine quantified evidence with “fuzzy
imponderables.” It is noteworthy that Tribe’s behavioral speculation that jurors would be
41
over-persuaded by quantified evidence has not been supported by subsequent empirical
evidence. Jurors, as an empirical matter, tend to undervalue quantified evidence (Faigman
& Baglioni, 1988; Schklar & Diamond, 1999).
Conclusion
The unwavering rejection of trial by mathematics is firmly rooted in policy
considerations. The trial ritual serves a symbolic purpose designed to promote a sense of
fairness and legitimacy in the institution, even if such policies counter-produce factually
correct verdicts. Again, the proper criterion is legally correct, not factually correct
verdicts, and, in this regard, trial by mathematics is metaphorically more akin to torture
and coercion in being an impermissible means to an end. Of course, one might question
whether the perception of fairness truly outweighs factual fairness (see Shaviro, 1989).
The ritual might increase the perception of verdict accuracy even if it actually reduces
verdict accuracy (Koehler & Shivaro, 1990).
The issue is more equivocal with respect to the heuristic use of the normative
model. It is not clear that there are major threats to the perception of fairness when used
in this capacity. The arguments against the heuristic use are either speculative or rely on
empirical assumptions that have been disabused. Koehler (1991) notes that many
commentators conflate policy with probity–two conceptually separable issues—when
arguing against the heuristic use. Policy arguments, as discussed, relate to the pursuit of
legal rather than factual guilt out of concerns about the means. Probity arguments concern
the probative merit or diagnosticity of the evidence itself. As Koehler (1991) notes,
“[some] attack on policy grounds, then use these arguments to support a conclusion that
42
[the] evidence is diagnostically inferior [or vice versa] (p. 147).” Similar bootstrapping
seems apparent in arguments against the heuristic use.
Aside from the capacity in which to implement the model, there seems to be little
debate about the model’s theoretical validity. A notable exception is the claim made by
Cohen (1977) that the tents of decision theory are logically flawed. Cohen proposes his
own conception of rationality, in which probabilities (what he calls “inductive
probabilities”) behave in a manner very different from Bayesian probability theory.
Cohen claims that inductive probabilities are more germane to legal fact-finding (but see
Kaye, 1979). Although interesting, this system of thought has some highly undesirable
consequences (see Schum, 1979; Fineberg, 1986).
Debating the merits of Cohen’s conception of rationality is far beyond the scope
of this dissertation. As stated in the last chapter, this dissertation assumes an economic
conception of rationality. If one accepts the assumptions laid out in the last chapter, one
should have no qualm with a decision theoretic account of a rational juror or the use of
the model to analyze juridical topics. Indeed, under the “new evidence scholarship”
(Lempert, 1986), law students are taught to analyze legal concepts in Bayesian terms (for
e.g., Lempert, 1977; Kornstein, 1976). Even if one rejects its pragmatic value, there is no
incompatibility in using the model to describe how jurors ought to make decisions under
the RD rule.
43
Chapter 4: Threshold Variability: Virtue or Vice?
For me the reasonableness of the doubt required to acquit should depend on the
seriousness of the crime and the severity of the punishment. No doubt is
reasonable if the punishment is death. Very little doubt should be deemed
reasonable if the punishment is imprisonment. But if the punishment is merely a
fine or a suspended sentence, the required degree of doubt might be greater.
—Alan Dershowitz
The Economic Rationality of Threshold Variability
Discussions of Blackstone-type ratios imply that, to the extent there is social
agreement on a particular ratio, a singular threshold value corresponds to RD. Indeed,
Winship held that the RD rule applies to all criminal cases, and it did not state the
standard applies with lesser force depending on the ‘type’ of criminal allegation (Cohen,
1995). Copious empirical evidence, however, suggests that there is not a universal
decision threshold, and that the RD standard varies based on general-case characteristics,
such as the severity of the crime or the seriousness of the potential punishment. This
phenomenon was first detected by Simon (1969) who asked a sample of judges, “What
would the probability that the defendant committed the act have to be before you declared
him guilty (p. 110)?” Simon (1969) provided a list of crimes, including murder, rape, and
embezzlement, and found systematic variation: the more serious the crime, the higher the
required probability to convict (see also Simon & Mahan, 1971). McCauliff (1982)
reported similar findings from a different sample of federal judges.
Kerr (1978) demonstrated this effect experimentally by manipulating the type of
crime (e.g., first degree murder or second degree murder) as well as the severity of the
44
punishment (i.e., 1-5 years in prison; 25-years to life; capital punishment). Participants
were less likely to convict as the severity of the punishment increased, and less likely to
convict when the crime was first-degree murder as compared to second-degree murder.
To date, several studies have replicated these findings (e.g., Hester, & Smith, 1975;
Kaplan, & Krupa, 1986), even using more sophisticated methodology (Martin & Schum,
1987), and have detected other factors that influence the conviction rate, such as the type
of the crime (Nagel, Lamb, & Neef, 1979; Vidmar, 1997) and its heinousness (Bright &
Williams, 2001).
This type of variability (hereinafter “threshold variability”) has a rational
economic explanation. According to decision theory, rational decisions are a function of
their potential consequences. And the potential consequences do vary between criminal
cases, as the legendary Judge Weinstein (Weinstein & Drewbury, 2006, p. 168-69)
rhetorically asks:
Should society be willing to risk 10 guilty defendants go free rather than one
innocent person be convicted? Or is the proper ratio 100 to one? Should we be
willing to accept lower risks in a spitting on the sidewalk case than in a capital
homicide case?
Of course, as the consequences increase the ratio’s preference for false negatives to false
positives, the level of certainty required for conviction correspondingly rises. For
instance, since the consequences of an erroneous conviction for embezzlement are
unarguably less significant than the consequences of an erroneous conviction for capital
murder—plainly, “death is different” (Sand & Rose, 2003, p. 1361).”—as such, one
should require a greater degree of certainty for conviction in the capital case.
45
The Legal Propriety of Threshold Variability
From a legal perspective, the appropriateness of threshold variability is unclear.
The legal process has instituted several bulwarks designed to “control[] the jury’s
rationality (Kaplan, 1968, p. 1076).” While rational decisions are a function of their
potential consequences, the law is sometimes crafted specifically to keep jurors
uninformed of the potential consequences of their decisions. The bifurcation of the guilt
and penalty phases is one example (Horowitz & Seguin, 1986). Most jury instructions
explicitly prohibit jurors from considering the punishment associated with a verdict. The
jury instructions endorsed by the Eighth Circuit are representative: “You may not
consider punishment in anyway in deciding whether the Government has proved its case
beyond a reasonable doubt.” Additionally, certain inferences, such as those stemming
from prior convictions, are perhaps rationally related to utilities, but are legally
inadmissible and routinely withheld from the jury. The withholding of certain decision–
relevant information potentially suggests that the law is not approving of a variable
threshold for reasonable doubt.
Stofflemayer and Diamond (2000) argue that there is virtue in a “flexible”
standard because it has the potential to improve decision quality, which they define as
consistency across decisions but not cases. They note, “the American criminal jury
usually completes its work with a verdict on guilt. Jurors do, however, speculate on
sentencing, as they speculate on many consequences of their verdicts. The legal system
operates as if ignoring those speculations or simply admonishing the jury to refrain from
46
considering them will result in a blindfolded jury uninfluenced by forbidden
considerations (p. 782).”
The empirical reality, Stofflemayer and Diamond (2000) claim, is that jurors
bring different expectations and assumptions about the potential consequences of their
verdict (i.e., the severity of the punishment), and that these differences can lead to
undesirable variability in verdicts within a given case. However, to the extent jurors are
accurately aware of the potential consequences, they can tailor a “context-sensitive”
threshold, based on “a case-specific analysis of the disutilities associated with convicting
the innocent and acquitting the guilty (p. 783),” which presumably varies between cases,
not within cases. Stofflemayer and Diamond (2000) advocate telling jurors about the
potential punishment precisely so that jurors can vary their threshold accordingly.
It is unclear how far Stofflemary and Diamond (2000) would be willing to take
this argument. For example, in addition to the potential punishment, ample empirical
evidence suggests that the character (Dane & Wrightsman, 1986), attractiveness (Kassin,
1983) and even race of the defendant (Sommers & Ellsworth, 2001; English & Sales,
2005) or the victim (Kerr, 1978a; Bernard, 1979; Fairchild & Cowan, 1997) influence
jurors’ propensity to convict (see generally Kassin & Wrightsman, 1985; Mazzella &
Feingold, 1994; Brewer & Williams, 2005). Certainly, these considerations are “context-
specific,” and they certainly would figure into the “case-specific analysis of the
disutilities,” but it does not follow that jurors should vary their threshold on the basis of
these considerations. The use of race in particular runs amuck of the deeply treasured
Equal Protection Clause. Stofflemary and Diamond’s (2000) argument fails to indicate
47
which factors other than the potential punishment, if any, should be a legitimate basis for
threshold variability and why.
Though Stofflemary and Diamond’s argument is a response to the empirical
reality that jurors do vary their threshold for different crimes, and, in this instance, it is in
accord with rational decision making, the question remains as to whether such variability
is legally appropriate. Empirical data, a theory of rational decision-making, or an
admixture of the two cannot answer this question. As Tribe (1971a,b) has so fervently
argued, the legal ritual is concerned with more than veridical verdicts; many legal
policies are designed to confer a sense of fairness rather than exact truth. An answer to
the question about the legal propriety of threshold variability must thus come from legal
doctrine and scrupulosity.
In this regard, Lillquist claims that threshold variability is both legally defensible
and socially approved. He offers three basic arguments to buttress this assertion. First, he
argues that society’s differential allocation of resources to solve certain crimes over
others expresses that “certain verdicts [are] more or less valuable in some cases rather
than in other cases (p. 160).” Second, he interprets the court’s refusal to provide any lucid
definition of RD as a tacit approval of threshold variability: “the existing reasonable
doubt standard, with its confusing language, is well equipped to allow [threshold
variability] to happen (p. 162).” Finally, and more generally, threshold variability is
legally appropriate because it “assures that jury decisions will generally mimic the
standard of proof that society would apply in a particular case (p. 194).”
48
Lillquist (2002) specifically argues that the “characteristics about the particular
crime and the characteristics about the alleged offender (p. 159)” are appropriate bases on
which to vary the legal standard of proof, and he expends a considerable amount of effort
arguing that the utilities are not uniform across cases and should therefore vary along
these dimensions. He even notes that rise of certain substantive offenses, such as RICO
and the federal drug kingpin statute, which greatly increase the amount and type of
permissible evidence, signal a willingness to vary the standard in order to achieve a
particular verdict. Essentially, Lillquist (2002) believes that society is willing to attenuate
the requisite level of proof in order to achieve the convictions that it deems especially
important.
At some level, these arguments all inevitably raise the question: whose utilities
are appropriate to use for determining the RD standard of proof? Jurors can appraise the
utilities in the case before them, but they cannot consider the broader societal
implications of their decisions. Nor are they supposed to. Jurors, in the capacity of a
juror, are the “‘finders of fact,’ not arbiters of social utility (Saunders, 2005, p.
8).”Accordingly, some argue the proper utility ratio should be determined by the
legislature and other political bodies, not by judges or jurors (Kaye, 2002; Redmayne,
1999; Saunders, 2005). Indeed, Tribe (1971) argued against the use of decision theory in
the trial process because he claimed the underlying utilities ought to reflect the
preferences of a much broader system, not those of any individual juror.
If the responsibility for determining the underlying utilities belongs to the
legislature and other representative bodies, the legal appropriateness of threshold
49
variability seems dubious. No such institution has endorsed or approved threshold
variability (Risinger, 1998). If the responsibility falls to jurors, the appropriateness of
threshold variability is unclear. For an individual juror, hearing one particular case,
threshold variability is rational and it does, as an empirical matter, occur. But even then,
it seems grounded in considerations that conflict with certain fundamental legal policies,
such as speculation about the potential punishment associated with a conviction. And this
certainly neglects the broader social implications of the particular verdict. At the present,
the legal status of threshold variability is equivocal.
50
Chapter 5: Recasting Reasonable Doubt
The previous chapter described the way in which the threshold for reasonable
doubt does vary based on case characteristics, setting aside the legal question of whether
the threshold should vary between cases. This chapter theorizes about an additional way
in which the threshold varies: as a dynamic function of proffered evidence. This source of
variability is qualitatively different than the previously described variability, and, in order
to avoid any confusion, will be referred to as “threshold shifting.” The central feature of
threshold shifting can be illustrated by the following thought experiment:
Imagine you live several blocks away from your place of employment, and you
regularly walk to work. It is now winter and inclement weather is common.
Because your office space is small, the company you work for has a policy
governing when employees may bring an umbrella to work. The policy states that
umbrellas may be brought only when there is a “significant likelihood” of
precipitation. The policy does not define “significant likelihood” but instead
leaves it for the employees to determine.
What does the significant likelihood criterion mean to you? In other words, at
what likelihood of rain will you carry your umbrella to work? I suspect the sincere
answer is “that it depends.” On what the criterion “depends” is precisely what
differentiates threshold shifting from other types of threshold variability.
51
Decision theory posits that a rational decision depends on the utilities and
disutilities of the potential consequences. Hence, it is rational to vary the threshold
2
as the
consequences change, a phenomenon previously referred to as “threshold variability.”
For example, the threshold corresponding to the “significant likelihood” criterion should
differ if a forecast called for rain and hail, because the consequences of rain and hail are
much different than simply rain. In contrast, threshold shifting implies that determining
what likelihood is “significant” depends on the likelihood estimate itself. Suppose a
perfectly calibrated weather forecast calls for a 60% chance of rain today and that no
other form of precipitation is possible. I believe the decision to carry the umbrella entails
evaluating whether 60% is “significant,” or significant enough to justify carrying the
umbrella. In short, the likelihood estimate is used as the reference point for determining
the threshold. Utilities and disutilities are exogenous to the decision to carry the umbrella.
Threshold shifting, in which the likelihood estimate is a determinant of the
criterion, is incongruent with the economic conception of rationality presupposed by
decision theory. It is axiomatic in decision theory that the threshold and likelihood
estimate are independent (Tversky, 1967). The threshold should not vary as a function of
the likelihood because the consequences of the decision are the same regardless of the
likelihood of their occurrence. For instance, the consequences of getting rained on are the
same whether there is a 1% chance or a 99% chance of rain—if it rains, you get wet all
2
It is suitable to think of a threshold as reflecting an implicit subjective value that is used for making
binary decisions. Any description of this value in probabilistic terms is not meant to suggest that people
actually calculate and compare probabilities when making decisions. In fact, I generally do not believe that
people follow such a procedure. At the same time, people do use some type of threshold or cut-point to
make binary decisions, and I believe this point can (and must) be described in numeric terms.
52
the same.
3
Yet, despite the fact that the actual consequences are the same, it might seem
worse psychologically to get rained on if there was a 1% chance of rain. This implies that
the likelihood estimate does change the perceived consequences of the decision (i.e., the
underlying utilities). Such a change is not normatively defensible.
There is a body of research related to this phenomenon known generally as Biased
Predecision Processing. This research finds that decision makers actively distort their
evaluation of information in order to attain cognitive consistency—a state in which
attitudes, beliefs and cognitions are congruous (Moskowitz, 2005). Along the same lines,
I want to propose that decision makers shift their threshold in order to enhance cognitive
consistency, and that anchoring the threshold on the likelihood estimate is congenial to
this basic psychological phenomenon. Roughly, the idea is that, since there is no apriori
commitment to any particular threshold, decision makers will unknowingly raise or lower
their implicit threshold in order to increase the disparity between the relevant information
and the decision threshold. This process helps to resolve complexity and ambiguity
perceived by the decision maker, and thereby promotes cognitive consistency.
Before this process can be meaningfully translated to the judicial context, an
interlude into the body of research on Biased Predecision Processing is necessary, as this
literature provides the theoretic rationale for threshold distortion.
3
It is important not to read this argument as saying the ‘expected utility’ does not change as a function of
the likelihood of the occurrence. The expected utility, defined as the utility weighted by the likelihood of
occurrence, does change as a function of the likelihood by implication. What is suggested here is that the
utility (not the expected utility) of the outcomes actually changes as a function of the likelihood.
53
Biased Predecision Processing
Biased predecision processing (BPP) occurs when a decision maker restructures
the mental representation of the decision task to favor one alternative prior to making a
decision (Brownstein, 2003; Kunda, 1990). A mental representation refers to the
variables, attributes and alternatives of the decision, and how these components are
represented in the decision maker’s mind (Kellogg, 2003). BPP posits that decision
makers consistently and systematically distort these components in order to garner
support for the preferred outcome (Simon & Holyoak, 2002). The result of such
distortion is a state in which one alternative is strongly preferred while the competing
alternative is denigrated.
It is important to note that research on BPP is relatively new, with the bulk of it
occurring only in the past four decades (Read & Simon, in press). This is largely the
result of Festinger (1957; 1964) who explicitly argued that BPP would not occur because
it required cognitive dissonance. Cognitive dissonance, he argued, occurred exclusively
following decisions; hence there could be no bias prior to the decision. However, the
empirical evidence supporting the phenomenon BPP is now voluminous, and, after
providing an exhaustive review, Brownstein (2003) urged that “it is time to [end] the
debate over whether biased predecision processing occurs…[b]iased processing can
occur within the predecision period (p. 566).” Accordingly, it will be assumed that BPP
does in fact exist.
There are two cognitive mechanisms that lead to BPP: biased information search;
and biased evaluation of alternatives (Brownstein, 2003). Biased information search
54
refers to the tendency to seek evidence that is consistent with a preferred hypothesis and
overlook inconsistent evidence (Baron, 2000). This ubiquitous phenomenon is known
generally as the confirmation bias, and it has been apparent for centuries. Nickerson
(1998) notes that torture was widely used and considered legally appropriate in witchcraft
trials during the 15
th
, 16
th
and 17
th
centuries in order to confirm allegations that the
accused engaged in sorcery. More contemporarily, the phenomenon occurs in politics, for
instance when rationalizing policy, in medicine, for instance when making diagnoses and
prescribing treatment, and even in judicial reasoning (Nickerson, 1998). A study by
Devine and Ostrom (1985) found that after forming an opinion about the verdict, jurors
attended to evidence that supported their preferred verdict and disregarded evidence that
failed to support it (see also Pennington & Hastie, 1993).
The second mechanism of BPP is biased evaluation of alternatives. This aspect is
the primary focus for this dissertation. Theories describing the psychological processes of
biased evaluation are numerous (for a review see Brownstein, 2003). Although only three
will be described here, all BPP theories postulate the same underlying goal: to reach a
state of cognitive consistency (Bond, Carlson, Meloy, Russo, & Tanner, 2007; Simon &
Holyoak, 2002; Moskowitz, 2005). Consistency theories posit that humans are
fundamentally driven to resolve discrepant cognitions, and will engage in elaborate
“mental gymnastics” to do so (Simon, Krawczyk, & Holyoak, 2004). Cognitive
consistency not only resolves psychological discomfort, it can also serve an adaptive
function by providing economic structure to the world, which some have argued is
essential for survival and other behavioral purposes (Back, 1968). The following three
55
theories describe the processes people use to attain cognitive consistency, as well as the
methodologies researchers use to test such theories.
Information Distortion
Information distortion can be defined as “the biased interpretation and evaluation
of new information to support whichever alternative is currently leading during a decision
process (Carlson, & Russo, 2001, p. 91).” The methodology used to test information
distortion is known as “stepwise evolution of preference” (Russo, Carlson, Meloy, &
Yong, 2008). This approach tracks the development of a preference between two
alternatives and examines what effect this preference has on the evaluation of subsequent
information. Briefly, participants are incrementally presented with information pertaining
to a decision task. After each piece of information is presented, participants are asked to
indicate which alternative is currently favored as well as the diagnostic value of the piece
of information. The reported diagnosticity values are then compared to an unbiased
evaluation of the same information, which is determined by a pretest or a control group.
Information is considered biased to the extent it is ascribed a different diagnostic value
than the unbiased value.
Research consistently finds that information is evaluated in a biased manner when
there is a preferred alternative (e.g., Carlson, Meloy, & Russo, 2006; Russo, Medvec, &
Meloy, 1996). There is a positive correlation between the strength of a preference and the
degree to which information is biased. As one alternative becomes strongly preferred,
information supporting that alternative is accorded significantly more weight; while
information supporting the opposite alternative is denigrated. In this way, information is
56
actually being distorted in order to comport with the preferred alternative. The result is a
state in which the information strongly favors the preferred alterative and provides little
to no support for the competing alternative (Russo, Carlson, Meloy, & Yong, 2008).
Information distortion has been demonstrated in numerous domains, including
consumer decisions (Bond et al., 2007; Carlson, Meloy, & Russo, 2006; Russo, Meloy, &
Wilkins, 2000), medical decisions (Levy & Hershey, 2008), risky decisions (Dekay,
Patino-escheverri, & Fishbeck, 2009) incentivized decisions (Meloy, Russo, Miller,
2006), and by a variety of populations, including students (Dekay, Patino-Escheverri, &
Fischbeck, 2009), professionals (Russo et al., 2000), entrepreneurs (Boyle, Hanlon, &
Russo, 2011), sales representatives and public auditors (Wilks, 2002), and even
prospective jurors making legal decisions (Carlson & Russo, 2001).
In one study, Carlson & Russo (2001) presented participants, who were
prospective jurors waiting to be called for jury duty, with a fictional civil case that
contained six pieces of evidence. After hearing each piece of evidence, participants
indicated which side was currently favored to win, how confident they were in that
decision, and they rated the extent to which the given piece of evidence favored one side
over the other. The unbiased value was defined as the mean of this rating (bias is thus
indicated by any non-zero deviation from the mean).
The study replicated the Information Distortion Effect. Evidence that was
consistent with the emerging preferred alternative was deemed highly diagnostic, while
inconsistent evidence was deemed non-diagnostic. For instance, participants leaning
towards imposing liability rated the testimony of the victim’s grandmother to be highly
57
probative, while participants leaning toward not imposing liability deemed the testimony
to be non-probative. As a result, Carlson & Russo (2001) were able to predict the
direction of the distortion simply based on knowing which alternative was preferred.
Additionally, the confidence ratings were positively related to the magnitude of distortion
such that high confidence coincided with large levels of distortion. In comparison to an
undergraduate sample, which completed the exact same study, Carlson & Russo (2001)
note, “[the jury sample] showed twice as much distortion on average, and more
confidence in their tentatively leading verdicts (p. 99).”
Coherence Based Reasoning
Coherence Based Reasoning (CBR) provides a nuanced explanation of the
mechanism underlying Information Distortion. CBR postulates that inferences flow
bidirectionally. That is, information moves in a forward direction to support a particular
alternative, and, at the same time, the supported alternative radiates backward to
influence the evaluation of the information in a (biased) way that is highly supportive
(Holyoak & Simon, 1999; Simon, Snow, & Read, 2004). In other words, information
influences preference for an alternative, but the preferred alternative also influences the
evaluation of information. As a result, information that was once deemed ambiguous and
non-diagnostic is transformed into evidence that provides definitive support for the
preferred alternative, thus enabling an easy, seemingly straightforward choice between
the two alternatives. (Simon & Holyoak, 2002; Simon, Pham, Le, & Holyoak, 2001)
The methodology used to demonstrate bidirectional reasoning is different from
the stepwise evolution approach used by Information Distortion researchers. Particularly
58
different is the manner in which CBR researches assess the biased value of information.
This process is described by way of example.
In order to establish the unbiased diagnosticity of information, participants first
made judgments about a series of isolated vignettes. For instance, one of the vignettes
read as follows:
Wendy works as a computer programmer. One evening, after most of the
employees had left, she was walking by the accounting department. She noticed a
man rushing into the office and leaving a bouquet of flowers on the desk of
Jessica Meyers. The next day Jessica was distraught because there was no note on
the flowers and she was eager to learn who had left them. Wendy told her it was
Dale Brown, a man who works on the ground floor. At Jessica’s behest, Wendy
went to Dale’s office to confirm it was he who left the flowers. Wendy said she
was completely certain that is was Dale. (Simon, Snow, & Read, 2004, p. 818)
Participants answered several questions about this scenario, which probed factual beliefs
(e.g., “Does Wendy’s identification make it likely that it was Dale who left the flowers?”)
and general beliefs (e.g., “In general, when people identify someone whom they’ve seen
once or twice before, identifications are pretty accurate?”).
After completing several distracter tasks, participants were presented with a legal
case. Unbeknownst to participants, the evidence in the case was remarkably similar to the
isolated vignettes. For example, one piece of evidence was described as follows:
On the night of the crime, a technician was called in to repair the photocopying
machine. On his way out of the office, the technician saw a person rushing out of
the bookkeeper’s office. The next day the police asked him to identify Jason
Wells as the person he saw leaving the bookkeeper’s office the night before. He
did and stated he was completely certain that Jason Wells was the person he saw
exiting the bookkeeper’s office the night before (Simon, Snow, & Read, 2004, p.
818-9).
59
In all, there were seven pieces of evidence; four pieces tended to incriminate the
defendant and the rest tended to exculpate the defendant. Participants then answered the
same questions probing factual and background beliefs about the evidence, provided a
verdict, and rated their confidence in that verdict.
When presented in the isolated vignettes, participants were ambivalent about the
evidence and deemed it non-diagnostic. However, when nested within the context of a
legal trial, the evidence was considered highly diagnostic, and its value crucially
depended on the participant’s ultimate verdict. Participants who voted to convict deemed
the incriminating evidence highly probative and the exonerating evidence non-probative.
The same but opposite pattern emerged for participants who voted to acquit. They
deemed the exonerating evidence highly probative and incriminating evidence considered
non-probative. The evidence was transformed to be coherent by providing strong support
for the chosen verdict and weak support for the alternative (Simon, Snow, & Read, 2004;
Simon, Pham, Le, & Holyoak, 2001).
The high diagnosticity of the evidence allowed participants to report extremely
high levels of confidence. Regardless of whether they voted to convict or acquit,
participants were very confident that they had reached the appropriate decision. In
addition, the evidence affected the ratings of the non-case specific background beliefs.
For instance, participants who voted to convict not only believed that the technician’s
identification was highly probative, they also believed that eyewitness identifications are
generally reliable.
60
The results support the contention of CBR that people reason bidirectionally. The
evidence affected the verdict, but, at the same time, the preference for a particular verdict
affected the evaluation of the evidence. Evidence that was considered non-diagnostic in
isolation became skewed in order to align with the preferred verdict. It should be noted
that this effect was obtained even when verdict preference was randomly assigned
(Simon, Snow, & Read, 2004, study 4). Thus, even when participants did not select the
verdict, they still engaged in bidirectional reasoning, which skewed their evaluation of
the evidence to support the assigned verdict.
Differentiation and Consolidation Theory
According to Svenson’s (1992) Differentiation and Consolidation Theory, the
decision-making process involves actively “spreading apart” the various decision
alternatives. This spreading can take place both pre- and post-decision. Before the
decision, decision makers seek to increase or differentiate the perceived differences
between the alternatives. Differentiation is achieved by changing the structure of the
decision task (known as “structural differentiation”), which involves altering the
attractiveness and importance of attributes in order to enhance the desirability of the
preferred alternative.
Svenson claims that a decision criterion is used to evaluate whether and when the
competing alternatives are sufficiently differentiated. Alternatives are considered
sufficiently differentiated only when the amount of differentiation exceeds this criterion.
Once this occurs, decision makers are in a position to make and defend their decision,
and consolidation of the preferred alternative’s advantages over the rejected alternative
61
occurs. Svenson (1992) notes that if a sufficient level of differentiation cannot be
achieved, “the decision-maker may…change the criterion level (p. 155-6).” In other
words, decision-makers may adjust their required level of differentiation in order to
produce the desired outcome. While plausible, Svenson provided no empirical evidence
for this claim.
Phillips (2002) conducted the only empirical test of whether differentiation
involves both reevaluation of the information as well as a modification to the criterion. In
that study, participants were asked to assume the role of an accountant who was auditing
a company to determine whether it would remain financially viable or fail within the next
year. Accountants are required to report if there is a “significant doubt” about the
company’s financial prosperity. After reading about the company’s financial viability,
participants indicated what level of doubt corresponds to the “significant doubt” criterion
(i.e., how much doubt is “significant”). They also rated how likely the company was to
fail within the next year, and made a binary decision whether or not to report this to
management. The experimental manipulation was whether this decision was made before
or after the ratings.
For participants who made the ratings before the decision, there was no difference
in either the likelihood of failure rating or significant likelihood criterion rating between
participants who would report and those who would not. However, there was a difference
for participants who made the ratings after the decision. Those who voted to report
deemed the company very likely to fail, and those who voted not to report believed the
company’s failure was unlikely. Thus, participants aligned their interpretation of the
62
evidence (i.e., the likelihood of the company’s failure) to be consistent with their
decision.
With respect to the “significant doubt” criterion ratings (i.e., how much doubt is
“significant”), a similar but converse pattern of results was observed. Those who voted to
report indicated that the threshold was low (and hence the failure-likelihood rating
exceeded the threshold) and those who voted not to report indicated that the threshold
was high (and hence the failure-likelihood rating did not reach the threshold for
reporting). In other words, reporters lowered their criterion to justify reporting and non-
reporters raised their criterion to justify not reporting.
A second study was conducted to specifically test whether this distortion in the
criterion occurred before or after the decision. The primary difference from the previous
study was that participants made an interim judgment about their tentative decision to
report and the ratings. After some additional time, participants then made a final
judgment and provided the ratings once again.
At the interim, there were no differences in the criterion ratings between
participants who eventually decided to report and participants who decided not to report;
both groups gave the same rating for “significant doubt.” When the final decision was
made, however, participants who voted to report indicated a criterion value that was low,
while those who voted not to report indicated a value that was high. Because this
difference was observed after the final decision was made but not at the interim, Phillips
(2002) asserted that “[criterion] distortion occurs postdecisionally and information
63
distortion occurs predecisionally (p. 780),” and therefore that the two forms of distortion
do “not occur concurrently (p. 782).”
The claim that distortion of the criterion occurs post hoc should be accepted with
great caution. Phillips (2002) showed only that the “decision criterion definitions” (p.
775; emphasis added) change after a decision is made. This result is completely
unsurprising because it is natural to provide a criterion definition that comports with the
decision. However, this result says nothing about the actual criterion itself; that is, the
actual criterion used to make the decision. The pertinent question is not whether the
criterion definition changed after the decision, but whether the criterion itself changed to
facilitate the decision. The latter is what Svenson speculated about. Phillips’ findings
cannot speak to whether a change in the criterion itself influenced participants’ decisions,
nor can the results sustain the claim that both forms of distortion do not interact.
Reasonable Doubt as a Relative Construct
I can’t define it, but I know it when I see it.
—Justice Potter Stewart
The thought experiment about carrying the umbrella suggested that the primary
determinant of the “significant likelihood” criterion is the likelihood estimate itself. The
decision to carry the umbrella entailed evaluating whether likelihood estimate is
significant, rather than whether the estimate exceeds an apriori threshold and is hence
determined “significant.” Translating this process to the judicial context, jurors use the
evidence as a focal point to determine whether reasonable doubt exists. Jurors do not
64
come to trial with any preconceived notion of what amount of doubt is reasonable.
Similar to Justice Stewart’s elucidation of obscenity, jurors do not define reasonable
doubt apriori, they know it when they see it. In other words, jurors listen to the evidence
and then determine whether the remaining doubt is reasonable or not.
Jurors are not, however, objective evaluators of evidence. As suggested by the
research on BPP, jurors consistently and systematically distort the value of evidence to
support their preferred verdict. Evidence that was once ambiguous will come to be seen
as either strongly probative or useless depending on how it relates to the preferred
verdict. Since the evidence is determinative of the threshold, jurors are likely to have a
skewed focal point for determining the amount of doubt that is reasonable. Hence, the
level of doubt deemed reasonable might be dramatically different than the amount that is
determined apriori or in isolation.
In addition to a skewed focal point, it is possible that a dynamic similar to
information distortion is at play with respect to the threshold itself. In crude terms, no
juror would ever say (or believe), “well, the evidence just barely surpassed my threshold
for reasonable doubt, so I convicted.” BPP postulates that jurors expend mental effort in
order to not see cases as “close calls.” Close calls are the antithesis of cognitive
consistency because by definition they are indecisive. Thus, the evidence and the
threshold should not be close to one another—it should not be a close call. In the same
way jurors distort evidence to support a preferred verdict, jurors might also shift their
threshold in order to avoid close calls.
65
This hypothesis suggests that the threshold and evidence would move in opposite
directions like magnets of the same charge, constantly repelling from one another. For
example, if a piece of evidence increased the likelihood of guilt, it would correspondingly
decrease the threshold for conviction. Conversely, if a piece of evidence decreased the
likelihood of guilt, it would correspondingly increase the threshold for conviction. This
process minimizes the possibility of close calls and it is consistent with the thought
experiment demonstrating that the evidence and threshold are interrelated.
Threshold shifting might also have a more elegant, albeit non-normative,
economic explanation. The shifting pattern could be the result of the evidence influencing
the utilities. Evidence is almost always value laden in the sense that it contains
information relevant to the utilities. For example, learning that the defendant operated a
motor vehicle without a driver license might make it more likely that he committed the
alleged crime (because he has not completed the requisite driver’s training or his driving
privileges were suspended because of malfeasance, etc.), but it might also lower the
disutility of convicting him should he happen to be innocent. After all, he does,
apparently, violate the law.
There is some empirical evidence that decision makers distort utilities in a manner
that is consistent with information distortion. For example, Dekay, Patino-Echeverri, and
Fishbeck (2009a) presented participants with a scenario that required a decision in
response to a dam failure warning. Participants overwhelmingly preferred costly false
positives (e.g., evacuation) to true negatives. Through written explanations it was
revealed that this preference was based in large part on a desire to evacuate. The
66
preference for false positives decreased considerably when participants were not required
to make a decision. As noted by Dekay and colleagues (2009b), “false positives were
viewed as better than true negatives because only the former outcome could follow from
the preferred course of action (e.g., evacuation) (p. 80).” In other words, the preference
for false positives was only so that the decision would be to evacuate; when participants
were not asked to make a decision, there was no preference for false positives.
It should be noted, however, that Dekay and colleagues did not investigate
whether the decision threshold is changed as a result of utility distortion. They only found
that the reported utilities are distorted. Purported utilities might not perfectly resemble the
de facto decision utilities. Hence the findings of Dekay and colleagues cannot definitely
support or refute the threshold shifting hypothesis, but the findings are not inconsistent
with the hypothesis.
In sum, seen through the lens of threshold shifting, reasonable doubt is relative
construct that is context dependent. Jurors know it when they see it, implying that
verdicts depend on the evidence, rather than directly depending on an exogenous tradeoff.
Moreover, the threshold for reasonable doubt constantly ebbs and flows as a function of
the evidence. This dynamic presumably occurs in order to promote cognitive consistency,
and it is qualitatively different than the variability that is currently the source of debate
amongst legal academicians. Importantly, threshold shifting is not compatible with the
decision theoretic account of the rational juror. In addition to recasting the concept of
reasonable doubt, threshold shifting also complicates legal doctrine that requires jurists to
speculate about the impact of evidence on verdicts. Evidence serves a dual function in
67
that it not only affects jurors’ beliefs, but it also affects their propensity to convict or
acquit. Failing to account for both functions could lead outside observers to misjudge the
decision making of jurors. At this point, of course, threshold shifting is only a theory.
Whether or not it occurs in actuality is an empirical question taken up in the next chapter.
68
Chapter 6: Empirical Studies of Threshold Shifting
Study 1: Conventional Evidence
Participants
Participants were recruited to partake in a human information task (HIT) through
Amazon Mechanical Turk (see Mason & Suri, 2010). Basically, Turk provides a platform
through which “requesters” can post HITs that require some type of human judgment that
“workers” can complete. HITs commonly include surveys, questionnaires, and market-
research questions about products and websites. Workers can sort HITs based on the
estimated completion time and the amount of compensation. Requesters can stipulate
required qualifications of workers (e.g., at least 18-years-old) for a particular HIT.
Workers select the HITs they desire to complete, then, pending verification of the work,
requesters credit the worker’s Amazon account. Although a bit lower in socio-economic
status, it has been shown that domestic workers are fairly representative of the national
population and behave in a manner consistent with other common subject pools, such as
community samples and university undergraduates (Paolacci, Chandler, & Iperirotis,
2010).
To be eligible to participate in the present studies, participants must have met two
requirements: they must have been at least 18 years old; and they must have been jury
eligible within the United States (i.e., no felony convictions; registered voter). Although
it was not possible to verify that participants satisfied each requirement, certain
69
procedures were used to ensure that each participant was within the United States. First,
the information page of the HIT clearly stated this requirement, and indicated that non-
United States citizens would be denied compensation, regardless of whether they
successfully completed the task. Second, the IP address of each participant who
successfully completed the HIT was verified using the website whatsmyipaddress.com.
An IP address is a unique identifying number that can be used to track the general
(network) location of the computer. Less than 5% (n = 18) of the total participants from
all three studies possessed an IP address that was not within the United States; these
participants were excluded from all analyses reported herein and will not be discussed
further.
Further procedures were used to ensure the quality of the data. A common
concern with the online platform is the fear that participants rapidly “click through” the
study without reading the material. Two procedures were used to countervail this
possibility. First, each webpage had a built in timer that would not allow participants to
advance the webpage until a certain amount of time had elapsed (e.g., 10 seconds).
Second, at two different points throughout the study participants were asked to select a
particular answer. For example, one question read, “To ensure that the survey is working
properly, please select ‘strongly agree.’” Participants who did not select the appropriate
response were immediately eliminated from the study and their responses were removed
from all analyses reported herein. These individuals constituted less than 5% of the
overall total sample.
70
In addition to these procedures, a reading comprehension question appeared at the
end of the study to ensure that participants paid attention to the materials. This question
probed a basic fact from the vignette. Consistent with current practice (see Oppenheimer,
Meyvis, & Davidenko, 2009), the responses of participants who failed the reading
comprehension question (about 17% of the overall total) were excluded from all analyses
reported herein. Furthermore, no individual with the same IP address or Amazon worker
account was able to participate in more than one study, regardless of whether they were
previously removed for failing a reading comprehension question.
Materials and Procedure
The vignette used in this study portrayed a felony criminal trial in which the
defendant was accused of forcible rape. Participants read a summary of the case that was
prepared by a court reporter, and were told that the summary was objective and contained
all of the evidence in the case. The case itself was adapted from an actual rape trial that
took place in California. Briefly, the case involved the abduction and rape of a 16-year-
old girl from a public area in Los Angeles, California. According to the victim, the
perpetrator drove a Ford pickup truck, was in his late 20s, had a thick mustache, and
spoke a mixture of English and Spanish. Detectives canvassed businesses in the vicinity
of where the abduction took place and interviewed all employees who met the broad
description of the perpetrator. One employee seemed visibly shaken during the interview,
thence suspicion focused exclusively on him. (This is considered evidence # 1.)
71
At trial, the prosecutor presented four pieces of incriminating (hereinafter
“inculpatory”) evidence: (2.) The defendant’s boss testified that on the night the rape took
place, the defendant did not show up for work. This was corroborated by a time-punch
card that employees use to clock into work. (3.) Several of the defendant’s co-workers
testified that he regularly wore a mustache. (4.) A neighbor testified that he recalled
seeing the defendant driving a Ford truck for a short period of time, which was odd since
he knew that the defendant did not posses a driver license. (5.) The victim testified that
she was “absolutely, 100% sure that it was the defendant who raped [her],” though she
admitted that she had initially identified someone else.
The defense introduced three pieces of exonerating (hereinafter “exculpatory”)
evidence: (6.) The defendant’s sister testified that on the night the rape took place, the
defendant was suffering from food poisoning and was at her house. She further claimed
the defendant never left her house that entire week. (7.) An expert on eyewitness
identification testified that nervousness and anxiety as well as cross-racial identification
both increase the likelihood of a mistaken identification. He noted that both factors were
present in the current case, and opined that the victim could have mistakenly identified
the defendant as the perpetrator. (8.) The defendant’s brother-in-law testified that he
occasionally let the defendant borrow his Ford pickup truck. The brother-in-law
speculated that the neighbor had probably seen the defendant driving his truck. Appendix
[A] contains the actual materials presented to participants.
After receiving each piece of evidence participants were asked the following three
questions: 1. “At this point, based on all the evidence you have heard, what is the
72
numerical likelihood that the defendant committed the rape in question?” which was rated
on a 0-100 point scale with higher values indicating a higher likelihood of having
committed the alleged rape. 2. “At this point, based on all the evidence you have heard,
would you convict the defendant of rape?” which was a binary yes/no decision. And 3.
“How confident are you in this decision?” which was rated on a 1-7 point likert scale,
with higher values indicating higher confidence. These questions were posed on the
same webpage as the piece of evidence and always in this specific order. After the last
piece of evidence was presented, the questions were preceded with “you have now heard
all of the evidence in the case.” Study 1 elicited a total of eight sets of judgments (one
after suspicion focused on the defendant and one after each of the seven pieces of
evidence).
Analytic Methods
Almost all of the research in this domain has utilized some type of direct
elicitation method, such as asking, “what numeric level of certainty does reasonable
doubt require?” The obvious problem with this approach is the potential discordance
between the reported value and the de facto decision threshold. To overcome this
limitation, an indirect method was utilized to infer the de facto value corresponding to the
decision threshold. This approach does not require any introspection on the participant’s
behalf and is therefore a potentially more accurate reflection of the threshold actually
used.
The approach uses logistic regression to assess how willing a participant is to
convict based on her subjective feeling about the evidence. There are three components
73
associated with this method: the binary verdict (acquit/convict); a subjective feeling
about the evidence (likelihood of guilt rating); and the willingness to convict. The first
two components are directly elicited while the latter –willingness to convict—is not. The
regression is used to infer the willingness to convict given a particular subjective feeling
about the evidence. In other words, given a participant’s subjective feeling about the
evidence, how likely is she to convict?
Willingness to convict is of course a continuous variable. However, the threshold
used to make a binary decision is inherently a single point estimate on this continuum. As
defined here, the implicit decision threshold is the point where a participant is more
willing than not to convict. Hence, the relevant question for our purposes is not how
willing she is to convict given her subjective feeling about the evidence, but rather, at
what subjective feeling level is a conviction more likely than not? This process is now
formally described.
The implicit threshold can be described as a threshold ( ), which, if the
subjective feeling of guilt (s) exceeds, will result in a conviction (c). A conviction implies
that s > , and that is sufficiently large such that a conviction c indicates that s leaves
no reasonable doubt in the juror’s mind. The willingness to convict is the conditional
probability of a vote to convict (c = 1) given the subjective feeling of guilt (s), or p(c = 1 |
s). This conditional probability states the likelihood that a juror would convict conditional
on her subjective feeling of guilt.
This conditional probability can be estimated from a logistic regression. Logistic
regression is appropriate because the relation between c and s is not linear. Rather, the
74
relation tends to follow the logistic curve where the portion of convictions is quite small
when s is close to zero and quite large when s is close to one. The actual shape of the
logistic curve is based on empirical data; specifically, on the distribution between
subjective feelings of guilt and verdicts. The curve is derived from conducting a logistic
regression, which takes on the following form:
ln( ) = α + βs 6.1
According to equation, 0 ≤ s ≤ 1 and 0 < p < 1 (since p is undefined at 0 or 1), and the
natural logarithm of the odds of voting to convict is linear in s. The logistic regression
provides a maximum likelihood estimate of α and β.
The log-odds reflect the willingness to convict – as a continuous variable. Recall,
however, that we are interested in a point estimate, specifically the point where a
participant is more willing than not to convict. When p = 0.5, the log-odds are zero,
indicating that a conviction is equally likely as an acquittal. Hence p > 0.5 implies that
the participant is more willing to convict. The relevant query is determining what
subjective feeling of guilt (s) corresponds to this level of willingness. This can be
determined by substituting p = 0.5 into the previous equation, which reduces the log-odds
to zero, and solving for s:
= - 6.2
75
The implicit decision threshold indicates the subjective feeling of guilt at which a
conviction is more likely than not.
Although this approach is an improvement over direct elicitation methods, the
implicit decision threshold is a single point estimate with no measure of variability. A
measure of variance is necessary in order to test for statistical differences. Bootstrapping
(random sampling with replacement) was used to estimate the variability of the derived
threshold . The process involved bootstrapping 150 new samples from the original
dataset, and conducting the above-described logistic regression and derivation on each
sample. There was thus 150 implicit decision thresholds ( )—one for each of the
bootstrapped samples. The standard error from this distribution was then used as the
measure of variability for .
Results
One hundred-fifty-six participants correctly responded to the memory check
questions and were included in this study. The age range was 19-60, with a mean of 34
(SD = 10.6) and median 30 (IQR = 16). Male participants comprised 53% (n = 82) of the
sample. A majority (47%, n = 73) of the participants identified themselves as politically
liberal, 30% (n = 47) were politically moderate and 23% (n = 36) were politically
conservative.
At the conclusion of the trial, 20% (n = 31) of participants voted to convict the
defendant. The descriptive statistics for the percentage of participants convicting, the
76
likelihood of guilt ratings and verdict confidence over the course of the trial are contained
in Table 1.
Table 1. Descriptive Statistics for Each Piece of Evidence
Piece of Evidence % Convicting Likelihood Rating Verdict Confidence
1 12 38.5 (4.13) 5.91 (0.28)
2 14 44.98 (4.73) 5.84 (0.27)
3 13 47.86 (1.5) 5.91 (0.25)
4 17 60.16 (2.77) 5.69 (0.30)
5 31 64.56 (2.81) 5.58 (0.32)
6 25 62.94 (2.64) 5.31 (0.35)
7 21 57.3 (2.36) 5.44 (0.30)
8 20 52.82 (2.85) 5.47 (0.30)
Note: * = p < .10; ** = p < .05; *** = p < .001; parentheses
indicate 2 standard errors.
Figure 1 contains the mean likelihood estimates and implicit thresholds for
each piece of evidence over the course of the trial. Note that the fit indices of the logistic
regression used to calculate the implicit thresholds can be found in Appendix [B]. Further
note that age and gender did not improve the fit of the regression and will not be
discussed further. The overall low conviction rate can be inferred from the fact that the
mean likelihood estimates never exceed the implicit thresholds.
77
Figure 1. Mean Likelihood Estimates (+/- 2 S.E.) and Implicit Thresholds (+/- 2 S.E.)
Over the Course of Trial
As predicted, the mean likelihood estimates and implicit thresholds move
concomitantly in opposite directions (r = -.968, p < .001). The mean likelihood estimates
increase when the evidence is inculpatory (pieces 1-5) while the thresholds decrease.
Conversely, the likelihood estimates drop when the evidence is exculpatory (pieces 6-8)
and the thresholds increase. A repeated measures ANOVA found that the likelihood
ratings did significantly change over the course of the trial F(7, 156) = 41.20, p < .001,
2
= .395.
This pattern of shifting the threshold was hypothesized to occur in order to
promote cognitive consistency. One manifestation of cognitive consistency is high
decisional confidence. In general, participants reported extremely high levels of
confidence in their eventual verdict, with an overall mean of 5.52 (S.D. = 1.01) and
median of 6 (on a 1-7 scale with higher values indicate greater confidence). A repeated
89.46
86.21
79.49
75.27
71.2
71.87
73.4
78.76
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8
Piece of Evidence
implicit threshold
likelihood of guilt
78
measures ANOVA indicated that verdict confidence did significantly change over the
course of the trial F(7, 156) = 15.09, p < .001,
2
= .193.
If shifting the threshold promotes confidence by avoiding “close calls,” there
should be a positive relation between verdict confidence and the degree of separation
between the implicit thresholds and the mean likelihood estimates. The degree of
separation between the threshold and the likelihood estimate (herein after “threshold
disparity” or TD), was calculated by taking the absolute difference of the likelihood
rating (L) and the implicit threshold (s
t
) for each piece of evidence. Formally,
6.1
The subscript i refers to the participant and j refers to the particular piece of evidence
(e.g., piece 1 or 2, etc.).
Figure 2 is a plot of the relation between verdict confidence and threshold
disparity over the course of the trial.
79
Figure 2. Threshold Disparity and Mean Verdict Confidence over the Course of Trial
Notice that the highest confidence occurred when threshold disparity (i.e., the
disparity between the implicit threshold and the likelihood rating) is the greatest.
Confidence tended to decrease as threshold disparity decreased at the middle of trial, and
confidence increased as the disparity increased towards the conclusion of trial. The
correlation of verdict confidence and threshold disparity was r = .770 (p < .05). This
pattern is consistent with the hypothesis that threshold shifting occurs to increase verdict
confidence.
This correlation between threshold disparity and verdict confidence could be
potentially misleading, since verdict confidence was correlated with the likelihood ratings
r = -.136 (p < .001). In order to partition the unique variance accounted for by verdict
confidence, an OLS regression was run with the likelihood ratings and verdict confidence
as the independent variables and threshold disparity as the dependent variable for each
piece of evidence. The results are contained in Table 2.
5
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
6
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8
Piece of Evidence
Threshold Disparity
Verdict Confidence
80
Table 2. Unstandardized Regression Coefficients for Verdict Confidence and Likelihood
Ratings in Predicting Threshold Disparity
Verdict Confidence Likelihood Rating
Piece of Evidence B S.E. B S.E.
1 .852** 0.361
-.89*** 0.017
2 1.21** 0.406
-.89*** 0.018
3 2.32** 0.848
-.75*** 0.037
4 1.71* 0.944
-.57*** 0.055
5 1.92* 0.976
-.12 0.064
6 2.49** 0.864
-.41*** 0.058
7 3.69*** 0.854
-.55*** 0.049
8 2.28** 0.735 -.74*** 0.039
Note: * = p < .10; ** = p < .05; *** = p < .001
The regression parameters indicate that verdict confidence is related to threshold
disparity independent of any relation between verdict confidence and the likelihood
rating. With respect to the third piece of evidence, for instance, the findings indicate that
for every one-unit increase in verdict confidence, threshold disparity would increase by
2.32 units. In general, the results indicate that verdict confidence is positively related to
threshold disparity. Note that the correlations between the likelihood ratings, verdict
confidence, and threshold disparity for each individual piece of evidence can be found in
Appendix [C].
It is noteworthy that this analysis was conducted at the individual level, as
opposed to the previous analyses which examined the change in the means over time.
Analyzing the data at the individual level provides a more nuanced examination of the
relation between threshold disparity and the likelihood estimates. Even at this level, the
81
findings support the casual hypothesis that threshold distortion occurs to augment verdict
confidence.
Discussion
Previous research on cognitive consistency suggested that the evidence (or
information) associated with a decision task is distorted in order to produce a skewed
representation of the possible choices (Simon & Holyoak, 2002; Russo et al., 1996). Such
distortion occurs until the evidence strongly supports the preferred alternative and
strongly rejects the competing alternative, thereby allowing the decision to be made
easily and confidently. The primary manifestation of cognitive consistency is high
decision confidence. A natural extension of this process is what has been dubbed
“threshold shifting.” The basic idea of threshold shifting is that, in the same way the mind
shuns ambiguity by distorting information, it also shuns ambiguity by increasing the
perceived disparity between the threshold and the evidence. It avoids “close calls” by
distorting the distance between the evidence and the threshold, which would manifest
high decisional confidence.
The present results support the hypothesized phenomenon of threshold shifting.
Not only do the implicit thresholds shift, but also they shift in a systematic way: as an
inverse function of the evidence. When the evidence inculpates, the threshold is lowered,
and when the evidence exculpates, the threshold is raised. This pattern was observed
consistently throughout the study. In addition, the magnitude of the shift was positively
related to verdict confidence, where the greater the disparity in the likelihood rating and
the threshold, the greater the confidence in the verdict. This finding is consistent with the
82
basic premise of cognitive consistency theories, which is that the fundamental objective
of human decision making is a desire to eschew ambiguity and complexity (Russo et al.,
2008), and it suggests that shifting the decision threshold is supplemental means to this
end.
The current results are somewhat inconsistent with Phillips’ (2002) conclusion
that threshold distortion exclusively occurs post hoc and does not interact with the
evidence. By contrast, this study found that threshold shifts throughout the decision-
making process and directly depends on the evidence. It should be noted that threshold
shifting is slightly different than the phenomenon Phillips tested, threshold distortion,
which postulates that shifts in the threshold are unidirectional and dependent on
participants’ final decision. According to threshold distortion, a person who ultimately
voted to convict would only “lower” the threshold throughout the decision process.
Unlike threshold distortion, however, the present study conditioned on the
evidence, not the final verdict, when testing for threshold shifts. This approach was
motivated by the fact that conditioning on the final verdict ignores the reality that a
nontrivial proportion of participants switch their verdicts throughout the decision process.
Such “switchers,” which have constituted anywhere from 13-20% of other samples
(Phillips, 2002; Holyoak & Simon, 1999), must be removed from the analysis when
conditioning on the final verdict, a decision that is hard to justify.
One possible explanation for this inconsistency is that the methodology Phillips
employed was extremely limited, relying on self-reports about the “criterion definition.”
This approach is potentially problematic because self-reports may not be well calibrated
83
(Nisbet & Wilson, 1977), and the criterion definition may not resemble the criterion
itself. By using an approach that derives the implicit (de facto) threshold, without having
to rely on self-reports, the methodology used in the present study overcomes both of
these limitations. Hence, the inconsistency in the findings can potentially be explained by
the different methodological approaches.
A similar criticism applies to much of the previous research on reasonable doubt.
Though voluminous, that literature typically uses self-report, direct elicitation methods to
estimate the threshold corresponding to reasonable doubt, such as asking participants,
“what probability corresponds to reasonable doubt,” or “what is the disutility associated
with the possible errors?” Both approaches are highly susceptible to social desirability
effects, in which participants provide a response they believe conforms to social
conventions (Nagel, 1979; Underwood, 1977). The latter approach, which uses disutility
estimates to impute the threshold, notably yields radically lower values corresponding to
reasonable doubt (Nagel, Lamb, & Neef, 1981; Hastie, 1993). After conducting an
empirical comparison of both approaches, Dane (1985) concluded that his results “cannot
be used to decide which approach is the ‘best’ or most accurate (p. 156).” However,
arguably neither of the approaches is appropriate because they both essentially focus on
the ‘definition’ of reasonable doubt. The more germane query concerns the de facto
decision threshold, not its purported definition.
The present findings also suggest that previous research on reasonable doubt is
largely inchoate because it attempted to elicit the threshold (definition) values in
isolation, independent of any evidentiary context. According to the present results, the
84
threshold for reasonable doubt directly depends on the evidence and it varies as a
function of the evidence. Without any evidence in which to contextualize the threshold, it
is unclear what the values from previous research actually signify. At most, they must
reflect a theoretical value, but they probably bear very little resemblance to the de facto
threshold. The current methodology overcomes this limitation and therefore provides a
much more realistic estimate of the values associated with reasonable doubt.
As noted in Chapter 2, there is no consensus on the proper Blackstone-type ratio,
and so it is not possible to explicate a specific quantitative value that reflects the
normative threshold for reasonable doubt. However, most commentators and courts
assume that reasonable doubt requires a high degree of certainty, typically well over 0.90
(Franklin, 2006; Newman, 2007). The implicit thresholds observed in the present study
were below 0.90, usually in the mid to upper 0.70 range. Although no pronouncements
will be made here about the compatibility of these thresholds with normative
expectations, it is noteworthy that the thresholds fall in between the values observed in
previous research. Specifically, the approach that directly asks for the threshold value
typically yields values above 0.90, while the disutility-imputation approach finds that the
value is closer to 0.50 (Hastie, 1993). The present findings thus temperate the extreme
values that have been previously reported in the literature, and perhaps can be used to
rejuvenate discussion about normative expectations.
85
Study 2: Statistical Evidence
Introduction
The primary purpose of the present study is to see if the threshold shifting effect
would replicate. The study is also concerned with extending the effect to a new domain of
evidence; namely, statistical evidence, sometimes called naked statistical evidence (Kaye,
1980). Naked statistical evidence describes the frequency with which some event or
characteristic is present in a given reference class (Koehler, 2002). Examples include the
number of registered republicans in a given state, or the number of handgun owners in
some specified territory, etc.
Although less acknowledged, the random match probability associated with DNA
evidence is also of this type (Thompson, 1989). When a DNA matched is declared, it is
typically accompanied by an estimate of the chances that such a match would occur at
random in the population, hence the locution “random match probability” or RMP. RMPs
are usually quite small, sometimes on the order of 1 in a billion, trillion or quintillion
(Kaye, 2009). Because the chance of the match occurring at random is exceedingly
unlikely, DNA is powerful, if not the most powerful form over evidence that is available.
But do jurors recognize the power of this evidence?
Empirical research generally finds that jurors underappreciate the probative value
of RMPs (Kaye & Koehler, 1991). For instance, a RMP of 1 in one million is equivalent
to a likelihood ratio of one million. But, when presented with this evidence, mock jurors
update their prior by some amount less than one million, indicating that they undervalue
the probative value in comparison to Bayesian norms (Faigman & Baglioni, 1988).
86
Perhaps even more pernicious are the forms of fallacious reasoning that jurors manifest
when presented with RMPs. One example is the well-known Prosecutor’s Fallacy, which
occurs when jurors assume that the RMP provides an estimate of the probability that the
defendant is innocent (thus one minus this value is the probability he is guilty)
(Thompson & Schumann, 1987).
In light of the research on jurors’ imperfect consumption of RMPs, this study is
interested in what possible effect RMPs might have on threshold shifting. The primary
hypothesis is that threshold shifting does not occur in the case of statistical evidence. This
hypothesis is motivated by the possibility that the undervaluation of RMPs is partially the
result of a lack of threshold shifting. Evidence is most powerful when it both changes the
likelihood estimates and it shifts the threshold in the opposite direction. Perhaps RMPs
are undervalued because they do not cause the threshold to shift.
Materials and Procedure
The vignette utilized in this study was identical to the previous study but differed
in only one respect: there was a DNA match that implicated the defendant. This match
arose after the detectives requested a genetic sample from the defendant during the initial
interview. The defendant complied and provided a blood sample. Participants
subsequently learned that the defendant’s genetic profile is highly similar to the sample
recovered from the victim. Indeed, participants were told, “The lab technician estimated
that, based on population frequencies, such a match would occur at random in about one
in 200 million Hispanics.” The technician further stated that his “laboratory had no
87
instances of committing ‘lab error’” and that he considered this RMP to be a “valid
estimate.”
The rest of the materials were exactly the same as the first study. There were thus
nine pieces of evidence in this study. The procedure was also the same as the first study,
where participants answered three questions—probing the likelihood of guilt, providing a
verdict, and indicating the confidence in that decision–after the introduction of each piece
of evidence. The analytic method was also the same.
Results
One hundred-thirteen participants responded correctly to the reading
comprehension question and were included in this study. The age range was 19-68, with a
mean of 35 (SD = 12.2) and median 32 (IQR = 20). Female participants comprised 59%
(n = 67) of the sample. A majority (44%, n = 50) of the participants identified themselves
as politically liberal, 26% (n = 29) were politically moderate and 30% (n = 34) were
politically conservative.
At the conclusion of the trial, 72% (n = 82) of participants voted to convict the
defendant. Table 3 contains descriptive statistics of the percentage of participants
convicting, the likelihood ratings and verdict confidence over the course of the trial.
88
Table 3. Descriptive Statistics for Each Piece of Evidence
Piece of Evidence % Convicting Likelihood Rating Verdict Confidence
1 8 27.45 (4.60) 5.59 (0.29)
2 78 84.95 (3.52) 5.70 (0.24)
3 81 87.77 (3.20) 5.88 (0.24)
4 79 88.29 (3.16) 5.93 (0.25)
5 82 90.38 (2.94) 6.09 (0.22)
6 82 90.37 (2.96) 6.02 (0.22)
7 77 84.49 (3.82) 5.88 (0.22)
8 76 82.93 (3.52) 5.90 (0.23)
9 72 80.15 (3.85) 5.79 (0.13)
Note: * = p < .10; ** = p < .05; *** = p < .001;
parentheses indicate +/- 2 S.E.
Figure 3 contains the mean likelihood estimates and implicit thresholds for
each piece of evidence throughout the trial. Note that the fit indices of the logistic
regression used to calculate the implicit thresholds can be found in Appendix [D]. Further
note that age and gender did not improve the fit of the regression and will not be
discussed further. The elevated conviction rate can be inferred from the fact that nearly
all the likelihood ratings exceeded the threshold. A repeated measures ANOVA found
that the mean likelihood ratings did significantly change over the course of the trial F(8,
113) = 174.42, p < .001,
2
= .609.
89
Figure 3. Mean Likelihood Estimates (+/- 2 S.E.) and Implicit Thresholds (+/- 2 S.E.)
Over the Course of Trial
Consistent with the first study, the thresholds and likelihood ratings moved in
opposite directions (r = -.864, p < .01). Participants considered the DNA evidence (piece
# 2) highly incriminating, adjusting their likelihood estimate from about 27 to 85
following its introduction. This upward adjustment coincided with a downward
adjustment of the threshold from 84 prior to introduction of the DNA to 71 following the
evidence.
Verdict confidence levels were extremely high with an overall mean of 5.86 (S.D.
= 1.29) and median of 6 (on a 1-7 scale where higher values indicate greater confidence).
A repeated measures ANOVA indicated that verdict confidence did significantly change
over the course of the trial F(8, 113) = 10.16, p < .01,
2
= .083. Figure 4 plots the
relation between verdict confidence and threshold disparity (i.e., the absolute difference
between the implicit threshold and the likelihood rating) over the course of the trial.
83.5
70.6 69.5
66.3
63.7 59.8 60.0
64.4 66.3
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9
Piece of Evidence
implicit threshold
likelihood of guilt
90
Figure 4. Threshold Disparity and Mean Verdict Confidence over the Course of Trial
Setting aside the first piece of evidence, the displayed pattern is consistent with
the hypothesis that threshold shifting increases confidence, since confidence was the
highest when threshold disparity was the greatest (r = .801, p < .05).
4
The relation
between verdict confidence and threshold disparity could be misleading, however, since
the likelihood estimates are also positively correlated with verdict confidence (r = .753, p
< .05). Notice that verdict confidence is highest for piece of evidence # 5 and piece of
evidence #6; these pieces of evidence also yielded the highest likelihood ratings (see
Table 3.).
In order to partition the unique variance accounted for by verdict confidence, a
regression was run with the likelihood ratings and verdict confidence as the independent
4
Including the first piece of evidence non-significantly reversed the relation between threshold disparity
and verdict confidence (r = -.442, p = .23).This reversal is the result of the anomalously disparity in relation
to the verdict confidence.
5.3
5.4
5.5
5.6
5.7
5.8
5.9
6
6.1
6.2
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9
Piece of Evidence
Threshold Disparity
Verdict Confidence
91
variables and threshold disparity as the dependent variable for each piece of evidence.
The results are contained in Table 4.
Table 4. Unstandardized Regression Coefficients for Verdict Confidence and Likelihood
Ratings in Predicting Threshold Disparity
Verdict Confidence Likelihood Rating
Piece of Evidence B S.E. B S.E.
1 .74** 0.257
-.89*** 0.016
2 2.46** 0.806
-.08 0.055
3 2.34** 0.755
.01 0.057
4 2.40*** 0.699
.15** 0.055
5 3.44*** 0.652
.22*** 0.049
6 4.49*** 0.663
.26*** 0.049
7 5.39*** 0.845
.17*** 0.05
8 4.12*** 0.757
.18*** 0.048
9 4.20*** 0.671 0.08 0.043
Note: * = p < .10; ** = p < .05; *** = p < .001
The regression parameters indicate that verdict confidence is related to threshold
disparity independent of any relation between verdict confidence and the likelihood
rating. For instance, with respect to the DNA evidence (# 2.), for every one-unit increase
in verdict confidence, threshold disparity would increase by 2.46 units. This relation is
independent of the fact that the likelihood estimate significantly increased for the DNA
evidence as well. Similarly, verdict confidence was much more predictive of threshold
disparity than was the likelihood ratings for pieces of evidence #5 and #6, which yielded
the highest likelihood ratings and verdict confidence overall. In general, the results are
consistent with the explanatory hypothesis that threshold distortion occurs to augment
verdict confidence. Note that the correlations between the likelihood ratings, verdict
92
confidence, and threshold disparity for each individual piece of evidence can be found in
Appendix [E].
Discussion
The present findings replicate the effect of threshold distortion shifting.
Participants consistently and systematically shifted their threshold according to the
evidence. The effect was slightly more pronounced in the present study, a likely result of
more variability in the verdicts. Recall that only 20% of participants voted to convict in
the first study, whereas almost three-quarters of participants voted to convict in the
present study. This considerable increase in the rate of convictions speaks to the
persuasive power of DNA evidence. The present study also replicated the finding that
threshold shifting is positively related to verdict confidence, which supports the
contention that decision confidence is a byproduct of threshold shifting.
The present findings lead to many empirical questions that should be taken up
with future research. One question is whether the framing of the RMP affected the
results. Koehler (2000) found that framing a RMP in frequency (i.e., 1 in 1,000,000) is
less persuasive to mock jurors than the equivalent RMP framed in probabilistic terms
(i.e., .000001%). The posited explanation for this phenomenon is that the frequency
format provides a reference class, which allows people to imagine examples of how the
match could occur at random. For instance, since there are six million residents in Los
Angeles, a 1 in 1,000,000 RMP suggests that there are six people with the DNA profile.
But the probabilistic frame does not instantiate such a reference class, because there is no
denominator, and so mock jurors ascribe relatively more weight to the match. Koehler’s
93
findings suggest that the threshold shifting observed in present study, which used a
frequency format, might be a lower-bound estimate. Threshold shifting might be even
more pronounced if the RMP were framed in probabilistic terms. This possibility could
be examined in future research.
Another emerging question is whether the method used to produce the match
affected threshold shifting. There are two methods used to procure DNA matches. One is
the conventional approach where detectives test only one suspect whom they suspect a
priori. The second approach is when the detectives have no suspects so they aimlessly
trawl through a database. Although it should not affect the probative value of the match
(Kaye, 2009), the production method does have an enigmatic psychological effect on
jurors.
Scurich and John (2011) conducted a study in which they manipulated the
production method of the DNA match, holding everything else constant. The production
method did not affect mock jurors’ estimates of the likelihood that the defendant was
guilty. In other words, participants thought it was equally likely that the defendant
committed the crime, regardless of whether the match resulted from a conventional
search or a database trawl. However, Scurich and John (2011) found that mock jurors
were much less likely to convict when the match resulted from a database trawl. This
disjunction is consistent with the Wells effect (Neidermeier, Kerr, & Messe, 1999). Wells
(1992) found that statistical evidence sometimes influenced civil liability verdicts and
sometimes it did not, even though mock jurors considered the evidence equally probative.
94
It is possible that threshold shifting can explain this anomaly. Scurich and John
(2011) found that mock jurors are much more likely to convict in the conventional case
than in the database trawl case. The present study, which used a conventional case, found
that participants did shift their threshold. Perhaps the reason jurors were less likely to
convict in the trawl case is because they did not shift their threshold. That is, although the
participants in Scurich and John (2011) considered the evidence equally probative,
participants did not lower their threshold in the trawl case, hence they were less likely to
convict relative to the conventional case. This possibility should be systematically
examined in future research.
Study 3: Character Evidence
Introduction
A recurring theme is that a decision-theoretic account of juror decision making is
in tension with legal policy, in part because other values are at play beside veridical
verdicts (see Nesson, 1979; Tribe, 1971ab). A prime example is the categorical policy of
exclusion on character evidence, sometimes referred to as propensity evidence (Risinger,
1998). Evidence of prior conduct is generally inadmissible, except in specific instances,
not because it lacks any probative value; indeed, the aphorism that past behavior is the
best predictor of other behavior might be true. Rather, character evidence is inadmissible
because its potential prejudicial impact puts defendants (and sometimes victims) at an
unfair and impermissible disadvantage.
95
There are at least two ways in which the impact of such evidence can be
prejudicial (Muller & Kirkpatrick, 2003). First, hearing of past nefarious behavior can
cause jurors to convict without regard to culpability in the current allegation(s). In short,
jurors might convict simply because the defendant is a “bad man.” Evidence proving that
the defendant is a bad man is prejudicial because it is immaterial to proving the currently
alleged crime—it only proves the defendant is a “bad man”. Second, evidence can be
prejudicial if jurors ascribe substantially more weight to it that then is logically
appropriate. It could deny the defendant a fair opportunity to defend himself if jurors give
the evidence too much weight and are over-persuaded by it. Both types of prejudice
decrease the likelihood that a conviction will be based on a rational evaluation of the
evidence, and both increase the likelihood that it will be based on considerations that
arouse condemnation and dispassion.
Though the ban on character evidence runs deep through Anglo-American
jurisprudence, a recent Supreme Court decision has taken a considerable step in the
opposite direction. Based on a folk-psychological theory of how evidence affects jurors’
decision making, the Court delineated a more relaxed test for balancing probativeness
against prejudicial impact. As a result, the Court may have increased the likelihood that
prejudicial evidence will be deemed admissible. Before turning to this theory, and
describing how it relates to threshold shifting, a synopsis of that case follows.
Briefly, the case involved a series of felony charges, including being a felon in
possession of a firearm. The defendant offered to stipulate that he met the statutory
definition of being a convicted felon, though he contested ever possessing the firearm in
96
question. The prosecution refused the stipulation, and instead was permitted to describe
in great detail the events surrounding the defendant’s prior conviction, which were
strikingly similar to the facts in the present case. The defendant feared, apparently
rightfully so, that the jury would infer guilt based largely on his previous conduct. After
all, the only other evidence that the defendant possessed the firearm in question came
from witnesses with conflicting accounts and who were drunk at the time of the incident.
The defendant was convicted and sentenced to 15 years imprisonment.
In the landmark case Old Chief vs. US (1997), the Supreme Court held that the
lower court was in error by not accepting the defendant’s proffered stipulation. However,
fearing that the holding would have wide-reaching implications for other types of
evidence, such as gory photographs, the Majority went on to craft an opinion that
dramatically changed the bed-rock legal concept of relevance. As defined by Federal
Rules of Evidence 401, relevant evidence is “evidence having any tendency to make the
existence of any fact that is of consequence to the determination of the action more
probable or less probable than it would be without the evidence.”
In contrast to this well-accepted standard, the Old Chief Court suggested that
evidence which does not make the tendency of the fact any more probable can still be
relevant, provided it “tells a colorful story with descriptive richness (p. 653).” Such
evidence, the Court said, “ [contains the] power not only to support conclusions but to
sustain the willingness of jurors to draw the inferences, whatever they may be, necessary
to reach an honest verdict (p. 654; emphasis added).” The Court went on to prescribe:
“[T]he prosecution may fairly seek to place its evidence before the jurors…to convince
97
the jurors that a guilty verdict would be morally reasonable as much as to point to the
discrete elements of a defendant’s legal fault (p. 654).” The Court thus endorsed the
admissibility of evidence, not to increase the likelihood of the defendant’s guilt, but
rather to increase the likelihood (“willingness”) of a conviction. One can formally
interpret this “willingness” as lowering the threshold for Reasonable Doubt. Hence,
evidence can be relevant (and presumptively admissible) simply to lower the threshold.
The sage of Old Chief is hotly contested within the legal academy (see, e.g.,
Risinger, 1998). But there has yet to be any empirical study that tests the folk psychology
on which this decision is based. There is empirical evidence that suggests character
evidence, such as gory photographs and prior convictions, does increase the rate with
which jurors convict (Douglas, Lyons, & Ogloff, 1997; Wissler, & Saks, 1985).
However, the cause of this increase is unclear. It could be that the evidence causes jurors
to employ a “bad man” type of reasoning; it could be that the probative value of the
evidence is overweighed by jurors; or, as the Supreme Court assumes, such evidence
might cause jurors to lower their threshold for conviction. The present study attempts to
disentangle these explanations. By doing so, it provides the first direct test of the Old
Chief Court’s reasoning and its subsequent overhaul of the concept of legal relevance. It
also attempts to extend the previous findings on threshold shifting to a new domain of
evidence: character evidence.
Materials and Procedure
The materials utilized in this study differ from the previous study in only one respect: it
was revealed that the defendant is a convicted felon. After receiving the background facts
98
of the case, and learning about the defendant’s apparent nervousness when questioned by
the detectives, participants were told:
Detective Smith asked [the defendant] to submit a blood sample. He refused. But
[the defendant] was no stranger to the police. He had a prior conviction for
aggravated assault and battery with a gang enhancement, and had previously been
incarcerated in a California State prison.
As part of a new state initiative, all convicted felons have a genetic sample taken
when they arrive at prison. Detective Smith was able to access [the defendant’s]
genetic profile that was collected when he was incarcerated.
The rest of the evidence presented was exactly the same as the previous study, including
the DNA random match probability (1 in 200 million Hispanics). As with the previous
studies, the same three questions—probing the likelihood of guilt, providing a verdict,
and indicating the confidence in that decision –followed the introduction of each piece of
evidence, and the same analytic methods were employed.
Results
One hundred-four participants correctly responded to the reading comprehension
question and were included in this study. The age range was 18-78, with a mean of 36
(SD = 13.4) and median 31 (IQR = 21). Female participants comprised 61% (n = 64) of
the sample. A majority (49%, n = 51) of the participants identified themselves as
politically liberal, 26% (n = 27) were politically moderate and 25% (n = 26) were
politically conservative.
At the conclusion of the trial, 82% (n = 85) of participants voted to convict the
defendant. The difference between this conviction rate and the conviction rate observed
in the previous study (i.e., 72%) trended towards significance (χ² = 2.57, df =1, p = .11).
99
This implies that, all else being equal, learning of the prior conviction tended to increase
the likelihood that jurors will convict. Table 5 contains descriptive statistics of the
percentage of participants convicting, the likelihood ratings and verdict confidence over
the course of the trial.
Table 5. Descriptive Statistics for Each Piece of Evidence
Piece of Evidence % Convicting Likelihood Rating Verdict Confidence
1 1 24.34 (4.13) 5.75 (0.29)
2 3 32.64 (4.73) 5.61 (0.27)
3 78 87.84 (1.50) 5.95 (0.21)
4 79 90.57 (2.77) 6.04 (0.23)
5 81 91.41 (2.81) 6.04 (0.25)
6 83 92.97 (2.64) 6.21 (0.23)
7 89 93.72 (2.36) 6.24 (0.21)
8 84 90.38 (2.85) 5.89 (0.25)
9 80 89.49 (3.16) 5.93 (0.29)
10 82 89.68 (3.24) 5.95 (0.22)
Note: * = p < .10; ** = p < .05; *** = p < .001;
parentheses indicate +/- 2 S.E.
Figure 5 contains the implicit thresholds and likelihood ratings for each piece of
evidence throughout trial. Note that the fit indices of the logistic regression used to
calculate the implicit thresholds can be found in Appendix [F]. Further note that age and
gender did not improve the fit of the regression and will not be discussed further. A
repeated measures ANOVA found that the likelihood ratings did significantly change
over the course of the trial F(9, 104) = 586.54, p < .001,
2
= .851.
100
Figure 5. Mean Likelihood Estimates (+/- 2 S.E.) and Implicit Thresholds (+/- 2 S.E.)
over the Course of Trial
As in the previous studies, the thresholds and likelihood ratings consistently
moved in opposite directions (r = -.913, p < .001). It is interesting to note that by the end
of the presentation of incriminating evidence (i.e., through piece 7), the threshold
dropped to 64 from 94. The exculpating evidence (i.e., pieces 8-10) brought the
threshold up to 75.
Piece of evidence # 2 pertains to the prior conviction. Participants considered this
evidence marginally diagnostic as it increased the mean likelihood rating from 24 to 33.
At the same time, this evidence lowered participants’ threshold from 94 to 86. No error
bars could be calculated for the first threshold because the number of participants voting
to convict at this point (n = 1) was too small to yield reliable estimates from the
bootstrapping procedure.
93.9
86.3
75.4
74.6
71.4
65.3 63.7 66.3 70.6
75.0
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
Piece of Evidence
implicit threshold
likelihood of guilt
101
The third piece of evidence pertains to the DNA match, which participants
considered to be highly incriminating, as evidenced by the change in the likelihood
ratings from about 33 to 88 following the introduction of the DNA. This upward
adjustment coincided with a downward adjustment in the threshold from 86 to 75
following the introduction of the DNA evidence.
Verdict confidence levels were extremely high with an overall mean of 5.96 (S.D.
= 1.26) and median of 6 (on a 1-7 scale where higher values indicate greater confidence).
A repeated measures ANOVA indicated that verdict confidence did significantly change
over the course of the trial F(9, 104) = 4.63, p < .01,
2
= .043. Figure 6 plots the relation
between verdict confidence and threshold disparity over the course of the trial.
Figure 6. Threshold Disparity and Mean Verdict Confidence over the Course of Trial
Figure 6 indicates that verdict confidence is highest when threshold disparity is
the highest, the first two pieces of evidence notwithstanding. Setting aside the first two
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
6
6.1
6.2
6.3
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
Piece of Evidence
Threshold Disparity
Verdict Confidence
102
pieces of evidence, the correlation of verdict confidence and threshold disparity was r =
.673 (p < .05),
5
which indicates that as threshold disparity increases, verdict confidence
does as well. However, this correlation could be potentially misleading, since verdict
confidence was highly correlated with the likelihood ratings r = .795 (p < .01). Verdict
confidence and threshold disparity were the greatest for pieces of evidence # 6 and 7, but
the likelihood ratings were also the greatest for these pieces of evidence (see Table 5).
In order to partition the unique variance accounted for by verdict confidence, an
OLS regression was run with the likelihood ratings and verdict confidence as the
independent variables and threshold disparity as the dependent variable for each piece of
evidence separately. The results are contained in Table 6.
Table 6. Unstandardized Regression Coefficients of Verdict Confidence and Likelihood
Ratings in Predicting Threshold Disparity
Verdict Confidence Likelihood Rating
Piece of Evidence B S.E. B S.E.
1 NA NA
NA NA
2 0.9 0.059
-.99 0.003
3 3.66*** 0.936
-0.09 0.065
4 3.18*** 0.753
-0.06 0.063
5 3.11*** 0.623
-0.01 0.055
6 3.08*** 0.589
.12* 0.051
7 2.04*** 5.85
.24*** 0.053
8 3.56*** 0.562
.17*** 0.049
9 4.33*** 0.731
-.42 0.056
10 3.95*** 0.684 -.17** 0.054
Note: * = p < .10; ** = p < .05; *** = p < .001
5
When all pieces of evidence are included, the correlation between the likelihood estimates and threshold
disparity was not significant (r = -.582, p = .078). Nor was the correlation between the likelihood estimates
and threshold disparity when only the first piece of evidence was removed (r = -.485, p = .185).
103
The regression parameters indicate that verdict confidence is related to threshold
disparity independent of any relation between verdict confidence and the likelihood
rating. For instance, with respect to the DNA evidence (# 3.), for every one-unit increase
in verdict confidence, threshold disparity would increase by 3.66 units. With regard to
pieces of evidence # 6 and # 7, verdict confidence was much more predictive of threshold
disparity than were the likelihood ratings, despite the fact that pieces of evidence # 6 and
# 7 yielded the greatest overall likelihood ratings. In general, these results support the
explanatory hypothesis that threshold distortion occurs to augment verdict confidence.
Note that the correlations between the likelihood ratings, verdict confidence, and
threshold disparity for each individual piece of evidence can be found in Appendix [G].
Discussion
The present findings replicate the threshold shifting effect that was observed in
the previous two studies. Participants consistently and systematically varied their
threshold in relation to the evidence. Moreover, the magnitude of the shift was positively
related to verdict confidence, such that the greater the disparity between the threshold and
the likelihood rating, the greater the confidence in the verdict. The replication of these
findings across three different studies and three unique samples suggests that the
phenomenon is fairly robust.
The present study extends the previous findings to a new domain of evidence.
Although character evidence has been historically subject to a categorical policy of
exclusion because of its prejudicial impact, a recent Supreme Court decision has largely
lifted this ban by suggesting that evidence can be relevant simply to increase jurors’
104
willingness to convict. This study found that simply including character evidence, all else
being equal, increased in the rate of conviction from the previous study, which is
consistent with previous research on the effect of character evidence (e.g., Douglas,
Lyons, & Ogloff, 1997). However, the previous findings that character evidence
increases the conviction rate are ambiguous, because it is unclear whether the increase
was due to “bad man-type” of reasoning, an over-weighting of the evidence, or a
lowering of the threshold.
The current findings permit a more nuanced explanation of the increased
conviction rate following the introduction of character evidence. First, participants did
not consider the character evidence to be highly probative. This is evidenced by the non-
significant increase in the likelihood ratings. Second, the findings do not fully support the
contention that character evidence lead jurors to convict based on the “bad man-type” of
reasoning. If this type of reasoning were relied upon, one would expect threshold
intransience in which the threshold would drop to an extremely low level (hence
requiring little-to-no proof for a conviction) and continue to stay low. While the character
evidence did lower the threshold initially, the subsequent thresholds continued to move,
even in the opposite direction. This finding is only consistent with the “bad man” theory
insofar that jurors were more apt to convict based on evidence that they deemed
probatively useless and hence immaterial. But the finding is most consistent with the
notion that character evidence makes jurors more apt to convict simply because it lowers
their threshold, as intuited by the Supreme Court in Old Chief.
105
Several qualifications of the present findings on character evidence must be noted.
First, the findings are generalizable only to this specific type of character evidence,
namely, evidence of a prior conviction and incarceration for aggravated assault with a
gang enhancement. Other types of character evidence, such as gory photographs, might
increase the likelihood that jurors rely on the “bad man” reasoning. Indeed, gory
photographs, which presumably arouse emotion and inflame passion, are more closely
related to this type of reasoning (Risinger, 1998). It is also likely that the specific nature
of the character evidence would have a different effect on jurors. For example, evidence
of a prior sexual offense involving children could increase the effect, or it could invoke a
different set of emotions altogether (see Wiener, Arnot, Winter, & Redmond, 2006;
Vidmar, 1997). Further experimentation and replication is necessary before universal
claims can be made about character evidence more generally.
106
Chapter 7: General Discussion
This chapter first considers how the empirical findings of threshold shifting fit
within the theory of cognitive consistency. The chapter then turns to the implications for
legal doctrine by considering harmless error analysis in light of threshold shifting. Before
concluding with closing remarks, the chapter discusses general limitations of the
research.
Implications for Cognitive Consistency Theory
Threshold shifting extends the previous findings of Biased Predecision Processing
(BPP) (Brownstein, 2003). BPP includes a number of closely related theories that
postulate cognitive consistency is a primary objective when making decisions (Russo et
al., 2008). The mind attains cognitive consistency by systematically distorting
information to make it comport with the evolving mental representation of the task at
hand (Simon, Krawczyk, & Holyoak, 2004). This comportment leads to high decisional
confidence. The present findings suggest that threshold shifting is also used to enable a
state of cognitive consistency. In the same way information is distorted to polarize the
alternatives, the threshold is shifted in order to increase the perceived disparity between
the evidence and the decision threshold. This avoids close calls, which are antithetical to
cognitive consistency, and it facilitates highly confident decisions.
Prior research on BPP largely overlooked the possibility of threshold shifting as a
mechanism to promote cognitive consistency. But this does not mean that threshold
shifting was absent in prior research. Since this phenomenon occurs throughout the
107
decision-making process, it is likely that the previous findings on information distortion
were overstated because they unknowingly included the effect of threshold shifting. That
is, the previously-reported effect sizes of information distortion included both a distortion
of information and a shifting of the threshold, even though the findings purport to pertain
only to the former. To be fair, the magnitude of threshold shifting does appear small
relative to the magnitude of information distortion (see generally Dekay, Stone, &
Sorenson, 2011). Nevertheless, the present findings indicate that the two forms of
distortion interact synergistically in the quest for cognitive consistency.
The present findings indicate that threshold shifting obtains cognitive consistency
in a conceptually different capacity than the distortion associated with BPP. Previous
research on BPP conditioned on participants’ ultimate choice and showed that distortion
occurs in one direction (i.e., unidirectionally) to conform the evidence therewith. The
present study, however, conditioned on the nature of the evidence (i.e., inculpating or
exculpating) and showed that the threshold varies in both directions (i.e., bidirectionally)
as a function of the evidence. Conditioning on the evidence does not require participants
whose interim preference is different than their final decision to be eliminated, and it
allows the dynamic of distortion to be examined at a more nuanced level.
It must be stressed that threshold shifting, which is conditioned on the evidence, is
not inconsistent with the previous findings on BPP, which is conditioned on the final
decision. Despite the fact that BPP is unidirectional and threshold distortion is
bidirectional, both mechanisms operate to enhance the same underlying goal of cognitive
consistency (Russo et al., 2008). The differences in the direction of distortion can be
108
explained by the different mechanisms that operate to attain cognitive consistency. BPP
involves polarizing the interpretation of the evidence to shape the overall mental
representation, while threshold shifting allows evidence—even if contradictory to an
emerging preference—to be assimilated in a consistent manner. This can be inferred from
the increased confidence that threshold shifting engenders. Ultimately, the BPP
mechanism is more influential, as evidenced by the relative effect sizes of information
distortion, but threshold shifting is also driven by cognitive consistency. There is thus no
incompatibility between BPP and threshold shifting. The findings on threshold shifting
simply suggest that the Gestalt is more pliable than previously imagined.
Implications for Legal Doctrine
Threshold shifting suggests that evidence serves a dual function by affecting the
perception of guilt and moderating the threshold for conviction. A corollary of this
duality is that it potentially complicates the analysis of legal doctrine. Anglo-American
doctrine almost universally assumes that the threshold is static and contextually
independent. Because this assumption is no longer tenable, a complete analysis must also
take into account the effect of the evidence on the threshold. This section considers how
the empirical findings of threshold shifting might be used to inform the legal analysis of
one doctrine in particular—harmless error analysis—which has previously neglected the
potential effects of impermissible evidence on the threshold. This illustration can be
applied to other doctrines, such as the weighting of probative value versus prejudicial
impact.
109
Harmless Error Analysis
Defendants are constitutionally entitled to a fair trial, not a perfect one. Errors are an
ineluctable aspect of judicial proceedings, and virtually no verdict could be upheld if it
required the trial to be error-free (Edwards, 1995). Some trial errors, however, are so
significant that they impinge the safeguards afforded by the constitution (Stacy &
Dayton, 1988). These errors make the trial unfair and defendants have a right to a new
trial in their presence. Appellate courts are responsible for determining whether errors are
so obtrusive as to render the trial unfair. This tremendously important responsibility
allows the higher courts to regulate the behavior of trial court judges, prosecutors and
police investigators, and to deter them from engaging in similar behavior in the future
(see Traynor, 1970).
Central to the issue is determining the broad impact of the error (Mueller &
Kirkpatrick, 2003). Errors that are not consequential to the trial outcome are deemed
“harmless” and are not cause for a reversal. There are two analytical approaches for
determining whether an error is harmless (Simon, 2004). The first, known as the error
focused approach, analyzes the actual impact of the error on the trial outcome. This
approach evaluates whether and to what extent the error contributed to the outcome of the
trial, and determines harmlessness accordingly. The second approach, which is more
popular amongst jurists, is known as the guilt focused approach. This approach analyzes
whether a conviction is warranted based on the remaining evidence. If the remaining
evidence is sufficient for conviction, even formidable errors will be deemed harmless.
110
Under either approach, the appellate court judge must be certain beyond a reasonable
doubt that the error did not affect the outcome of the case (Newman, 1993).
Both approaches have been the subject of extensive scholarly debate (e.g.,
Mitchell, 1994; Edwards, 1995; Saltzburg, 1973). Given that courts primarily utilize the
guilt focused approach, much commentary has pointed out its weaknesses, which include
undermining the right to trial by jury and resting too heavily on judicial speculation
(Simon, 2004). The error focused approach has also been criticized, albeit to a much
lesser extent. In practice, most trial errors are considered harmless and very few verdicts
are overturned on the basis of harmful error, typically referred to as “reversible error”
(Edwards, 1995; Landes & Posner, 2001)
Simon (2004) notes several psychological explanations for the judicial reluctance
to find trial errors reversible. In particular, the predominance of the guilt focused
approach might contribute to judges regularly deeming errors harmless, because appellate
judges are privy to the impermissible evidence and thus susceptible to being influenced
by it. Judges, like any other human, cannot consciously ignore knowledge about facts
(Wegner, Schneider, Carter, & White, 1987). Moreover, coherence based reasoning
posits that evidence can indirectly influence (or contaminate) the perception of the other,
unrelated evidence. Hence, the impermissible evidence could influence the judge’s
perception of the case despite proclamations to the contrary.
Threshold shifting cannot speak to which approach to harmless error analysis is
preferable nor can it resolve the scholarly debate. However, the phenomenon of threshold
shifting does suggest that both approaches are inchoate because they ignore the possible
111
effect that such evidence could have on the threshold. That is, by focusing on the
evidentiary impact of the impermissible evidence, harmless error analyses ignore the
effect such evidence has on the threshold. A seemingly minor error could have a
significant effect on the threshold, even if the impact of the error—in terms of evidentiary
value—only appears to have moderately affected the verdict.
To make the discussion more concrete, consider a judge faced with determining
whether the introduction of a coerced confession is harmless error. Under an error
focused approach, the central question is whether the confession contributed significantly
to the verdict. The judge will find either that the confession had a significant impact on
the verdict and declare it reversible error, or the judge will find that it had little impact on
the verdict and deem the introduction of the confession harmless. Impact refers to the
evidentiary weight jurors ascribe to the confession.
According to threshold distortion, the weight of evidence also influences the
threshold for conviction by lowering it.
6
A proper analysis of harmless error must take
this into account, and consider the possibility that the confession, even if jurors consider
it minimally probative, might have had an effect on the verdict by lowering the threshold.
Thus, the error focused approach could understate the true effect of the confession on the
verdict because it ignores the possibility that the confession increased jurors’ willingness
to convict (aside from any evidentiary impact). Moreover, the present findings suggest
that threshold shifting is a determinant of verdict confidence; the evidence was only
6
Harmless error analysis is almost always precipitated by the defendant. Thus, we can assume that the
evidence in question is of an inculpatory nature (since no defendant would contest the introduction of
exculpatory evidence), which would correspondingly lower the threshold.
112
modestly related to verdict confidence. Hence, focusing on the evidential impact neglects
much of the overall effect of the confession on the verdict. It also ignores how the
increased willingness to convict engendered by the confession might have affected
subsequently introduced evidence and the corresponding threshold.
Under the guilt focused approach, the judge must determine whether the jury
would have convicted based on the remaining evidence (i.e., without the confession).
This approach may be self-defeating for the same reasons articulated by Simon (2004);
namely, that the judge is privy to the inadmissible evidence and may be unconsciously
influenced by it. Threshold shifting predicts that the judge would lower her threshold
based on the confession, even if she does not consider the impact of the confession in
determining harmless error. Ironically, the reasonable doubt standard, by which appellate
judges determine whether the error was harmless, increases the likelihood of this
happening.
These thoughts are only preliminary. Threshold shifting raises numerous
questions for harmless error analysis, some of which are empirical in nature. For
example, further experimentation should examine whether threshold shifting occurs when
judges attempt to “ignore” a piece of evidence. This could have implications for the
vitality of the guilt focused approach. An additional study might examine whether judges
can anticipate the effect of the error on jurors’ thresholds and appropriately integrate this
effect into an evidence focused analysis. This brief discussion is not limited to harmless
error analysis; it can apply to other evidentiary determinations, such as the balancing of
113
probative value with prejudicial impact. Recognizing the empirical reality of threshold
shifting can produce more rigorous analysis of legal doctrine.
General Limitations and Future Directions
The reported studies are subject to the usual limitations associated with trial
simulations. Briefly, the findings are based on the responses of participants who read an
abridged description of a criminal trial, and made hypothetical judgments without any
real consequences. Although important differences between actual trials and simulations
exist (see Wiener, Krauss, & Liberman, 2011), research indicates that the effect of these
differences on the mock juror behavior is not as significant as is sometimes assumed
(Berstein, 1999). Perhaps of greater concern is the fact that actual jurors do not make
judgments after the reception of each piece of evidence. It is possible that making
incremental judgments could have affected participants’ responses.
7
In short, the current
findings need to be replicated in a more ecologically valid environment.
Ecological validity could be enhanced in future studies by using what trial
consultants call shadow jurors. Shadow jurors are surrogate jurors who sit in the gallery
and watch an actual criminal trial. Typically, shadow jurors are chosen to mirror the
demographic composition of the actual jury. It would be interesting to outfit shadow
jurors with a portable electronic data-storing device, and have them make judgments over
the course of an actual trial. These judgments could be used to calculate the implicit
threshold in order to see if threshold shifting replicates in a more realistic environment. It
7
This possibility was examined in a pilot study in which participants made judgments after every other
piece of evidence, rather than after each piece of evidence. The same pattern of distortion appeared even
when the judgments were made with less frequency.
114
is possible that the effect would be more pronounced under realistic conditions, such
viewing witnesses testify first hand or being in the presence of the defendant. I am
currently applying for a National Science Foundation (NSF) grant in order to conduct this
study.
Another potential limitation is that the present research examined cognitive
processes at the individual level, while criminal verdicts are decided by a collective of
jurors. Research shows that the predeliberation distribution of votes is predictive of the
eventual verdict (Kalven & Zeisel, 1966; Devine et al., 2001), implying that groups do
sometimes reach different verdicts than individual jurors. However, group verdicts are
not less susceptible to bias and error (see generally Devine et al., 2001), and some
cognitive consistency researchers believe that the group dynamic would actually
“amplify” and “compound” the effects of BPP (Simon, 2004). Further research is
necessary to examine what effect if any group deliberation has on moderating threshold
shifting.
It would be interesting to see if threshold shifting also occurs at the group level—
a type of second-order threshold shifting—where discussion of the evidence causes jurors
to shift their threshold additionally. If jurors shift their threshold when exposed to the
evidence, it is possible that jurors might re-shift their threshold when a fellow juror
discusses the evidence during deliberation. The direction and magnitude of this additional
shifting might be moderated by certain factors, such as feelings toward the fellow juror
who is discussing the evidence. For instance, if the fellow juror is disliked or perceived to
be unintelligent, perhaps jurors will re-shift their threshold will shift in the opposite
115
direction. Examining the effect of the group dynamic on threshold shifting is part of the
aforementioned NSF grant proposal.
Several caveats associated with the implicit methodology should also be
acknowledged. First is the choice to define the implicit threshold as the indifference point
between voting to convict or acquit; that is, the subjective probability at which a
conviction is equally as likely as an acquittal. Admittedly, this choice is a subjective one
and it is particularly convenient from a mathematical standpoint. One could argue that the
indifference point is not the appropriate point to define the implicit threshold that jurors
use to operationalize the reasonable doubt rule. In particular, one might be more
demanding and argue that the implicit threshold for reasonable doubt should require a
greater likelihood than simply being more likely than not to convict.
The choice to use the indifference point was inspired by Item Response Theory
(IRT) (de Ayala, 2009). IRT is used by psychometricians for evaluating and developing
test items. It is commonplace in IRT to use the indifference point for defining the
difficulty of a given item (DeVellis, 2003). That is, the difficulty level associated with the
point at which a person is equally likely to pass the item as she is to fail the item. Note
that difficultly level in IRT is synonymous with the subjective probability of guilt in the
present studies. Of course, this too is a subjective decision, but it makes conceptual sense
since it indicates the point above which people are more likely to pass the item and below
which people are more likely to fail the item. No other point possesses this characteristic.
Disputing the choice of the indifference point is to some extent a red herring. The
purpose of the reported studies was not to evaluate whether mock jurors’ implicit
116
thresholds were sufficiently stringent. The choice of a particular point would affect the
implicit threshold estimate, but this is beside the point of the present studies. The purpose
was to test whether the implicit thresholds varied in a systematic fashion. This does not
require devotion to any particular point, and any implicit threshold estimate, whether
higher or lower than that associated with the indifference point, would suffice to
demonstrate threshold shifting. In sum, it can be argued that the indifference point is an
inappropriate estimate of the implicit threshold, but the purpose of the present study was
simply to demonstrate that the implicit threshold—however it is defined—varies in a
non-normative manner.
Another potential criticism of the methodology is that is does not completely
overcome the issues that motivated its use. Recall that previous threshold research has
relied on direct elicitation methods to enumerate participant’s decision threshold. I argued
that these methods are questionable because of the potential discordance between the
purported threshold and the de facto threshold, and that this discordance could be the
result of poor introspection. In other words, people generally lack the capacity to
accurately introspect what their internal criterion is. The implicit threshold methodology
does not require introspection about the threshold; however, the methodology is not
completely free of introspection. Participants did have to make likelihood judgments,
which do require some introspection. Thus, one could argue that methodology simply
shifts the problem from introspecting a threshold to introspecting a probability.
This criticism is fair and it illustrates the fundamental issue in examining
thresholds. Because there is no clairvoyant methodology currently available, there must
117
be some sort of direct elicitation, which of course requires introspection. The critical
issue is what should be introspected: the threshold or the probability? Though it is an
empirical question, it seems likely that, given the relative familiarity, people might have
better introspective abilities with probability. Probability is ubiquitous in everyday life,
from the weather forecast to political polls, etc. Thus, people may be more sensitive to
changes in probabilities because it is a concept with which they have some experience. It
is far less common for people to think about decision thresholds. As a result, people may
be less sensitive to noticing changes in their internal criterion, and may in fact not even
notice that their threshold is shifting. Future research could examine whether threshold
shifting is apparent when the thresholds are directly elicited. It might also examine
whether directly eliciting thresholds has any effect on the implicit thresholds. Further
research could also eschew probability elicitation by having participants make judgments
on a 1-10 point Likert scale with non-numeric labels. These judgments could be
converted to probability for the purpose of calculating the implicit threshold.
A final caveat concerns the quality of the data. Recruiting participants through
Mechanical Turk is a relatively new procedure, and there are only a limited number of
studies on the population of Turk workers (Paolacci, Chandler, & Ipeirotis, 2010). While
these studies generally indicate that Turk workers are quite similar and behave similarly
to participants from other subject pools (Mason & Suri, 2010), this does not guarantee the
similarity of the participants in the present studies. Moreover, although every attempt was
made to ensure that participants were US jury-eligible citizens, it is certainly possible that
118
some non-eligible participants slipped through the cracks. However, it is not clear that
these participants would have systematically skewed the results.
Approximately 27% of participants who initiated the study were removed for
either failing a reading comprehension/attention check question or for having an invalid
IP address. This is less than the rate (35%) of USC subject pool participants who were
removed during pilot testing for failing a reading comprehension/attention check
question. The elimination of participants who failed to accurately answer the reading
comprehension question could also raise concerns about a potential selection bias.
Selection bias (attrition, in particular) could skew the results of a study if self-selecting
participants systematically differ from non-self-selecting participants. This could in turn
obfuscate whether the observed effect is the result of the sampling method or the
phenomenon under study.
Without denying its potential, there is no reason to believe that a selection bias
would have materially affected the present findings. The participants in the present
studies were clearly not a random or representative sample; the degree to which the
findings generalize beyond Amazon Turk workers is an empirical question that is not
addressed in this dissertation. Selection bias is thus not at issue with respect to external
validity. The threat to internal validity that is caused by selection bias largely depends on
the reason for self-selection. For instance, if participants drop out of a medical study
because the treatment is not “working,” the resulting sample would be biased towards
participants for whom the treatment was effective. But there is no theoretical reason to
believe that participants in the present studies self-selected for a reason systematically
119
related to the phenomenon of threshold shifting. So far as one can tell, participants failed
the reading comprehension question because they were financially motivated to complete
studies as expeditiously and effortlessly as possible, a motivation that appears unrelated
to the phenomenon at hand. Unfortunately, the approach used in the present studies
eliminated participants before demographic information was collected. As a result,
information about the removed participants is unknowable. Future research could collect
demographic information before eliminating participants, and examine correlates of the
eliminated participants.
Regardless of why these participants failed such a straightforward question, the
inevitable dilemma must be faced: what ought to be done with participants whose
responses cannot be trusted? Data imputation/surrogate methods seem inappropriate
because there is no reason to trust any of the responses from these participants. If one is
chiefly concerned with internal validity, as is the case with experimental research, then it
seems appropriate not to retain spurious data. This is decidedly a value judgment and it is
open to disagreement.
Concluding Remarks
In a highly authoritative book on juror decision making, Joseph Kadane, an
eminent legal statistician, wrote an article entitled Sausages and the Law. A take on the
aphorism that people who enjoy sausages should never watch them being made, Kadane
(1993) pondered whether “those who love the institution of jury decision making should
not also avoid watching it being done (p. 229).” In a way, threshold shifting confirms
Kadane’s suspicion. It reveals the inability of jurors to abide by normative standards. As
120
a result, it might call into question whether jurors should be vested with the responsibility
to make such important decisions. But important differences exist between sausages and
the law on how to react to this revelation. The legal institution cannot simply change its
diet. The concept of the jury is so deeply entrenched within American culture that any
call for its eschewal is an exercise in futility. Improvement in the institution of jury
decision making is possible, but it requires awareness of limitations and shortcomings.
The purpose of the present research is to expose one shortcoming with the hope that such
awareness will improve the quality of the sausages that nourish the criminal justice
system.
121
Cases
Cage v Louisiana (1990) 498 US 392
In Re Winship (1970) 397 US 358
McCullough v State (1983) 657 P.2d 1157
People v Ibarra (2001) h021123 Cal Ct Apps
State v Skipper (1994) 228 Conn. 610, 637 A.2d 1101
Taylor v Kentucky (1978) 436 US 478
US v Glass (1988) 846F.2d 386
US v Lawson (1974) 707 F. 2d 433
State v Casey (1994) 405738. Oh Apps
Victor v Nebraska (1994) 511 US 132
Old Chief v. United States 519 US 172 (1997)
Sandoval v. California, 114 S. Ct. 1239 (1994)
People v. Collins, 438 P. 2d 33 (68 Cal. 2d 319 1968)
122
Bibliography
Allen, R.J., & Pardo, M.S. (2007). The problematic value of mathematical models of
evidence. Journal of Legal Studies, 36, 107-140.
Allen, R.J., & Laudan, L. (2008). Deadly dilemmas. Texas Tech Law Review, 41, 65-92.
Allen, R.J. (1997). Rationality, algorithms and juridical proof: a preliminary inquiry. The
International Journal of Evidence & Proof, 1, 254-275.
Arkes, H.R., & Mellers, B.A. (2002). Do juries meet our expectations? Law and Human
Behavior, 26(6), 625-639.
Back, K.W. (1968). Equilibrium as motivation: Between pleasure and enjoyment. In R. P.
Abelson, E. Aronson, W. J. McGuire, T.M. Newcomb, M. J. Rosenberg, & P. H.
Tannenbaum (Eds.), Theories of cognitive consistency: A sourcebook (pp. 311-
318). Chicago: Rand McNally.
Baron, J. (2000) Thinking and Deciding (3
rd
ed). New York: Cambridge University Press
Bernstein, B. H. (1999) The ecological validity of jury simulations: Is the jury still out?”
Law and Human Behavior, 75, 49-63.
Bond, S.D., Carlson, K.A., Meloy, M.G., Russo, J.E., & Tanner, R.J. (2007). Information
distortion in the evaluation of a single option. Organizational Behavior and
Human Decision Processes, 102, 240-254.
Bolstad, W.M. (2007) Introduction to Bayesian Statistics. New Jersey: John Wiley &
Sons.
Boyle, P.J., Hanlon, D., & Russo, E.J. (2011) The value of task conflict to group
decisions. Journal of Behavioral Decision Making.
Brilmayer, L. & Kornhauser, L. (1978). Review: Quantitative methods and legal
decisions. The University Of Chicago Law Review, 46, 116-140.
Brownstein, A.L. (2003). Biased predecision processing. Psychological Bulletin, 129,
545-568.
Brewer, N., & Williams, K.D. (2005) Psychology and Law: An Empirical Perspective.
New York: Guilford Press.
123
Bugliosi, V. (1996) Outrage: The Five Reasons Why OJ Simpson Got Away With
Murder. New York: Norton & Company.
Bugliosi, V. (2006) Beyond a reasonable doubt? In L. King (ed.) Beyond a reasonable
doubt (pp. 15-21). Phoenix Books: Los Angeles
Bright, D.A., & Williams, K.D. (2001) When emotion takes control of jury verdicts.
Special Issue on Crime, Punishment, and Incarceration, 8, 68-70.
Carlson, K.A., & Russo, J.E. (2001). Biased interpretation of evidence by mock jurors.
Journal of Experimental Psychology: Applied, 7(2), 91-103.
Carlson, K.A., Meloy, M.G., & Russo, J.E. (2006) Leader-driven primacy: Using
attribute order to affect consumer choice. Journal of Consumer Research, 32,
513-518.
Callen, C.R. (1982) Notes on a grand illusion: some limits on the use of Bayesian theory
in evidence law. Indiana law Journal, 57, 1-44.
Cohen, J.N. (1995). The reasonable doubt jury instruction: Giving meaning to a critical
concept. American Journal of Criminal Law, 22, 677-701.
Cohen, L.J. (1977). The Probable and the Provable. Oxford: Claredon Press
Coombs, C.H., Dawes, R.M., & Tversky, A. (1970) Mathematical Psychology. New
Jersey: Prentice Hall
Connolly, T. (1987). Decision theory, reasonable doubt, and the utility of erroneous
acquittals. Law and Human Behavior, 11(2), 101-112.
Dane, F.C. (1985). In search of reasonable doubt. Law and Human Behavior, 9(2), 141-
158.
Dane, F.C. & Wrightsman, L. (1982) Effects of defendants’ and victims’ characteristics
on jurors’ verdicts. In Brey, N.L., & Kerr, R.M. (Eds) The Psychology of the
Courtroom. San Diego, CA: Academic Press
Davies, H. (1995) Simpson Acquittal. Daily Telegraph, (October 25, 1995, D1).
De Ayala, R.J. (2009) The theory and practice of Item Response Theory. New York:
Guilford Press
Dekay M.L. (1996). The difference between Blackstone-like error ratios and probabilistic
standards of proof. Law and Social Inquiry, 21, 95-132.
124
Dekay, M.L., Patino-Echeverri, D., & Fischbeck, P.S. (2009) Distortion of probability
and outcome information in risky decisions. Organizational Behavioral and
Human Decision Processes, 109, 79-92.
Dekay, M.L., Stone, E.R., & Sorenson, C.M. (2011) Sizing up information distortion:
Quantifying its effect on the subjective values of choice option. Psychonomic
Bulletin Review.
Dershowitz, A. (1996) Reasonable Doubts. New York: Touchstone.
Dershowitz, A. (2006) When are doubts reasonable? In L. King (ed.) Beyond a
reasonable doubt (pp. 22-25). Phoenix Books: Los Angeles
Devellis, R.F. (2003) Scale Development: Theory and Applications. Thousand Oaks, CA:
Sage Publications.
Devine, D.J., Clayton, L.D., Dunford, B.B., Seying, R., & Pryce, J. (2001). Jury decision
making 45 years of empirical research on deliberating groups. Psychology, Public
Policy, and Law, 7(3), 622-727.
Devine, P. G., & Ostrom, T. M. (1985) Cognitive mediation of inconsistency discounting.
Journal of Personality and Social Psychology, 49, 5-26.
Douglas, K., Lyon, D., & Ogloff, J. (1997). The impact of graphic photographic evidence
on mock jurors' decisions in a murder trial: Probative or prejudicial? Law and
Human Behavior, 21, 485-501.
Diamond, H.A. (1990). Reasonable doubt: to define, or not to define. Columbia Law
Review, 90(6), 1716-1736.
Dworkin, R. (1977) Taking Rights Seriously. Boston, MA: Harvard University Press
Edwards, H.T., (1995) To err is human, but not always harmless: When should legal error
be tolerated? New York University Law Review, 70, 1167-1189.
Edwards, W. (1992) Influence diagrams, Bayesian imperialism, and the Collins case: An
appeal to reason. Cardozo law Review, 13, 1025- 1074.
Edwards, W., Lindman, H., & Savage, L.J. (1954) Bayesian statistical inference for
psychological research. Psychological Review, 70(3), 193-242.
125
Ellis, A. (2010) Are cell phones dangerous?
(http://au.lifestyle.yahoo.com/womens-health/article//8495830/are-mobile-
phones-dangerous/)
English, P.W., & Sales, B.D. (2005) More Than the Law: Behavioral and Social Facts in
Legal Decision Making. Washington D.C.: APA
Fairchild, H.H., & Cowan, G. (1997) The OJ Simpson trial: Challenges to science and
society. Journal of Social Issues, 53(3), 583-591.
Faigman, D.L., & Baglioni, A.J. (1988). Bayes’ theorem in the trial process: instructing
jurors on the value of statistical evidence. Law and Human Behavior, 12(1), 1-17.
Festinger, L. (1957). A Theory of Cognitive Dissonance. Evanston, IL: Row, Peterson.
Festinger, L. (Ed) (1964) Conflict, Decision, and Dissonance. Stanford, CA: Stanford
University Press.
Fienberg, S.E. (1986). Comment: misunderstanding, beyond a reasonable doubt. Boston
University Law Review, 66, 651-656.
Finberg, S.E., & Schervish, M.J. (1986) The relevance of Bayesian inference for the
presentation of statistical evidence and for legal decision making. Boston
University Law Review, 66, 771-798.
Finkelstein, M. & Fairley, W. (1970). A Bayesian approach to identification evidence.
Harvard Law Review, 83, 489-510.
Finkelstein, M. (1978). Quantitative Methods in Law. Free Press: New Jersey.
Fortunato, S.J. (1996) Instructing on reasonable doubt after Victor v. Nebraska: A trial
judge’s certain thoughts on certainty. Villanova Law Review, 41, 365-431.
Faigman, D.L., & Baglioni, A.J. (1988) Bayes’ theorem in the trial process: Instructing
jurors on the value of statistical evidence. Law & Human Behavior, 12, 1-17.
Franklin, J. (2006). Case comment – United States v. Copeland, 369 f. supp. 2d 275
(e.d.n.y.2005): quantification of the ‘proof beyond reasonable doubt’ standard.
Law, Probability and Risk, 5, 159-165.
Friedman, R.D. (1992). Infinite strands, infinitesimally thin: Storytelling, Bayesianism,
hearsay and other evidence. Cardozo Law Review, 14, 79-101.
126
Friedman, R.D. (1995) Probability and Proof in State v Skipper: An internet exchange.
Jurimetrics Journal, 35, 277-310.
Friedman, R.D. (1997). Answering the Bayesioskeptical challenge. The International
Journal of Evidence & Proof, 1, 276-291.
Friedman, R.D. (2000). A presumption of innocence, not of even odds. Stanford Law
Review, 52, 872-887.
Glynn, K. (2000) Tabloid Culture. North Carolina: Duke University Press
Grove, W.M., & Meehl, P.E. (1996) Comparative efficiency of informal (subjective,
impressionistic) and formal (mechanical, algorithmic) prediction procedures: The
clinical-statistical controversy. Psychology, Public Policy, and Law, 2, 293-323.
Hastie, R. (1993) Algebraic models of juror decision research. In Hastie (Ed.) Inside the
Juror. New York: Cambridge University Press.
Hester, R.K., & Smith, R.E. (1975). Effects of a mandatory death penalty on the
decisions of simulated jurors as a function of heinousness of the crime. Journal of
Criminal Justice, 1, 319-326.
Hoffman, M.B. (2007) The myth of factual innocence. Chicago-Kent Law Review, 82,
663-690.
Holyoak, K.J., & Simon D. (1999). Bidirectional reasoning in decision making by
constraint satisfaction. Journal of Experimental Psychology: General, 128(1), 3-
31.
Horowitz, I.A. (1997). Reasonable doubt instructions: commonsense justice and standard
of proof. Psychology, Public Policy, and Law, 3(2), 285-302.
Horowitz, I.A., & Kirkpatrick, L.C. (1996). A concept in search of a definition: the
effects of reasonable doubt instructions on certainty of guilt standards and jury
verdicts. Law and Human Behavior, 20(6), 655-670.
Horowitz, I.A., & Seguin, D.G. (1986). The effects of bifurcation and death qualification
on assignment of penalty in capital crimes. Journal of Applied Social Psychology,
16(2), 165-185.
Jaffee, L.R. (1988). Prior probability – a black hole in the mathematician’s view of the
sufficiency and weight of evidence. Cardozo Law Review, 9, 967-1011.
127
Kadane, J.B. 1993. Sausages and the law: Juror decisions in the much larger justice
system. In Inside the juror, R. Hastie, ed. 65–83. Cambridge, UK: Cambridge
University Press.
Kalven, H., & Zeisel, H. (1966). The American Jury. Boston, MA: Little, Brown and
Company.
Kaplan, J. (1968). Decision theory and the factfinding process. Stanford Law Review, 20,
1065-1092.
Kaplan, M.F., & Krupa, S. (1986). Severe penalties under the control of others can
reduce guilty verdicts. Law and Psychology Review, 10, 1-18.
Kassin, S.M., & Wrightsman, L.S. (1985) The Psychology of Evidence and Trial
Procedure. Beverly Hills: Sage Publications
Kassin, S.M. (1983) Deposition testimony and the surrogate witness: Evidence for a
“messenger effect” in persuasion. Journal of Social and Personality Psychology,
9, 281-288.
Kaye, D.H. (1999). Clarifying the burden of persuasion: What Bayesian decision rules do
and do not do. The International Journal of Evidence & Proof, 3, 1-28.
Kaye, D.H. (2002). Statistical decision theory and the burdens of persuasion:
Completeness, generality and utility. The International Journal of Evidence &
Proof, 1, 313-315.
Kaye, D. (1979). The laws of probability and the law of the land. The University of
Chicago Law Review, 47, 34-56
Kaye, D. H. (1980) Naked Statistical Evidence. Yale Law Journal, 89, 601-623.
Kaye, D.H., & Balding, D.J. (1995) Probability and Proof in State v Skipper: An internet
exchange. Jurimetrics Journal, 35, 277-310.
Kaye, D.H., & Koehler, J.J. (1991) Can jurors understand probabilistic evidence? Journal
of the Royal Statistical Society Association, 154, 75-81.
Kaye, D.H. (2009) Rounding up the usual suspects: A legal and logical analysis of DNA
trawling cases. North Carolina Law Review, 87(2), 1-90.
Kagehiro, D.K., & Stanton, W.C. (1985). Legal vs. quantified definitions of standards of
proof. Law and Human Behavior, 9(2), 159-178.
128
Kagehiro, D.K. (1990). Defining the standard of proof in jury instructions. Psychological
Science, 1(3), 194-200.
Kellogg, R.T. (2007) The fundamentals of cognitive psychology. Sage Publications:
New York
Kemeny, J.G. (1959) A philosopher looks at science. Van Nostrand Company: New York
Kerr, N.L., Atkin, R.S., Stasser, G., Meek, D., Holt, R.W., & Davis, J.H. (1976). Guilt
beyond a reasonable doubt: effects of concept definition and assigned decision
rule on the judgments of mock jurors. Journal of Personality and Social
Psychology, 34(2), 282-294.
Kerr, N.L. (1978a). Beautiful and blameless: effects of victim attractiveness and
responsibility on mock jurors’ verdicts. Personality and Social Psychology
Bulletin, 4, 479-482.
Kerr, N.L. (1978b). Severity of prescribed penalty and mock jurors’ verdicts. Journal of
Personality and Social Psychology, 36(12), 1431-1442.
King, L. (2006) Beyond a reasonable doubt. Phoenix Books: Los Angeles
Koehler, J.J., & Shaviro, D.N. (1990). Veridical verdicts: Increasing verdict accuracy
through the use of overtly probabilistic evidence and methods. Cornell Law
Review, 75, 247-279.
Koehler, J.J. (1991). The probity/policy distinction in the statistical evidence debate.
Tulane Law Review, 66, 141-150.
Koehler, J.J. (2002) When do courts think base rate statistics are relevant? Jurimetrics
Journal, 42, 373-402.
Koehler, J.J. (2001) The psychology of numbers in the courtroom: How to make dna-
match statistics seems impressive or insufficient. Southern California Law
Review, 74, 1275-1305.
Kornstein, D. (1976). A Bayesian model of harmless error. The Journal of Legal Studies,
5, 121-150.
Kramer, G., & Koenig, D.M. (1990) Do jurors understand criminal jury instructions?
Analyzing the results of the Michigan juror comprehension projection. University
of Michigan Journal of Legal References, 23, 401-443.
129
Kunda, Z. (1990) The case for motivated reasoning. Psychological Bulletin, 108, 480-
498.
Landes, W.M., & Posner, R.A. (2001) Harmless error. Journal of Legal Studies, 30, 161-
184.
Laudan, L., & Saunders, H.D. (2009). Re-thinking the criminal standard of proof: seeking
consensus about the utilities of trial outcomes. International Commentary on
Evidence, 7(2), 1-34.
Laudan, L. (2003). Is reasonable doubt reasonable? Legal Theory, 9, 295-331.
Lawson, G. (1992) Proving the law. Northwestern University Law Review, 86, 859-903.
Lempert, R. (1986). The new evidence scholarship: analyzing the process of proof.
Boston University Law Review, 66, 439-477.
Lempert, R.O. (1977). Modeling relevance. Michigan Law Review, 75, 1021-1057.
Levine, M. (1998). Do standards of proof affect decision making in child protection
investigations? Law and Human Behavior, 22(3), 341-348.
Levy, A.G., & Hershey, J.C. (2008). Value-induced bias in medical decision making.
Medical Decision Making, 28, 269-276.
Lillquist, E. (2002). Recasting reasonable doubt: decision theory and the virtues of
variability. University of California, Davis Law Review, 36, 85-196.
Lillquist, E. (2004). The puzzling return of jury sentencing: misgivings about apprendi.
North Carolina Law Review, 82(621), 635-636.
Lillquist, E. (2008). Balancing errors in the criminal justice system. Texas Tech Law
Review, 41, 175-185.
Lillquist, E. (2005). Absolute certainty and the death penalty. American Criminal Law
Review, 42, 45-91.
Lindley, D.V. (1970). Bayesian Statistics, A Review. Bristol, England: Arrowsmith
Martin, A.W., & Schum, D.A. (1987). Quantifying burdens of proof: a likelihood ratio
approach. Jurimetrics Journal, 27, 383-402.
Mason, W., & Suri, S. (2010) A guide to conducting behavioral research on Amazon’s
Mechanical Turk.
130
McCauliff, C.M.A. (1982). Burdens of proof: degrees of belief, quanta of evidence, or
constitutional guarantees? Vanderbilt Law Review, 35, 1293-1335.
McFall, R.M., & Treat, T.A (1995) Quantifying the information value of clinical
assessment with signal detection theory. Annual Review of Psychology, 50, 215-
241.
Mazzella, R., & Feingold A. (1994). The effects of physical attractiveness, race,
socioeconomic status, and gender of defendants and victims on judgments of
mock jurors: a meta-analysis. Journal of Applied Social Psychology, 24(15),
1315-1344.
Milianich, P. G. (1981) Decision theory and standards of proof. Law and Human
Behavior, 5, 87-96.
Mitchell, G. (1994) Against "overwhelming" appellate activism: Constraining harmless
error review. California Law Review, 82, 1335-1365.
Montgomery, J.W. (1998). The criminal standard of proof. New Law Journal, 148(37),
582-589.
Moskowitz, G.B. (2005) Social cognition. New York: The Guildford Press
Morano, A.A. (1975) A re-examination of the development of the reasonable doubt rule.
Boston University Law Review, 55, 507-562.
Mueller, C.B., & Kirkpatrick, L. (2003) Evidence. New York: Aspen
Nagel, S. (1979). Bringing the values of jurors in line with the law. Judicature, 63, 189-
195.
Nagel, S., Lamb, D., & Neff, M. (1981) Decision theory and juror decision-making. In
Sales, B.D. (ed.) The Trial Process. New York: Plenum.
Niedermeier, K. E., Kerr, N. L., & Messe, L. A. (1999) Jurors’ use of naked statistical
evidence: Exploring bases and implications of the Wells Effect. Journal of
Personality & Social Psychology, 73, 533-564.
Nesson, C.R. (1979) Reasonable doubt and permissible inferences: The value of
complexity. Harvard Law Review, 92, 1187-1225.
Nesson, C.R. (1985) The evidence or the event? On judicial proof and the acceptability of
verdicts, Harvard Law Review, 98, 1357-1393.
131
Newman, J.O. (2007). Quantifying the standard of proof beyond a reasonable doubt: a
comment on three comments. Law, Probability and Risk, 1, 3-7.
Newman, J.O. (1993). Beyond “reasonable doubt”. New York University Law Review,
68(5), 979-1002.
Nickerson, R. S. (1998) Confirmation bias: A ubiquitous phenomenon in many guises.
Review of General Psychology, 2, 175-203.
Nisbettt, R.E., & Wilson, T.D. (1977) Telling more than we can know: verbal reports on
mental processes. Psychological Review, 14, 231-259.
Note (1995) Reasonable doubt: An argument against definition. Harvard Law Review,
108, 1955-1968.
Ogloff, J.R.P. (1991). A comparison of insanity defense standards on juror decision
making. Law and Human Behavior, 15(5), 509-531.
Oppenheimer, D.M., Meyvis, T., & Davidenko, N. (2009) Instructional manipulation
checks: detecting satisficing to increase statistical power. Journal of Experimental
Psychology, 45, 776-872.
Parloff, R. (1997) Race and juries: If it ain’t broke….American Lawyer, 19(5), 5-7.
Paolacci, G., Chandler, J., & Ipeirotis, P.G. (2010) Running experiments on Amazon
Mechanical Turk. Journal of Behavioral Decision Making, 411-419.
Park, R.C., Tillers, P., Moss, F.C., Risinger, D.M., Kaye, D.H., Allen, R.J., Gross, S.R.,
Hay, B.L., Pardo, M.S., & Kirgis, P.F. (2010) Bayes wars redivivus — an
exchange. International Commentary on Evidence, 8.
Pennington, N., & Hastie, R. (1992). Explaining the evidence: Tests of the story model
for juror decision making. Journal of Personality and Social Psychology, 62(2),
189-206.
Pennington, N., & Hastie, R. (1991). A cognitive theory of juror decision making: the
story model. Cardozo Law Review, 13, 518-557.
Pennington, N., & R. Hastie (1993) The Story Model for Juror Decision Making. In R.
Hastie, ed., Inside the juror: The psychology of juror decision making, pp. 192–
221. New York: Cambridge Univ. Press.
132
Phillips, F. (2002). The distortion of criteria after decision-making. Organizational
Behavior and Human Decision Processes, 88, 769-784.
Polinsky, M.A., & Shavell, S. (1999) On the disutility and discounting of imprisonment
and the theory of deterrence. The Journal of Legal Studies, 28(1), 1-16.
Posner, R.A. (1998) Economic analysis of law. Aspen: New York
Read, S.J., & Simon, D. (in press) Parallel constraint satisfaction as a mechanism for
cognitive consistency. In B. Gawronsky, & F. Strack (eds) Cognitive consistency:
A unifying concept in social psychology. New York Guildford Press
Redmayne, M. (1999). Standards of proof in civil litigation. The Modern Law Review,
62(2), 167-195.
Risinger, D.M. (1998). John Henry Wigmore, Johnny Lynn Old Chief, and “legitimate
moral force”: Keeping the courtroom safe for heartstrings and gore. Hastings Law
Journal, 49, 403-462.
Russo, J.E., Medvec, V.H., & Meloy, M.G. (1996). The distortion of information during
decisions. Organizational Behavior and Human Decision Processes, 66(1), 102-
110.
Russo, J.E., Meloy, M.G., & Wilks, T.J. (2000). Predecisional distortion of information
by auditors and salespersons. Management Science, 46(1), 13-27.
Russo, J.E., Carlson, K.A., Meloy, M.G., & Yong, K. (2008). The goal of consistency as
a cause of information distortion. Journal of Experimental Psychology: General,
13(3), 456-470.
Saltzburg, S.A.(1973) The Harm of harmless error. Virginia Law Review, 98, 1000-1035.
Sand, L.B., & Rose, D.L. (2003) Proof beyond all possible doubt: Is there a need for a
higher burden of proof when the sentence may be death? Chicago-Kent Law
Review, 78, 1359-1376.
Saunders, H.A (2005) Quantifying reasonable doubt: A proposed solution to an equal
protection problem. Bepress Legal Series.
Savage, L.J. (1954) The Foundations of Statistics. New York: John Wiley & Sons.
Saxton, B. (1998) How well do jurors understand jury instructions- A field test using real
jurors and real jury instructions in Wyoming. Land and Water Law Review, 33,
59-190.
133
Schauer, F. (1993). Slightly guilty. The University of Chicago Legal Forum, 1, 83-100.
Schklar, J., & Diamond, S.S. (1999). Juror reactions to dna evidence: errors and
expectancies. Law and Human Behavior, 23(2), 159-186.
Schum, D.A. (1979). A review of a case against Blaise Pascal and his heirs. Michigan
Law Review, 77, 446-470.
Schum, D.A. (1992) Hearsay from a layperson. Cardozo Law Review, 14, 1-42.
Schum, D.A., & Martin, A.W. (1982). Formal and empirical research on cascaded
inference in jurisprudence. Law & Society Review, 17(1), 105-151.
Schum, D.A. (1994) Evidential Foundations of Probabilistic Reasoning. New York:
Wiley & Sons.
Schnittker, J., & John, A. (2007) Enduring stigma: The long-term effects of incarceration
on health. Journal of Health and Social Behavior, 48, 115-146.
Scurich, N., & John, R.S. (2011) Trawling genetic databases: When a DNA match is just
a naked statistic. Journal of Empirical Legal Studies, 8(s1), 49-71.
Shapiro, B.J. (1986). “To a moral certainty”: theories of knowledge and Anglo-American
juries 1600-1850. The Hastings Law Journal, 38, 153-193.
Sheppard, S. (2003). The metamorphoses of reasonable doubt: how changes in the burden
of proof have weakened the presumption of innocence. The Notre Dame Law
Review, 78, 1-77.
Shaviro, D. (1989). Commentary: statistical-probability evidence and the appearance of
justice. Harvard Law Review, 103, 530-554.
Simon, D., Snow, C.J., & Read, S.J. (2004). The redux of cognitive consistency theories:
evidence judgments by constraint satisfaction. Journal of Personality and Social
Psychology, 86(6), 814-837.
Simon, D. (2004). A third view of the black box: cognitive coherence in legal decision
making. The University of Chicago Law Review, 71, 511-586.
Simon, D., & Holyoak, K.J. (2002). Structural dynamics of cognition: from consistency
theories to constraint satisfaction. Personality and Social Psychology Review,
6(6), 283-294.
134
Simon, D., Pham, L.B., Le, Q.A., & Holyoak, K.J. (2001). The emergence of coherence
over the course of decision making. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 27(5), 1250-1260.
Simon, D., Krawczyk, D.C., & Holyoak, K.J. (2003). Construction of preferences by
constraint satisfaction. Psychological Science, 15(5), 331-336.
Simon, R.J., & Mahan, L. (1971). Quantifying burdens of proof: a view from the bench,
the jury, and the classroom. Law and Society Review, 1, 319-330.
Simon, R.J. (1969) Judges’ translation of burdens of proof into statements of probability.
In Kennelly, J.J., & Chapman, J.P. (Eds) The trial lawyer’s guide (p. 103-114).
Illinois: Callaghan.
Solan, L.M. (1999). Refocusing the burden of proof in criminal cases: some doubt about
reasonable doubt. Texas Law Review, 78, 105-147.
Sommers, R.,S., & Ellsworth, P.C. (2001) White juror bias: An investigation of prejudice
against black defendants in the American courtroom. Psychology Public Policy
and Law, 7(1), 201-229.
Stacy, T., & Dayton, K. (1988) Rethinking harmless constitutional error. Columbia Law
Review, 88, 91-98.
Stoffelmayr E., & Diamond, S.S. (2000). The conflict between precision and flexibility in
explaining “beyond a reasonable doubt”. Psychology, Public Policy, and Law,
6(3), 769-787.
Strawn, D.U., & Buchanan, R.W. (1976). Information material allegation to the exclusion
of jury confusion: a threat to justice. Judicature, 59(10), 478-483.
Stuntz, W.J. (2001) The pathological politics of criminal law. Michigan Law Review,
100, 505-537.
Sundby, S.E. (1989) The reasonable doubt rule and the meaning of innocence. Hastings
Law Journal, 40, 457-510.
Svenson, O. (1992). Differentiation and consolidation theory of human decision making:
a frame of reference for the study of pre- and post-decision processes. Acta
Psychologica, 80, 143–168.
Swets, J.A., & Pickett, R. (1966). Signal Detection Theory and ROC Analysis in
Psychology and Diagnostics. Mahwah, NJ: Erlbaum.
135
Swets, J.A. (1992) The science of choosing the right decision threshold in high-stakes
diagnostics. American Psychologist, 47, 522-32.
Thompson, W. C. (1989) Are juries competent to evaluate statistical evidence? Law and
Contemporary Problems, 54, 9-28.
Thompson, W.C., & Schumann, E.L. (1987) Interpretation of statistical evidence in
criminal trials: The prosecutor’s fallacy and the defense attorney’s fallacy. Law
and Human Behavior, 11, 167-187.
Tillers, P., & Gottfried, J. (2006). Case comment – United States v. Copeland, 369 F.
Supp.2d 275 (E.D.N.Y. 2005): a collateral attack on the legal maxim that proof
beyond a reasonable doubt is unquantifiable? Law, Probability, and Risk, 1, 315-
331.
Tillers, P., & Green, (Eds.) (1988) Probability and Inference in the Law of Evidence.
Boston, MA: Kluwer Academic Publishers
Traynor, R.J. (1970) The riddle of harmless error analysis. Ohio State University Press:
Ohio
Tribe, L.H. (1971a). Trial by mathematics: precision and ritual in the legal process.
Harvard Law Review, 84, 1329-1393.
Tribe, L.H. (1971b) A further critique of mathematical proof. Harvard Law Review,
84(8), 1810-1820.
Tversky, A. (1967) Additivity, utility, and subjective probability. Journal of
Mathematical Psychology, 4, 175-201.
Twinning, W. (1986) Theories of Evidence: Bentham and Wigmore.
Underwood, B.D. (1977). The thumb on the scales of justice: Burdens of persuasion in
criminal cases. The Yale Law Journal, 86, 1299-1348.
Vidmar, N. Generic prejudice and the presumption of guilty in sex abuse trials. Law and
Human Behavior, 21, 5-25.
Volokh, A. (1997). N-guilty men. University of Pennsylvania Law Review, 146, 173-216.
Von Winterfield, D., & Edwards, W. (1986) Decision analysis and behavioral research.
Massachusetts: Cambridge University Press.
136
Waldman, T. (1959). Origins of the legal doctrine of reasonable doubt. Journal of the
History of the Ideas, 1, 299-316.
Wald, P.M. (1993). Guilty beyond a reasonable doubt: a norm gives way to the numbers.
The University of Chicago Legal Forum, 1, 101-126.
Walters, J. (1998) Gold: The masters. Sports Illustrated, 26.
Wegner, D. M., Schneider, D. J., Carter, S., & White, T. (1987). Paradoxical effects of
thought suppression. Journal of Personality and Social Psychology, 53, 5-13.
Weinstein, J.B., & Dewsbury I. (2006). Comment on the meaning of ‘proof beyond a
reasonable doubt’. Law, Probability and Risk, 5, 167-173.
Weinstein, J.B. (1992) Considering jury ‘nullification’: When may and should a jury
reject the law to do justice. American Criminal Law Review, 29, 239-251.
Wiener, R., Krauss, D., & Lieberman, J. (Eds.) (2011). Differences between college
student and more representative jurors. A special issue of Behavioral Sciences and
the Law, 29, 325-479.
Wiener, R.L., Arnot, L., Winter, R., & Redmond, B. (2006). Generic prejudice in the law:
Sexual assault and homicide. Basic and Applied Social Psychology, 28,145-155.
Wells, G. (1992) Naked statistical evidence of liability: Is subjective probability enough?
Journal of Personality and Social Psychology, 62(5), 739-752.
Wilks, J. (2002). Predecisional distortion of evidence as a consequence of real-time audit
review. The Accounting Review, 77, 51-71.
Wissler, R.L., & Saks, M.J. (1985). On the inefficacy of limiting instructions: When
jurors use prior conviction evidence to decide on guilt. Law and Human Behavior,
9, 37-48
Wright, D.B., & Hall M. (2007). How a “reasonable doubt” instruction affects decisions
of guilt. Basic and Applied Social Psychology, 29(1), 91-98.
Zander, M. (2000). The criminal standard of proof – how sure is sure? New Law Journal,
15(5), 1517-1522.
137
Appendix [A]: Stimuli from the Conventional Case
[Note that the paragraph breaks correspond to the web-page breaks as seen by
participants]
Robert Martinez has been charged with one count of felony rape.
In California, rape is defined by statute as an act of sexual intercourse accomplished
against a person's will by means of force, violence, duress, menace, or fear of immediate
and unlawful bodily injury on the person or another.
Mr. Martinez has pleaded not guilty to the charge. He claims that he was nowhere near
the crime scene on the evening of the crime.
You are about to read a case summary that was prepared by a court reporter. You should
assume the summary objectively and comprehensively reflects all the facts in the present
case.
After reading the evidence in the case, you will be asked to decide a verdict and to make
some other judgments about the case.
In order to convict the defendant of the alleged crime, you must be confident of his guilt
beyond a reasonable doubt.
The question that naturally comes up is – what is a reasonable doubt? The words almost
define themselves. It is a doubt founded in reason and arising out of the evidence in the
case, or the lack of evidence. It is doubt that a reasonable person has after carefully
weighing all the evidence. Reasonable doubt is a doubt that appeals to your reason, your
judgment, your experience, your common sense. If you do not have an abiding conviction
of the defendant’s guilt, if you have such a doubt as would cause you, as prudent persons,
to hesitate before acting in matters of importance to yourselves, then you have a
reasonable doubt, and in that circumstance it is your duty to acquit the defendant.
You are not expected to have any special legal knowledge.
Here are the facts of the case:
On a Wednesday night in 2003, a 16-year-old Caucasian girl, known as Sara P., was
leaving Starbucks in North Hollywood when a man, speaking a mixture of English and
Spanish, drew a knife and forced her into a Ford pickup truck. He drove her to a rural
area and raped her inside the truck.
138
The ordeal lasted approximately 45 minutes. The man then drove her back to town and
told her “you better tell nobody.” Sara immediately ran to the nearest business and called
911.
Sara was taken to Presbyterian Hospital where a rape examination kit was administered,
as is state protocol for victims of forcible intercourse. The examination involves the
collection of seminal fluid and the administration of an HIV test.
Sara was able to describe the truck in great detail to detectives. She said it was a red Ford
F-150, it had large chrome rims, and a rosary hanging from the rear-view mirror. It also
had flame decals up the side of it.
Three days later, detectives showed her a truck that had recently been impounded. Sara
positively identified the particular truck as the one in which the rape took place.
Unfortunately, two weeks prior to the incident, the truck had been reported stolen by its
owner, who is of African American descent. The truck yielded no latent fingerprints or
any other evidence.
Sara was also able to give a description of the assailant. She described the man as
Hispanic, about 5’7” and in his late 20s or early 30s. Sara did not recall seeing any
tattoos, but she did remember the man having a large and full mustache. The man also
spoke in a broken dialect, crossing back and forth between English and Spanish.
The prosecution offered the following evidence to prove the guilt of Mr. Martinez:
(1.) Detective Smith from the Los Angeles Police Department testified that there are a
series of businesses with Hispanic employees in the area surrounding where the
abduction had taken place. Detective Smith decided to question all male Hispanic
employees who were within the potential age range of the perpetrator. One of the people
questioned was Robert Martinez, who worked as a cook at a nearby restaurant. Mr.
Martinez was visibly shaken, and indicated he could not speak English. Detective Smith
had Mr. Martinez taken to the police station for questioning with a language translator.
(2.) Mr. Martinez’s boss testified that on the night back in 2003 when the rape took
place, Robert did not show up to work. He called in sick. Martinez’s boss produced a
time-stamp card, which each employee uses to clock-in at work. The card was not
stamped on this particular night.
(3.) Several of Mr. Martinez’s co-workers testified about his physical appearance. All
indicated that he was 5’9”, that he had no tattoos and that he had a mustache. According
to his co-workers, Mr. Martinez was particularly proud of his mustache, as he used to
brag that it was a sign of “machismo.”
139
(4.) A married couple who used to live in the same apartment complex as Mr. Martinez
also testified. The couple remembered seeing Robert drive a Ford truck for a while. This
was unusual, as Mr. Martinez typically rode a bicycle since he did not have a driver
license. Although they could not be certain, they did recall that the color was dark red,
and that it did have some sort of decal up the side.
(5.) The victim, Sara P., testified. When asked if she could identify the perpetrator of the
crime, she pointed to the defendant, Robert Martinez. She had first made this
identification after looking through mug-shots while still in the hospital. Though she
initially identified someone else, she eventually identified Mr. Martinez. She says she is
“absolutely, 100% sure that it was Robert Martinez who raped me.”
The defense offered the following evidence to refute the prosecution’s allegations:
(6.) Mr. Martinez’s sister testified that Robert was at her home on the night in question.
She stated that Robert had been suffering from food poisoning, and stayed at her house
for about a week. His sister, who is a homemaker and raising two young children, said he
never the left house during the entire week. She also noted that her house is in Sylmar,
which is approximately 30 miles from North Hollywood.
(7.) The defense called an expert on eyewitness identification to testify about the victims’
identification of Robert Martinez. The expert testified that it is known from laboratory
studies and real world simulations that certain factors increase the likelihood of mistaken
identification. Situations in which people are highly nervous or anxious, such as when
being attacked, are very likely to yield unreliable identifications. Cross-racial
identifications, such as when a Caucasian identifies a Black or Hispanic person, are less
reliable than within race identifications. He noted that both these factors were present in
the current case.
(8.) Robert’s brother-in-law testified that he owned a maroon Ford 150. He stated that
Robert would occasionally borrow it, though he knew Robert did not have a valid driver
license. He stated that Robert was generally a responsible driver. He also speculated that
the neighbors had probably seen Robert driving his truck, not the truck in which the rape
took place, although he admitted that his truck did not have any stickers or decals.
140
Appendix [B]: Fit Indices of the Logistic Regressions from
Study 1.
Piece of Evidence Chi-Square* Nagelkerke R Square
1 11.71
0.316
2 18.21
0.445
3 25.36
0.618
4 21.15
0.469
5 36.8
0.615
6 27.92
0.524
7 22.54
0.467
8 18.87 0.402
Note: all values* p < .01; D.F. =1
141
Appendix [C]: Correlation Matrix of Likelihood Ratings, Verdict
Confidence, and Threshold Disparity for Study 1.
Note: the numbers refer to the particular piece of evidence; ‘A’ signifies the likelihood rating;
‘B’ signifies verdict confidence; ‘C’ signifies threshold disparity; * = p < .01; ** = p < .05; ***
= p < .001
1A 1B 1C 2A 2B 2C 3A 3B 3C 4A 4B 4C 5A 5B 5C 6A 6B 6C 7A 7B 7C 8A 8B 8C
1A
1
-.255
*
-.989
**
.922
**
-.111
-.908
**
.879
**
.060
-.810
**
.679
**
.119
-.438
**
.446
**
.194 -.048
.310
*
.069 -.209
.366
**
.044
-.279
*
.511
**
-.022
-.470
**
1B -.255
*
1
.294
*
-.332
**
.836
**
.346
**
-.268
*
.759
**
.240 -.204
.463
**
.146 -.203
.288
*
-.032 -.197
.357
**
.136 -.119
.435
**
.137 -.237
.624
**
.355
**
1C -.989
**
.294
*
1
-.926
**
.151
.926
**
-.882
**
-.022
.838
**
-.679
**
-.097
.466
**
-.453
**
-.177 .053
-.297
*
-.045 .213
-.345
**
-.008
.279
*
-.490
**
.054
.474
**
2A .922
**
-.332
**
-.926
**
1 -.215
-.987
**
.934
**
-.044
-.855
**
.744
**
.048
-.522
**
.554
**
.128 -.026
.418
**
.002
-.264
*
.402
**
-.008
-.314
*
.534
**
-.105
-.510
**
2B
-.111
.836
**
.151 -.215 1
.269
*
-.107
.889
**
.182 -.132
.691
**
.183 -.118
.533
**
.113 -.141
.585
**
.207 .027
.549
**
.136 -.089
.725
**
.196
2C -.908
**
.346
**
.926
**
-.987
**
.269
*
1
-.914
**
.093
.887
**
-.736
**
-.001
.555
**
-.540
**
-.100 .069
-.404
**
.032
.281
*
-.367
**
.050
.330
**
-.510
**
.142
.509
**
3A .879
**
-.268
*
-.882
**
.934
**
-.107
-.914
**
1 -.036
-.927
**
.829
**
.086
-.590
**
.642
**
.187 -.022
.501
**
.026
-.268
*
.498
**
.022
-.363
**
.570
**
-.055
-.556
**
3B
.060
.759
**
-.022 -.044
.889
**
.093 -.036 1 .157 -.043
.714
**
.179 -.083
.552
**
.183 -.184
.588
**
.274
*
-.030
.576
**
.203 -.118
.709
**
.238
3C -.810
**
.240
.838
**
-.855
**
.182
.887
**
-.927
**
.157 1
-.792
**
.040
.689
**
-.595
**
-.093 .159
-.474
**
.053
.331
**
-.432
**
.057
.405
**
-.518
**
.111
.547
**
4A .679
**
-.204
-.679
**
.744
**
-.132
-.736
**
.829
**
-.043
-.792
**
1 -.049
-.793
**
.832
**
.088 -.131
.680
**
-.130
-.428
**
.617
**
-.068
-.486
**
.536
**
-.142
-.530
**
4B
.119
.463
**
-.097 .048
.691
**
-.001 .086
.714
**
.040 -.049 1 .176 -.051
.740
**
.204 -.116
.770
**
.292
*
.069
.710
**
.161 .071
.642
**
.022
4C -.438
**
.146
.466
**
-.522
**
.183
.555
**
-.590
**
.179
.689
**
-.793
**
.176 1
-.687
**
.061
.453
**
-.594
**
.169
.559
**
-.484
**
.101
.535
**
-.407
**
.118
.500
**
5A .446
**
-.203
-.453
**
.554
**
-.118
-.540
**
.642
**
-.083
-.595
**
.832
**
-.051
-.687
**
1 .112 -.201
.831
**
-.141
-.548
**
.751
**
-.161
-.610
**
.647
**
-.187
-.634
**
5B
.194
.288
*
-.177 .128
.533
**
-.100 .187
.552
**
-.093 .088
.740
**
.061 .112 1 .215 -.011
.843
**
.209 .162
.696
**
.030 .143
.709
**
-.063
5C
-.048 -.032 .053 -.026 .113 .069 -.022 .183 .159 -.131 .204
.453
**
-.201 .215 1 -.189 .162
.565
**
-.163 .108
.439
**
-.176 .138 .242
6A .310
*
-.197
-.297
*
.418
**
-.141
-.404
**
.501
**
-.184
-.474
**
.680
**
-.116
-.594
**
.831
**
-.011 -.189 1 -.124
-.668
**
.893
**
-.118
-.727
**
.750
**
-.201
-.733
**
6B
.069
.357
**
-.045 .002
.585
**
.032 .026
.588
**
.053 -.130
.770
**
.169 -.141
.843
**
.162 -.124 1
.338
**
.063
.875
**
.190 -.004
.782
**
.091
6C
-.209 .136 .213
-.264
*
.207
.281
*
-.268
*
.274
*
.331
**
-.428
**
.292
*
.559
**
-.548
**
.209
.565
**
-.668
**
.338
**
1
-.576
**
.337
**
.847
**
-.519
**
.290
*
.608
**
7A .366
**
-.119
-.345
**
.402
**
.027
-.367
**
.498
**
-.030
-.432
**
.617
**
.069
-.484
**
.751
**
.162 -.163
.893
**
.063
-.576
**
1 -.012
-.783
**
.847
**
-.079
-.796
**
7B
.044
.435
**
-.008 -.008
.549
**
.050 .022
.576
**
.057 -.068
.710
**
.101 -.161
.696
**
.108 -.118
.875
**
.337
**
-.012 1
.311
*
-.084
.822
**
.208
7C -.279
*
.137
.279
*
-.314
*
.136
.330
**
-.363
**
.203
.405
**
-.486
**
.161
.535
**
-.610
**
.030
.439
**
-.727
**
.190
.847
**
-.783
**
.311
*
1
-.686
**
.259
*
.768
**
8A .511
**
-.237
-.490
**
.534
**
-.089
-.510
**
.570
**
-.118
-.518
**
.536
**
.071
-.407
**
.647
**
.143 -.176
.750
**
-.004
-.519
**
.847
**
-.084
-.686
**
1 -.147
-.920
**
8B
-.022
.624
**
.054 -.105
.725
**
.142 -.055
.709
**
.111 -.142
.642
**
.118 -.187
.709
**
.138 -.201
.782
**
.290
*
-.079
.822
**
.259
*
-.147 1
.279
*
8C -.470
**
.355
**
.474
**
-.510
**
.196
.509
**
-.556
**
.238
.547
**
-.530
**
.022
.500
**
-.634
**
-.063 .242
-.733
**
.091
.608
**
-.796
**
.208
.768
**
-.920
**
.279
*
1
142
Appendix [D]: Fit Indices of the Logistic Regressions from Study 2.
Piece of Evidence Chi-Square* Nagelkerke R Square
1 19.44
0.371
2 22.78
0.283
3 23.44
0.299
4 31.94
0.382
5 21.48
0.285
6 29.06
0.374
7 24.12
0.291
8 26.84
0.317
9 39.24 0.424
Note: all values* p < .01; D.F. =1
143
Appendix [E]: Correlation Matrix of Likelihood Ratings, Verdict
Confidence, and Threshold Disparity for Study 2.
Note: the numbers refer to the particular piece of evidence; ‘A’ signifies the likelihood rating;
‘B’ signifies verdict confidence; ‘C’ signifies threshold disparity; * = p < .01; ** = p < .05; ***
= p < .001
144
Appendix [F]: Fit Indices of the Logistic Regressions from Study 3.
Piece of Evidence Chi-Square* Nagelkerke R Square
1 2.91**
0.269
2 14.04
0.454
3 39.24
0.488
4 34.89
0.443
5 28.87
0.388
6 25.35
0.367
7 21.99
0.373
8 27.29
0.391
9 43.23
0.536
10 47.16 0.594
Note: all values* p < .01; D.F. =1; ** p = .08
145
Appendix [G]: Correlation Matrix of Likelihood Ratings, Verdict
Confidence, and Threshold Disparity for Study 3.
Note: the numbers refer to the particular piece of evidence; ‘A’ signifies the likelihood rating;
‘B’ signifies verdict confidence; ‘C’ signifies threshold disparity; * = p < .01; ** = p < .05; ***
= p < .001
1A 1B 1C 2A 2B 2C 3A 3B 3C 4A 4B 4C 5A 5B 5C 6A 6B 6C 7A 7B 7C 8A 8B 8C 9A 9B 9C 10A 10B 10C
1A
1
-.418
**
-1.000
**
.846
**
-.366
**
-.843
**
.208
*
.037 -.054 .165 .011 -.036 .185 -.002 .005 .176 -.021 .071
.212
*
.050 .163 .164 -.033 .071 .135 -.130 -.014 .121 -.055 -.096
1B -.418
**
1
.418
**
-.350
**
.872
**
.343
**
-.087 -.002 .094 -.135 -.033 -.033 -.132 -.052 -.021 -.114 -.031 -.065 -.145 -.071 -.158 -.112 -.056 -.156 -.066 .054 -.111 -.054 .039 -.040
1C -1.000
**
.418
**
1
-.846
**
.366
**
.843
**
-.208
*
-.037 .054 -.165 -.011 .036 -.185 .002 -.005 -.176 .021 -.071
-.212
*
-.050 -.163 -.164 .033 -.071 -.135 .130 .014 -.121 .055 .096
2A .846
**
-.350
**
-.846
**
1
-.403
**
-1.000
**
.231
*
.066 -.025 .179 .093 -.020 .154 .082 .037 .146 .051 .063 .188 .109 .106 .125 -.019 .000 .095 -.125 -.052 .078 -.073 -.142
2B -.366
**
.872
**
.366
**
-.403
**
1
.398
**
-.073 .094 .104 -.117 .015 -.020 -.113 -.030 -.030 -.093 .016 -.056 -.099 .012 -.079 -.099 .009 -.073 -.070 .098 -.034 -.048 .093 .059
2C -.843
**
.343
**
.843
**
-1.000
**
.398
**
1
-.231
*
-.064 .027 -.178 -.093 .023 -.153 -.082 -.035 -.146 -.052 -.061 -.187 -.110 -.104 -.124 .021 .003 -.094 .125 .053 -.077 .073 .144
3A .208
*
-.087
-.208
*
.231
*
-.073
-.231
*
1
.357
**
-.001
.952
**
.354
**
.091
.916
**
.296
**
.165
.881
**
.348
**
.375
**
.854
**
.408
**
.493
**
.770
**
.328
**
.356
**
.711
**
.346
**
.232
*
.671
**
.347
**
-.031
3B
.037 -.002 -.037 .066 .094 -.064
.357
**
1
.338
**
.329
**
.898
**
.361
**
.302
**
.820
**
.385
**
.302
**
.741
**
.422
**
.283
**
.709
**
.380
**
.277
**
.674
**
.403
**
.242
*
.680
**
.375
**
.271
**
.723
**
.327
**
3C
-.054 .094 .054 -.025 .104 .027 -.001
.338
**
1 -.116
.273
**
.873
**
-.186
.302
**
.757
**
-.232
*
.333
**
.577
**
-.271
**
.251
*
.354
**
-.116
.364
**
.502
**
-.053
.374
**
.497
**
-.059
.359
**
.543
**
4A
.165 -.135 -.165 .179 -.117 -.178
.952
**
.329
**
-.116 1
.362
**
.050
.967
**
.294
**
.132
.931
**
.323
**
.341
**
.903
**
.358
**
.466
**
.780
**
.279
**
.311
**
.700
**
.286
**
.197
*
.664
**
.262
**
-.057
4B
.011 -.033 -.011 .093 .015 -.093
.354
**
.898
**
.273
**
.362
**
1
.378
**
.337
**
.944
**
.445
**
.317
**
.821
**
.452
**
.307
**
.762
**
.414
**
.295
**
.734
**
.418
**
.248
*
.679
**
.416
**
.275
**
.688
**
.316
**
4C
-.036 -.033 .036 -.020 -.020 .023 .091
.361
**
.873
**
.050
.378
**
1 -.024
.403
**
.927
**
-.115
.412
**
.748
**
-.159
.332
**
.537
**
-.064
.436
**
.643
**
-.028
.436
**
.672
**
-.041
.393
**
.651
**
5A
.185 -.132 -.185 .154 -.113 -.153
.916
**
.302
**
-.186
.967
**
.337
**
-.024 1
.274
**
.119
.973
**
.324
**
.377
**
.922
**
.338
**
.496
**
.813
**
.244
*
.323
**
.726
**
.245
*
.178
.702
**
.245
*
-.087
5B
-.002 -.052 .002 .082 -.030 -.082
.296
**
.820
**
.302
**
.294
**
.944
**
.403
**
.274
**
1
.458
**
.254
**
.869
**
.461
**
.249
*
.784
**
.430
**
.251
*
.737
**
.446
**
.221
*
.668
**
.468
**
.253
**
.692
**
.380
**
5C
.005 -.021 -.005 .037 -.030 -.035 .165
.385
**
.757
**
.132
.445
**
.927
**
.119
.458
**
1 .025
.473
**
.867
**
-.034
.381
**
.629
**
.076
.432
**
.698
**
.109
.445
**
.687
**
.102
.419
**
.582
**
6A
.176 -.114 -.176 .146 -.093 -.146
.881
**
.302
**
-.232
*
.931
**
.317
**
-.115
.973
**
.254
**
.025 1
.317
**
.349
**
.944
**
.337
**
.455
**
.804
**
.214
*
.259
**
.714
**
.206
*
.118
.695
**
.228
*
-.133
6B
-.021 -.031 .021 .051 .016 -.052
.348
**
.741
**
.333
**
.323
**
.821
**
.412
**
.324
**
.869
**
.473
**
.317
**
1
.521
**
.256
**
.875
**
.414
**
.257
**
.789
**
.462
**
.238
*
.728
**
.478
**
.266
**
.753
**
.374
**
6C
.071 -.065 -.071 .063 -.056 -.061
.375
**
.422
**
.577
**
.341
**
.452
**
.748
**
.377
**
.461
**
.867
**
.349
**
.521
**
1
.250
*
.427
**
.817
**
.267
**
.411
**
.760
**
.251
*
.400
**
.706
**
.222
*
.423
**
.513
**
7A .212
*
-.145
-.212
*
.188 -.099 -.187
.854
**
.283
**
-.271
**
.903
**
.307
**
-.159
.922
**
.249
*
-.034
.944
**
.256
**
.250
*
1
.352
**
.503
**
.839
**
.230
*
.282
**
.741
**
.222
*
.108
.721
**
.234
*
-.113
7B
.050 -.071 -.050 .109 .012 -.110
.408
**
.709
**
.251
*
.358
**
.762
**
.332
**
.338
**
.784
**
.381
**
.337
**
.875
**
.427
**
.352
**
1
.442
**
.337
**
.794
**
.474
**
.332
**
.743
**
.485
**
.371
**
.742
**
.390
**
7C
.163 -.158 -.163 .106 -.079 -.104
.493
**
.380
**
.354
**
.466
**
.414
**
.537
**
.496
**
.430
**
.629
**
.455
**
.414
**
.817
**
.503
**
.442
**
1
.454
**
.422
**
.823
**
.389
**
.403
**
.684
**
.368
**
.418
**
.513
**
8A
.164 -.112 -.164 .125 -.099 -.124
.770
**
.277
**
-.116
.780
**
.295
**
-.064
.813
**
.251
*
.076
.804
**
.257
**
.267
**
.839
**
.337
**
.454
**
1
.381
**
.467
**
.968
**
.377
**
.140
.935
**
.389
**
-.124
8B
-.033 -.056 .033 -.019 .009 .021
.328
**
.674
**
.364
**
.279
**
.734
**
.436
**
.244
*
.737
**
.432
**
.214
*
.789
**
.411
**
.230
*
.794
**
.422
**
.381
**
1
.614
**
.374
**
.895
**
.521
**
.385
**
.853
**
.437
**
8C
.071 -.156 -.071 .000 -.073 .003
.356
**
.403
**
.502
**
.311
**
.418
**
.643
**
.323
**
.446
**
.698
**
.259
**
.462
**
.760
**
.282
**
.474
**
.823
**
.467
**
.614
**
1
.455
**
.585
**
.841
**
.434
**
.580
**
.652
**
9A
.135 -.066 -.135 .095 -.070 -.094
.711
**
.242
*
-.053
.700
**
.248
*
-.028
.726
**
.221
*
.109
.714
**
.238
*
.251
*
.741
**
.332
**
.389
**
.968
**
.374
**
.455
**
1
.424
**
.164
.945
**
.425
**
-.135
9B
-.130 .054 .130 -.125 .098 .125
.346
**
.680
**
.374
**
.286
**
.679
**
.436
**
.245
*
.668
**
.445
**
.206
*
.728
**
.400
**
.222
*
.743
**
.403
**
.377
**
.895
**
.585
**
.424
**
1
.523
**
.413
**
.929
**
.403
**
9C
-.014 -.111 .014 -.052 -.034 .053
.232
*
.375
**
.497
**
.197
*
.416
**
.672
**
.178
.468
**
.687
**
.118
.478
**
.706
**
.108
.485
**
.684
**
.140
.521
**
.841
**
.164
.523
**
1 .132
.485
**
.809
**
10A
.121 -.054 -.121 .078 -.048 -.077
.671
**
.271
**
-.059
.664
**
.275
**
-.041
.702
**
.253
**
.102
.695
**
.266
**
.222
*
.721
**
.371
**
.368
**
.935
**
.385
**
.434
**
.945
**
.413
**
.132 1
.436
**
-.054
10B
-.055 .039 .055 -.073 .093 .073
.347
**
.723
**
.359
**
.262
**
.688
**
.393
**
.245
*
.692
**
.419
**
.228
*
.753
**
.423
**
.234
*
.742
**
.418
**
.389
**
.853
**
.580
**
.425
**
.929
**
.485
**
.436
**
1
.424
**
10C
-.096 -.040 .096 -.142 .059 .144 -.031
.327
**
.543
**
-.057
.316
**
.651
**
-.087
.380
**
.582
**
-.133
.374
**
.513
**
-.113
.390
**
.513
**
-.124
.437
**
.652
**
-.135
.403
**
.809
**
-.054
.424
**
1
Abstract (if available)
Abstract
The doctrine of reasonable doubt is deeply entrenched within American culture, but the concept continues to mystify legal scholars, courts and jurors, and a coherent definition remains elusive. Reasonable doubt (RD) can be reified with the tool of decision theory as a tradeoff between acquitting the guilty and convicting the innocent. For instance, Blackstone’s maxim that ten erroneous acquittals are equal in cost to one erroneous conviction implies, roughly, that jurors ought to convict only if their confidence in the defendant’s guilt exceeds 0.91. This dissertation proposes several descriptive discrepancies from this normative account. First, it is argued that the proffered evidence serves as the focal point for RD, such that jurors listen to the evidence and then determine whether the remaining doubt is reasonable. In short, jurors know RD when they see it. Verdicts do not depend on whether the evidence satisfies an exogenous tradeoff. Second, three studies were conducted to test the hypothesis that mock jurors systematically shift their operational definition or threshold for RD in order to promote cognitive consistency, a state in which attitudes, beliefs and cognitions are congruous. Shifting the decision threshold can theoretically attenuate “close calls”—that is, when the evidence is close to the RD threshold—by inflating the perceived disparity between the two. The studies revealed that mock jurors’ implicit threshold regularly shifted in the opposite direction of the proffered evidence. When a piece of evidence increased the likelihood of guilt, it concomitantly decreased the threshold for conviction, and when a piece of evidence decreased the likelihood of guilt, it increased the threshold for conviction. The degree to which the threshold shifted was positively related to decisional confidence, one manifestation of cognitive consistency. Threshold shifting violates an axiom of decision theory, which is that the consequences of an outcome are independent from the chances of its occurrence. This finding has implications for both psychological and legal theory. With respect to the latter, the findings indicate that RD is a relative construct and suggest that the analysis of legal doctrine is more complicated than has been previously supposed.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The effects of framing and actuarial risk probabilties on involuntary civil commitment decisions
PDF
Rasch modeling of abstract reasoning in Project TALENT
PDF
Social exclusion decreases risk-taking
PDF
An empirical evaluation of breast cancer treatment decision making
PDF
A functional use of response time data in cognitive assessment
PDF
Attrition in longitudinal twin studies: a comparative study of SEM estimation methods
PDF
The motivated affective behavior system: a dynamic account of the attachment behavioral system
PDF
The human element: addressing human adversaries in security domains
PDF
The interpersonal effect of emotion in decision-making and social dilemmas
PDF
Applying adaptive methods and classical scale reduction techniques to data from the big five inventory
PDF
Pushmi-pullyu and little c: a search for the structure of personal creativity in a general population
PDF
Antecedents of marijuana initiation
PDF
Sources of stability and change in the trajectory of openness to experience across the lifespan
PDF
Metacognitive experiences in judgments of truth and risk
PDF
Do you see what I see? Personality and cognitive factors affecting theory of mind or perspective taking
PDF
How college affects Latinas' STEM career decision-making process: a psychosociocultural approach
PDF
Dynamic analyses of the interrelationship between mothers and daughters on a measure of depressive symptoms
PDF
Insula activity during safe-sex decision-making in sexually risky men suggests negative urgency and fear of rejection drives risky sexual behavior
PDF
Reasoning with degrees of belief
PDF
Direct and indirect predictors of traumatic stress and distress in orphaned survivors of the 1994 Rwandan Tutsi genocide
Asset Metadata
Creator
Scurich, Nicholas
(author)
Core Title
The dynamics of reasonable doubt
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Psychology
Publication Date
04/26/2014
Defense Date
05/11/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Bayes theorem,decision-making,OAI-PMH Harvest,standards of proof
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
John, Richard S. (
committee chair
), Lyon, Thomas D. (
committee member
), McArdle, John J. (
committee member
), Read, Stephen J. (
committee member
), Simon, Dan (
committee member
)
Creator Email
scurich@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-12854
Unique identifier
UC11289162
Identifier
usctheses-c3-12854 (legacy record id)
Legacy Identifier
etd-ScurichNic-654.pdf
Dmrecord
12854
Document Type
Dissertation
Rights
Scurich, Nicholas
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
Bayes theorem
decision-making
standards of proof