Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A critical examination of the three main interpretations of probability
(USC Thesis Other)
A critical examination of the three main interpretations of probability
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A CRITICAL EXAMINATION OF THE THREE MAIN INTERPRETATIONS OF
PROBABILITY
by
Luigi A. Secchi
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(PHILOSOPHY)
August 2010
Copyright 2010 Luigi A. Secchi
ii
Table of Contents
List of Figures iv
Abstract v
Introduction 1
1. Interpreting Probability 1
2. The Axioms of Probability 5
3. A Brief History of the Interpretations of Probability 6
3.1. The A Priori Interpretation of Probability 6
3.1.1. Early Period: the Classical Interpretation 6
3.1.2. Late Period: the Logical Interpretation 9
3.2. The Relative Frequency Interpretation of Probability 12
3.3. The Subjectivist Interpretation of Probability 17
4. The Propensity Interpretation 20
Chapter 1: the A Priori Interpretation of Probability 22
1. Introduction: the Classical Theory of Probability 22
2. The Paradox of Inverse Probabilities Explained 26
3. Logical Probability 31
4. The Problem of Meaning of Probability-Values 32
5. The Problem of Additivity of Probability-Values 37
6. Probability and Induction 42
7. Final Remarks 45
Chapter 2: the Subjectivist Interpretation of Probability 47
1. Introduction: Qualitative Degrees of Belief 47
2. The Dutch Book Argument 48
3. The Representation Theorems Argument 56
4. Remarks on the DBA and the RTA 62
5. Bayesian Confirmation Theory 63
6. The Problem of “Zero Priors” 65
7. “Zero Priors” Again 71
8. Remarks on Bayesian Confirmation Theory 74
Chapter 3: the Relative Frequency Interpretation of Probability 76
1. Introduction: the Relative Frequency Interpretation 76
2. Infinite Frequentism: a Clarification 77
3. Finite or Infinite Frequentism 82
iii
4. The Epistemology of Infinite Frequentism 85
4.1. The Epistemological Problem of Infinite Frequentism 85
4.2. Falsificationism in Statistics 87
4.3. Estimation 96
5. The Single Case Question 101
6. The Reference Class Problem 103
7. Probability: a Guide to Life? 105
8. An Alternative to Infinite Frequentism 109
8.1. Is Infinite Frequentism Tenable? 109
8.2. An Alternative to Infinite Frequentism 110
8.3. Infinite Frequentism Compared to My Alternative 112
Conclusion 114
1. The Problems of the A Priori and Subjectivist Interpretations 114
2. Infinite Frequentism 117
3. How My Work Contributes to the Philosophy of Probability 120
Bibliography 122
Appendices
Appendix A 130
Appendix B 132
Appendix C 133
iv
List of Figures
Figure 1: Structures in a World with Two Elements and One Predicate 40
Figure 2: Tosses of a Coin on a Train in Back and Forth Motion 80
v
Abstract
The three main interpretations of probability (i.e. of probability-statements) are the
classical/logical, the subjectivist and the relative-frequency. My dissertation is a critical
examination of each of these interpretations.
The core tenet of the classical interpretation is that whenever a set of outcomes of
an event is such that a) no information is available about any of the outcomes or b)
symmetrical information is available about each outcome, then each outcome must have
the same probability. This is usually known as the Principle of Indifference. The logical
interpretation –which descends directly from the classical interpretation– differs from the
classical interpretation in that: i) outcomes of the same event may be assigned unequal
probabilities and ii) probability is a logical relation between two complex propositions e
and k (k is the information relevant to the occurrence of e). I discuss the classical and the
logical interpretations in chapter 1.
Nowadays, the classical interpretation is not held in high consideration. The
accepted view is that the principle of indifference, the core of this interpretation, faces, in
certain infinite domains, a devastating paradox. In my dissertation, I contend that this is
not the case. By distinguishing between a weak principle of indifference and a strong
principle of indifference, I show that it’s not at all clear that there is such a paradox.
Thus, the classical interpretation is unproblematic in this respect.
But the classical interpretation is not unproblematic in general. Moreover the
classical interpretation is affected by the very same two problems that affect the logical
vi
interpretation. First, both the classical and the logical interpretations presuppose
Kolmogorov’s axioms of probability (or an equivalent axiomatization). Without
Kolmogorov’s axioms, an outcome cannot be assigned a unique probability in principle
and it is impossible to establish a probabilistic hierarchy between sets of outcomes (i.e.
either two sets of outcomes have the same probabilities or their probabilities are not
commensurable). The problem is that it is not at all clear how to justify such a
presupposition. Surprisingly, only a few philosophers have appreciated the seriousness of
this issue. The second problem is that neither the classical nor the logical interpretation
can provide an adequate explication of probability. In particular my analysis shows that
any possible explication (within these two interpretations) is either unacceptable because
it is confused or contradictory, or is unfit for the job (because it doesn’t express or relate
to any of the ideas that are usually associated with probability, such as indetermination,
prediction, indecision). This is obviously a major problem, since the raison d’être of the
classical and the logical interpretations is to interpret, i.e. to explicate, probability.
The subjectivist interpretation (subjectivism) identifies probabilities with the
numerically formulated degrees of beliefs of individual agents. Subjectivism is the object
of chapter II. A large part of the appeal of this interpretation is due to the alleged fact that
numerical degrees of beliefs must satisfy the axioms of Probability Theory
(Kolmogorov’s axioms) on pain of irrationality. This (alleged) fact is said to be the
consequence of either of two arguments, the Dutch Book Argument and an argument
from representation theorems, the Representation Theorems Argument. I show (in the
first part of the chapter) that both the Dutch Book Argument and the Representation
vii
Theorems Argument are flawed. This entails that subjectively interpreted probabilities
cannot be proved to satisfy the axioms of probability and with that a large part of the
appeal of subjectivism disappears. The second part of the chapter is devoted to Bayesian
Confirmation Theory (BCT), a subjectivist doctrine according to which one’s (numerical)
degrees of belief should be updated strictly by way of a probabilistic equation called
Bayes’ Theorem. It is fair to say that BCT is the most important doctrine that originates
from the subjectivist interpretation. The main claim (and the whole point) of BCT is that
if certain (allegedly mild) conditions are respected then probabilistic induction is possible
and perfectly justified. I argue that this claim is in fact false. Firstly, I reconstruct
Popper’s famous criticism of BCT qua justification of probabilistic induction. Popper
pointed out that the conditions under which BCT allows probabilistic induction are
anything but mild, and in fact one of these conditions presuppose induction to begin with.
I address several objections that have been made against his argument. Secondly, I
present a new, improved version of Popper’s original argument. Thirdly, I present a novel
argument to the effect that BCT cannot provide a genuine justification of probabilistic
induction.
The relative-frequency interpretation (frequentism) asserts that the probability of
an outcome is its frequency of occurrence (relative frequency). Historically, frequentism
has come in two variants: finite frequentism –a relative frequency is computed on the
basis of a finite sequence of instances (occurrences and non-occurrences) of an outcome–
and infinite frequentism –the sequence of instances of an outcome must be infinite.
However, the vast majority of the proponents of frequentism (frequentists) have endorsed
viii
infinite frequentism. I discuss frequentism in Chapter 3. After clarifying why infinite
frequentism is to be preferred to finite frequentism and after taking care of a few
difficulties related to the former, I discuss what I take to be the central problem of infinite
frequentism: the fact that probability-values can neither be verified nor falsified
(rejected). I show then that this problem may be mitigated by means of an “unorthodox”
infinite acceptance test of Classical Statistics that may be used as a “quasi-falsification”
procedure for infinite frequentism.
Since within infinite frequentism probability-values are not logically falsifiable
and since there are reasons why dealing with infinitely many instances of an outcome is
undesirable, I present a new (as far as I know) form of frequentism. This new version of
frequentism, I argue, has all the advantages of infinite frequentism, but lacks those two
drawbacks.
Lastly, let me briefly anticipate the conclusion I reach in my dissertation. Of the
three main interpretations of Probability Theory, frequentism (either in its classical form
or in the novel variant I present) seems to constitute a viable option; this doesn’t mean of
course that frequentism is unproblematic; it means, rather, that frequentism is the least
problematic among the available interpretations.
1
Introduction
1. Interpreting Probability
Probability Theory is a well-established mathematical theory and is, from a mathematical
standpoint, unproblematic. From a philosophical standpoint, however, Probability Theory
is not unproblematic. Its main problem, philosophically speaking, is that the meaning of
the generic probability-statement ‘the event X has probability y’ is indefinite: crucial
questions like ‘what does it mean to say that y is the probability of an event?’ and ‘what
does y represent?’ are unanswered within the theory. Thus, probability-statements need to
be interpreted.
The three main interpretations (or types of interpretations) of probability (i.e. of
probability-statements) are the classical/logical, the subjectivist and the relative
frequency. My dissertation is a critical examination of each of these interpretations.
The classical interpretation constitutes the earliest attempt to interpret probability.
The core tenet of this interpretation is that whenever a set of outcomes of an event is such
that a) no information is available about any of the outcomes or b) symmetrical
information is available about each outcome, then each outcome must have the same
probability. This is usually known as the Principle of Indifference. The logical
interpretation –which descends directly from the classical interpretation– differs from the
classical interpretation in that: i) outcomes of the same event may be assigned unequal
probabilities and ii) probability is a logical relation between two complex propositions e
2
and k (k is the information relevant to the occurrence of e). I discuss the classical and the
logical Interpretations in chapter 1.
Nowadays, the classical interpretation is not held in high consideration. The
accepted view is that the principle of indifference, the core of this interpretation, faces, in
certain infinite domains, a devastating paradox. In my dissertation, I contend that this is
not the case. By distinguishing between a weak principle of indifference and a strong
principle of indifference, I show that it’s not at all clear that there is such a paradox.
Thus, the classical interpretation is unproblematic in this respect.
But the classical interpretation is not unproblematic in general. Moreover the
classical interpretation is affected by the very same two problems that affect the logical
interpretation. First, both the classical and the logical interpretations presuppose
Kolmogorov’s axioms of probability (or an equivalent axiomatization). Without
Kolmogorov’s axioms, an outcome cannot be assigned a unique probability in principle
and it is impossible to establish a probabilistic hierarchy between sets of outcomes (i.e.
either two sets of outcomes have the same probabilities or their probabilities are not
commensurable). The problem is that it is not at all clear how to justify such a
presupposition. Surprisingly, only a few philosophers have appreciated the seriousness of
this issue. The second problem is that neither the classical nor the logical interpretation
can provide an adequate explication of probability. In particular my analysis shows that
any possible explication (within these two interpretations)
1
is either unacceptable because
it is confused or contradictory, or is unfit for the job (because it doesn’t express or relate
1
For example, an important explication typically associated with the Logical
Interpretation is that a probability is a rational degree of belief.
3
to any of the ideas that are usually associated with probability, such as indetermination,
prediction, indecision). This is obviously a major problem, since the raison d’être of the
classical and the logical interpretations is to interpret, i.e. to explicate, probability.
The subjectivist interpretation (subjectivism) identifies probabilities with the
numerically formulated degrees of beliefs of individual agents. Subjectivism is the object
of chapter 2. A large part of the appeal of this interpretation is due to the alleged fact that
numerical degrees of beliefs must satisfy the axioms of Probability Theory
(Kolmogorov’s axioms) on pain of irrationality. This (alleged) fact is said to be the
consequence of either of two arguments, the Dutch Book Argument and an argument
from representation theorems, the Representation Theorems Argument. I show (in the
first part of the chapter) that both the Dutch Book Argument and the Representation
Theorems Argument are flawed. This entails that subjectively interpreted probabilities
cannot be proved to satisfy the axioms of probability and with that a large part of the
appeal of subjectivism disappears. The second part of the chapter is devoted to Bayesian
Confirmation Theory (BCT), a subjectivist doctrine according to which one’s (numerical)
degrees of belief should be updated strictly by way of a probabilistic equation called
Bayes’ Theorem. It is fair to say that BCT is the most important doctrine that originates
from the subjectivist interpretation. The main claim (and the whole point) of BCT is that
if certain (allegedly mild) conditions are respected then probabilistic induction is possible
and perfectly justified. I argue that this claim is in fact false. Firstly, I reconstruct
Popper’s famous criticism of BCT qua justification of probabilistic induction. Popper
pointed out that the conditions under which BCT allows probabilistic induction are
4
anything but mild, and in fact one of these conditions presuppose induction to begin with.
I address several objections that have been made against his argument. Secondly, I
present a new, improved version of Popper’s original argument. Thirdly, I present a novel
argument to the effect that BCT cannot provide a genuine justification of probabilistic
induction.
The relative frequency interpretation (frequentism) asserts that the probability of
an outcome is its frequency of occurrence (relative frequency). Historically, frequentism
has come in two variants: finite frequentism –a relative frequency is computed on the
basis of a finite sequence of instances (occurrences and non-occurrences) of an outcome–
and infinite frequentism –the sequence of instances of an outcome must be infinite.
However, the vast majority of the proponents of frequentism (frequentists) have endorsed
infinite frequentism. I discuss frequentism in Chapter 3. After clarifying why infinite
frequentism is to be preferred to finite frequentism and after taking care of a few
difficulties related to the former, I discuss what I take to be the central problem of infinite
frequentism: the fact that probability-values can neither be verified nor falsified
(rejected). I show then that this problem may be mitigated by means of an “unorthodox”
infinite acceptance test of Classical Statistics that may be used as a “quasi-falsification”
procedure for infinite frequentism.
Since within infinite frequentism probability-values are not logically falsifiable
and since there are reasons why dealing with infinitely many instances of an outcome is
undesirable, I present a new (as far as I know) form of frequentism. This new version of
5
frequentism, I argue, has all the advantages of infinite frequentism, but lacks those two
drawbacks.
Lastly, let me briefly anticipate the conclusion I reach in my dissertation. Of the
three main interpretations of Probability Theory, frequentism (either in its classical form
or in the novel variant I present) seems to constitute a viable option; this doesn’t mean of
course that frequentism is unproblematic; it means, rather, that frequentism is the least
problematic among the available interpretations.
2. The Axioms of Probability
The axioms of probability play a crucial role in my work (as well as in the philosophy of
probability), especially in chapter 1 and chapter 2. A discussion of those axioms is
therefore in order.
Probability theory, like any other mathematical theory, is axiomatic; that is, the
truth of any theorem or proposition of the theory depends either directly on the axioms
that are adopted or on simpler theorems, which in turn depend either on those axioms or
on even simpler theorems –and so on. Now, there are two approaches to the
axiomatization of Probability Theory. The first takes conditional probability, the
probability conditional to the occurrence of a specified event, as primitive. The second
considers conditional probability to be merely a derivative notion –a complex notion
based on “atomic” primitive notions. In the present work, the second approach is adopted.
Two simple reasons justify this choice. Firstly, the second approach is preferred by the
majority of philosophers interested in the foundations of Probability Theory and by the
6
vast majority of mathematicians. Secondly, this approach allows one to adopt a simpler
set of axioms, which in turn allows one to keep the mathematics to a minimum.
Virtually each probability theorist and philosopher of science identify the second
approach I hinted at in the previous paragraph with the axiomatization originally due to
Kolmogorov. In his (1933), Kolmogorov observed that the following three simple axioms
(now known as Kolmogorov’s axioms) could serve as a basis for all of Probability
Theory: the probability of an outcome must be equal to or greater than zero; the
probability of the disjunction of all the possible outcomes associated to an event is one;
the probability of the disjunction of any two mutually exclusive outcomes of an event is
the sum of the individual probabilities of those events. Kolmogorov’s Axioms will play
an important role in chapter 1 and a crucial one in chapter 2.
3. A Brief History of the Interpretations of Probability
Although the aim of this dissertation is to critically discuss the three main interpretations
of probability from a philosophical point of view, it seems appropriate to briefly sketch
the history of these interpretations. The three sub-sections that follow (§3.1, §3.2, §3.3)
present a succinct historical account of each interpretation.
3.1. The A Priori Interpretation of Probability
3.1.1. Early Period: the Classical Interpretation
The so-called Principle of Indifference (PI hereafter) constitutes –we have seen– the very
core of the classical interpretation of probability. Indeed, it would not be unreasonable to
7
hold that the classical interpretation boils down to PI. We have also seen that this
fundamental principle prescribes that if there is no evidence or reason favoring one over
another of several mutually exclusive events then these events have the same probability.
James Bernoully was the first to explicitly state PI –albeit by way of an example
2
.
In the game of dice for instance, the number of possible cases [or throws] is
known, since there are as many throws for each individual die as it has faces;
moreover all these cases are equally likely when each face of the die has the same
form and the weight of the die is uniformly distributed. (There is no reason why
one face should come up more readily than any other, as would happen if the
faces were of different shapes or part of the die were made of heavier material
than the rest.)
The pioneers of the classical interpretation, however, didn’t handle PI with sufficient
rigor. For them, PI was more or less interchangeable with the following principle: if there
is no evidence favoring one over another of n mutually exclusive events, then the
probability of occurrence of either one of m of those events is the ratio
!
m /n. This
“variant” of PI was endorsed by Cardano, Galileo, Pascal and Fermat, the first inquirers
who held (a close enough version of) the classical interpretation (their position was later
systematized in James Bernoulli’s treatise Ars Conjectandi, published in 1713). We shall
see in Chapter 1 that this principle is a stronger version of PI, and that therefore the
former is not equivalent to the latter (we will also see that, amazingly, most of the authors
who have discussed or criticized the classical interpretation failed to make this distinction
clear enough).
These scientists were not moved mainly by a philosophical interest. Rather, their
aim was to provide a solid ground to their mathematical investigations. Cardano, Pascal
2
James Bernoulli, ‘Ars Conjectandi’, excerpted in Newman (1956: 1452-5). Brackets
added by Newman.
8
and Fermat were interested in the mathematical aspects of games of chance (e.g. dice
games). Pascal and Fermat, for example, analyzed mathematically several puzzles
concerning games of chance that were presented to them by a nobleman, Chevalier de
Méré. Galileo, on the other hand, was working on a rudimentary theory of errors, which
he needed to analyze the astronomical data he was gathering at the time.
Beside Cardano, Pascal, Fermat and Galileo there are at least two other notable
early proponents of the classical interpretation of probability: Huygens and Laplace. They
too were interested mainly in the mathematical aspects of probability. Huygens first
defined the concept of mathematical expectation. Laplace solved a great number of
fundamental problems and developed many important mathematical tools within
probability theory. The historian of mathematics Todhunter has said of Laplace: ‘on the
whole the Theory of Probability is more indebted to him than to any other
mathematician’ (1949).
Although Cardano, Pascal, Fermat, Galileo, Huygens and Laplace were mostly
concerned with the classical approach to probability, they were also interested in the
relative-frequency interpretation. None of them, however, explicitly recognized or stated
that probability may have also to do with relative frequencies. For example, Laplace and
Daniel Bernoulli (the nephew of James Bernoulli) performed a rudimentary statistical
analysis (which presupposed relative frequencies) of smallpox morbidity and mortality to
evaluate the effectiveness of vaccination. But neither Laplace nor Bernoulli called their
analysis “probabilistic” or stated that one could think of relative frequencies as
probabilities.
9
Most of the pioneers of probability theory (i.e. of the mathematical theory of
probability) were also pioneers and proponents of the classical interpretation of
probability. But why were the early probability theorists so focused on the classical
interpretation? The early probability theorists were not pure mathematicians. They were
all-around scientists. And the scientific climate was, at that time, heavily influenced by
the then recent Newtonian revolution in physics and astronomy. In particular, as a
consequence of Newton’s theory of mechanics, science viewed natural phenomena as
purely deterministic: given the present state of a physical system, it is in principle
possible –assuming all the relevant physical laws are known– to predict every future state
of that system. As a consequence, scientists could only conceive probability as expressing
the lack of knowledge about an event or phenomenon. For if the outcome of an event is
already predetermined, then the probability associated to that outcome can only represent
one’s incomplete knowledge about that event. Classically interpreted probability fits this
condition quite well: recall that the main condition to apply PI to the outcomes of an
event is that one’s lack of knowledge about each outcome is the same.
Nowadays, the classical interpretation is not considered a legitimate or “viable”
interpretation of probability: as I anticipated in §1, the classical interpretation is widely
believed to be affected by a “fatal” paradox that arises in certain infinite domains.
3.1.2. Late Period: the Logical Interpretation
The logical interpretation of probability was established at the beginning of the twentieth
century, mainly thanks to John Maynard Keynes’s work A Treatise on Probability,
10
published in England in 1921. Keynes, who later became the most prominent economist
of his generation, began as a philosopher of science, and his main philosophical interest
was precisely the philosophy of probability. Keynes’ interpretation of probability evolves
from the classical interpretation, and, therefore, has similarities with the latter. Like the
classical interpretation, Keynes’ interpretation rests on PI. However, for Keynes PI must
be amended in the following way: PI can be applied to a set of alternatives only if these
alternatives are “atomic”, i.e. only if none of them may be split up into sub-alternatives.
This proviso constitutes a significant improvement of the “old” PI, and it allowed Keynes
to deal with a class of difficulties that the classical theorists couldn’t deal with. A simple
example will make the point clear. An urn contains three balls: a white ball, a red ball and
a green ball. The white ball is smooth while the red and the green balls are uneven. What
is the probability that a ball extracted from the urn is white? And the probability that a
ball extracted is smooth? The classical theorist is committed to answer
!
1/3 to the first
question (there are three possible colors) and
!
1/2 to the second (there are two possible
types of surface). But both questions ask what is the probability that the very same ball,
that which is both white and smooth, is extracted! Keynes’ emended version of PI, on the
other hand, does not run into this difficulty. One cannot apply Keynes’ version of PI to
the two cases ‘smooth’ ‘uneven’, since ‘uneven’ may be split up into the two sub-cases
‘red’ ‘green’. Therefore the probability that a ball is smooth must be computed indirectly
as the probability that a ball is white. Both questions have the same answer.
The second novel aspect of Keynes’ approach is its emphasis on the relational
nature of probability. According to Keynes, it is a mistake to talk about the probability of
11
an event tout court; instead one should always talk about the probability of an event
relative to the evidence or knowledge about that event. Consider the urn example again.
The probability of extracting a red ball from the urn is
!
1/3. But this probability is not
absolute; it’s relative to the available knowledge about the urn, i.e. to the fact that we
know that there are a red, a white and a green ball in the urn.
The third and perhaps, from a philosophical standpoint most interesting element
of Keynes’ approach is the attempt to define probability, i.e. to answer the question ‘what
does it mean to say that a certain event has probability p (relative to certain evidence)?’.
According to Keynes, probability is a logical relation, and to say that an event E has,
relatively to certain evidence K, probability p is to say, roughly speaking, that the
sentence ‘K is known’ logically implies the sentence ‘E will occur’ to degree p. In order
to make intuitively clearer his definition, Keynes adds that a sure event is always implied
(by any knowledge K) to degree one, whereas an impossible event is always implied to
degree zero; e.g. the sentence ‘tomorrow either it will rain or it will not’ is logically
implied by any other sentence to degree one. In chapter 1, we shall see that Keynes’
attempt at defining probability –as well as any other attempt that has been put forward so
far, as I anticipated in §1– fails.
Following Keynes, many philosophers and philosophically minded
mathematicians endorsed and defended the logical interpretation of probability, adding
their own contributions and corrections to Keynes’ original work. The first (historically
speaking) and most famous among them is certainly Rudolf Carnap, the most prominent
member of the famous Vienna Circle. Here I shall briefly discuss Carnap’s view.
12
Carnap’s main interests were the philosophy and the methodology of science, and
his work in the philosophy of probability was essentially a byproduct of those interests.
The main reason that led Carnap to work on an a priori theory of probability was to
provide a framework for the inductive support of scientific theories. In the eyes of
Carnap, a priori probability, when properly understood, had an important feature: when a
scientific theory is confirmed by appropriate empirical observations, the probability of
that theory increases. A priori probability seemed to Carnap a golden path to the
inductive confirmation of the laws of nature. I will argue in Chapter 1 that Carnap was
mistaken.
Probabilistic induction aside, Carnap’s analysis of the logical interpretation
follows rather closely Keynes’ pioneering work. Presently –for reasons that are not
entirely clear to me– Carnap’s approach is still pursued by a few authors, the most known
of whom is perhaps Jaakko Hintikka. At any rate, it is fair to say that nowadays the
logical interpretation has largely an historical interest for most philosophers of science.
3.2. The Relative Frequency Interpretation of Probability
The relative frequency interpretation of probability was first developed by the English
logician John Venn, in his 1866 treatise The Logic of Chance. Venn wrote under the
influence of the English intellectual climate of his time, which was dominated by the so-
called British Empiricism, a philosophical doctrine that emphasized experience as the
preeminent source of knowledge. In this respect, it’s not especially surprising that the
most empirically grounded interpretation, the frequency interpretation, was born in the
13
England of nineteenth century. Venn’s definition of (relative-frequency) probability is
rather naïve and mathematically unsophisticated. According to him, the probability
associated to an event is the frequency of occurrence of that event in the long run. In
defense of Venn, it must be considered that differential calculus was, at his time, at a very
early stage, and that a rigorous notion of mathematical limit was missing. But it is not
only Venn’s definition of probability that is non-mathematical. In general, the character
of his work is more narrative than technical. At any rate, the chief achievement of Venn’s
work is simply that of having established a new framework to understand probability: the
relative frequency interpretation.
The second major figure of the frequentist tradion is the Austrian applied
mathematician Richard von Mises, who offered the first systematic exposition of
frequentism in his 1928 book Probability, Statistics and Truth. While von Mises’ work
has both a foundational and a mathematical component, here we shall only be concerned
with the former. Von Mises begins his analysis by making two observations: first, the
“meaning” of probability is to be sought in science and not in the everyday usage of the
word ‘probability’; second, probability theory is a science, just like physics or chemistry
are. His first observation is especially important. Since only the relative-frequency
approach to probability plays a crucial role in science, von Mises says, it follows that
only frequentism is a valid (or better, perhaps, useful) interpretation of probability. It is
clear that von Mises’ position was heavily influenced by the recent developments of
science, and especially by those of physics. The new statistical approach to mechanics
14
and the study of Brownian Motion, which both implicitly relied on a frequentist
interpretation of probability, must have certainly inspired him.
Unlike Venn, von Mises offers a precise and rigorous definition of probability, as
well as a set of necessary conditions for the outcome of a repeatable event to be assigned
a probability-value. Let’s begin from said conditions. The first condition for the
probability of an outcome O to exist is that the mathematical limit of the relative
frequency of O within the repeatable event corresponding to O exists. For example, in
order for the probability of the outcome ‘coin C lands heads’ (relative to the event ‘coin
C is tossed’) to exist, the mathematical limit of the relative frequency of heads within the
(infinitely many) tosses of C must exist. The second necessary condition –which von
Mises called ‘randomization’– for the probability of an outcome O to exist concerns the
sequence S of all the outcomes of that repeatable event of which O is a specific outcome.
This second condition requires that for every infinite subsequence of S that may be built
by way of a rule of place-selection, the limit of the relative frequency of O is the very
same as the limit of the relative frequency of O in S. Consider again the case of a coin
tossed repeatedly. S is the sequence (or record, if you will) of all of the outcomes relative
to that coin. Now let p be the limit of the relative frequency of the outcome ‘heads’ in S;
let
!
q
1
be the limit of the relative frequency of ‘heads’ in the sequence obtained from S by
selecting all the elements that occupy in S an even position; let
!
q
2
be the limit of the
relative frequency of ‘heads’ in the sequence obtained from S by selecting every tenth
element of S (i.e. the 10
th
, 20
th
, 30
th
, …, element of S); let
!
q
3
be the limit of the relative
frequency of ‘heads’ in the sequence obtained from S by selecting every element of S
15
whose position is a power of 2 (i.e. the 2
nd
, 4
th
, 8
th
, 16
th
, 32
th
, …, element of S); and so on
for all the infinitely many possible rules of place-selection. Randomization requires that
!
p =q
i
, for all i. Now, if the relative frequency of an outcome O complies with these two
conditions, the probability of O is the mathematical limit of the relative frequency of O
within the sequence (i.e. record) of outcomes of the repeatable event to which O
corresponds.
While the first condition is straightforward and, well, trivial (if the probability of
an outcome is the limit of its relative frequency, then obviously such a limit must exist),
the legitimacy and utility of the second condition –randomization– is certainly far from
obvious. Hans Reichenbach, another prominent frequentist whose work we shall briefly
review, strongly objected to randomization on the ground that the latter is unnecessarily
restrictive (I sympathize with this criticism). All in all, however, it is fair to say that von
Mises’ work has multiple merits. He put forward, for the first time, a precise and
sophisticated relative-frequency theory. He understood the importance that a frequency-
style approach has in the understanding and the development of science. Last but not
least, he made it clear –unlike Venn– that a probability in a relative-frequency
interpretation does not and cannot concern a single event (e.g. the single toss of a coin);
rather, von Mises says, the probability of an outcome is about the relative frequency of
occurrence of that outcome, that is it’s about the (potentially) infinitely long record of the
occurrences of that outcome (e.g. the potentially infinite sequence of outcomes of single
tosses of a coin).
16
Outside the German-speaking world, von Mises’s work did not attract the
attention it would have deserved, and it took more than twenty years before an extensive
and systematic work on the frequentist approach appeared in English. This was Hans’
Reichenbach’s The theory of probability, published in 1951. Like von Mises,
Reichenbach had been a member of the Vienna Circle (but while von Mises was a
philosophically minded scientist, Reichenbach was a philosopher tout court). And just
like von Mises, Reichenbach was attracted to frequentism for its empirical significance. I
do not believe that Reichenbach’s contribution to frequentism is especially valuable.
However, since his work on probability has been rather influential in the philosophy of
science, I shall briefly sketch Reichenbach’s view. In addition to Reichenbach’s rejection
of von Mises’ randomization condition, there are two other major points that characterize
Reichenbach’s analysis. The first is the application of induction to relative frequencies.
According to Reichenbach, because of the observed and observable regular “behavior” of
relative frequencies, only a finite portion of the sequence of occurrences of a certain
event needs to be considered; this solves, the argument goes, the problem of observing
infinitely long sequences (Reichenbach’s definition of probability is the same as von
Mises’). Consider one more time the case of a coin that is tossed indefinitely. The
probability of, say, heads for that coin concerns, roughly speaking, the infinite sequence
(record) of tosses of that coin. However, this probability may be computed by analyzing
just a finite portion of the sequence, since the ratio of heads to total tosses for a
sufficiently long finite portion of the sequence is, by induction, the same as the ratio in
the whole (infinite) sequence. The second point peculiar to Reichenbach’s analysis is the
17
explicit application of probabilities to single cases. According to Reichenbach, it is
perfectly legitimate and meaningful to talk about, say, the probability that a specific toss
of a specific coin yields heads (within the relative-frequency interpretation). This feature
of Reichenbach’s account is in sharp contrast to von Mises’, and it is –in my view– a
plain and simple mistake; nonetheless, it has attracted considerable interest among
philosophers of science.
Nowadays, frequentism is held only by a minority of philosophers of science. At
the same time, however, it is fair to say that the vast majority of scientists are supporters
of, or at the very least sympathize with, the relative-frequency interpretation.
3.3. The Subjectivist Interpretation of Probability
The subjectivist interpretation of probability, we have seen, identifies the probability of a
proposition as the degree of belief an agent holds in that proposition at a given time.
Although in the past some authors (e.g. von Mises) thought that Bernoulli and Laplace
were the first subjectivists, it is now widely accepted that this is a misconstruing of the
position of both of them. It is true that Bernoulli and Laplace occasionally used the
expression ‘degree(s) of confidence’. However, it must be kept in mind that to them the
latter was merely a colorful illustration of a classical concept of probability.
It was only in the first part of the twentieth century that the subjectivist approach
was expounded and systematized by the Italian mathematician Bruno de Finetti and by
the English polymath Frank Ramsey. Although Ramsey was the first to publish an
analysis of subjectivism (1931), it seems that de Finetti had already been working on his
18
version of subjectivism in the early twenties. At any rate, Ramsey was not aware of the
(unpublished) work of de Finetti and the latter didn’t read Ramsey’s (1931) –at the time
de Finetti didn’t read English. Thus it is appropriate to consider Ramsey and de Finetti as
two scholars who independently arrived at similar conclusions.
Both Ramsey and de Finetti were strict empiricists. To them, the idea of a
probability established a priori, à la Laplace, was unacceptable (this is not surprising;
what is perhaps more surprising, is their out of hand rejection of the relative-frequency
approach to probability, which is also empirically grounded, and perhaps more so than
the subjectivist approach). At the same time, however, both Ramsey and de Finetti
sympathized with a crucial aspect of the classical interpretation: the fact that the latter
naturally applies to single cases or events (in contrast to the relative frequency
interpretation). In a way, Ramsey and de Finetti saw the subjectivist approach to
probability as an improved, more empirically grounded version of the classical approach.
The central problem of subjectivism is how to measure degrees of beliefs, that is
how to compute probabilities. Both Ramsey and de Finetti offer careful discussions of
this issue, and they end up proposing a similar solution. But while de Finetti emphasizes
the importance of bets and betting behavior as measuring tools for the intensity of beliefs,
Ramsey prefers what is now referred as a representation theorem. I shall expound in
detail –and criticize– both techniques in Chapter 2. At any rate, the use of these “belief
measurement” procedures allows both de Finetti and Ramsey to catch two birds with one
stone: besides measuring effectively beliefs, these procedures also guarantee that the
measured beliefs (probabilities) comply with the axioms of probability. Or so Ramsey, de
19
Finetti and their supporters have held. I will discuss and criticize this position in Chapter
2.
After the pioneering work of Ramsey and de Finetti, subjectivism gained
credibility as a “legitimate” interpretation of probability. However, much of the interest
that the subjectivist interpretation generated in the second part of the twentieth century
was not due to this interpretation per se, but rather to the use that the latter lends itself to.
Let’s take a step back. Since at least Laplace, Bayes’ theorem, a simple theorem of the
probability calculus, had been used to provide probabilistic support to universally
quantified statements by way of individual observations. Suppose the universally
quantified statement ‘all crows are black’ has some initial probability p. Then, under
certain (allegedly) mild assumptions, the observation of one or more black crows allows,
by way of Bayes’ theorem, to raise the probability of ‘all crows are black’ to
!
p +"
(where
!
" >0). The point is not to provide probabilistic support to statements like ‘crows
are black’, but to scientific laws (whose logical form is precisely that of a universal
statement). Now, subjectivist probability offers, for the first time, a completely natural
framework for the use of Bayes’ theorem, with no major –or even moderate– difficulty
involved. Or so many philosophers of science (e.g. Richard Jeffrey) and Bayesian
statisticians (e.g. Dennis V. Lindlay) have held in the second half of the twentieth
century. In Chapter 2, I will contend that all of them were –and are– deeply mistaken.
From an historical standpoint, however, the fact remains that subjectivism has attracted,
and continues to attract, much interest thanks to the belief that it is the most natural and
promising framework for Bayesian induction. Indeed, nowadays subjectivism is the most
20
popular approach to probability among philosophers of science (and among quite a few
statisticians).
4. The Propensity Interpretation
The reader who has a background in the philosophy of science must be wondering why in
§1 and §3 I haven’t listed and discussed, among the interpretations of Probability Theory,
the so-called propensity interpretation. There are good reasons, I believe, that justify this
exclusion.
Karl Popper introduced the propensity interpretation in his (1957), primarily as a
means to fill a well-known gap of frequentism: the attribution of single-case probabilities.
According to Popper, the frequency (actual or hypothetical) of occurrence of a certain
outcome is to be looked at as a measure of the propensity of the ‘experimental setup’ to
which that frequency corresponds. For instance, if the repeated tossing of a coin yields a
frequency of heads 0.4, the latter is a measure of the propensity of the coin tossing setup
(the coin itself, the way it is tossed, etc.) to “produce” heads. Thus a probability, which is
computed as a frequency, is ‘a characteristic of the experimental arrangement rather than
a property of a sequence’.
Popper’s claim, however, is unconvincing. First, his position is ambiguous. Is
Popper saying that probabilities are propensities of experimental setups to produce
certain frequencies? Or is he saying that probabilities are numerical properties of
experimental setups? These accounts are certainly not equivalent. Suppose Popper is
saying that probabilities are propensities. What is a propensity? If anything, a propensity
21
is a hypothetical frequency in disguise; e.g. to say that a die has propensity
!
1/6 to yield a
one is to say that were that die cast for sufficiently many times it would yield a one
!
1/6
of the times. I really see no other way to explicate the notion of propensity. But then the
propensity interpretation is just frequentism in disguise. Suppose, on the other hand, that
a probability-value is a characteristic, a property of an experimental setup. Even if this is
the case –and I’m sympathetic to this view– I do not see how this analysis could serve as
the core of a genuine interpretation of probability. Consider a simple analogy. The
temperature of a gas is obviously a property of the experimental apparatus that contains
the gas. But to say that the temperature of the gas is a property of the gas is not an
explication of the notion ‘temperature of the gas’. An acceptable explication would be
something like: the temperature of the gas under scrutiny is a measure of the degree of
motion of its molecules. Likewise, the statement that a probability is a property of an
experimental apparatus, of a physical system is perhaps true, but such a statement, even if
true, does not and cannot constitute an interpretation of the notion of probability.
22
Chapter 1: The A Priori Interpretation of Probability
1. Introduction: the Classical Theory of Probability
It is customary to refer to the “Classical Theory of Probability” as to the earliest attempt
to systematically interpret the mathematical theory of probability. The core tenet of this
approach to probability is that equal probability-values must be assigned to outcomes of
events whenever either no information about such outcomes is available, or the
information relative to each outcome is symmetrical. This is usually known as the
principle of indifference, sometimes also called the principle of insufficient reason, and it
is the main instrument within the Classical Theory of Probability. The principle of
indifference is usually justified on the ground that under absent or symmetrical
information there is no reason to assign unequal probability-values to the outcomes under
scrutiny (from equal information equal probability, so to speak).
It is not surprising that the first application of this approach to probability was the
scientific study of games of chance, since these represent the natural ground for the use of
the principle of indifference. Consider, as an example, the throw of a dice: six outcomes
are possible and no information about them is available; by the principle of indifference
each outcome must receive the same probability-value,
!
1
6
, since the individual
probability-values must add up to one.
3
Of course the games of chance represent only a
3
We shall see in §5 that it is not possible to assign unique, well-defined probability-
values by means of the principle of indifference alone. Until then I will presuppose that it
23
part of the problems to which the Classical Theory lends itself to be applied. In general,
the theory can be fruitfully employed in the computation of probability-values for
uncertain outcomes, when knowledge –or uncertainty– is symmetrical with respect to
such outcomes. Here is a trivial example: it is only known that Mary will choose either A,
B or C; what is the probability that Mary will choose, say, B?
More and more complicated chance games and situations involving uncertain
outcomes can be studied within the classical approach to probability. And yet it is in
comparatively simple scenarios that the theory seems to face problems. I shall consider
here two often cited and representative cases. We’ll see that the problems relative to these
cases are due to an improper use of the principle of indifference.
Consider an urn containing three balls: one blue, one red and one white. Since
there are three possible outcomes and since we have no reason to “prefer” one over the
other, by the principle of indifference the probability that a ball is white, red or blue is the
same,
!
1
3
. At the same time, one might think of the outcomes as the alternatives: the ball
extracted is, say, red; the ball extracted is not red. Since there is no reason for preferring
either outcome, by the principle of indifference both the probability that a ball is red and
the probability that a ball is not red is
!
1
2
. And this is clearly unacceptable: the
probability-value that the ball extracted is red cannot be both
!
1
3
and
!
1
2
.
How can one explain and solve such contradiction? It’s sufficient to notice that
only the first probability assignment (
!
1
3
) represents a genuine application of the principle
is legitimate to do so, for the sake of the discussion (the criticisms that I examine in the
reminder of §1 and in §2 have been put forward under this very presupposition).
24
of indifference: for the knowledge that the ball extracted will be of one among three
colors is symmetrical and “balanced” only with respect to the events ‘the ball extracted is
white’, ‘the ball extracted is red’, ‘the ball extracted is blue’. Whereas with respect to the
events ‘the ball extracted is red’ and ‘the ball extracted is not red’ such knowledge is not
symmetrical: to the first event corresponds the piece of knowledge that in one possible
outcome the ball will turn out to be red; but to the second corresponds the piece of
knowledge that in two possible outcomes the ball will turn out not to be red.
The second case is known as Bertrand’s Box, after Joseph Bertrand who first
formulated it. Consider a box or chest with three drawers. The first drawer contains two
gold coins, the second contains two silver coins and the third contains one silver and one
gold coin. A drawer – we don’t know which one – is selected and a coin is blindly drawn
from it. Given that the drawn coin is a gold one, what is the probability that the other coin
is gold as well? It seems that two different, incompatible answers are possible. The first:
since a gold coin was drawn, either the first drawer or the third drawer was selected; but
there is no reason to “prefer” either one, hence by the principle of indifference both the
probability that the first drawer was selected and the probability that the second drawer
was selected is
!
1
2
. Since the probability that the first drawer was selected equals the
probability that a second gold coin will be drawn, the answer is
!
1
2
. The second: the gold
coin that has been drawn was either the first gold coin in the first drawer, or the second
gold coin in the first drawer or the only gold coin in the third drawer; hence the
probability that the first drawer was selected is
!
2
3
. So the answer is
!
2
3
.
25
How to explain the contradiction in this case? What is the correct probability-
value for the drawing of another gold coin? Both the answers sketched above rely on a
somewhat liberal use of the principle of indifference; hence it seems proper to restrict the
use of the principle of indifference to clear-cut cases only. Since we are interested in the
probability that the first drawer was selected given that a gold coin was drawn, Bayes’
Theorem lends itself to the solution of our problem. Letting A be the event ‘the first
drawer was selected’ and B ‘a gold coin was extracted’, the solution to Bertrand’s Box
problem is simply the value of
!
"(AB), which, by means of Bayes’ theorem, we can
compute as
4
!
"(AB) =
"(BA)#"(A)
"(B)
(1)
Now, the probability that a gold coin is drawn given that the first drawer is selected is
obviously 1; the probability that the first drawer is selected is, by the principle of
indifference,
!
1
3
; and the probability that a gold coin is extracted is given by the theorem
of total probabilities and is
!
1
2
.
5
Thus
!
"(AB) is
!
2
3
.
I have examined two cases representative of those problematic and I have shown
that, once the principle of indifference is properly applied, the two cases become
unproblematic. In a similar fashion the difficulties within all the other problematic cases
4
In general the expression
!
"(XY) refers to the probability of X given that Y is the case.
5
The probability of selecting a drawer (be it the first, the second or the third) is
!
1
3
(by
indifference); the probability of drawing a gold coin from the first drawer is 1, from the
second 0, from the third
!
1
2
(by indifference). By the theorem of total probabilities,
!
"(B) =
1
2
.
26
can be eliminated (it is fair to say that the vast majority of the detractors of the Classical
Theory grant that none of these problems should be considered genuine contradictions).
In the next section (§2) I will discuss a similar, but more serious, problem that
arises in certain infinite domains, and which cannot be so easily dismissed. The
remainder of the chapter deals with problematic aspects of the theory, which either have
not been fully appreciated or have been completely ignored. In particular: in §4 I will
discuss what I call the problem of meaning of probability-values (what does it mean to
say that the probability of a certain event is p?); §5 deals with the problem of justifying
the additivity of probability-values within Classical Probability; lastly, §6 is a sketch of
probabilistic induction and of its relationship with traditional (i.e. non-probabilistic)
approaches to induction.
2. The Paradox of Inverse Probabilities Explained
At the end of the nineteenth century, Joseph Bertrand described a series of paradoxes
originating from the use of the principle of indifference in certain infinite domains.
6
Subsequent writers have referred to these paradoxes and their variants as a fatal flaw in
the classical interpretation of probability. I shall consider here a specific case as
representative of the problem.
Suppose we know that the specific volume v of a substance is between 1 and 3.
By the principle of indifference the probability that v is between 1 and 2 is the same as
the probability that v is between 2 and 3 (these alternatives are “symmetric”). The
6
Bertrand (1889).
27
specific density d is the reciprocal of the specific volume; hence d is between 1 and 1/3.
By the principle of indifference the probability that d is between 1 and 2/3 is the same as
the probability that d is between 2/3 and 1/3. From the latter it follows that, since v is the
reciprocal of d, the probability that v is between 1 and 3/2 is the same as the probability
that v is between 3/2 and 3, contradicting our initial computation.
This is the typical account of a famous exemplification of the paradox of inverse
probabilities. The use it is made in it of the principle of indifference seems correct: the
latter seems indeed applicable in the particular instances considered. But is it? The
expression ‘principle of indifference’, as it is customarily used, refers to the conflation of
a “weak” principle of indifference (weak principle of indifference hereafter): if there is
no reason to prefer a state to another then any state must be assigned equal probability;
and of a “strong” principle of indifference (strong principle of indifference hereafter): if
there is no reason to prefer any state to another, then any two disjoint and equinumerous
subsets of states must be assigned equal probabilities. In the case at hand, ‘principle of
indifference’ can only, if anything, be understood as the strong principle of indifference;
for the weak principle of indifference cannot possibly apply.
7
Indeed, the infinite set of
7
The information that there are, in principle at least, infinitely many alternatives or states
in an interval of values, say
!
1"v"2, cannot be ignored and the set of alternatives cannot
be considered a single alternative or state. Each possible alternative must be considered in
the application of the principle of indifference (cf. my discussion in §1).
The mathematically inclined reader should not be tempted to think that the weak
principle of indifference implies that, say, v qua random variable must have a uniform
probability density function, for two reasons. First, all what the weak principle of
indifference implies is that each state must have the same probability, which in turn
implies that each state must have probability zero, since there are infinitely many states.
And this is compatible with any probability density function. Second, the fact that each
state must have the same probability could not imply that its representing random
28
states that correspond to values of v between one and two, call this set
!
S
v:[1,2]
, and the
infinite set of states that correspond to values of v between two and three,
!
S
v:[2,3]
, are
guaranteed, whether v takes real or rational values, to be equinumerous (two infinite sets
are said to be equinumerous if, and only if, there exists a bijective, i.e. one-to-one,
function that maps each member of the first set into one and only one member of the
second set). Unfortunately, the very same mathematical reason (which will not be
discussed here) that guarantees that
!
S
v:[1,2]
and
!
S
v:[2,3]
are equinumerous, also guarantees
that so are
!
S
v:[1,x]
and
!
S
v:[x,3]
, for all
!
1< x <3. It follows that by the principle of strong
indifference,
!
S
v:[1,x]
and
!
S
v:[x,3]
are equiprobable for infinitely many values of x!
Thus the standard account of the paradox of inverse probabilities
8
points out an
actual and substantial difficulty, but it accounts for it with the wrong explanation:
!
S
v:[1,2]
and
!
S
v:[2,3]
are equiprobable not because of their “symmetry” (two is exactly in the middle
of one and three) but because
!
S
v:[1,x]
and
!
S
v:[x,3]
are all equiprobable, and accidentally one
of the possible values of x is two.
variable had a uniform probability density function: for a uniform probability density
associated to v means a non-uniform probability density associated to d, and vice versa
(both v and d represent the states of the system).
At this point it might be objected that, since we already have a measure (the
Euclidean metric) of the “quantity” of states corresponding to the values of v between one
and two and between one and three, we might compute the probability that, say, the value
v of a state is between one and two simply observing that the latter is just the ratio of
favorable to possible cases, and hence a ratio of metrical measures. This objection would
be incorrect, however: each probability assignment made on the basis of the rule
‘probability equals the ratio of favorable to possible outcomes’ presupposes an implicit
application of the principle of indifference in the first place (see section §5).
8
As it is presented by, for example, Von Mises (1957: p. 79), Howson (2000: 84-86),
Van Fraassen (1989: pp. 301-316) and Salmon (1966: pp. 66-68).
29
I have explained what the problem amounts to, but it remains to be established
what conclusion one should draw from it. There are two alternatives: i. the problem
counts as a reductio ad absurdum of the Theory of Classical Probability; ii. the problem
is merely a “symptom” of the unsoundness of the strong principle of indifference as
applied to infinite domains (and maybe of the unsoundness of the strong principle of
indifference tout court). Let’s consider ii. first. We accept the weak principle of
indifference on the basis of its reasonableness: it just seems right that we should be
equally undecided among n (or even infinite) outcomes, provided we have no reason to
prefer any of them over the others. And we accept the strong principle of indifference,
when applied to finite domains, on a like rationale. But it is not clear that the strong
principle of indifference, when applied to infinite domains, is equally reasonable.
Consider the following example. A random integer is generated and displayed on a
computer screen. What is the probability that the number on the screen is odd? What is
the probability that the number on the screen is a multiple of one million? The probability
that the number on the screen is odd should be much higher than the probability that it is
a multiple of one million, or so it seems. However, since the set of odd numbers and the
set of multiples of one million are equinumerous, by the principle of strong indifference
these events are equally likely (which implicitly presupposes that we are equally
undecided between them). This suggests, at a minimum, that the application of the
principle of strong indifference to infinite domains must be restricted. But it also suggests
that the strong principle of indifference is downright mistaken, and that the fact that this
principle holds in finite domains is just “accidental”.
30
On the other hand, it’s not clear on what ground one could defend thesis i. To
begin with, the Paradox of Inverse Probabilities, once properly analyzed, constitutes a
difficulty that is perfectly equivalent to that constituted by the integers lottery case
discussed in the previous paragraph. Hence, a supporter of thesis i. should be prepared to
defend it on the basis of the integer lottery “paradox” (or of an equivalent one, of course),
which clearly doesn’t possess the same “force” as the Paradox of Inverse Probabilities as
it is usually understood (improperly understood, that is). At the same time, a defense of
thesis i. should show that the principles of weak and strong indifference possess the same
plausibility. However the former has superior plausibility in virtue of its simplicity: if it is
intuitively appealing that one should be equally undecided between two alternative, it is
not so intuitive that one should be equally undecided between two “equinumerous”
infinite sets of alternatives.
Most of the detractors of the Theory of Classical Probability have held that the
Paradox of Inverse Probabilities constitutes an insuperable problem for this theory.
9
But
the analysis of these writers is vitiated by a lack of appreciation of the nature of the
difficulty that underlies the paradox (and of the role that the principle of indifference
plays in it). A proper understanding of the issue shows that this difficulty cannot
constitute a conclusive argument against the Theory of classical probability.
9
Von Mises (1957: p. 79), Howson (2000: 84-86), Van Fraassen (1989: pp. 301-316) and
Salmon (1966: pp. 66-68).
31
3. Logical Probability
J. M. Keynes put forward his theory of Logical Probability as a refinement of the
Classical Theory.
10
Logical Probability, according to Keynes, differs from classical
probability in two crucial points:
i) It is incorrect to specify a probability-value as applying to an outcome or event. A
probability-value must apply to two elements: an event or outcome, and the knowledge
relevant to its occurrence.
ii) A probability-value is a logical relation P between two complex propositions, e and k
(k is the information relevant to the occurrence of the outcome e); in Keynes’ notation
!
"(e/k).
The observation that a probability-value cannot refer merely to an event or
outcome is undoubtedly one of Keynes main achievements within the foundations of
probability. It is only via this principle that probability-values can be coherently assigned.
For if a probability-value p’ refers to a certain outcome O, it does on the basis of certain
information k’; yet, with respect to other information, call it k”, a different probability-
value p” might be computed for O. But it is incoherent to assign two distinct values to
!
"(O). On the other hand, both
!
"(O/k') and
!
"(O/k") have each one value only.
11
As for (ii), I believe it to be unwarranted and unnecessary, but at this point I shall
not argue against it. Rudolf Carnap, the other famous exponent of the logical school,
10
Keynes (1921).
11
Of course this point doesn’t hold if the assignment of probability-values doesn’t rest on
the principle of indifference.
32
accepts (ii) and indeed re-proposes it in a bolder form:
12
according to him, P is a logical
relation similar in nature to that of logical implication between two sets of propositions,
the difference being that “while the first states a complete logical implication, the second
states only, so to speak, a partial logical implication” (1950: pp. 30,31). This concludes
our brief survey of Logical Probability. Henceforth, I’ll employ the expression ‘a priori
probability’ to refer to both the logical and the classical interpretations of probability.
4. The Problem of Meaning of Probability-Values
What does it mean to say that a certain state of affairs, given certain information, has
probability-value p? For instance, it is known that an urn contains only red, blue and
white balls. What do I mean when I say that the probability-value of extracting say a blue
ball is 1/3? There is indeed a trivial way of dealing with these questions. In the context of
Classical Probability, one might answer that to say that a probability-value is p, means
that the ratio of favorable to possible outcomes equals p. Yet, this approach is
unsatisfactory. Under this interpretation, probability-values are purely descriptive
properties – on par with ‘being blue’ and ‘having two legs’ – that do not convey any of
the ideas or concepts which one usually associates with probability: uncertainty,
indetermination, prediction, indecision, etcetera (in fact, no scholar I’m aware of holds a
similar view). To say that probability-values are ratios of favorable to possible outcomes
is not an explication of their meaning, but, rather a “rule of determination”, a tool to
12
The following tenet is widely endorsed by contemporary accounts of Logical
Probability.
33
measure them so to speak.
13
In the reminder of this section I shall examine other
candidate meanings that have been advanced by the supporters of the a priori
interpretation of probability.
14
It is fair to say that the most widely held account of the meaning of a priori
probability is that a probability-value has to be understood as a degree of rational
belief.
15
Unfortunately, it is not at all clear what a degree of rational belief is. In
particular, it is not clear how to interpret the predicative term ‘rational’. Maybe ‘rational’
stands for ‘guaranteed a priori’. For instance, to say of a certain belief that it is rational
means that that belief is guaranteed to be true a priori, by reasoning alone (e.g. that
outside either it rains or it doesn’t rain is a rational belief). Thus the expression ‘rational
belief’ has a clear, definite meaning.
16
But ‘rational’, when predicated of a certain degree
of belief, cannot mean that that degree of belief is true a priori, for the simple reason that
13
Let me put this in another way. If the probability-value of an event meant ratio of
favorable to possible outcomes for that event, then it would be uninteresting to use and to
talk about classically computed probability-values; indeed, we would not even call them
probability-values to begin with.
14
The problem of establishing a meaning for probability statements (within the classical
interpretation) is not a new one. Reichenbach discusses it in (1949: 366-372). However
his analysis has two deficiencies: first, Reichenbach is concerned with establishing a
meaning that is, to some degree at least, empirical, whereas I’m ready to accept any
reasonably clear candidate meaning; second, he doesn’t fully appreciate the difficulties
presented by the candidates meanings he considers.
15
See for example Keynes (ibid: p. 4) and Skyrms (1980: pp. 20,21). Incidentally, let me
notice that the expression “rational degree of belief” would be more appropriate.
16
Here I shall assume as correct a simplified version of the so-called Correspondence
Theory of Truth: an atomic sentence is true if and only if it corresponds to an actual fact.
For instance, ‘it rains outside my house’ is true if and only if it is the case that it rains
outside my house.
34
a degree of belief –qua degree– can be neither true, nor false.
17
If anything, a degree of
belief can be correct (or right, sound, etc.) or incorrect. But what does it mean to say that
a certain degree of belief is correct? The only way of making such a notion meaningful is
by a reference to future events. For instance, I believe to a high degree that tomorrow it
will rain; in what sense may my belief be correct? At most, in the sense that it conforms
to future reality, i.e. that tomorrow it will actually rain. But a priori probability cannot be
predictive.
18
The conjecture that ‘rational’ means ‘guaranteed a priori’ is therefore
incorrect.
19
Another possible candidate meaning for ‘degree of rational belief’ is the
following: to say that a certain degree of belief is rational is to say that that degree is the
best possible degree associated to that belief, given the circumstances. But this is not
going to work either, for what does it mean to say that a degree of belief is the best
possible? The only clear, coherent meaning for this expression is that the degree of belief
17
It might be objected that it is perfectly meaningful to assign truth-values to degrees of
belief, if they are understood subjectively. For instance, that person X has degree of belief
y in proposition P is either true or false. Let’s see how this position fares. Since to say
that a degree of rational belief in P is y is to say that the degree of belief y in P is a priori
true, it follows that it is a priori true that the subjective degree of belief in P of a certain
person (or maybe of everybody) is y. But this statement is of course false: there is no a
priori guarantee that y is the actual subjective degree of belief of anybody (determining
subjective degrees of beliefs – if there are such things – is an empirical problem).
18
For three rather obvious reasons: i) A priori probability is precisely a priori; how could
it predict anything empirical? ii) To the same event might correspond both a low and a
high probability-value; iii) the probability-value associated to the occurrence of a certain
event can be neither confirmed, nor disconfirmed by the occurrence of that event.
19
Obviously this argumentation applies also against the view that a rational degree of
belief is a rationally justified degree of belief (i.e. the valid conclusion of a rational
process): for to say that a degree of belief d is rationally justified is to say that an a priori
justification is available for the correctness of d.
35
at issue is the fittest (given the circumstances) to formulate predictions; i.e. that that
degree is the best possible on empirical ground. But any appeal to (future) experience in
order to explicate the meaning of ‘degree of rational belief’ is unacceptable, as we have
just seen.
We have seen in §3 that both Keynes and Carnap held that probability-values
must be understood as peculiar logical relations between two (sets of) propositions: to say
that an event, described by the proposition S, has probability-value p with respect to
certain knowledge, expressed by the proposition K, is to say that there exists a logical
relation (of the “p-type”) between K and S; for the sake of brevity, I’ll say that K logically
implies S to a degree p. Thus, in this case the proposal is: a probability-value p stands for
a logical relation (of the “p-type”), and that is precisely its meaning.
20
This position is
rather weak, and for an elementary reason: that logical relations too, provided they are
meaningful, must be explained via appropriate semantics. To say that A logically implies
B is to say that every possible state of affairs that makes A true, makes B true as well. For
instance, to say that the statement ‘it rains and it is foggy’ logically implies the statement
‘it rains’, is to say that whenever the first proposition is true (i.e. whenever it is raining
and it is foggy) the second must be as well (i.e. it is raining). But there is no clear
semantic “translation” for ‘A logically implies B to a degree p’ (which, remember, is
supposed to be a generalization of the “regular” logical implication). Hence this logical
relation is meaningless – literally so.
20
I don’t want to imply that this was Keynes’ or Carnap’s position.
36
Lastly, we shall examine two proposals due to Carnap. The first is that an a priori
probability-value is an estimate of a certain relative frequency f:
21
let the a priori
probability for a certain outcome O of an event E be p, and suppose that E will occur
indefinitely many times in the future; then, according to Carnap, p represents the
expected value, the estimate of the frequency of occurrence of O within E (if E is
repeated that is). This proposal, however, just pushes the problem a little further, but it
doesn’t solve it: for how is the notion of estimate or expected value of a frequency of
occurrence to be understood?
22
The second proposal has been put forward by Carnap
23
and later refined by
Kemeny
24
. Under certain evidential knowledge e, a person A is asked to bet on the
occurrence of an event E described by h; s is the price (
!
0" s"1) of the bet that pays 1 if
E occurs and 0 otherwise; A must chose a value for s (the betting quotient). Both Carnap
and Kemeny identify the a priori probability-value p of h given e with s, under the
proviso that s must be rational: the a priori probability-value of h given e, is the rational
betting quotient of a bet on h given e. Unfortunately this proposal encounters exactly the
same difficulties as the first we have examined, the explication of probabilities in terms
21
Carnap (1950: pp. 168-174).
22
One might object that the notion of estimate of a frequency could be analysed in terms
of rational beliefs: to say that an a priori probability-value p is the estimate of a certain
frequency, means that it is rational to believe that p is the value of that frequency. But
this amounts to say either that it is a priori true that p is the value of that frequency, or
that p is the best guess for the value of the frequency. Neither proposal is acceptable (see
the previous discussion).
23
Carnap (Ibid.: pp. 165,166).
24
Kemeny (1955).
37
of rational degrees of belief. Since a betting quotient cannot be a priori true or false, it
must be, if anything, a priori correct (or right, appropriate, etc.) or incorrect; and the only
possible meaning of the expression ‘correct betting quotient’ is the articulation of certain
empirical conditions (that a certain prediction is somehow fulfilled); but a probability-
value cannot postulate anything about experience. Rational betting quotients (if there are
such things) won’t do as meanings of probability-values.
All the attempts to interpret classical probability statements that have been put
forward fail in one way or another. Since there is no good reason to believe that the
correct meaning of classical probability statements is waiting to be discovered, I conclude
that the problem of meaning is an outstanding difficulty for the Theory of Classical
Probability.
5. The Problem of Additivity of Probability-Values
Kolmogorov’s axioms are widely considered the axiomatic system for the Theory of
Probability. I shall briefly describe Kolmogorov’s three axioms for the reader who is not
familiar with them.
25
The first axiom states that probability-values are positive real
numbers. The second axiom states that the “universal event” (the disjunction of a
complete set of mutually exclusive event, e.g. ‘A or not A’) has probability one. The
axiom of finite additivity states that the sum of the probabilities of two mutually
exclusive events equals the probability of their disjunction.
25
Instead of employing the axiom of countable additivity, as it is customary in
mathematical probability, I’ll use the axiom of finite additivity.
38
It is a common and widespread misconception that probability-values, within the
Classical Theory of Probability, satisfy “automatically” the axiom of additivity. This is
due, I believe, to the mistaken supposition that the principle according to which
probability equals the ratio of favorable to possible outcomes (from which the axiom of
additivity directly follows) is immediately implied by, or even on par with the principle
of indifference.
26
This, however, is not the case, as a simple example will show. For the
sake of the discussion I will presuppose the other two axioms.
27
Consider a set of three
mutually exclusive outcomes, A, B and C. By the first axiom
!
P(A),P(B),P(C)"0 (2)
By the second axiom
!
P(A"B"C) =1 (3)
And by the principle of indifference
!
P(A) =P(B) =P(C) = p (4)
From these three equations it follows that A, B and C have the same positive probability-
value p; but it does not follow that p has a specific, definite value: any positive real
number will do. In order to assign a definite value to p, it is necessary to postulate the
axiom of finite additivity. By the latter and by the second axiom
26
In this section, ‘principle of indifference’ is to be understood as the weak principle of
indifference.
27
I believe that both the first and second axioms have a somewhat conventional nature.
As for the first axiom, negative values, for example, might be employed instead (and a
probability theory could be developed non the less; it would be just more complex). As
for the second, it seems natural to assign a definite probability value to the universal
event (notice that the choice of one as probability-value is conventional) which
represents, loosely speaking, certainty. But be as it may, I’ll charitably presuppose them.
39
!
1=P(A"B"C) =P(A"B) +P(C) =P(A) +P(B) +P(C) =3p (5)
From which it readily follows that p is
!
1
3
(obviously
!
1
3
is also the ratio of the number of
favorable to possible outcomes). Thus, the principle that a probability-value equals the
ratio of the number of possible to favorable outcomes follows from the principle of
indifference and from the axiom of additivity.
28
How about Logical Probability? This difficulty presents itself with the same force,
although in a (apparently) different form. Consider Carnap’s mathematical definition of
probability
29
(which he calls “degree of confirmation”, c, of an hypothesis h given certain
evidential knowledge e)
!
c(h,e) =
m(h"e)
m(e)
(6)
It is straightforward that
!
c(",") satisfies the axiom of additivity iff
!
m("), which Carnap
calls measure function, is additive, that is
!
m((h'"h")#e)$m((h'#e)"(h"#e)) =m(h'#e) +m(h"#e) (7)
So the issue is really why the function
!
m(") should be additive. Within Carnap’s theory
!
m(") assigns “weights” to possible states of the world.
30
Given a finite set of predicates
and a finite set of individuals, a state of the world is an assignment of each predicate, or
its negation, to every individual constant. For instance, consider a “world” with one
28
Again, assuming both the first and the second axioms as given.
29
Carnap (Ibid.).
30
Essentially
!
m(") assigns unconditional probabilities to the states of the world.
40
predicate, F, and two individuals, a and b; its states are Fa & Fb, Fa & ¬Fb, ¬Fa & Fb
and ¬Fa & ¬Fb.
Carnap examines two measure functions among those possible. Let’s consider
Carnap’s favorite,
31
!
m
"
(#), which assigns identical weights to the same “structures”. A
structure is a set of states of the world, whose elements can be made identical by a certain
permutation of their individuals. For instance Fa & ¬Fb and ¬Fa & Fb belong to the
same structure (one individual is F, one individual is not F). The idea is to assign equal
weight, through
!
m
"
(#), to each structure, and then to distribute equally among the states
of a structure its weight. Presumably, the even distribution of weight among structures
(and then among their states) is justified by the fact that there is no reason to prefer a
structure over another, hence by an appeal to the principle of indifference. In the case at
hand there are three structures (see Fig. 1).
Figure 1: Structures in a World with Two Elements and One Predicate
I (one element is F, the other is ¬F) Fa & ¬Fb
¬Fa & Fb
II (both elements are F) Fa & Fb
III (both elements are ¬F) ¬Fa & ¬Fb
The principle of indifference allows us to assign equal weight to each structure, via
!
m
"
(#)
!
m
"
(I) =m
"
(II) =m
"
(III) =# (8)
31
Carnap (Ibid.: pp. 562-564).
41
But what is the value of φ? In the substance, the situation is identical to that of
Classical Probability sketched above. There is just no way of having a definite, unique
value for φ (and hence for c) without postulating additivity.
It remains to consider another probability function proposed by Carnap in a later
work
32
!
c
"
(h
i
,e
i
) =
s
i
+"/k
s +"
(9)
This equation applies to a logical world in which all of the individuals possess one and
only one property out of a set of k exclusive properties
!
{P
1
,P
2
,...,P
k
};
!
h
i
is the event that
individual
!
s +1 has the property
!
P
j
;
!
e
i
is the knowledge that among the first s
individuals,
!
s
i
have property
!
P
j
. The parameter λ balances two antithetic components of
!
c
"
(h
i
,e
i
). An empirical component that is “maximal” for λ=0
!
c
0
(h
i
,e
i
) =
s
i
s
(10)
and an a priori, logical component, which is “maximal” when
!
"#$
!
c
"
(h
i
,e
i
) =
1
k
(11)
The empirical component is the so-called straight rule of induction, and it amounts to
posit that the past and the future are alike. The a priori component is an instance of
logical probability, as it was used in Carnap (1950); but it can also be interpreted as a
classical probability, of course (there are k properties; since an individual must have one
and only one property, the ratio of favorable to possible outcomes is 1/k). Now, if the
32
Carnap (1952: p. 33).
42
logical component expresses a definite numerical logical (and classical) probability, the
axiom of additivity must be presupposed (see above the case of logical probability).
According to Kemeny (1955), the axiom of additivity – as well as a number of
axioms unwarrantably presupposed by Carnap – can be justified within Logical
Probability, once probability-values are viewed as rational betting quotients (see section
§4). Indeed, Kemeny is right in that if probability-values within the Logical Theory of
Probability are interpreted as betting quotients, then all the desired axioms follow
(including the axiom of additivity). Unfortunately, we have seen in section §4 that within
the Logical Theory of Probability, probability-values cannot be meaningfully interpreted
as betting quotients.
The axiom of additivity cannot be justified within a priori probability; yet it is a
necessary ingredient, and it needs to be postulated, for a priori probability to work at all.
33
But why must probability-values of incompatible outcomes be additive?
6. Probability and Induction
Carnap considered the construction of a probabilistic system of inductive logic the most
important outcome of his theory of Logical Probability; and so does Hintikka. In general,
philosophers of science have seen probability as the solution to the problem of induction
and of confirmation of scientific laws. It therefore seems appropriate to discuss the
33
One might argue that the principle that probability equals the ratio of favorable to
possible outcomes should be employed as an axiom in place of the principle of
indifference. In this way additivity would hold “for free”. The problem with this position
is of course that whereas there seem to be good reasons to admit the principle of
indifference as basic, it is not clear why the principle that probability equals the ratio of
favorable to possible outcomes should be presupposed.
43
possibility and legitimacy of probabilistic inductive systems, at least to the extent that a
work whose main focus is not induction allows.
It is well known that Hume first criticized induction in his “deterministic” form
34
(deterministic induction henceforth), by means of a threefold argument:
35
(i) it is
logically possible that ‘the course of Nature may change’, hence no a priori justification
can guarantee the soundness of inductive generalizations; (ii) the principle that the future
will resemble the past could guarantee the correctness of inductive generalizations; but
this principle is itself inductive and therefore it would require the very same justification
that it is supposed to provide; (iii) since an inductive generalization goes beyond what is
observed it cannot be justified empirically. Thus deterministic induction is unjustified and
groundless.
The “probabilization” of induction may be seen precisely as a reaction to Hume’s
criticism.
36
It is true that deterministic induction is untenable, but Hume’s critique seems
to leave open the possibility of a weaker form of induction, of a weak inductive
justification: inductive generalizations are not a priori certain, they are merely probable,
and it may turn out that they are mistaken (thus accommodating the possibility of a
change in natural laws). This approach to induction has been endorsed in three variants.
Under the first, inductive generalizations are assigned well-defined probability-values.
34
That is the view that inductive generalizations follow with certainty from their
premises. It is still debated whether Hume also meant to criticize induction in its
probabilistic form; but I shall not be concerned with this scholarly issue.
35
Hume (1748).
36
See Lakatos’ enlightening discussion (1965: pp. 321-323).
44
This is the view of C. D. Broad and Keynes.
37
Under the second, numerical probability-
values are replaced by qualitative degrees of support. This is the view of Max Black and
P. F. Strawson.
38
In the third variant, whose main proponents are Carnap
39
first and
Hintikka later, inductive generalizations are possible only within a system of probabilistic
inductive logic.
Thus probabilistic induction has been seen as a better answer to the problem of
induction than deterministic induction ever was. It seems to me, however, that this stand
misses the point and that, in fact, probabilistic induction is exactly as groundless and
unjustified as deterministic induction is.
40
The chief desideratum for a probabilistic inductive system is that it succeeds at
confirming, probabilistically, universal hypotheses on the basis of appropriate evidence.
For instance, the (universal) hypothesis that ‘all As are Bs’ must receive increasing
probability for every new instance of an A which is also a B. Roughly speaking, if the
essence of deterministic induction is that the future will resemble the past, the essence of
probabilistic induction is that the future will probably resemble the past. But in what way
37
Broad articulates this position explicitly in (1918: p. 391). Keynes assumes it implicitly
throughout parts III and V in (1921). Also A. J. Ayer (1952: p. 72) seems to accept it,
although less forcefully.
38
Black (1981: pp. 61-88), Strawson (1952: pp. 237, 244).
39
Probability aside, Carnap (1950: p. v) also believed in the intrinsic superiority of a
logical, formal inductive system over an informal inductive system, the idea being that
since inductive logic is analytic (qua logic, presumably) it allows the formulation of a
priori justified inductive generalizations.
40
I do not believe, however, that Goodman’s famous criticism (the “New Riddle of
Induction”) poses a genuine hurdle to deterministic and probabilistic induction as it is
widely accepted nowadays.
45
is probabilistic induction supposed to be better justified and more solid than deterministic
induction?
It is clear that probabilistic induction – like deterministic induction – cannot be
justified empirically: the former, like the latter, goes beyond experience and therefore
cannot be supported by it. It follows that probabilistic induction must be justified a priori,
logically. The fact that probabilistic induction accommodates the humean challenge of a
logically possible change in the laws of nature, allows to reject Hume’s argument
(applied to probabilistic induction) that induction cannot be logically justified. Now, it is
true that this leave open the possibility of an a priori justification; but this hope is soon
infringed by a simple consideration. From a logical point of view the principle that the
future will probably resemble the past is on par with the principle that the future will
probably not resemble the past; for I do not see why, on purely a priori ground, one
should be preferred to the other. Therefore, since these principles are in contradiction,
neither one can be justified a priori. Probabilistic induction can be justified neither
empirically nor logically, and it is not a better answer to the problem of induction than
deterministic induction is.
7. Final Remarks
There is an almost unanimous consensus in the literature that the application of the
principle of indifference to certain infinite domains leads to unacceptable consequences.
Indeed, it is contended that such consequences count as a reductio ad absurdum. I have
shown in §2 that this criticism is mistaken: the problem that underlies the Paradox of
46
Inverse Probabilities does not constitute a conclusive argument against the Theory of
Classical Probability.
In §5 we have seen that the principle of indifference cannot serve alone as a basis
for a priori probability. And this seems to me an insurmountable difficulty for the theory,
for it is not clear what other additional principle or law could be legitimately postulated
(granting that the adoption of the principle of indifference can be justified through its
intuitive appeal, on what ground should we accepts additional principles?). The principle
of indifference together with the axiom of additivity would represent a solid foundation
for probability; unfortunately it is not clear at all how to justify the latter.
In §4 I have examined what I’ve called ‘the problem of the meaning of
probability-values’. The latter, we have seen, amounts to the difficulty of individuating a
reasonably well-defined, non-trivial meaning for statements of the form ‘p is the a priori
probability-value of an event H (given evidence E)’. In my analysis, none of the
candidates for a meaning/interpretation of probability-values seem acceptable. This, I
believe, is a serious difficulty for the Theory of A Priori Probability.
47
Chapter 2: the Subjectivist Interpretation of Probability
1. Introduction: Qualitative Degrees of Belief
The Subjectivist Theory of Probability (subjectivism) equates the probability of
occurrence of an event with the strength of an agent’s belief that that event will occur;
e.g., the probability, relative to an agent A, that tomorrow it will rain is A’s degree of
conviction that tomorrow it will rain. Thus probability is agent-dependent: there is no
such thing as objective probability.
41
That beliefs come in comparative degrees of strength is obvious enough. For
instance, my belief that the sun will rise tomorrow is stronger than my belief that the
weather will be, say, good tomorrow. However, subjectivists require that the strength of a
belief is expressed quantitatively, as a number; if to a belief A corresponds a degree of
belief p and to a belief B corresponds a degree of belief q, then A is believed more
strongly than B just in case
!
p >q.
All subjectivists hold that degrees of belief may be computed in such a way as to
satisfy Kolmogorov’s axioms of probability: if T is a tautology, one’s degree of belief in
T is one; if A and B are mutually exclusive events then one’s degree of belief in A plus
one’s degree of belief in B is equal to one’s degree of belief in A or B; lastly, one’s
degree of belief in any proposition is non-negative. While all subjectivists hold that
degrees of belief may be computed in such a way as to satisfy Kolmogorov’s axioms, the
41
That is, a probability computable through a widely accepted algorithm.
48
vast majority of subjectivists also believe in Probabilism, the thesis that degrees of belief
must, on pain of irrationality, be computed in such a way as to satisfy Kolmogorov’s
axioms.
The proponents of Probabilism support their position essentially by means of
either of two arguments: the Dutch Book Argument (in one of its variants); and the
Representation Theorems Argument (in one of its variants). I shall argue that neither the
Dutch Book Argument (examined in §2) nor the Representation Theorems Argument
(examined in §3) prove that numerically formulated degrees of beliefs must conform to
Kolmogorov’s axioms.
Bayesian Confirmation Theory is a subjectivist doctrine according to which one
should update one’s degrees of belief about empirical hypotheses by means of Bayes’
Theorem. It is fair to say that Bayesian Confirmation Theory represents the most
important application or outcome of subjectively conceived probability, and that virtually
any subjectivist endorses it or one of its variants (cf. also Intro. §3). After a sketch of this
doctrine (§5) I shall discuss two criticisms (§6, §7) that are, in my view, fatal to it.
For the sake of simplicity, throughout the chapter I shall use the word
‘probability’ as meaning numerically formulated degree of belief.
2. The Dutch Book Argument
The Dutch Book Argument (DBA hereafter) relies on a pair of theorems, the so-called
Dutch Book Theorem and its converse –call it Converse Dutch Book Theorem. In the
first part of this section I deal with these theorems; in the second part I turn to the DBA.
49
Both the Dutch Book Theorem and the Converse Dutch Book Theorem
presuppose that an agent’s probabilities are determined through monetary bets. Let γ be
an event. An agent’s probability for γ is determined in the following way. The agent is
forced, somehow, to fix the price of a bet that pays a sum of money (positive or
negative
42
) S if γ will occur and zero otherwise. Subsequently, by requiring that the price
of such a bet equals
!
p
"
#S,
!
p
"
, that agent’s probability for γ, is calculated.
43
Hereafter, I
shall refer to this way of determining probabilities as the ‘standard way’.
The Dutch Book Theorem states that if an agent A’s probabilities (determined in
the standard way) violate Kolmogorov’s Axioms then there exists a sequence of
transactions of the relevant bets (i.e. of those bets that correspond to A’s probabilities)
that assure a monetary loss to A. Here is a sketch
44
of the proof of the Dutch Book
Theorem. ∀χ,
!
p
"
#0 (non-negativity): suppose that for an event χ,
!
p
"
<0; then A is
prepared to sell a bet on χ at price
!
p
"
#S ; that is A is prepared to pay somebody
!
p
"
#S
to take a bet that will pay at a minimum zero; A’s loss is therefore assured. If τ is a sure
event,
!
p
"
=1 (normalization): suppose that
!
p
"
<1; then A is prepared to sell for
!
p
"
#S a
bet on τ that certainly pays S – A’s loss is therefore assured; on the other hand, suppose
that
!
p
"
>1; then A is prepared to buy for
!
p
"
#S a bet on τ that will certainly pay S, and
42
To be paid a negative sum of money means to be deprived of that money.
43
Typically this procedure is considered a definition of subjective/epistemic probabilities
(cf. for example De Finetti (1990: 62)), while its role as a computational algorithm is
viewed as secondary. However, in my argument this procedure plays merely the role of
computational algorithm, and thus I shall ignore its main role as a definition.
44
S is assumed to be positive. The reader can easily construct a similar argument for S
negative.
50
this is a sure loss. If α and β are mutually exclusive events,
!
p
"
+ p
#
= p
"$#
(additivity):
suppose that
!
p
"
+ p
#
< p
"$#
; then A is prepared to sell the bet on α for
!
p
"
#S , the bet on
β for
!
p
"
#S and to buy them back as the bet on α∨β for
!
p
"#$
%S; in which case A incurs
a sure loss since she paid
!
S" p
#$%
&p
#
&p
% ( )
but the monetary value of the bet on α∨β
equals the sum of the monetary values of the individual bets on α and β
45
(likewise it’s
easy to show that if
!
p
"
+ p
#
> p
"$#
A’s loss is assured). This concludes the sketch of the
proof.
The Converse Dutch Book Theorem states that if an agent A’s probabilities
(determined in the standard way) comply with Kolmogorov’s Axioms, then there doesn’t
exist a sequence of transactions of the relevant bets (i.e. of those bets that correspond to
A’s probabilities) that assure a monetary loss to A. The proof of the Converse Dutch
Book Theorem is fairly elaborate and it will not be given here.
46
Combining the Dutch
Book Theorem and the Converse Dutch Book Theorem one gets the following theorem –
call it Complete Dutch Book Theorem: an agent’s probabilities (determined in the
standard way) comply with Kolmogorov’s axioms if, and only if, there doesn’t exist a
sequence of transactions of the relevant bets that assure a monetary loss to that agent.
Notice that the Dutch Book Theorem, the Converse Dutch Book Theorem and the
Complete Dutch Book Theorem are all –qua theorems– guaranteed to be valid.
45
One can merge the bet on α with the bet on β to form the bet on α∨β (and vice versa).
46
A detailed proof can be found in Kemeny (1955).
51
I now turn to the DBA. Per se, the DBA is a concise argument: since an agent’s
probabilities have to be determined in the standard way, and since an agent who performs
an act whose only consequence is a sure monetary loss is acting irrationally, then by the
Complete Dutch Book Theorem an agent whose probabilities don’t conform to
Kolmogorov’s Axioms is irrational and an agent whose probabilities conform to
Kolmogorov’s Axioms is, insofar as her probabilities are concerned, rational. In essence,
the conclusion of the DBA is that having probabilities that comply with Kolgomorov’s
Axioms is part of what we usually call ‘being rational’.
The DBA has been criticized for a number of different reasons. However, none of
the criticisms that have been advanced over time seems to be conclusive. I cannot
undertake an extensive discussion of all such criticisms here, but let me briefly consider
the main ones. Undoubtedly, the most famous criticism advanced against the DBA is the
following: not minimizing one’s losses (other things being equal) is compatible with the
laws of logic; hence the DBA does not prove that it is irrational to violate Kolmogorov’s
Axioms. Suppose that this criticism is sound (and I believe it is). Still, the DBA
effectively supports the weaker claim that an agent’s probabilities should conform to
Kolmogorov’s Axioms on pain of “violating” human nature. Thus if one takes the DBA
to merely support this claim, the DBA is unproblematic. A second important line of
criticism holds that since in certain circumstances a rational agent would fix a price h for
a bet on α∨β, where α and β are mutually exclusive, and prices k and l for the individual
bets on α and β such that
!
h"k +l, then, in those circumstances, Kolmogorov’s Axioms
fail. Intuitively speaking, the problem is that sometime money and utility are not
52
proportional to each other. Now I don’t deny that there is something to this argument, but
it is far from clear that the latter constitutes a fatal difficulty for the DBA. Indeed, it
seems that it is always possible to reestablish the proportionality between money and
utility. Consider the following case due to Maher (1993). A man has in his pockets only
sixty cents and cannot afford to buy the one-dollar ticket for a bus that would take him
home. Would this man pay sixty cents for a bet that pays one dollar just in case a coin
lands heads? He would indeed, for winning the bet would allow him to get on the bus and
go home. Likewise the man would also pay sixty cents for a bet that pays one dollar just
in case the coin lands tails. This means that the probability that the coin lands heads or
tails is 1.2, in violation of the normalization axiom. Sure, but one can readily fix the
situation either by lowering the payoffs of the single bets to, say, ninety cents or by
giving the man a dollar (either action will reestablish the proportionality between money
and utility for the man). The third criticism I wish to briefly examine is that determining
the probability of an unverifiable event through a bet is pointless, for clearly the payoff of
such a bet could never be cashed. Whereas this is a problem for the DBA, it is certainly
not a fatal one. All this criticism achieves is to limit the application of the DBA to
verifiable events (which are the vast majority anyway). And, at any rate, it is a significant
fact that in most cases the DBA guarantees that probabilities must conform to
Kolmogorov’s Axioms. Lastly, consider the most obvious objection that one could move
against the DBA: why should probabilities be determined by way of bets? There is surely
something to this objection, but it is unclear that it is fatal. After all, monetary bets
53
constitute the best way to measure directly and without relying on testimony the strength
of an agent’s beliefs.
So far, every attempt at rejecting the DBA failed: every time, a comparatively
simple response was sufficient to meet (partially or fully) the challenge. But the DBA is
in fact unsound. In the remainder of this section I shall examine an argument to this
effect. It will be seen that my argument cannot be neutralized by way of a simple
response.
We have seen that the DBA assumes that an agent’s probabilities have to be
determined by way of a specific two-steps procedure –which I called the ‘standard way’
of determining probabilities. Let χ be an event, S a sum of money and A an agent. First,
A fixes the price of a bet that pays S if χ occurs and zero otherwise; call such a price
!
S
"
.
Second,
!
p
"
, the probability of χ relative to A, is determined by imposing the condition
!
S
"
= p
"
#S (12)
But suppose that the second step of this procedure is replaced with the following:
!
p
"
, the
probability of χ relative to A, must satisfy the condition
!
S
"
= p
"
3
# S (13)
Now let P be the probability assignment
!
p
"
,p
#
,p
$
,p
%
,... and
!
" the probability
assignment
!
p
"
, p
#
, p
$
, p
%
,... If we determine an agent’s probabilities in the standard way –
as in the DBA– then by the Complete Dutch Book Theorem (and assuming that deciding
to endure a monetary loss is irrational) P, which in this scenario represents the agents’
probabilities, must conform to Kolmogorov’s Axioms on pain of irrationality. But what if
54
we impose the condition (13) instead? It can be proven that if we determine probabilities
by way of this condition, then (assuming that deciding to endure a monetary loss is
irrational)
!
P , which in this scenario represents the agents’ probabilities, must conform to
the following three axioms on pain of irrationality (here α and β are any two
incompatible events, and ω is a sure event):
47
∀χ,
!
p
"
#0 (14)
!
p
"
=1 (15)
!
p
"#$
3
= p
"
3
+ p
$
3
(16)
where (16) is obviously incompatible with Kolmogorov’s axiom of additivity. Thus, that
probabilities must be determined in the standard way is a crucial premise of the DBA, a
premise without which it is perfectly legitimate that a rational agent has non-
kolmogorovian probabilities. But this premise is false, and therefore the DBA is unsound.
In the remainder of this section I shall examine, and reject, two arguments to the effect
that probabilities have to be determined in the standard way by imposing the condition
that bet prices are proportional to probabilities.
Suppose an agent A believes equally that an event λ will occur and that λ will not
occur (i.e. A believes equally in λ and in ¬λ). Then A fixes the very same price b for a
bet that pays S if λ occurs (and zero otherwise) and for a bet that pays S if λ does not
occur (and zero otherwise). Since the price of the bet on the sure event λ∨¬λ is S on pain
47
This theorem is a variant of the Complete Dutch Book Theorem. And, like the latter, it
consists of two sub-theorems: a variant of the Dutch Book Theorem and a variant of the
Converse Dutch Book Theorem. The proof of the variant of the Dutch Book Theorem is
given in Appendix A.
55
of irrationality,
48
and since the price of the bet on λ∨¬λ equals the sum of the prices of
the bets on λ and on ¬λ, the following equation holds
49
!
b =0.5"S (17)
Now,
!
p
"
must satisfy condition (13), and because of (17),
!
p
"
#0.794. On the other hand,
!
p
"
must satisfy (12), and because of (17),
!
p
"
=0.5. But 0.5 seems a “better” probability-
value for λ than 0.794 is. After all, shouldn’t the probability of any event whose
occurrence and non-occurrence one equally believes be 0.5? Therefore bet prices should
be proportional to probabilities.
Now, this rationale might seem intuitively plausible or appealing, but in fact it is
faulty. Neither the fact that the event λ∨¬λ is certain, nor the fact that the probabilities of
λ and ¬λ are equal imply that the probability of λ is 0.5. If one is led to think that 0.5 is
the appropriate probability for λ, it is because one already has inadvertently assumed that
the probability of λ and the probability of ¬λ must add up to one; that is, because one has
inadvertently, implicitly assumed Kolmogorov’s axiom of additivity in the first place.
Frank Ramsey commits precisely this mistake in his famous essay “Truth and
Probability”: within an argument supposed to prove that probabilities must conform to
Kolmogorov’s Axioms, Ramsey takes for granted that the probability of an event whose
occurrence and non-occurrence an agent equally believes must be 0.5.
48
Suppose that the price of the bet on λ∨¬λ is
!
" S < S; then an agent is prepared to sell
that bet for
!
" S , thus incurring the sure loss
!
S" # S . Suppose, on the other hand, that the
price of the bet on λ∨¬λ is
!
" " S > S; then an agent is prepared to buy that bet for
!
" " S , thus
incurring the sure loss
!
" " S # S.
49
Cf. footnote 45.
56
The second argument is an argument from simplicity. The proportionality
between bet prices and probabilities should be preferred to any other numerical relation
in virtue of its simplicity. There are at least two reasons to reject this argument. The first,
obvious reason is that an argument from simplicity is ipso facto extremely weak. The
second reason is subtler. The degree of simplicity of a specific relation between bet prices
and probabilities merely mirrors the degree of simplicity of the additivity axiom that
corresponds to that relation: if for all χ the price of the bet on χ equals
!
f (p
"
f
)#S (where
!
f (") is a “well-behaved” function and
!
p
"
f
is just
!
f
"1
(p
#
)), the additivity axiom is (α and
β are any two independent events)
50
!
f (p
"#$
f
) = f (p
"
f
) + f (p
$
f
) (18)
The simplicity of the relation of proportionality between bet prices and probabilities
merely mirrors the simplicity of Kolmogorov’s axiom of additivity. It therefore seems
inappropriate to defend the superiority of this relation on grounds of simplicity (one
might as well directly defend the superiority of Kolmogorov’s axiom of additivity on
grounds of simplicity).
I conclude that the DBA is unsound and that, therefore, it offers no support to the
thesis that probabilities must comply with Kolmogorov’s Axioms.
3. The Representation Theorems Argument
The first representation theorem was proved by Savage (1954). Subsequently, variants of
Savage’s original theorem have been proved by Luce and Krantz (1971) and Maher
50
The proof of this statement is given in Appendix B.
57
(1993). Although there are minor differences among representation theorems, these
theorems are all fundamentally equivalent (at least from a philosopher’s perspective). In
this section I shall focus on a typical representation theorem. Let Γ be a set of mutually
exclusive states,
!
{S
1
,S
2
,...,S
k
,...}, and Π a set of outcomes,
!
{"
1
,"
2
,...,"
k
,...}. Let a lottery
be a function from Γ to Π; that is, a lottery associates to each state an outcome (I
designate lotteries with lowercase letters –a, b, c, etc.). Now let ‘
!
a"b’ stand for ‘lottery
a is not preferred to lottery b’.
51
Lastly, let the Expected Utility of a lottery x, EU(x), be
defined in the following way
EU
!
(x)" u[x(S
i
)]
i
#
$%(S
i
) (19)
where
!
u["] is a utility function
52
from a set U of linearly dependent utility functions
53
and
!
"(#) is any probability assignment that conforms to Kolmogorov’s axioms.
54
Then, the
following representation theorem holds.
If the preference relation ‘
!
"’ complies with certain rationality constraints,
55
then
there are a probability assignment
!
"(#) and a utility function
!
u["] such that for any
51
An agent is indifferent between two lotteries a and b if, and only if,
!
a"b#b" a.
52
!
u["] associates to any outcome
!
"
i
a unique real number
!
u["
i
].
53
That is any utility function
!
u["] of U satisfies the equation
!
u["] = C# u ["], where
!
C"#
and
!
u ["]# U.
54
Let Ω be the disjunction of all the states, i.e.
!
"#S
1
$S
2
$S
3
$...; then
!
"(#) =1
(normalization). For any state
!
S
i
,
!
"(S
i
)#0 (non-negativity). For any two states
!
S
i
and
!
S
j
,
!
"(S
i
) +"(S
j
) ="(S
i
#S
j
) (additivity).
55
Two of the rationality constraints are: for any two lotteries, either the first is not
preferred to the second or the second is not preferred to the first, or both (trichotomy); the
preference relation ‘
!
"’ must be transitive. The other rationality constraints require an
58
two lotteries a and b:
!
a"b if, and only if, EU(a)
!
"EU(b). Furthermore
!
"(#) is
unique.
Let’s briefly examine the meaning and the significance of the representation theorem I
have enunciated by way of an example. Consider a system, or state of affairs ∑ that is in
three possible (and mutually exclusive) states
!
S
1
,
!
S
2
,
!
S
3
(i.e. ∑ is either in state
!
S
1
or in
!
S
2
or in
!
S
3
). The lotteries a, b, c,… each associate to the three states of ∑ an outcome. a
associates the outcomes
!
"
2
,
!
"
3
,
!
"
7
to
!
S
1
,
!
S
2
,
!
S
3
respectively; b associates the outcomes
!
"
4
,
!
"
4
,
!
"
9
to
!
S
1
,
!
S
2
,
!
S
3
respectively; and so on for each lottery. Any agent values an
outcome according to a utility function
!
u["]; e.g. the value of
!
"
1
is
!
u["
1
], the value of
!
"
2
is
!
u["
2
], and so on. Now, the representation theorem above simply guarantees, under
certain mild rationality constraints, that i) there is a unique probability assignment to the
states
!
"(S
1
),
!
"(S
2
) ,
!
"(S
3
) and that there is at least one function
!
u["] (in fact if there is
one it’s trivial to prove that there are infinitely many) such that ii) for any two lotteries x,
y an agent prefers y to x, or is indifferent between x and y, (in symbols
!
x" y) just in case
EU(x)
!
"EU(y), that is just in case the weighted sum of the utilities associated to the three
outcomes that correspond to the states
!
S
1
,
!
S
2
,
!
S
3
through x does not exceed the weighted
sum of the utilities associated to the three outcomes that correspond to
!
S
1
,
!
S
2
,
!
S
3
through
y; the two sums are weighted by the (unique) probabilities associated to each state (i.e.
!
"(S
1
),
!
"(S
2
) ,
!
"(S
3
) ). For example, the theorem guarantees that
!
a"b just in case EU(a)
is not larger than EU(b), that is, just in case
extensive presentation, which is outside the scope of this paper. For a complete and
accessible discussion see Maher (1993: Ch. 8) and Jeffrey (1983: Ch. 9). Luce and Krantz
(1971) is a more technical, and demanding, exposition.
59
!
u["
2
]#(S
1
) +u["
3
]#(S
2
) +u["
7
]#(S
3
)$
!
u["
4
]#(S
1
) +u["
4
]#(S
2
) +u["
9
]#(S
3
) (20)
for some utility function
!
u["]. As for the significance of the representation theorem above,
the theorem –just like all representation theorems– simply guarantees, under certain
rather weak conditions, that an agent’s qualitative preferences among lotteries may be
represented numerically only by way of a unique probability assignment. That is
representation theorems guarantee that if certain mild conditions are respected, then an
agent’s qualitative preferences among combinations of uncertain outcomes –lotteries are
just a general, abstract way of modeling combinations of uncertain outcomes– can be
represented numerically only by way of a unique probability assignment. Now to the
Representation Theorems Argument (RTA) proper.
The RTA is not directly based on representation theorems, but on the
contrapositive of their existence component. The contrapositive of the existence
component of the representation theorem above is the following.
If there aren’t a probability assignment
!
"(#) and a utility function
!
u["] such that
for any two lotteries a and b
!
a"b if and only if EU(a)
!
"EU(b), then some
rationality constraints have been violated.
Representation theorems and their contrapositives are guaranteed, qua theorems, to be
valid. Finally, the following is a typical statement of the RTA. Suppose that an agent’s
probabilities do not comply with Kolmogorov’s Axioms. Then (for that agent) there isn’t
a probability assignment
!
"(#) such that for any two lotteries a and b,
!
a"b if and only if
EU(a)
!
"EU(b). But then, by way of the contrapositive of the existence component of the
representation theorem, that agent’s preferences violate some rationality constraints.
60
Hence an agent whose probabilities do not comply with Kolmogorov’s Axioms is
irrational.
56
As it is, the RTA is flawed. The fact that (i) an agent’s probabilities do not
comply with Kolmogorov’s Axioms does not imply that (ii) for that agent there isn’t a
probability assignment
!
"(#) such that for any two lotteries a and b,
!
a"b if and only if
EU(a)
!
"EU(b). This is immediate. Suppose that the preferences of an agent A comply
with the rationality constraints of a representation theorem. Then, by a representation
theorem (ii) is false (relatively to A). On the other hand, nothing forbids one to represent
A’s degrees of beliefs by way of an assignment
!
"
#
($)% f "($)
( )
that does not comply with
Kolmogorov’s axioms (indeed, nothing forbids one to represent A’s lottery values –
utilities hereafter– by way of the pair
!
"
#
($), EU
!
*
(x), where the latter is
EU
!
*
(x)" u[x(S
i
)]# f
$1
%
*
(S
i
)
( )
i
&
(21)
since from the definition of
!
"
#
($) and from (19)
EU
!
*
(x) =
!
u[x(S
i
)]" f
#1
f $(S
i
)
( ) ( )
i
%
= EU(x)) (22)
Let therefore
!
"
#
($) represents A’s degrees of beliefs. (i) is then true (relatively to A). It
follows that (i) does not imply (ii).
Zynda (2000) and Christensen (2001) offer a reconstruction of the RTA that
differs from mine in a rather important aspect. In their reconstruction of the argument (i)
(an agent’s probabilities do not comply with Kolmogorov’s Axioms) is replaced by (i′):
56
Some critics of the RTA have argued that some of the rationality constraints of a
representation theorem are not genuine conditions of rationality: an agent may violate
some of these constraints and at the same time be perfectly rational. For the sake of the
argument I shall ignore this criticism.
61
an agent’s real probabilities do not comply with Kolmogorov’s Axioms. Now, if (i′)
implies (ii), Zynda’s and Christensen’s version of the Representation Theorem Argument
is sound. But that is not the case. In the context of representation theorems probabilities
are just numerical representations of qualitative degrees of belief. Hence, only qualitative
degrees of belief may or may not be real (or actual, true, etc.). The locution ‘real
probabilities’ is meaningless –it’s a category mistake. If anything, Zynda and Christensen
must mean that an agent’s probabilities are unique. Assuming that they do, the
contrapositive of ‘(i′) implies (ii)’ reads: suppose that, for a certain agent, there exists a
probability assignment
!
"(#) such that for any two lotteries a and b,
!
a"b if and only if
EU(a)
!
"EU(b); then that agent’s unique probabilities do comply with Kolmogorov’s
Axioms. But this entailment is clearly false. The mere fact that the degrees of belief and
the utilities of an agent are representable by way of
!
"(#) and EU(
!
") neither implies that
that agent’s probabilities must comply with Kolmogorov’s Axioms nor that that agent’s
probabilities are unique (for instance the agent’s degrees of belief may be represented by
!
"
#
($) –and her utilities by EU
!
*
(x)).
Zynda (ibid.) has suggested that if one was able to show that the pair
!
"(#), EU(
!
")
is the best representation of degrees of belief and utilities, then the RTA would be a
sound argument. Setting aside that it’s not at all clear what it means that a representation
of degrees of belief and utilities is the best one, Zynda’s suggestion, in my view, misses
the point. Consider the problematic portion of the RTA, the implication from (i) to (ii).
Even assuming that
!
"(#), EU(
!
") is the best representation of degrees of belief and utilities
–whatever that means– the fact that an agent’s degrees of belief are not represented by
62
!
"(#) does not imply that that agent’s degrees of belief cannot be represented by
!
"(#). The
reason is exceedingly simple. Suppose that the pair
!
"
#
($), EU
!
*
(") represents an agent’s
degrees of belief and utilities. In the context of representation theorems the word ‘best’
(and ‘superior’, ‘worst’, etc.) as applied to representations is meaningless: representations
are all equally good. It follows that
!
"
#
($), EU
!
*
(") and
!
"(#), EU(
!
") are equivalent, which
means that that agent’s degrees of belief could be also represented by way of
!
"(#), EU(
!
").
The RTA is a sound argument only if (i) implies (ii). In turn, the implication from
(i) to (ii) is true only if the possibility of representing an agent’s degrees of belief and
utilities by way of a representation different from
!
"(#), EU(
!
") excludes the possibility of
representing that agent’s degrees of belief and utilities by way of
!
"(#), EU(
!
"). And this is
not the case: there are infinitely many representations of degrees of belief and utilities
which are all perfectly compatible with (indeed equivalent to)
!
"(#), EU(
!
").
57
I conclude
that the RTA is a faulty rationale and that, therefore, the RTA can offer no support to the
thesis that probabilities must conform to Kolmogorov’s Axioms.
4. Remarks on the DBA and the RTA
It is fair to say that Probabilism (the view that degrees of belief must comply with
Kolmogorov’s axioms) has contributed to much of the appeal of the subjective/epistemic
57
Proof. Let
!
"
#
($)% f
#
"($)
( )
where
!
"#$, an index set, and
!
f
"
#
( )
is such that for any two
states
!
S
h
and
!
S
k
,
!
"
#
(S
h
)$"
#
(S
k
) if and only if
!
"(S
h
)#"(S
k
). Now let
EU
!
"
(x)# u[x(S
i
)]
i
$
% f
"
&1
'
"
(S
i
)
( ). It is immediate that for any lottery x and for any
!
"#$, EU
!
(x) =EU
!
"
(x). Therefore, since there are infinitely many
!
f
"
#
( )
, the
representation
!
"(#), EU(
!
") is equivalent to and compatible with infinitely many
representations (i.e. all of the pairs
!
"
#
($), EU
!
"
(#)). QED
63
interpretation of probability theory: a system of subjective probabilities that
“automatically” comply with the axioms of probability theory is certainly more
interesting than a system of subjective probabilities that don’t. It is perhaps for this
reason that Probabilism is widespread among the proponents of the subjective/epistemic
interpretation.
If I am right, however, Probabilism is largely unsupported. The DBA and the
RTA are (by far) the main arguments for the thesis that probabilities must conform to
Kolmogorov’s Axioms –the main thesis of Probabilism. But both arguments are flawed.
A crucial premise of the DBA turns out to be unwarranted. And the RTA hinges on an
invalid implication.
5. Bayesian Confirmation Theory
The so-called Bayesian Confirmation Theory aims at providing a framework for the
confirmation (or disconfirmation) of uncertain hypotheses by means of empirical
evidence that is related to those hypotheses. The core tenets of this doctrine are
essentially two: first, degrees of belief must be formulated numerically in accordance to
Kolmogorov’s axioms (i.e. subjective probabilities are presupposed); second, degrees of
belief about hypotheses conditional on evidence must be calculated in accordance with
Bayes’s Theorem. Let H be a hypothesis and
!
"(H) my degree of belief in it, and let E be
evidence in agreement or disagreement with H and
!
"(E) my degree of belief in E. Then,
by Bayes’ Theorem, my degree of belief in H conditional on the manifestation of E,
64
!
"(HE), must be computed as
!
"(HE) =
"(EH)#"(H)
"(E)
(23)
One of the main strengths of Bayesian Confirmation Theory (so it is claimed) lies
in a specific instance of (23). Let H be a universal hypothesis of the type ‘all As are Bs’
and let E be a confirming instance of H (i.e. that a certain A is also a B). Since E follows
logically from H,
!
"(EH) =1. We can then re-write (23) as
!
"(HE) =
"(H)
"(E)
(24)
From the reasonable assumption that
!
0 <"(E) <1, and from assuming that
!
"(H) >0,
follows that
!
"(HE) >"(H). Now suppose that H is a scientific theory; then, in general, if
E is an instance of the theory H, then
!
"(HE) >"(H), that is E confirms H.
Thus Bayesian Confirmation Theory offers a simple solution to the ancient
philosophical problem of induction and of confirmation of scientific laws. But there is
more. Let A and B be two Bayesian agents
58
that are asked to evaluate the probability of
a set of mutually exclusive hypotheses and suppose that A assigns probability zero to an
hypothesis just in case B does; then it can be shown
59
that in the long run (that is after a
sufficient number of belief-updates via Bayes’ Theorem) A and B assign nearly identical
probabilities to each hypothesis.
58
In general by ‘Bayesian agent’ I will mean an agent who formulates his degrees of
beliefs quantitatively, in agreement with Kolmogorov’s axioms (in this particular case
countable additivity is assumed in place of the weaker finite additivity) and who updates
his beliefs by conditionalization alone.
59
See, for example, Howson and Urbach (1993).
65
Now let X be any Bayesian agent and let
!
H
i
be any hypothesis to which X has
assigned non-zero probability; then it can be shown
60
that in the long run: i) if all future
evidence relevant to
!
H
i
confirms
!
H
i
,
!
"(H
i
) will tend to one; whereas ii) if all future
evidence relevant to
!
H
i
disconfirms
!
H
i
, then
!
"(H
i
) tends to zero.
In virtue of my analysis in §2, §3 and 4§ the theses just expounded do not hold;
for if my analysis is correct, then Kolmogorov’s axioms don’t hold, and, therefore,
!
"(EH) =1 doesn’t hold (the latter can only follow from the normalization axiom). In §6
and §7, however, I will present independent arguments against Bayesian Confirmation
Theory. These arguments grant that subjective probabilities must conform to
Kolmogorov’s axioms. In §6 I will examine a much misunderstood criticism due to Karl
Popper and in §7 I will present an improved variant of Popper’s criticism as well as a
novel criticism.
6. The problem of “Zero Priors”
We have just seen that if H is a universal hypothesis and E is confirming evidence for H,
then, by means of (2), the probability of H given that E is the case,
!
"(HE), is greater
than the initial probability of H,
!
"(H). This is true, however, only if
!
"(H)#0, as it is
obvious from (2): if
!
"(H) equals zero, no evidence will ever increase the probability of
H. Thus a universal hypothesis can be confirmed by appropriate evidence just in case the
initial probability of that hypothesis is nonzero. Karl Popper, in The Logic of Scientific
Discovery (appendix vii), has argued that the initial probability of a universal hypothesis
60
See ibid.
66
in fact must be zero and that, consequently, the probability of a universal hypothesis
cannot be increased by confirming evidence. In the reminder of this section I shall focus
on Popper’s argument and on two criticisms that have been advanced against it.
Here is the argument. Let A be a monadic predicate and U an infinite but
countable collection of individuals,
!
{u
1
,u
2
,...,u
k
,...}. Let
!
A
i
stand for ‘A is true of
!
u
i
’.
We want to compute the probability of the universal hypothesis
!
H
†
"(i)A
i
. Now,
!
H
†
is
just the infinite conjunction
!
A
1
"A
2
"..."A
k
"..., and since any two
!
A
i
,
!
A
j
must be
independent, the probability of this conjunction must equal the product of the
probabilities of the single instances of the property A, that is
!
"(A
1
#A
2
#...#A
k
#...) ="(A
1
)$"(A
2
)$...$"(A
k
)$... (25)
For to deny the independence of the single instances of A would amount to assume that
inductive learning about the future occurrence of
!
A
k +1
from the occurrence of
!
A
i
is
possible.
61
Furthermore there is no reason to suppose that
!
"(A
i
)#"(A
j
); hence set
!
"(A
k
) = p for all
!
k"#. Then
!
"(H
†
) can be easily computed in the following way
!
"(H
†
)#"($
i=1
%
A
i
) =&
i=1
%
"(A
i
) =lim
i'%
p
i
(26)
where
!
lim
i"#
p
i
is zero for all
!
0" p <1. Thus
!
"(H
†
) =0. This concludes Popper’s argument.
61
In general the probability of
!
A
1
"A
2
"..."A
k
"... is calculated as
!
"(A
1
#A
2
#...#A
k
#...) ="(A
1
)$"(A
2
A
1
)$"(A
3
A
1
#A
2
)$...
However, in order to rule out “inductive biases” one needs to impose the condition that,
for all i,
!
"(A
i
A
1
#...#A
i$1
) ="(A
1
). This condition states that the probability that
!
A
i
will
be the case doesn’t depend on whether instances of the property A have or have not
occurred in the past.
67
The first criticism I want to examine is Earman’s. In (1992: pp. 92-94) Earman
produces a mathematical argument that he believes shows the inconsistency of (25), and,
therefore, of Popper’s rationale. This is his argument. Instead of associating a fixed
probability value to a single instance
!
A
i
, associate to the latter a variable probability,
!
Pr
p
(A
i
). Furthermore, let a second order probability be associated to each value of
!
Pr
p
(A
i
). In other words, make an agent think of
!
A
i
as having any possible probability-
value x between zero and one, but at the same time force the agent to associate to each x a
(second order) probability that x is in fact the true probability-value of
!
A
i
. Then, Earman
claims, the following ‘consistency condition’ must hold
!
Pr(A
i
) = Pr
p
(A
i
)
0
1
"
Pr(dp) (27)
that is, in a less awkward notation
!
"(A
i
)#$[x
i
] = y
0
1
%
& f
i
(y)dy (28)
where
!
x
i
is the (first order) variable probability associated to
!
A
i
,
!
f
i
(") is the probability
density associated to
!
x
i
, and
!
"[#] is just the expected value function. Next Earman
proceeds to apply Popper’s condition (25) to
!
Pr
p
("), thus getting
!
Pr
p
(A
1
"A
2
"..."A
k
"...) =Pr
p
(A
1
)#Pr
p
(A
2
)#...#Pr
p
(A
k
)#... (29)
So far, so good. Then Earman claims that Popper’s condition (25) also “translates” into
the following equation
!
Pr(A
l
"..."A
k +l
A
1
"...A
l#1
) =Pr(A
l
"..."A
k +l
) (30)
However, (30) is equivalent to
!
Pr A
l
"..."A
l +k
( ) = Pr(A
i
)
l
l +k
#
(31)
68
which by (27), by (28) and by assuming that
!
(i)(j)Pr
p
(A
i
) =Pr
p
(A
j
) = p
[ ]
(32)
is equivalent to
!
p
k
"Pr(dp) = p"Pr(dp)
0
1
#
( )
0
1
#
k
(33)
or, in standard notation
!
p
k
" f(p)dp = p" f(p)dp
0
1
#
( )
0
1
#
k
(34)
Since both (32) and (33) only hold true for a probability distribution
!
Pr(dp), or
!
f (p)dp,
that assigns all the weight to a single value and since (30) is equivalent to (32) and (33),
then (30) can hold only if a probability distribution assigns all the weight to a single
value. Earman concludes (actually from a variant of (32) and (33)) that Popper’s
condition (25) ‘requires the agent to be certain from the start about the value of p [i.e. of
the first order probability of an instance
!
A
i
], which is surely dogmatism rather than
skepticism’.
Earman is, however, mistaken. The problem in his argument is that (30) does not
follow from (25), as he thinks, and in fact is not even a meaningful condition. Consider
the following variant of (30) as applied to two instances
!
A
i
,
!
A
i+1
!
Pr
d
(A
i+1
A
i
) =Pr
d
(A
i+1
) (35)
What does (35) state? It states that the variable first-order probability associated to
!
A
i+1
is
independent from the occurrence of
!
A
i
. But this doesn’t make any sense! Since by
hypothesis the probability associated to
!
A
i+1
a) is variable and b) takes all values between
zero and one, how can such a probability be dependent on or independent from the
69
occurrence of
!
A
i
(or of anything else for that matters)? (35), just like (30), is the
mathematical equivalent of a category mistake. It is not on
!
Pr
p
("), as Earman thinks, that
Popper’s condition must be imposed, but, rather, on the probability distributions of the
variable probabilities. That is, if
!
f
i
(") is the (second order) probability density function
associated to the (first order) probability of
!
A
i
, Popper’s condition simply asserts that
!
f
i
(") is invariant with respect to the occurrence or non-occurrence of
!
A
i
, that is
!
(i)f
i
(") = f (") (36)
The second criticism of Popper’s argument that I wish to examine is Howson’s
(1973: pp. 154-158). The criticism is twofold. First, Howson contends that assuming, as
Popper does, independence, i.e. (25), is just as unwarranted as assuming dependence, and
that, therefore, Popper’s argument rests on an ad hoc presupposition. But Howson’s
claim is unconvincing. The point of Popper’s argument is not that independence implies
that universal generalizations must have probability zero. It’s the contrapositive of this
implication that is crucial: if a universal generalization has non-zero probability, then
independence does not hold. In other words, the point of Popper’s argument is that the
dependence among the instances of A is a necessary condition for the inequality
!
" (i)A
i
( )
>0 to hold. Why is this a problem for the Bayesian? An increase in the
probability of
!
(i)A
i
can only occur if the initial probability of the latter is non-zero,
which, in turn, can only be the case if the instances of A are dependent. But this means
that the ‘inductive confirmation’ associated with an increase in the probability of
!
(i)A
i
is
futile, for it already presupposes induction to begin with.
70
A possible option for the Bayesian is to develop an argument to the effect that
postulating independence has unacceptable, paradoxical consequences –in which case
one would be perfectly justified to discard it. This is, it seems, the strategy that Howson
pursues in the second part of his criticism. Suppose that one’s initial probability that a
certain coin lands heads is p. Then, if independence is assumed, by the central limit
theorem the probability that the relative frequency of heads lies in the interval
!
[p"#,p +#], where ε can be as small as one likes, converges to 1 as
!
n"#. So far, so
good. But, Howson observes, ‘if one is permitted to use a uniform prior distribution over
the physical probabilities
62
of one given coin […] then for small ε and large n it is almost
“almost certain” that the actual long run relative frequency of heads on that coin will lie
outside such an interval’. Thus, the argument runs, independence leads to inconsistencies.
But does it? Maybe so, but certainly not as a consequence of this contradiction.
Independence is the crucial element in the first probability assignment, but it plays a
secondary role in the second, where the crucial role is played by the principle of
indifference; hence the contradiction might be due to the latter. And indeed, it seems to
me that the contradiction is caused by a poor use of the principle of indifference in the
second probability assignment. To see why, consider first a sequence of n tosses (for the
sake of simplicity let n be even). There is only one possible sequence for which the
frequency of heads is zero (the sequence ‘TTTTTT…’). But there are n sequences for
which the frequency of heads is
!
1/n. And there are
!
n(n"1)/2 sequences for which the
frequency of heads is
!
2/n . There is an even larger number,
!
n(n"1)(n"2)/3!, of
62
‘physical probabilities’ is just an alias for infinite relative frequencies.
71
sequences whose frequency is
!
3/n . And so on, until the frequency
!
1/2, after which the
number of sequences associated to each frequency grows progressively smaller. Now let
r be the ratio of the number of sequences that lie in the interval
!
[1/2"#,1/2 +#], where δ
is as small as one likes, to the number of sequences that lie outside that interval. As n
grows larger and larger, also r grows larger and larger, i.e.
!
lim
n"#
r =#. But this means that
if we assign, following Howson, the same probability to the possibility that the frequency
of heads lies in the interval
!
[1/2"#,1/2 +#] and to the possibility that that frequency lies
in, say, the interval
!
[0,2"], we do so not in agreement with the principle of indifference,
but rather in complete violation of the latter: the number of sequences in
!
[1/2"#,1/2 +#]
is infinitely larger (in the sense just discussed) than the number of sequences in
!
[0,2"]
and the principle of indifference prescribes equal probabilities for equinumerous
alternatives.
7. “Zero Priors” Again
In §6 I have defended Popper’s independence argument from two criticisms. However,
Popper’s argument is not unproblematic. One might object, for example, that to require
that
!
"(A
i
) ="(A
j
) for all i, j –as Popper does– is unwarranted, for this condition doesn’t
follow from independence (e.g. a random numeric fluctuation in the sequence
!
"(A
1
) ,
!
"(A
2
),
!
"(A
3
), … is perfectly compatible with independence). Let us therefore investigate
more accurately whether dependence is, or is not, a necessary condition for the initial
probability of a universal generalization to be nonzero; that is, let us examine directly the
logical consequences of assuming that the initial probability of a universal generalization
72
is greater than zero. As usual,
!
A
1
,...,A
n
are the first n instances of a property A. By the
probability calculus we may write (cf. footnote 63 below)
!
"(A
1
#...#A
n
) ="(A
1
)$ "(A
i
A
1
#...#A
i%1
)
2
k
&
'
(
)
*
+
,
$"(A
k +1
#...#A
n
A
1
#...#A
k
) (37)
which for
!
n"# is the initial probability of the universal generalization
!
(j)A
j
, i.e.
!
" (j)A
j
( )
="(A
1
)# "(A
i
A
1
$...$A
i%1
)
2
k
&
'
(
)
*
+
,
#lim
n-.
"(A
k +1
$...$A
n
A
1
$...$A
k
) (38)
Now suppose that
!
" (j)A
i
( )
= p >0 (39)
From (38) and (39) it immediately follows that
!
(k) "(A
i
A
1
#...#A
i$1
)
2
k
%
& p (40)
But (40) implies that
63
!
lim
n"#
$(A
n
A
1
%...%A
n&1
) =1 (41)
Thus (41) is a necessary condition for the initial probability of
!
(i)A
i
to be non-
zero. But what does (41) say? Simply put, it says that for increasing values of n the
63
Proof. First, suppose toward a contradiction that
!
lim
n"#
$(A
n
A
1
%...%A
n&1
) =q <1. This
implies that there is an integer m such that for all
!
n"m
!
"(A
n
A
1
#...#A
n$1
)%1$&, where
!
" >0. But this in turn implies that there is an s such that
!
"(A
i
A
1
#...#A
i$1
)
m
s
%
< p,
which is to say that
!
"(A
i
A
1
#...#A
i$1
)
2
s
%
< p.
Now suppose now toward a contradiction that there isn’t a real number l such that
!
lim
n"#
$(A
n
A
1
%...%A
n&1
) =l. That is suppose that for each
!
l"[0,1] there is a real number
!
"
l
>0 such that for no integer value r,
!
P(A
n
A
1
"..."A
n#1
)#l <$
l
is true for all
!
n"r. In
particular, this means that there is a positive real
!
"
1
such that for no integer r,
!
"(A
n
A
1
#...#A
n$1
)%[1$&
1
,1] for all
!
n"r. But this is tantamount to say that there are
infinitely many
!
A
i
for which
!
"(A
i
A
1
#...#A
i$1
) <1$%
1
; and this is in contradiction with
(20). QED
73
probability of
!
A
n
must, sooner or later, grow closer and closer to one. This means that,
since
!
A
n
occurs after the occurrences of
!
A
1
through
!
A
n"1
, (41) represents a form of
dependence as well as a form of inductive learning. But ignore for a moment this (very
reasonable) interpretation of (41). The mathematical meaning of this equation is itself
utterly problematic. For (41) directly implies that only finitely many of the infinitely
many instances of A have probability less than, say, 0.999999. That is, (41) presupposes
that the (by far) vast majority of instances of A have an extremely high probability. But
this is unacceptable considering that such probabilities are initial probabilities.
After all this mathematics, the reader might wonder whether there is a purely
philosophical argument for “zero priors”. I think that there is one such argument.
Assigning to a universal generalization an initial probability greater than zero works well
in practice. The reason is obvious: so far, our world has been uniform with respect to both
space and time. But suppose that sometime in the future the world will turn non-uniform,
disorderly. Assigning to a universal generalization T an initial probability greater than
zero will not work so well, for the observation of “confirming evidence” for T will
wrongly raise the initial probability of T (sooner or later T will be disproved). For
instance, the observation of instances of the property A will raise the initial probability of
!
(i)A
i
; but mistakenly so, since sooner or later an instance of ¬A will disprove
!
(i)A
i
. On
the other hand, assigning null initial probabilities to universal generalizations works well
no matter what the world is like, as it is easy to check.
My simple thought experiment, however, seems to admit an obvious objection.
For, one might observe, after spending some time in the “new” disorderly world agents
74
would start to assign null initial probabilities to universal generalizations. Perhaps. But if
this were the case, assigning zero versus nonzero initial probabilities would be an
empirical matter. That is, the assignment of initial probabilities to universal
generalizations would incorporate an inductive bias: nonzero initial probabilities would
be assigned upon having observed that the world is uniform, and null initial probability
upon having observed that it isn’t.
To summarize, there are two alternatives: either assigning nonzero initial
probabilities to universal generalizations won’t work in general, or it will work but at the
price that an agent must assign zero or non-zero initial probabilities empirically, i.e.
depending on whether the world appears to that agent uniform or disorderly. Either
alternative is fatal to Bayesian Confirmation Theory: the first alternative for obvious
reasons; the second because it allows an increase in the probability of universal
generalizations at too high a price – that of presupposing induction to begin with.
8. Remarks on Bayesian Confirmation Theory
Bayesian Confirmation Theory constitutes possibly the most important application
(certainly the most far reaching) of subjectively interpreted probabilities. Qua offshoot of
Subjectivism, this doctrine faces the criticism I have presented in the first part of the
chapter. However in the second part of the chapter I have discussed an additional reason
for which Bayesian Confirmation Theory is pointless: the initial probability of a universal
generalization must be zero. Popper’s argument to this effect, albeit inappropriately
criticized, is not unproblematic. In §7 I have shown how to fix (and improve on) this
75
argument. Finally, I have presented a third, purely philosophical, objection to Bayesian
Confirmation Theory.
76
Chapter 3: the Relative Frequency Interpretation of Probability
1. Introduction: the Relative Frequency Interpretation
Unlike the classical/logical and the subjectivist interpretations, the relative frequency
interpretation asserts that probability is a physical fact of the world: the probability-value
of an outcome is the actual frequency of occurrence of that outcome. For this reason,
frequentism is certainly the most popular interpretation among natural scientists.
The relative frequency interpretation comes in two flavors: finite frequentism and
infinite frequentism. Within finite frequentism the probability associated to an outcome O
of a certain indefinitely repeatable event E equals the relative frequency of O, the ratio of
the number of occurrences of O to the (pre-established) number of repetitions of E. For
instance, the probability that a coin yields heads when tossed is the ratio of the number of
heads to the number of tosses. Infinite frequentism, on the other hand, prescribes that the
probability of an outcome of an indefinitely repeatable event E is the ratio of occurrence
of that outcome within an infinite repetition of E. Since the mathematical ratio of infinite
magnitudes cannot be computed directly, the ratio of occurrence of an outcome O in a
repeatable event E – i.e. the probability of O – is defined as
!
"(O)#limf
n
n$%
, where
!
f
n
is
the relative frequency of O with respect to n repetitions of E (i.e.
!
f
n
is the ratio of the
number of occurrences of O, when E is repeated n times, to n).
64
64
That is, for any
!
" >0 there exists a number N such that for all
!
n >N:
!
f
n
"#(O) <$.
77
The content of the chapter is the following. In §2 I clarify what I take infinite
frequentism to be and I discuss, and dismiss, a criticism moved against infinite
frequentism by Alan Hajek. In §3 I explain why infinite frequentism is to be preferred to
finite frequentism. In §4 I turn to the main problem of infinite frequentism: its
epistemological status. In particular, in §4.1 I present the details of the problem; in the
first part of §4.2 I show that the traditional falsificationist approach of classical statistics
cannot deal effectively with the problem and that a novel approach is needed, and in the
second part of §4.2 I present the details of this approach; lastly, §4.3 extends this novel
approach to the case of interval estimation. §5 and §6 deal with two entrenched problems
that allegedly plague infinite frequentism, the ‘single case problem’ and the ‘reference
class problem’. §7 examines, and solves, another old problem of infinite frequentism: the
problem of using relative frequencies in decision-making. Lastly, in §8 I present and
discuss a possible alternative to infinite frequentism.
2. Infinite Frequentism: a Clarification
Before proceeding with my analysis, it is appropriate to clarify what exactly I take
infinite frequentism to consist in. In this section I shall discuss three issues on the nature
of this doctrine that have been object of controversy or ambiguity.
The most important among these issues pertains to the fact that within infinite
frequentism, relative frequencies are to be computed on the basis of infinite sequences of
outcomes. Consider the probability of getting heads by tossing a coin. According to
infinite frequentism the probability of the outcome ‘heads’ is the relative frequency of
78
that outcome within an infinite sequence of tosses. But there is no such thing as an
infinite sequence of tosses of a coin in the real world! In general, events whose
probability we are interested into do not come in infinite instances. Thus, the infinite
sequences infinite frequentism talks about cannot be actual or real: they must be
hypothetical. That is, the probability-value of an outcome O of a repeatable event E is the
limiting value of the relative frequency of O, if the sequence of instances of E is extended
ad infinitum (to exemplify: the probability of ‘heads’ is its the relative frequency, were
the coin tossed forever).
The second issue concerns a proposal advanced by Richard Von Mises, the
pioneer of infinite frequentism. Von Mises has argued (1957: pp. 23-29) that a constraint
must be imposed on infinite sequences of outcomes, if these are to serve as a basis for
probability-values. Let a place-selection be a selection of elements of an infinite
sequence, such that an element of the original sequence is chosen just in case its position
satisfies some function or fixed rule (selecting the elements that are in an even position of
the sequence and selecting elements whose position is a perfect square are examples of
place-selection).
65
Then, Von Mises argues, the limit of the relative frequency of an
outcome in an infinite subsequence obtained by place-selection from an original sequence
must be the same as that of the latter (this he called Principle of Randomization). If for
example the limit of the relative frequency of an outcome O in a subsequence obtained by
selecting the even elements of a sequence S is 0.4, but the limit of the relative frequency
of O in S is 0.5, then O doesn’t possess a probability-value. I shall not follow Von Mises
65
Church (1940) has made of the vague concept of place-selection a precise notion: a
place-selection is such if, and only if, it can be expressed as a recursive function.
79
on this point: the Principle of Randomization need not be satisfied by a sequence of
outcomes. The reason is pragmatic: we want to reject incorrect probability-values, and we
do this by applying a peculiar statistical test to sequences of outcomes (this will be fully
illustrated in §4); but individuating sequences of outcomes that do not satisfy the
principle of randomization would require a statistical test of enormous complexity.
The last question I want to touch upon is an apparently unproblematic aspect of
infinite frequentism: how to order the set of outcomes of a repeatable event. In general, it
seems natural that the order of the outcomes of an event should reflect the chronological
order in which they have occurred. If the outcome of tossing a coin has occurred
immediately earlier than another toss of that coin, then the former is the immediate
predecessor of the latter in the sequence of tosses/outcomes. But what if beside time there
is another equally natural way of ordering outcomes? Hajek (2007) presents a case in
which a coin is repeatedly tossed on board of a train that moves in a peculiar back and
forth motion. The outcomes of the tosses are reported in Figure 2, in which the time
increases vertically (bottom to top) and the distance traveled by the train (and hence by
the coin) increases horizontally (left to right).
80
Figure 2: Tosses of a Coin on a Train in Back and Forth Motion
…
T
…
T
H
H
T
H
H
T
H
H
From Fig. 2, the spatial ordering of tosses is HTHTHTHT…, to which corresponds the
relative frequency limit ½; the temporal ordering is instead HHTHHTHHT…, to which
correspond a different relative frequency limit,
!
2
3
. Since, the argument runs, there is no
more reason to order the tosses chronologically than there is reason to order them
spatially (with respect to the Earth’s surface, that is), and since the chronological ordering
of the sequence yields a different relative frequency limit of ‘heads’ than the spatial
ordering does, there in no such thing, in this case, as the frequentist probability of
81
‘heads’. Setting aside that this kind of difficulty only concerns very peculiar situations
(when two or more natural orderings are present and to these orderings correspond
different relative frequency limits), Hajek’s case is certainly significant. However, in my
view, Hajek draws the wrong conclusion from it: his example doesn’t show that the
notion of frequentist probability may be problematic; it shows, rather, that from a
frequentist standpoint a repeatable event comprises the specific way in which its iteration
is to be considered. In the case at hand, the probability-value
!
2
3
is associated to the event
‘tossing the coin and considering the outcomes as occurring in chronological order’;
instead, the probability-value ½ is associated to the event ‘tossing the coin and
considering the outcomes as occurring in spatial order’ (were the type of ordering –
temporal or spatial– unknown, the probability of ‘heads’ would be undefined).
66
It’s as
simple as that (although this point goes unnoticed, since cases a-la-Hajek are virtually
nonexistent in practice: in regular cases not only the ordering of the outcomes is
unambiguous, but it would be difficult to come up with a new ordering to which a
different probability-value corresponded).
66
Thinking of the way the repetition of a certain event is to be considered as of part of
that event, is not peculiar to Hajek’s case. We do that quite naturally in situations that
violate Von Mises’ Principle of Randomization. e.g. an event E has probability (limiting
relative frequency) p when the odd positions in its sequence of outcomes are considered,
but E has probability
!
q" p when the even positions are considered. In this situation we
say that there are two probability-values corresponding to two distinct events, one being
‘E when only its odd instances are considered’ the other being ‘E when only its even
instances are considered’.
82
3. Finite or Infinite Frequentism?
The majority of relative frequency theorists have favored infinite frequentism over finite
frequentism (as far as indefinitely repeatable events are concerned, of course). There are
at least two reasons why this choice is the most appropriate.
The first reason is straightforward. Suppose one wanted to know the (frequentist)
probability of getting a six by rolling a die. Within finite frequentism that probability is
just the relative frequency of sixes in a pre-established number of rolls, say ten, of the
dice. Yet, if the number of rolls were, say, doubled, the relative frequency of sixes might
change. But the latter might change again were the number of rolls doubled again; and so
on, ad infinitum. Now, the only way out is to establish in advance the length n of a
sequence, and stick to that. Unfortunately any choice of n is arbitrary: why should one
choose n over, say
!
2n? Thus, in general, finite frequentism fails to associate unique
probability-values to repeatable events. Obviously infinite frequentism doesn’t present
this difficulty (if the limit of the relative frequency exists, it’s ipso facto unique).
The second reason is more sophisticated, and more important, than the first. Let
me introduce it through the usual fair coin case (any outcome of any event may be
modeled as a the tossing of a coin, since an outcome either occurs or it doesn’t). Suppose
we want to estimate the frequency of occurrence of ‘heads’ by means of a sequence of
two tosses. Moreover suppose – we haven’t tossed the coin yet – that the frequency of
heads is 0.5. Then, assuming the independence of subsequent tosses, the frequency of
occurrence of each possible sequence of two tosses, which are ‘HH’ ‘HT’ ‘TH’ ‘TT’, is
0.25. This means that repeating the estimation of the frequency of heads by means of
83
sequences of two tosses will yield a completely wrong result (‘HH’ or ‘TT’) 50% of the
times (with frequency 0.5). Indeed, it is not difficult to see that, no matter what the
frequency of heads is, the shorter the sequence of tosses is the worse the result. Since
relative frequencies satisfy Kologorov’s axioms, we may use the theorems and rules of
Probability Calculus.
67
Let φ be the frequency of heads, n the number of tosses,
!
f
n
the
actual frequency of heads in n tosses, δ a real number; then the following inequality
(which follows almost immediately from Chebyshev’s Inequality) holds for all
!
" >0
!
" f
n
#$ <%
( )
&1# $#$
2
( )
n'%
2
( )
(42)
In plain English this means that by estimating the frequency φ of heads by means of
sequences of n tosses (i.e. computing
!
f
n
for a certain sequence, then for another
sequence, and so on), we will obtain a reasonably good estimation (i.e. an estimation
whose mistake is smaller than δ) with frequency greater than or equal to
!
100" 1# $#$
2
( )
n"%
2
( ) [ ]
. Thus estimating the frequency of heads by means of
sequences of n tosses yields a frequency of successes, i.e. of estimations that differ from
the frequency to estimate of less than a pre-established value δ, which is larger the larger
is n. But what happen if n goes to infinity? In the case at hand, the strong law of large
numbers
68
asserts that
!
" lim
n#$
f
n
=%
( )
=1 (43)
67
Relative frequencies satisfy simple additivity but not countable additivity. However we
will only need pieces of Probability Calculus that are not derived from the axiom of
countable additivity.
68
Chen has shown in (1976) that the Strong Law of Large Numbers holds even within
axiomatic systems that do without countable additivity.
84
If we use the limit of relative frequencies as estimations, our frequency of successful
estimations will be one. It must be readily noticed that the fact that the frequency of an
outcome of a repeatable event is one doesn’t imply that repeating that event will always
yield that outcome. Nonetheless, unitary frequency is an exceptionally strong condition.
We immediately see this by considering the dual, frequency zero (obviously an outcome
whose frequency of occurrence is one fails to occur with frequency zero). If an outcome
has frequency of occurrence zero then its frequency of occurrence is lower than any finite
frequency; i.e. it’s lower than
!
10
"1000
,
!
10
"10
1000
,
!
10
"10
10
1000
, and so on. This means that
whereas strictly speaking frequency zero doesn’t amount to theoretical impossibility, it
certainly amounts to practical impossibility. In practice, then, the limit of the relative
frequency of an outcome equals the “true” frequency of occurrence of that outcome.
We have seen two reasons (the second of which is especially important) why
infinite frequentism is to be preferred over finite frequentism. In so doing, however, we
have tacitly assumed that both approaches allow the computation of probability-values on
the basis of certain empirical data (i.e. sequences of outcomes). Unfortunately this is not
the case: infinite frequentism prescribes that probability-values are limits of relative
frequencies; but the computation of the limit of a relative frequency requires an infinite
sequence of outcomes, and obviously we have no access to such a sequence. It goes
without saying that if infinite frequentism is to be the right version of frequentism, then
this difficulty must be dealt with. This is the object of the next section.
85
4. The Epistemology of Infinite Frequentism
4.1. The Epistemological Problem of Infinite Frequentism
If
!
f
n
is the relative frequency of occurrence of a certain outcome in a sample of n
instances of an indefinitely repeatable event, the probability p of that outcome is defined
as the mathematical limit of
!
f
n
, for n that goes to infinity (cf. §1):
!
p"lim
n#$
f
n
(44)
By identifying probability with the limiting value of a relative frequency, however, one
faces two well-known epistemological difficulties: i) there is no guarantee that a limiting
value for
!
f
n
exists when n is increased indefinitely; and ii) there is no guarantee –
assuming there is a limiting value – that a number p is the correct limiting value. Even if
the relative frequency of a certain outcome O in a repeatable event E were calculated
over a large number of data – say 1000’000 repetitions of E – this wouldn’t guarantee
that that relative frequency would coincide with or would be similar to the limiting value
of the relative frequency of O (or that a limiting-value for that frequency exists): for this
value (if it exists) depends on the infinitely many instances of E and it is not affected by
any finite subset of those instances – no matter how long (the intuitive idea being that all
finite subsets are “infinitely small” compared to an infinite set – even a set of one million
elements is negligible near to an infinite set). Thus statements about probability-values of
indefinitely repeatable events are not verifiable, since no observational evidence and no
empirical test may constitute a definitive proof of their truth (the verification of one such
statement would require an infinitely long process). Unfortunately probability statements
86
are not falsifiable either. Consider equation (44); the latter, we have seen, is equivalent to
the following mathematical proposition
!
"# >0,
!
"m :n#m$ f
n
%p <& (45)
In other words, if a certain frequency has limit p, then for a sufficiently long finite
sequence of data, the difference between
!
f
n
and p can be made arbitrarily small. The
problem is that for no value of ε (
!
0 <"#min{p;1$ p}) one can know which integer m is
such that
!
"n#m : f
n
$p <% (46)
And this makes (46), and therefore (45), impossible to falsify.
Universal empirical laws and scientific hypotheses, which constitute one of the
most precarious form of human knowledge, are not verifiable but are, at least in principle,
falsifiable.
69
That probability statements are not even falsifiable constitutes therefore a
grave problem for the frequentist approach. How can one accept the statement that a
certain probability-value is p if this claim is not even falsifiable? For what reason should
one believe that a probability-value is p rather than q? Probability statements must be
made, somehow, falsifiable, in order for them to gain an acceptable epistemic status.
There really is no other way around the problem (and clearly they cannot be transformed
into verifiable statements).
69
Since a universal law has the logical form ‘all As are Bs’ (where A and B are empirical
properties), it is not verifiable. However, since ‘all As are Bs’ is logically equivalent to
‘there is not an A which is a non-B’, a universal law is falsifiable in principle (i.e. upon
the occurrence of an A which is a non-B). For a discussion about the verifiability and
falsifiability of scientific theories see Popper (1959b), and especially (1959b: §6).
87
4.2. Falsificationism in Statistics
Ronald Fisher, one of the founding fathers of Classical Statistics, has been the first to
understand the importance of the possibility of refuting probabilistic statements.
70
Within
his pioneering work in statistical analysis Fisher devised a series of tests for the
acceptance or the rejection of probabilistic hypotheses. Nowadays acceptance tests
constitute an important portion of Classical Statistics. In this section I shall first focus on
the simplest of these tests, the binomial test, which is especially fit for the case of the
acceptance or rejection of a candidate probability-value; then I shall extend my analysis
to statistical tests (within classical statistics) in general.
Suppose one wants to establish whether a certain coin is fair (the probability of
‘heads’ equals the probability of ‘tails’) or not. The hypothesis that the coin is fair is
called null hypothesis, and it is tested in the following fashion. The coin is tossed n times
and if the number of heads in the sequence is less than or equal to
!
n/2 then t equals the
number of heads, otherwise t equals n minus the number of heads. Suppose the coin is
tossed ten times. Under the assumption that the coin is fair, let
!
a
t
be the sum of the
probability that the sequence of tosses yields a number of heads equal to or inferior than t
and of the probability that the sequence of tosses yields a number of heads equal to or
greater than
!
10" t.
71
If
!
a
t
is equal to or smaller than a pre-established value α, called the
70
See for example Fisher (1935).
71
In general the probability that a fair coin yields k heads or tails in n tosses is
!
P
n
(k) =
n
k
"
#
$
%
&
'
(0.5
n
; therefore
!
a
t
, in general, is
88
significance level of the test, then the hypothesis that the coin is fair, the null hypothesis,
must be rejected. If for instance α equals 0.05 (a typical significance level in statistics),
then if in the sequence of tosses there are zero, one, nine, or ten heads (
!
t =0"1),
!
a
t
is
smaller than α and the null hypothesis must be rejected, otherwise (
!
t =2"3"4"5) the
null hypothesis is accepted. If instead α equals, say, 0.01 (another typical significance
level in statistics) the null hypothesis must be rejected just in case the sequence of tosses
yielded only heads or only tails (
!
t =0).
72
Testing the fairness of a coin is a very specific application of the Binomial Test: it
is easy to extend the use of the test to the examination of any probability-value.
73
But
probability-values may be tested – i.e. accepted or rejected – also by means of other
acceptance tests (for example the Chi-squared Test and the Normal Test). It is therefore
more interesting to understand the ideas that underlie acceptance tests in general, rather
than focusing on a single test. Several parameters characterize the performance and the
efficiency of acceptance tests, however only two of these parameters or indicators are
important for our purposes. The first is called error of the first kind; this is the probability
!
n
i
"
#
$
%
&
'
(0.5
n
i)t
*
+
n
i
"
#
$
%
&
'
i+n,t
*
(0.5
n
-0.5
n,1
(
n
i
"
#
$
%
&
'
i)t
*
. In the case at hand since the coin is
tossed ten times
!
a
t
=0.5
9
"
10
i
#
$
%
&
'
(
i)t
*
.
72
!
a
0
"0.00195,
!
a
1
"0.0215,
!
a
2
"0.1,
!
a
3
"0.344 ,
!
a
4
"0.754,
!
a
5
=1 (cf. previous
footnote).
73
Insofar as probability-values are concerned, any event, no matter how complex, can be
thought as a “binary” event (like the tossing of a coin). Suppose for instance one is
interested in the probability of getting a six by rolling a die. Then one can consider the
outcomes ‘one’, ‘two’, ‘three’, ‘four’ and ‘five’ as the unique outcome ‘non-six’.
89
that a true null hypothesis is rejected as false by the test. For instance, in the case
examined above the error of the first kind is the probability that the null hypothesis is
rejected if the coin is in fact fair. The error of the first kind is just the significance level of
the test, α. The second indicator is called error of the second kind, and is usually
designated by the Greek letter β. This is the probability that a false null hypothesis is not
rejected by the test. In the case above the error of the second kind is the probability that
the test accepts the hypothesis that the coin is fair but in fact the coin is loaded.
Acceptance tests can be understood by means of the notion of significance level
(error of the first kind). To the significance level α of a test corresponds a set A of
sequences of outcomes which are less likely to occur than it is specified by the value α;
as α decreases A contains less and less sequences. If an actual sequence of outcomes is
an element of A, the test rejects the null hypothesis (compare with the discussion above
of the Binomial Test). Since the null hypothesis always implies that sequences of A are
unlikely to occur (the smaller is α the more unlikely), the fact that an actual sequence
belongs to A is either a chancy coincidence or a consequence of the fact that the null
hypothesis is mistaken. Thus an acceptance test systematically considers the occurrence
of an unlikely event (an element of A) a refutation of the initial hypothesis about the
probability of that event.
74
This is the essence of acceptance tests.
Howson and Urbach (1993) have criticized statistical tests on the ground that the
value α of a test is arbitrary and that significance tests “rule out” unlikely events which,
no matter how unlikely, may still occur. This criticism is pointless in my view. It is true
74
It seems that this analysis is originally due to Fisher (1956).
90
that a test with a significance level of, say, 0.01 rejects true null hypotheses in 1% of the
cases because the test misrepresents the occurrence of a sufficiently unlikely sequence of
outcomes as a refutation of the null hypothesis, but it is also true that this very fact allows
one a) to accept (provisionally at least) 99% of true null hypotheses and b) to reject
!
100" 1#$ ( ) [ ]
% false hypotheses. That is by losing a small amount of information one
gains a greater amount of information. The alternative is not to gain any information at
all. A similar point also addresses the second part of the criticism, the arbitrariness of α:
the fact that the amount of information we gain and lose is arbitrary (it depends on α)
doesn’t diminish the fact that in the process we do gain information.
Thus both of these criticisms are unsound. But mainstream acceptance tests, qua
procedures to refute hypotheses, are not unproblematic. The fact that the error of the
second type, β, is a non-zero constant constitutes certainly an undesirable feature (we
view acceptance tests as falsification procedures). And iterating an acceptance test is not
a viable option: sure enough this would make β smaller and smaller, but at the price of
making α larger and larger.
But there is worse to come. Let us consider a peculiar coin tossing game. A
loaded coin, whose (limiting) relative frequency is p, is tossed n times. The n outcomes
are registered orderly on a sequence S (beginning from the first position in the sequence).
The coin is then discarded. A new loaded coin with (limiting) relative frequency q, where
!
q" p, is tossed ad infinitum. The first toss of the new coin is recorded on S in position
!
n +1; the second toss is recorded on S in position
!
n +2; and so on. Imagine now that S
represents the sequence of outcomes of a single repeatable event U. The limit of the
91
relative frequency of U is exactly q. Now, will a statistical test a-la-Fisher that evaluates
the null hypotheses ‘the limit of the relative frequency of U is p’ reject it? Since the first
n outcomes correspond to an event whose relative frequency is p the test accepts the
(false) null hypothesis with probability
!
1"# . Since α is very small, the test presents a
large error of the second type (at least with respect to a range of probability-values
75
).
The test, that is, is a poor procedure to reject false hypotheses. Now, it is tempting to
think that cases like that of U are practically inexistent, and that we could safely ignore
them. But to do so would mean to adopt an inductive standpoint: we would be assuming
that any extension of a sequence of outcomes will resemble in relevant aspects the initial
sequence; that, so to speak, the probabilistic behavior of repeatable events is uniform.
Hereafter, for the sake of brevity, I will refer to the practice of ruling out cases analogous
to that of U as to the ‘inductive assumption of Classical Statistics’. In mathematical terms
the latter amounts to model the infinite sequence of outcomes of an event by means of a
sequence of binomial random variables
!
X
1
,X
2
,... (
!
X
i
=1 if outcome i is a “success” and
!
X
i
=0 otherwise) for which
!
"(X
1
=1) ="(X
2
=1) =... Compare: for the representation of
the sequence of outcomes of U the following equations hold
!
p ="(X
1
=1) =...="(X
n
=1) (47)
!
q ="(X
n +1
=1) ="(X
n +2
=1) =... (48)
75
The test accepts null hypotheses of the type ‘the limit of the relative frequency of E is
!
p +"p’ with probability
!
1"# +$, where ε has a small value for small values of
!
"p.
92
Because of the two problems just expounded, especially the second, traditional
statistical tests cannot be considered genuine falsification procedures.
76
Fortunately, it
turns out that there are, in fact, statistical tests that are not affected by either of these
problems: a small class of acceptance tests, within a subfield of Statistics called
Sequential Analysis, satisfies this desideratum. Here I shall present one such test
77
as
applied to the specific case of the acceptance/rejection of candidate probability-values.
Let p be a candidate probability-value for an outcome O of a repeatable event E, and let X
be a discrete random variable whose values are one if O occurs and zero otherwise;
!
X
i
designates the value of X for the i
th
outcome of E. Finally let
!
g
p
(n,B) be
!
g
p
(n,B)" p(1#p)(n +1)[log(n +1) +2logB] (49)
where B is a positive real number. At the n
th
iteration of E, the test (provisionally) accepts
p if
!
S
n
"np
p
(n,B) (where
!
S
n
" X
i
1
n
#
); otherwise, if
!
S
n
"np #g
p
(n,B), the test
rejects p. The test begins with
!
n =1 and, as long as p is accepted, it continues for larger
and larger values of n (ad infinitum). If the test rejects p, it stops. The error of the first
kind is the probability that the test stops at some point, given that p is the correct
probability-value. It can be proven
78
that this probability is smaller or equal to
!
B
"1
. Since
B is any number greater than zero, α can be made arbitrarily small through an appropriate
76
Fisher was therefore wrong if he thought that significance tests constitute some sort of
falsification procedure. On the other hand it is certainly one of Fisher’s merits that of
having understood the importance of devising falsification procedures within Statistics.
77
Cf. Siegmund (1985: pp. 70-71).
78
But it is too mathematically demanding and laborious to be proven here. The interested
reader can consult Siegmund (1985).
93
choice of B. Notice that this result is valid only under the inductive assumption of
classical statistics. That is,
!
"#B
$1
just in case the repeatable event under scrutiny is
“well behaved” (its outcomes are representable as a sequence of binomial random
variables with identical expected values). This doesn’t mean, however, that the value of α
is irrelevant. It is very important that our falsification procedure rejects as few
probability-values associated to well-behaved events as possible, partly because well
behaved events are intrinsically more interesting and partly because the world is
uniform.
79
Likewise, the error of the second kind is the probability that the test does not stop,
given that p is not the right probability-value. Whereas proving that the error of the
second kind is zero is laborious and it might not be even possible,
80
proving that it is
79
But this doesn’t amount to an inductive assumption on my part. The point is merely
that this procedure rejects less correct probability-values in a uniform world than it would
do in a chaotic world (but it is a flawless falsification procedure both in a uniform and in
a chaotic world).
80
From a statistician’s point of view this proof is straightforward. His/her proof would
proceed as follows. The weak law of large numbers states that if
!
"i,E X
i
[ ]
=# , then (let
!
S
n
" X
i
1
n
#
)
!
"# >0
!
lim
n"#
P S
n
$n%
( )
=1. For an appropriate choice of the value of ε,
call it
!
" , it is easy to verify that for some N, since
!
p"# ,
!
"n#N
!
P S
n
"np #g
p
(n,B)
( )
#P S
n
"n$
( )
. It immediately follows that
!
lim
n"#
P S
n
$np%g
p
(n,B)
( )
=1. Now, suppose toward a contradiction that
!
P "n: S
n
#np $g
p
(n,B)
( )
=% <1. Then for some
!
n
!
P S
n
" n p # g
p
(n ,B)
( )
>$ , which is
absurd. Thus
!
" =1 and the probability that p is not rejected is zero, i.e.
!
" =0.
Unfortunately this proof and its equivalents are valid only under the inductive
assumption of Classical Statistics, that is
!
"i,E X
i
[ ]
=# . Hence it remains to be shown for
each of the other possible scenarios that
!
" =0. Now, some of these cases are fairly
simple to deal with (e.g.
!
E X
i
[ ]
= p +" if
!
1+100"k #i#100"(k +1) but
!
E X
i
[ ]
= p"#
otherwise, where
!
k =0,2,4,6,... and
!
"
94
certain that the test rejects a false null hypothesis turns out to be rather simple. To say
that an outcome O has probability-value p, is to say that the relative frequency of its
sequence of occurrences converges, in the limit, to p. In mathematical terms,
!
"# >0,
!
"n
such that,
!
"n# n ,
!
x
i
i
n
"
n
#
$
%
&
'
(
) p <* (where
!
x
i
=1 if O occurs at the i
th
repetition,
!
x
i
=0
otherwise).
81
Hence, if the probability of an outcome is not p,
!
"# >0 such that
!
¬"m such
that
!
"n#m ,
!
x
i
i
n
"
n
#
$
%
&
'
(
) p <* ; which means that
!
"m,
!
"n >m such that
!
x
i
i
n
"
n
#
$
%
&
'
(
) p *+ , i.e. such that
!
x
i
1
n
"
# p$n %& $n. Since for some m and for all
!
k"#,
!
" # m +k
( )
$g
p
(m +k,B) and since for some
!
k ,
!
x
i
1
m +k
"
# p$ m + k
( )
%& $ m + k
( )
; then
at
!
m + k , or earlier, the tests must stop and reject the null hypothesis.
Having discussed the main parameters of the test, let us notice, and emphasize, an
absolutely crucial point: this test is not inductive qua falsification procedure.
82
Consider
again the “mixed coins” example on pp. 90-91. Event U is representable as a sequence of
random variables that satisfy equations (47) and (48). We have seen that a statistical test a
la Fisher that evaluates the null hypothesis ‘p is the correct probability value of U’ won’t
likely reject it, and it should since the correct probability-value of U is q. Our test, on the
(some or all of the
!
X
i
are not independent, and/or the expected value of some of the
!
X
i
is
undefined, and/or some of the
!
X
i
are not even random variables and their values depend
on certain
!
X
i
which are random variables, etc).
81
Notice that
!
x
i
is not a random variable.
82
Actually the test is not inductive tout court. But it is exceedingly important that it is not
inductive insofar as it is a falsification procedure.
95
other hand, doesn’t have the slightest difficulty to reject that null hypothesis. We begin
assuming that p is the correct probability-value of U. Since for
!
k >n the outcomes of U
behave as if they were the outcomes of a repeatable event whose probability is
!
q" p, the
relative frequency of the outcomes at that point will start shifting toward the boundaries
of the acceptance zone of the test, and it will eventually cross it (indeed it is certain that
the crossing of the boundary will happen at some point). Mainstream tests fail in this, as
well as in similar cases because they are inductive falsification procedures; our test
doesn’t because it is not. But what feature allows our test not to be inductive? The answer
is simple enough. Our test does not have a fixed length, a fixed dimension; hence it is
capable to deal effectively with any change in “behavior” of the data. Mainstream tests,
on the other hand, have a pre-determined length n, and a change in the data that occurs
after n will go undetected. Therefore a mainstream test must postulate or assume that the
probabilistic behavior of the repeatable event to be tested won’t change after n instances.
Lastly, I wish to address a seemingly forceful criticism of acceptance tests qua
falsification procedures: whereas a scientific, universal law is logically refuted by a
counterexample, the rejection of a null hypothesis by means of an acceptance test cannot
constitute a logical refutation of that hypothesis.
83
At bottom, this is just another way of
restating the criticism, which I have already addressed in a previous paragraph,
concerning the fact that the parameter α of an acceptance test is non-zero. Nonetheless it
is instructive to answer this variant of that criticism. As things stand, it is perfectly true
83
This criticism is due to Howson and Urbach (1993: p. 128).
96
that an acceptance test cannot (logically) refute a null hypothesis.
84
I do not see this,
however, as a major problem. Logical refutation is a desirable but not an essential
component of falsification: it is crucial that false scientific hypotheses are rejected, not
that true scientific hypotheses are not rejected; likewise, it is crucial that an acceptance
test rejects false null hypotheses, not that true null hypotheses are not rejected.
85
4.3. Estimation
Probability-values might be falsifiable, but how is one supposed to come up with an
appropriate guess for a probability-value to begin with? In certain, specific cases one’s
guess may rely on considerations of symmetry. For instance, one may conjecture that
tossing repeatedly a perfectly symmetrical coin will yield the very same probability
(limiting relative frequency) for both heads and tails. In general, however, the event
under scrutiny might not posses any symmetry, or its symmetry might be too complex to
be of any use. For instance, how is one to guess the probability of rain in New York?
84
But in fact statements about probability-values can be made logically refutable.
Suppose that instead of defining the probability-value of an outcome as the limit of a
relative frequency
!
f
n
, we say that that outcome has probability p just in case its relative
frequency satisfies
!
f
n
"p
p
(n,B) n for all n, where, say,
!
B =10
4
. Now, if a relative
frequency is such that
!
f
n
"p #g
p
(n,B) n for some n, the claim that p is the correct
probability-value is logically refuted. Hence under this definition of probability,
probability-values are logically refutable, exactly like scientific/universal laws are (the
price to pay is the permanent loss of a very small number of statements about probability-
values that would have been accepted under the usual definition – since
!
"#B
$1
=10
$4
,
less than 0.01% of the probability-values whose corresponding sequence of occurrences
satisfies the traditional frequentist definition, and the Principle of randomization, are
lost).
85
Of course assuming that only a proper subset of the true null hypotheses and only a
proper subset of the true scientific laws are falsely rejected.
97
Obviously, there is no symmetry to be found in such an event. A simple technique within
Classical Statistics – usually referred to as interval estimation – allows one to compute
from the observed occurrences of an outcome (the weather record in New York in our
example), the probability that the unknown probability-value of that outcome belongs to
a certain interval (confidence interval). Suppose a relatively large number n of
occurrences of the outcome O whose probability-value one wishes to estimate have been
observed. Let
!
X
i
be the random variable that corresponds to the i
th
occurrence of O
(
!
X
i
=1 if O occurs,
!
X
i
=0 otherwise); let p be the probability of O,
!
f
n
" X
i
1
n
#
$
%
&
'
(
)
n , and
!
"
x
# var X
i
[ ]
. By the Central Limit Theorem, the distribution of
!
f
n
"p
( )
#
x
/ n
( )
may
be approximated (the larger n, the better the approximation) as the distribution of a
normal random variable with mean zero and variance one. It follows that the probability
that
!
f
n
"p ( ) #
x
/ n
( )
$c , where c is any positive real number, may be calculated
(approximately) by integrating the probability density of a normal distribution with mean
zero and variance one. This allows one to calculate a confidence interval for any desired
probability (confidence coefficient). For instance, the value of c such that
!
f
n
"p ( ) #
x
/ n
( )
$c has probability 0.95 is approximately 1.96; i.e. the probability that
!
p" f
n
#
1.96$%
x
n
is 0.95. Observing that
!
"
x
= p#(1$p) %0.5, the probability that
!
p" f
n
#
1.96$0.5
n
=0.98/ n is greater or equal to 0.95. But this is just the probability
98
that
!
f
n
"0.98/ n # p# f
n
+0.98/ n . Therefore the probability that p belongs to the
interval
!
f
n
"0.98/ n,f
n
+0.98/ n
[ ]
is greater or equal to 0.95.
This seems an excellent result: the confidence interval grows smaller as the
number of observed outcomes, n, grows larger. However this result strictly depends on
the inductive assumption of Classical Statistics: interval estimation allows one to
compute the probability that the probability-value of a repeatable event belongs to a
certain interval only under the assumption that that event has a uniform probabilistic
behavior. This doesn’t mean that interval estimation is altogether useless: if one is
interested in estimating probability-values only if they correspond to well-behaved
events, taking the chance that an estimate is in fact a false positive (i.e. that the event
under scrutiny is not well-behaved), then interval estimation is of some use. Certainly it
doesn’t constitute the strong result that it seemingly constitutes at first glance. But are
there alternative methodologies or procedure that allow to compute confidence intervals
on the basis of observed outcomes? Yes, but we must renounce to associate a probability-
value to confidence intervals.
Recollect the falsification procedure presented in §4.2: at the k
th
iteration of a
repeatable event, letting E be a possible outcome of this event, the test provisionally
accepts Pr(E)
!
= p just in case
!
S
k
"kp
p
(k,B), i.e. just in case
!
p"g
p
(k,B) k < f
k
< p +g
p
(k,B) k (as usual
!
f
i
"S
i
i). The idea is simple. An n-level
confidence interval, call it
!
"
n
E
, for the probability of E is the set of those candidate
probability-values for E, which are compatible with all of the first n iterations of our
falsification procedure (that is, which are accepted by that procedure for
!
k =1,2,...,n ).
99
That
!
"
n
E
is an interval is immediate. For any fixed k and B,
!
p +g
p
(k,B) k grows
monotonically from its minimum to its maximum and than decreases monotonically to a
local minimum; hence any relative frequency
!
f
k
"[0,1] satisfies
!
f
k
< p +g
p
(k,B) k for
an interval
!
"
k
of probability-values. By analogous considerations,
!
f
k
satisfies
!
f
k
> p"g
p
(k,B) k for an interval
!
"
k
. But then
!
f
k
satisfies
!
p"g
p
(k,B) k < f
k
< p +g
p
(k,B) k for the interval of probability-values
!
"
k
#$
k
%&
k
.
Now, a probability-value p belongs to
!
"
n
E
just in case p has been accepted by the
falsification procedure for
!
k =1,2,...,n , which means that
!
"
n
E
=#
1
$#
2
$...$#
n
.
Therefore
!
"
n
E
is an interval.
There is however a problem: as it is, our estimation technique does not guarantee
the coherence of confidence intervals. Suppose our technique associates to two mutually
independent outcomes of the same event, call them A and B, the m-level confidence
intervals
!
[a
min
,a
max
] and
!
[b
min
,b
max
] respectively. Since probability-values are limits of
frequencies, they must comply with the axiom of additivity; hence the m-level confidence
interval for the outcome A∪B must be
!
[a
min
+b
min
,a
max
+b
max
]. Unfortunately there is no
guarantee that our estimation will generate this interval. We deal with this difficulty by
imposing that the confidence interval of any outcome which is a conjunction of two or
more mutually exclusive outcomes has as lower bound the sum of the lower bounds of its
conjuncts and as upper bound the sum of the upper bounds of its conjuncts.
86
86
Obviously this “imposition” is recursive.
100
Let us now examine the performance of our interval estimation. If P(O)
!
= p and O
is an “atomic” outcome (that is O is not the conjunction of other outcomes) then,
considering that the probability that
!
p"#
n
O
is the probability that p is not rejected by the
falsification procedure for
!
k =1,2,...,n , the probability that
!
p"#
n
O
is lesser than or equal
to
!
B
"1
, for all n. If instead P(O)
!
= p and O is not atomic, by means of geometrical
considerations about the boundaries of the falsification procedure the probability that
!
p"#
n
O
is even smaller than in the previous case. Hence in any case the probability that
the correct probability-value is excluded from an n-level confidence interval is smaller
than or equal to
!
B
"1
, no matter what the value of n. This fact has practical importance: it
guarantees that (in practice) one may go on updating (by means of new data) a
confidence interval indefinitely with a very low risk that the correct probability-value is
“lost” at some point.
87
Now for a downside. We have just seen that our interval
estimation must be tweaked if the confidence intervals that it generates are to be logically
coherent. Unfortunately this we pay in terms of performance. If an outcome is the
conjunction of other outcomes, then, for a fixed number of data/observations, its
confidence interval is poorer (i.e. larger) the larger is the number of conjuncts. This
means that in order to keep the confidence interval of a non-atomic outcome reasonably
small, one has to increase the number of observations (and the higher the number of
conjuncts, the higher must be the increase).
88
87
It is not at all clear that this is the case for classical interval estimation.
88
This would not be possible if countable additivity held (it would be impossible to
obtain a confidence interval –other than
!
[0,1]– for a conjunction of infinitely many
101
This concludes our discussion concerning the epistemological status of
probability-values within infinite frequentism. The next section turns to another widely
debated issue: the so-called problem of the single case.
5. The Single Case Question
Many supporters of the frequentist interpretation of probability (Reichenbach being a
notable exception) have held that within frequentism probability-values cannot be
attached to single events. To say that tossing a coin will yield ‘heads’ with probability ½
is not to say that a certain toss T of that coin will yield ‘heads’ with probability ½; indeed
there is no such thing as the probability that T will yield ‘heads’.
It is unquestionable, in my view, that any rigorous version of frequentism should
prohibit the assignment of probability-values to individual events. And yet, there are
plenty of scenarios in which one’s intuition is that single events should have a
probability. Suppose the limit of the relative frequency of the event ‘there is sunshine in
Los Angeles’ is 0.99. Anybody would consider the single event ‘there will be sunshine in
L.A. on April 7, 2013’ highly probable.
How is one to explain this tension? Is it meaningful to associate probabilities to
single instances? Is the frequentist interpretation of probability irreparably inadequate?
Or is our intuition mistaken? A comparatively simple analysis allows us to answer these
questions. Consider a repeatable event E whose possible outcomes are A, B, C,…; let the
limit of the frequency of occurrences of A within n instances of E,
!
f
n
A
, be
!
q =lim
n"#
f
n
A
;
outcomes, no matter how many observations were performed). Fortunately we know that
frequencies do not comply with countable additivity.
102
obviously q is the frequentist probability-value associated to A. Now, the interesting
thing is that the occurrence of A in a single instance of E has a well-defined probability
within the classical interpretation. This we may show in two different ways. First,
consider all the ordered sequences of outcomes of cardinality n, k of which are A (e.g. for
the ordered sequence ‘A, B, B, C, A, D’
!
n =6 and
!
k =2). The number of all such
sequences is
!
n
k
"
#
$
%
&
'
. Among them, the number of sequences whose i
th
element (
!
i
!
n"1
k"1
#
$
%
&
'
(
. Now, the classical probability that the i
th
element of the sequence will turn out to
be A is the ratio of the number of favorable sequences (whose i
th
element is A) to that of
possible ones, i.e.
!
n"1
k"1
#
$
%
&
'
(
n
k
#
$
%
&
'
(
; which is just
!
k /n. If the sequences of outcomes are
infinitely long, the classical probability that the i
th
element is A becomes
!
p"lim
n#$
k/n .
Since
!
k /n is just
!
f
n
A
, then
!
p =q. The frequentist probability of the outcome A and the
classical probability that A will occur in the i
th
instance of E have exactly the same value.
The second argument, albeit maybe less rigorous than the first, is certainly more
intuitive. Imagine that the outcomes of E are not recorded as usual in an ordered
sequence; rather, imagine that the outcomes of E are “stored” in a (unordered) set –call S
this set. Let l be the cardinality (i.e. the number of elements) of S (obviously l grows
larger as new outcomes of E are stored in S); then the ratio of the number of outcomes in
S that are A to the total number of outcomes in S is just
!
f
l
A
. Now, what is the classical
probability that a certain outcome of E, call it Γ, is an A, given that Γ is in S? Since it is
unknown which among the outcomes in S is Γ, the classical probability that Γ is a certain
103
outcome of S is
!
1/l. Hence the classical probability that Γ is one of the outcomes in S
that are A is
!
f
l
A
: the classical probability that a certain outcome in S is an A equals
!
f
l
A
. It
immediately follows that
!
lim
l"#
f
l
A
is the value of both the frequentist and the classical
probability of A when
!
l"#.
In light of these two argument (especially the second), and since one typically
finds the use of classical probability appealing and “natural”, it seems appropriate to
conclude that the reason why one intuitively associates to the frequentist probability-
value of a certain repeatable event an analogous probability to each individual instance of
that event, is an inadvertent, implicit application of classical probability. This is a
psychological point. But our analysis in the previous paragraph suggests that a second
point holds. If one wants to assign a probability-value to an individual instance qua
instance of a repeatable event with limiting relative frequency p, then one must do so
within the classical interpretation of probability (assuming that that interpretation is
tenable, and there are strong reasons which suggest that that interpretation is not –cf. Ch.
1). For I do not see in what other fashion one could assign probabilities to individual
instances on the basis of relative frequencies.
6. The Reference Class Problem
Within the frequentist interpretation of probability, if one attempts to attach probability-
values to a single instance of an event one faces the so-called ‘reference class problem’.
Let me illustrate the point by means of an example. Suppose we want to know the
probability of death within a year of a single educated fifty years old diabetic male.
104
Should we look at the probability (relative frequency) of death within a year of all male
diabetics? Or at the probability (relative frequency) of death within a year of all educated
individuals in their fifties? Or at the probability (relative frequency) of death within a
year of all educated diabetics in their fifties? And so on. This is the reference class
problem. Typically, a single instance is an instance of several, different repeatable events
whose relative frequency is in general different; hence it is not clear to the probability of
which one among these events the probability of the single instance should be equal.
Most, if not all, of those who think that a satisfiable form of frequentism must
accommodate single-case probabilities, believe that the solution to the problem is
essentially Reichenbach’s (1949): pick the relative frequency of that repeatable event
which shares most features with the single instance at hand. In our example, we should
pick the relative frequency of death within a year of all educated fifty years old diabetic
males, if this is available; if not, we should pick the next closest repeatable event.
This is a concise sketch of the reference class problem. The very first point I wish
to make is that this problem is meaningful only if the classical interpretation is
presupposed: with respect to frequentism alone the reference class problem is a non-
problem. This follows readily from my discussion of single case probability in §5. That
said, it is natural to wonder whether –assuming the classical interpretation– the reference
class problem is solvable. Well, it is.
89
Let A and C be two properties of objects; and let r
be the limiting relative frequency of objects of type A that are also C (that is the
frequency of objects of type C among objects of type A). Then, the probability that, say, a
89
On pain of pedantry, remember that we are assuming the tenability of the classical
interpretation (which as thing stands doesn’t seem to be tenable).
105
single object of type A is also a C is a classical probability whose value is r. Back to our
example, the probability of death within a year of a certain educated fifty years old
diabetic male is a classical conditional probability whose value equals the limit of the
relative frequency of those fifty years old educated diabetic males that die within a year.
7. Probability: a Guide to Life?
As Joseph Butler famously put it, ‘to us, probability is the very guide to life’. Indeed that
probability should constitute the basis of one’s choices and actions is a matter of common
sense. But is frequentist probability suitable to play this important role? Those who have
answered positively to this question have appealed to two variations of the same
argument. Suppose that the probability-value of an outcome O of a repeatable event E is
p, i.e. that p is the limiting relative frequency of O within the sequence of instances of E.
Now it is true that p only concerns or applies to infinite sequences of outcomes and that
human beings only deal with finite sequences. However, it is also true that if a sequence
is extended enough then the relative frequency of O must necessarily belong to a small
interval “around” p. Hence, we know that the relative frequency of O is bound to be close
to p, and we may use this information to make an educated choice among the outcomes
of E. Here is the second variant of the argument: it is perfectly true that p applies only to
infinite sequences of instances of E; however in the long run (i.e. when a sufficiently
large number of instances of E are involved) the relative frequency of O must get close
enough to p for practical purposes, i.e. to make educated choices among outcomes of E.
106
Demolishing this line of argumentation was easy for the detractors of
frequentism: it is true that in the long run or in a sufficiently long sequence of outcomes
the relative frequency gets very close to the corresponding probability-value;
unfortunately one cannot know whether a long run or a long sequence are long enough
for that to happen.
It is beyond doubt that the detractors of frequentism are right. However, and this
has gone unnoticed, that flawed rationale contains the seed of a possible solution to the
problem. Consider again an outcome O, of repeatable event E, whose probability is p.
Since p is the limiting relative frequency of O, it follows that there exist an ordered,
countable (finite or infinite) set of real values,
!
{"
i
i#$}, where
!
"
1
>"
2
>"
3
>..., and an
ordered, countable (finite or infinite) set of positive integers,
!
{l
i
i"#}, where
!
l
1
< l
2
< l
3
<...; such that for all
!
i"#,
!
f
n
(the relative frequency of E) satisfies
!
f
n
" p <#
i
for all
!
n" l
i
. Now let
!
"
0
>1 and
!
l
0
=1. Since it is trivially true that
!
f
n
" p #1 <$
0
for all
!
n"1= l
0
, then for all
!
i"#${0},
!
f
n
satisfies
!
f
n
" p <#
i
for all
!
n" l
i
. This means that
!
f
n
must satisfy the set of inequalities
!
f
n
" p <#
k
:0$k$ j
n
{ },
where
!
j
n
is the (unique) integer such that
!
n = l
j
n
. It follows that if we assume that
!
p"#
j
n
< f
n
< p +#
j
n
, then our assumption satisfies
!
f
n
" p <#
k
for all
!
k" j
n
. We are now
in the position to take the final step. We want to evaluate two alternative assumptions
about
!
f
n
:
!
f
n
= p;
!
f
n
="# p +$, where
!
"#$ and λ satisfies
!
"
k +1
< #$ p <"
k
.
Obviously
!
f
n
= p satisfies
!
f
n
" p <#
i
for all
!
i"#${0} and all n;
!
f
n
=", on the other
hand, satisfies
!
f
n
" p <#
i
just in case
!
i" k and
!
n" l
k
. Since both
!
f
n
= p and
!
f
n
="
107
satisfy
!
f
n
" p <#
i
in case
!
i" k and
!
n" l
k
, and since there is no reason to prefer one
hypothesis to the other, the two hypotheses are equally good in the scenario
!
n" l
k
.
90
For
!
i" k and
!
n > l
k
, since
!
f
n
=" does not satisfy
!
f
n
" p <#
i
and since
!
f
n
= p does,
!
f
n
= p
is the only acceptable hypothesis in the scenario
!
n > l
k
(it is certain that the assumption
!
f
n
=" is mistaken in that scenario). Now, since for
!
n" l
k
the two hypotheses are equally
good (or bad) and for
!
n > l
k
!
f
n
= p is a better hypothesis than
!
f
n
=", and considering
that the value of
!
l
k
is unknown (i.e. it is unknown whether
!
n" l
k
or not), it seems
reasonable to conclude that
!
f
n
= p is the best hypothesis of the two. But then, since λ
may take any value in the interval
!
[0,1],
!
f
n
= p is the best conjecture on the value of
!
f
n
:
one is better off acting as if
!
f
n
= p, no matter what the value of n is. To summarize:
suppose p is the limit of the relative frequency
!
f
n
of an outcome O; then
!
f
n
must –no
matter what the value of n is– belong to at least one of infinitely many intervals
!
I
g(n)
"[0,1], all of which contain p; at the same time, however, for any ε such that
!
"p <# <1" p, there are infinitely many among the
!
I
g(n)
that fail to contain the value
!
p +"; it follows that one’s “best bet” is to act as if
!
f
n
= p, for this is the only value of
!
f
n
that is guaranteed to belong to any of the intervals
!
I
g(n)
.
I do not believe that the argument I’ve just presented is conclusive. At the same
time, however, I do not see how this argument could be conclusively refuted. As things
stand we cannot be certain that frequentist probability is a reliable guide to life, and we
cannot prove that it is not. Now, in absolute terms this might be a problem for
90
Here I have made use of a form of the Principle of Indifference.
108
frequentism. But comparatively speaking it is not: for it is not at all clear that
subjectivism and a priori probability fare better in this respect. Let’s examine
subjectivism first.
Subjectivist philosophers have been especially eager to affirm that frequentist
probabilities are useless in practice, that one cannot rely on them to make educated
guesses (this is far from obvious, we have seen). However subjectivists have failed to
realize that subjective probability itself does not posses any practical utility. True,
subjective probability directs our choices ipso facto: it is by observing people’s choices
that one measures subjective probabilities (cf. Ch. 2). But this doesn’t mean that
subjective probability is a reliable guide to life. If a high subjective probability that
fairies exist influence a certain choice, it doesn’t follow that that choice is good or
appropriate (in fact it is likely that that choice is a bad one). The point is not that
subjective probabilities cannot be reliable guides to choices in certain cases: the point,
rather, is that in general there is no guarantee that acting in accordance with subjective
probabilities will serve one’s advantage.
As for the a priori interpretation, there is little doubt that were this interpretation
tenable, a priori probability would represent a reliable guide to life. Unfortunately, it is
not at all clear that this interpretation is tenable (cf. Ch. 1). And making it tenable means
basically to show that it is a reliable guide to choices and actions!
109
8. An Alternative to Infinite Frequentism
8.1. Is Infinite Frequentism Tenable?
Within infinite frequentism the probability-value of an event exists and is meaningful if
the sequence of outcomes of that event is indefinitely extendable. So far, we have taken
for granted that sequences of outcomes may be extended at one’s pleasure. But is this the
case? David Lewis (1994) observes that if space and time are finite (as many
cosmologists believe), then it’s not possible to extend indefinitely a sequence of
outcomes. Lewis’ criticism is a serious one, and, I believe, is fatal to infinite frequentism
(we do not know whether space-time is finite or not, but we cannot rule out this
possibility).
Infinite frequentism is untenable then. Since we ruled out finite frequentism as a
viable approach, this seems a major problem. But it is not. To begin with, the
performance of the falsification procedure I have presented in §4.2 is basically
91
unaffected: the current performance of that procedure is not influenced by the fact that at
some point in the future all sequences of outcomes will become non-extendable. More
importantly, it is possible to devise an approach that preserves the “spirit” of infinite
frequentism, but that at the same time renounces to define probabilities as limits at
infinity of relative frequencies. I deal with this approach in the next sub-section.
91
If a sequence of outcomes will become, at a certain point, non-extendable, one cannot
be certain of rejecting wrong probabilities/frequencies sometime in the future (cf. §4).
This fact, however, doesn’t compromise the performance of our falsification procedure.
110
8.2. An alternative to Infinite Frequentism
I have anticipated in §8.1 that the alternative approach to infinite frequentism I have in
mind does not share the definition of probability of infinite frequentism. Indeed my
alternative approach does not attempt to define probability in any way. Remember from
§3 that defining probabilities as limits at infinity of relative frequencies is just a
pragmatic choice: in §3 I preferred this definition to the definition of probabilities as
relative frequencies of finite sequences on the ground that the former is a better measure
of frequencies than the latter is. There is no intrinsic reason, in other words, to define
probabilities as limits at infinity of relative frequencies beside that of providing a good
measure of frequencies. But the way infinite frequentism measures frequencies is simply
not admissible under the assumption that space-time is finite.
Thus my alternative approach does not define probabilities. But how, then, does it
measure probabilities? My approach, quite simply, allows at any given moment all those
candidate-probabilities that, until that moment, have not been rejected by the “infinite”
statistical test I have presented in §4.2. Since –as we have seen in §4.3– that test admits,
at any given moment, an interval of probabilities that (until that moment) it has not
rejected, my approach does indeed provide a measure of probabilities. This doesn’t mean,
however, that one must necessarily rely on the interval estimation discussed in §4.3. For
example, suppose one believes that for a certain coin the probability of heads is 0.5; then
one doesn’t have to compute at every toss of the coin the interval of acceptable
probability-values for the outcome heads; rather, as long as our “infinite” statistical test
111
does not reject the probability-value 0.5, the latter is provisionally accepted as the
probability of heads.
My approach seems to have an obvious drawback, though: the acceptance or
rejection of a probability-value depends, in general, on the specific falsification
procedure one chooses as well as on the choice of the free or arbitrary parameters that
any such procedure contains. The “infinite” acceptance test discussed in §4.2 is a
falsification procedure that provisionally accept at step n a probability-value p as correct
just in case the relative frequency that correspond to p,
!
f
n
, satisfies the inequality
!
f
n
"p#n < p(1"p)(n +1)[log(n +1) +2logB] (50)
where the parameter B may be chosen arbitrarily (the larger B, the smaller the error of the
first kind). A critic of my approach might complain that the falsification procedure
associated to (50) contains a parameter, B, whose value is arbitrary. Furthermore, the
critic might continue, there probably are alternatives to (50) that constitute equally
legitimate falsification procedures. Let me begin from the first point. The parameter B is
related to the error of the first kind, α, through the inequality
!
"#B
$1
, and it’s therefore
the error of the first kind that may be chosen arbitrarily, through an appropriate choice of
B (likewise, as we’ll see shortly, the free/arbitrary parameters of any valid falsification
procedure may only influence the error of the first kind). But the error of the first kind
doesn’t really worry us: what really matters is that the error of the second kind goes
asymptotically to zero. This is what matters because (50) works as a falsification
procedure, and the whole point of an appropriate falsification procedure for probabilities
is that the error of the second kind, i.e. the probability of not rejecting incorrect
112
probabilities, must move closer and closer to zero when n grows (the problem of
free/arbitrary parameters only affect finite acceptance tests à la Fisher, because for those
tests a change in the values of the free parameters does indeed affect the error of the
second kind –cf. §4.2).
Also the second point, not surprisingly, concerns the error of the first kind, but not
that of the second kind. Consider any valid alternative to (50), which must have the form
!
f
n
" p# n < v(p,n,k ) (51)
where
!
k is a vector of free/arbitrary parameters, and for which it must be the case that
!
lim
n"#
v( p,n,k ) n =0. Moreover the error of the first kind associated to (51) must be strictly
smaller than one (otherwise the procedure is guaranteed to reject, sooner or later, any
probability). Now, to say that
!
lim
n"#
v( p,n,k ) n =0 is to say that the error of the second
kind goes to zero as n grows larger. Therefore any valid falsification procedure
alternative to (50) is guaranteed to have error of the second kind zero (asymptotically),
and the only parameter that varies among valid falsification procedures is the error of the
first kind.
8.3. Infinite Frequentism Compared to My Alternative
The first obvious advantage of the variant of frequentism I have presented in §8.2 is that,
unlike infinite frequentism, my variant doesn’t require that sequences of outcomes are
infinitely extendable. This allows me to avoid Lewis’ criticism. At the same time,
moreover, my approach is superior to finite frequentism, for it avoids the main problem
of the latter: the need of establishing a priori and arbitrarily the length of a sequence of
113
outcomes. A second advantage is that my approach, unlike infinite frequentism, is a
“natural” falsificationist framework. Of course, this is not surprising since the former,
unlike the latter, has at its core a falsification procedure. As for the problem of the
applicability of probability to decision making, neither infinite frequentism nor my
approach seems to be better than the other. In §7 I argued that while it is not clear that
within infinite frequentism probabilities play a role in decision making, also the opposite
is true: it is not clear that within infinite frequentism probabilities are irrelevant in
decision making. My approach has a very similar performance. If the argument I
presented in §7 to the effect that within infinite frequentism probabilities do play a role in
decision making is correct, then also within my approach probabilities must play a role in
decision making. The reason is very simple. That argument asserts that if the relative
frequency of a sequence of outcomes S converges at infinity, then the relative frequency
of any finite sub-sequence of S should play a role in decision making. But also within my
approach the Strong Law of Large Numbers guarantees that a sequence of outcomes must
converge, if indefinitely extended
92
(and it doesn’t matter that within my approach
sequences of outcomes must be finite). Thus the argument presented in §7 also
guarantees that within my approach appropriate finite sequences of outcomes should play
a role in decision making. On the other hand, if that argument is wrong, neither infinite
frequentism nor my approach are viable frameworks for decision making.
92
If the frequency
!
f
n
of a repeatable event E is p and if the repetitions of E are mutually
independent, then the Strong Law of Large Numbers guarantees that
!
lim
n"#
f
n
= p almost
surely.
114
Conclusion
In this work I have achieved, I believe, two important results. On the one hand I have
presented certain weaknesses of the a priori and of the subjective interpretations of
probability; such weaknesses are, I have argued, major ones. On the other I have
explained how the relative frequency interpretation may be “salvaged” from the
apparently insurmountable problems that (are believed to) affect it. Thus, the gist of my
work is that frequentism is the best interpretation available. Frequentism is, in other
words, our best shot at interpreting probability.
Since I have already presented a summary of each chapter in the introduction,
here I will not summarize each chapter –and therefore each interpretation. I will,
however, briefly examine the reasons why both the a priori and the subjective
interpretations are not, in my view, appropriate interpretations of probability (§1), as well
as the reasons why frequentism is a better interpretation (§2). Afterwards, I’ll discuss in
what way my work contributes to the foundations of probability (§3).
1. The Problems of the A Priori and Subjectivist Interpretations
The a priori and the subjectivist interpretations, I have shown, share a common problem:
within these interpretations, it’s not at all clear why Kolmogorov’s axioms (or any
equivalent axiomatization) must hold. For the a priori interpretation this is devastating.
Without Kolmogorov’s axioms, or at least some sort of additivity axiom, it is not possible
115
to assign unequal probabilities to sets of outcomes relative to the same event; e.g. it is not
possible to assign to the complex outcome ‘the dice will yield one, two, three, four or
five’ a higher probability than that assigned to the outcome ‘the dice will yield six’ (cf.
Ch. 1, §5). As for the subjectivist interpretation, the problem is less serious. Subjectivists
can do without Probabilism (the thesis that probabilities must comply with Kolmogorov’s
axioms –cf. Ch. 2, §1). All they need is the weaker thesis that probabilities may be
computed in such a way as to comply with Kolmogorov’s axioms; and this thesis is
indeed true. However, much of the appeal of the subjective interpretation is due precisely
to the (wrong) belief that Probabilism is true: it is not a coincidence that most
subjectivists are also proponents of Probabilism.
But we have seen that also the second source of appeal for the subjective
interpretation turns out to be bogus. This second source of appeal, it will be remembered,
is the belief that the subjective interpretation is the ideal framework for Bayesian
Confirmation Theory, and that the latter easily allows, within the subjective
interpretation, to increase the probability of any scientific law by way of appropriate
observations (cf. Ch. 2, §5). Thus, the single two elements that together produce most, if
not all of the appeal of the subjectivist interpretation are problematic. On the contrary, the
main, old problem of the subjectivist interpretation is still unresolved: subjective
probabilities are precisely that –subjective; that is subjective probabilities i) need not
depend on any objective verification or data, and ii) are agent-dependent, i.e. different
agents can have very different probabilities for the very same event. It follows that the
subjective interpretation is not a good or useful interpretation of probability.
116
But what about the “impure” subjective interpretations that have been proposed
over time, which impose further constrains on the values that subjective probabilities may
take? I shall briefly consider here two of these interpretations. The first interpretation
requires that an agent’s probabilities conform to relative frequencies, when these are
available; e.g. if an agent knows that a certain coin is loaded and that its relative
frequency of heads is say 0.7, then that agent’s (subjective) probability for heads should
be 0.7. This interpretation, which has been defended by –among others– David Lewis,
solves the problem relative to Kolmogorov’s axioms (at least for those events for which a
relative frequency is known): if subjective probabilities must equal relative-frequencies,
then, since relative frequencies are guaranteed to comply with Kolmogorov’s axioms,
subjective probabilities are guaranteed to comply with Kolmogorov’s axioms, too.
However, while solving a problem this interpretation creates a new one: since subjective
probabilities must be based on frequencies, subjective probabilities are, in fact,
frequencies in disguise; that is, this variant of the subjective interpretation collapses into
frequentism. For what is the advantage of considering probabilities degrees of belief, if
degrees of belief merely mirror frequencies? This interpretation therefore is not better
than standard subjectivism.
The second variant of subjectivism I’ll briefly examine has been advanced by de
Finetti and his followers.
93
Its main idea is very simple: when one is interested in the
probability of an event, one should pick the probability that an expert in the field relative
to that event fixes; e.g. if one wants to know the probability that tomorrow it will rain,
93
See for example de Finetti (1972).
117
one should ask a meteorologist what is his/her degree of belief that tomorrow it will rain.
Presumably, this variant of the subjective interpretation aims at mitigating two problems
that affect regular subjectivism: the first is the usual problem that different agents have
different probabilities for the very same event; the second is that most agents are
unreliable and not competent enough in most matters to have meaningful probabilities.
However, this variant of subjectivism poses more problems than it mitigates or solves. To
begin with, who and by what means settle the question whether an agent is or isn’t an
expert in a certain field? And what if two experts disagree about the probability of an
event? The list of problems goes on and on. Thus, also this second variant is unfit for the
job.
To summarize: the a priori interpretation is not a viable interpretation of
probability because it requires Kolmogorov’s axioms to work properly (to work at all,
actually), but the use of Kolmogorov’s axioms within the a priori interpretation is
unwarranted; the subjective interpretation has no special appeal qua interpretation of
probability, however it does have serious shortcomings (and the same is true for the two
variants I have examined); thus also the subjective interpretation and its variants are not
acceptable interpretations of probability.
2. Infinite Frequentism
The main problem of infinite frequentism, we have seen in Ch. 3, is epistemological:
within this interpretation it is neither possible to verify nor to falsify (reject) probability-
values. From an epistemological standpoint, a verification or a falsification procedure is
118
desperately needed. The second problem of infinite frequentism is “practical”: according
to the majority of cosmologists, space-time is finite (and, at any rate, most repeatable
events could not conceivably be repeated ad infinitum –e.g. a coin could not be tossed
forever without being destroyed in the process); but within infinite frequentism
probabilities are frequencies associates to infinitely long sequences of physical events.
In Ch. 3, I have presented a solution to both problems. Within infinite
frequentism, the closest thing to a falsification procedure is the so-called acceptance test
of classical statistics (e.g. the binomial test). The main problem of this test, however, is
that it is finite (it only applies to finite sequences of outcomes). For reasons that I have
explained in details (cf. Ch.3, §4.2) it is this very feature, finitude, which makes the
acceptance test a very poor falsification procedure. I have then presented an improved
version of the acceptance test: an “open” acceptance test that never ends, that is an
acceptance test that applies to sequences of outcomes that may be extended whenever one
wishes, ad infinitum (cf. §4.2). This improved acceptance test, I have argued, is the
closest thing to a falsification procedure that we can get. This solves (to a great extent if
not completely) the epistemological problem. It does nothing, however, for the second
problem of infinite frequentism: the fact that the latter presupposes infinitely long
sequences of physical events (outcomes). Unfortunately, there seems to be only one way
to cope with this problem. This is to abandon the very core of infinite frequentism: the
definition of probability as the limit at infinity of a relative frequency.
But how do we achieve that? How can we renounce to the definition of
probability as the limit at infinity of a relative frequency, without at the same time giving
119
up infinite frequentism completely? The solution I have proposed in Ch. 3 (§8) is i) to
renounce altogether any attempt to define probability and instead ii) to worry about
specifying under what condition(s) probabilities may be falsified (rejected as incorrect).
In Ch. 3, we have seen that (ii) is easily achieved by way of the abovementioned “open”
acceptance test. My variant of frequentism solves the two major problems of traditional
infinite frequentism: it largely solves the epistemological problem because it specifies a
(quasi) falsification procedure; and since it doesn’t need infinite sequences of physical
events, it solves the problem of dealing with infinite sequences. At the same time, my
variant is a close relative of infinite frequentism: just like within the latter, also within my
variant a probability is not determinable on the basis of any finite sequence of events, no
matter how long; moreover if a probability p is incorrect in the sense of infinite
frequentism, i.e. if p is not the limit at infinity of an appropriate relative frequency, then
my (quasi) falsification procedure is guaranteed to reject p sooner or later (cf. Ch. 3,
§4.2). Lastly, my variant of frequentism easily allows one to come up with candidate
probabilities (i.e. probabilities that are provisionally accepted as correct): candidate
probabilities are simply those that “survive” my falsification procedures (cf. Ch.3, §4.3).
The variant of frequentism I have put forward is not perfect –no doubt. However,
given its minor shortcomings, and considering the deeper problems that affect both the a
priori and the subjectivist interpretation, my sui generis frequentism is, I believe, our best
shot at interpreting probability.
120
3. How My Work Contributes to the Philosophy of Probability
The philosophical debate regarding the interpretations of probability began at least two
centuries ago. However, all of the philosophical analyses of probability that have been
put forward during this rather long time frame are either incomplete or just plain wanting.
Nowadays, most philosophers interested in probability hold that subjectivism is the most
promising interpretation of probability, and that the relative frequency interpretation is
obsolete. But nothing could be farther from the truth! It is this very state of affairs that
motivated my work.
I have explained elsewhere in some detail (Intro. §1, and also partly in §2 above)
in what way I take my dissertation to contribute to the philosophy of probability.
Therefore, I shall be rather concise here. Contemporary philosophy of science considers
the a priori interpretation more or less obsolete. While it is true that this interpretation is
not a viable interpretation of probability, it is not for the reasons that philosophers of
science adduce. Ch. 1 explains the actual, true reasons why the a priori interpretation is,
indeed, obsolete. The subjectivist interpretation, I have said in the previous section, is the
most highly regarded interpretation of probability nowadays. This is a complete mistake.
In Ch. 2 I examine in detail the main two alleged virtues of subjectivism, and I show that,
in fact, when properly analysed these “virtues” disappear like snow in the sunshine. The
situation of the relative frequency interpretation is, in a sense, turned around: whereas
contemporary philosophy of science doesn’t hold frequentism in high esteem, the value
of its stock has only risen by comparison. Ch. 3 shows how the two main problems of the
121
relative frequency may be successfully dealt with, thus indicating why frequentism is our
best shot at interpreting probability.
122
Bibliography
Armendt, B., 1993, “Dutch Books, Additivity and Utility Theory”, Philosophical Topics,
Vol. 21, No. 1: 1-20
Ayer, A. J., 1952, Language Truth and Logic, New York: Dover Publications
Baillie, P., 1973, “Confirmation and the Dutch Book Argument”, The British Journal for
the Philosophy of Science, Vol. 24, No. 4: 393-397
Bartha, P. and Johns, R., 2001, “Probability and Symmetry”, Philosophy of Science, 68
(Proceedings): S109-S122
Black, M., 1981, Language and Philosophy: Studies in Method, Westport, Conn.:
Greenwood Press
Bolker, E. D., 1967, “A Simultaneous Axiomatization of Utility and Personal
Probability”, Philosophy of Science, Vol. 34, No. 4: 333-340
Braithwaite, R. B., 1953, Scientific explanation: A Study of The Function of Theory,
Probability and Law in Science, Cambridge: Cambridge University Press
Carnap, R., 1950, Logical Foundations of Probability, Chicago: University of Chicago
Press
—, 1952, The Continuum of Inductive Methods, Chicago: University of Chicago Press
—, 1955, “Statistical and inductive Probability” reprinted in Readings in the Philosophy
of Science, B. Brody and R. Grandy (eds), Englewood Cliffs: Prentice-Hall, 1989
—, 1963, “Replies and Systematic Expositions” in The Philosophy of Rudolf Carnap, P.
A. Schilpp, (ed.), Open Court, Illinois: La Salle
Chen, R., 1976, “Some Finitely Additive Versions of the Strong Law of Large Numbers”,
Israel Journal of Mathematics, Vol. 24, Nos. 3,4: 244-259
Christensen, D., 1996, “Dutch Book Arguments Depragmatized: Epistemic consistency
for Partial Believers”, The Journal of Philosophy, Vol. 93, No. 9: 450-479
Christensen, D., 2004, Putting Logic in its Place, New York: Oxford University Press
Church, A., 1940, “On the Concept of a Random Sequence”, Bulletin of the American
Mathematical Society, 46: 130-135
123
De Finetti, B., 1937, “La Prévision: Ses Lois Logiques, Ses Sources Subjectives”,
Annales de l”Institut Henri Poincaré, 7: 1-68; translated as “Foresight. Its Logical Laws,
Its Subjective Sources”, in Studies in Subjective Probability, H. E. Kyburg, Jr. and H. E.
Smokler (eds.), Robert E. Krieger Publishing Company, 1980
—, 1970, Teoria della Probabilita’, Turin: Einaudi
—, 1972, Probability, Induction and Statistics, New York: Wiley
—, 1990, (originally published 1974), Theory of Probability, Vol. 1, Wiley Classics
Library, John Wiley & Sons
Earman, J., 1992, Bayes or Bust, Cambridge: MIT Press
Edwards, W., Lindman, H., and Savage, L. J., 1963, “Bayesian Statistical Inference for
Psychological Research”, Psychological Review, LXX: 193-242
Eells, E., 1982, Rational decision and Causality, New York: Cambridge University Press
—, 1983, Objective Probability Theory Theory, Synthese, 57: 387-442
Elga, A., 2000, “Self-Locating Belief and the Sleeping Beauty Problem“, Analysis, 60
(2): 143-147.
Eriksson, L. and A. Hájek, 2007, “What Are Degrees of Belief?“ forthcoming in Studia
Logica, special issue on formal epistemology, ed. Branden Fitelson [Preprint available
online at: philrsss.anu.edu.au/people-defaults/alanh/papers/WADOB.pdf]
Festa, R., 1993, Optimum Inductive Methods: A Study in Inductive Probability, Bayesian
Statistics, and Verisimilitude, Dordrecht: Kluwer
Fetzer, J. H., 1983, “Probability and Objectivity in Deterministic and Indeterministic
Situations”, Synthese, 57: 367-386
Fine, T., 1973, Theories of Probability, Academic Press
Fishburn, P., 1986, “The Axioms of Subjective Probability”, Statistical Science, Vol. 1,
No. 3: 335-358
Fisher, R. A., 1935, “Statistical tests”, Nature, Vol. 136: 474-474
—, 1947, The Design of Experiments, 4
th
edition, Edinburgh: Oliver and Boyd
—, 1956, Statistical Methods and Statistical Inference, Edinburgh: Oliver and Boyd
124
Fitelson, B., Hajek, A., Hall, N., “Probability”, in The Routledge Encyclopedia of
Philosophy of Science, J. Pfeiffer, S. Rausch, S. Sarkar (eds.), Routledge
Forster, M. and Sober, E. 1994, “How to Tell when Simpler, More Unified, or Less Ad
Hoc Theories will Provide More Accurate Predictions”, British Journal for the
Philosophy of Science, 45: 1-35.
Gaifman, H., 1988, “A Theory of Higher Order Probabilities”, in Causation, Chance, and
Credence, B. Skyrms and William L. Harper (eds.), Dordrecht: Kluwer
Giere, R. N., 1973, “Objective Single-Case Probabilities and the Foundations of
Statistics”, in Logic, Methodology and Philosophy of Science, IV, P. Suppes, et al., (eds.),
New York: North-Holland
Gillies, D., 1971, “A Falsifying Rule for Probability Statements”, The British Journal for
the Philosophy of Science, vol. 22, No. 3: 231-261
—, 2000a, Philosophical Theories of Probability, London: Routledge
—, 2000b, “Varieties of Propensity”, The British Journal for the Philosophy of Science,
51: 807-835
Glymour, C., 1980, Theory and Evidence, Princeton: Princeton University Press
Goldstein, M., 1983, “The Prevision of a Prevision”, Journal of the American Statistical
Association, 78: 817-819
Goodman, N., 1983, Fact, Fiction and Forecast, Cambridge: Harvard University Press,
4
th
ed.
Hacking, I., 1965, The Logic of Statistical Inference, Cambridge: Cambridge University
Press
—, 2001, An Introduction to Probability and Inductive Logic, New York: Cambridge
University Press
Hájek, A., 1997, “‘Mises Redux’ — Redux. Fifteen Arguments Against Finite
Frequentism”, Erkenntnis, 45: 209-227
—, 2008, “Arguments For –Or Against– Probabilism”, forthcoming in Degrees of Belief,
Eds. F. Huber and C. Schmidth-Petri, Oxford: Oxford University Press
Herstein, I. N. and Milnor, J., 1953, “An Axiomatic Approach to Measurable Utility”,
Econometrica, Vol. 21, No. 2: 291-297
125
Hesse, M. B., 1974, The Structure of Scientific Inference, Berkeley: University of
California Press
Hintikka, J., 1965, “A Two-Dimensional Continuum of Inductive Methods” in Aspects of
Inductive Logic, J. Hintikka and P. Suppes, (eds.), Amsterdam: North-Holland
Hitchcock, C., 2002, “Probability and Chance”, in the International Encyclopedia of the
Social and Behavioral Sciences, vol. 18, 12089-12095, London: Elsevier
Howson, C., 1973, “Must the Logical Probability of Laws Be Zero?”, British Journal for
the Philosophy of Science, 24: 153-163
Howson, C. and Urbach, P., 1993, Scientific Reasoning: The Bayesian Approach, Open
Court, 2
nd
edition
Jeffrey, R., 1965, The Logic of Decision, Chicago: University of Chicago Press; 2
nd
ed.
1983.
—, 1992, Probability and the Art of Judgment, Cambridge: Cambridge University Press
Jeffreys, H., 1939, Theory of Probability; reprinted in Oxford Classics in the Physical
Sciences series, Oxford University Press, 1998.
Johnson, W. E., 1921, Logic, Cambridge: Cambridge University Press
Joyce, J., 1998, “A Nonpragmatic Vindication of Probabilism”, Philosophy of Science, 65
(4): 575-603
—, 2005, “How Probabilities Reflect Evidence”, Philosophical Perspectives, 19: 153-178
Kaplan, M., 1996, Decision Theory as Philosophy, Cambridge: Cambridge University
Press
Kemeny, J., 1955, “Fair Bets and Inductive Probabilities”, Journal of Symbolic Logic, 20:
263-273
Keynes, J. M., 1921, A Treatise on Probability, Macmillan and Co
Kennedy, R. and Chihara, C., 1979, “The Dutch Book Argument: its Logical Flaws, its
Subjective Sources”, Philosophical Studies, 36: 19-33
Kolmogorov, A. N., 1933, Grundbegriffe der Wahrscheinlichkeitrechnung, Ergebnisse
Der Mathematik; translated as Foundations of Probability, Chelsea Publishing Company,
1950
126
Krantz, D. H. and Luce, R. D., 1971, “Conditional Expected Utility”, Econometrica, Vol.
39, No. 2: 253-271
Kyburg, H. E., 1970, Probability and Inductive Logic, New York: Macmillan
—, 1978, “Subjective Probability: Criticisms, Reflections and Problems”, The Journal of
Philosophical Logic, Vol. 7: 157-180
Kyburg, H. E. and Smokler, H. E., (eds.), 1980, Studies in Subjective Probability, 2nd
ed., Huntington, New York: Robert E. Krieger Publishing Co.
Laplace, P. S., 1814, English edition 1951, A Philosophical Essay on Probabilities, New
York: Dover Publications Inc.
Lewis, D.,1980, “A Subjectivist's Guide to Objective Chance”, in Richard C. Jeffrey (ed.)
Studies in Inductive Logic and Probability, Vol II., Berkeley and Los Angeles: University
of California Press
—, 1986, “Probabilities of Conditionals and Conditional Probabilities II”, Philosophical
Review, 95: 581-589
—, 1994, “Humean Supervenience Debugged”, Mind, 103: 473-490
Luce, R. D. and Raiffa, H., 1957, Games and decisions, New York: Wiley
Maher, P., 1993, Betting on Theories, Cambridge: Cambridge University Press
—, 1997, “Depragmatized Dutch Book Arguments”, Philosophy of Science, 64: 291-305
—, 2000, “Probabilities for Two Properties”, Erkenntnis, 52: 63-91
—, 2001, “Probabilities for Multiple Properties: The Models of Hesse and Carnap and
Kemeny”, Erkenntnis, 55: 183-216
Miller, D. W., 1994, Critical Rationalism: A Restatement and Defence, Chicago and
Lasalle, Il: Open Court
Newman, J. R. (ed.), 1956, The World of Mathematics, New York: Simon & Schuster
Pearl, J., 2000, Causality, Cambridge: Cambridge University Press
Popper, Karl R., 1957, “The Propensity Interpretation of the Calculus of Probability and
the Quantum Theory” in S. Körner (ed.), The Colston Papers, 9: 65-70
127
—, 1959a, “The Propensity Interpretation of Probability”, British Journal of the
Philosophy of Science, 10: 25-42
—, 1959b, The Logic of Scientific Discovery, Basic Books; reprint edition 1992,
Routledge
Ramsey, F. P., 1926, “Truth and Probability”, in Foundations of Mathematics and other
Essays, R. B. Braithwaite (ed.), Routledge & P. Kegan , 1931, 156-198; reprinted in
Studies in Subjective Probability, H. E. Kyburg, Jr. and H. E. Smokler (eds.), 2
nd
ed., R.
E. Krieger Publishing Company, 1980, 23-52; reprinted in Philosophical Papers, D. H.
Mellor (ed.) Cambridge: University Press, Cambridge, 1990
Reichenbach, H., 1949, The Theory of Probability, Berkeley: University of California
Press
Renyi, A., 1970, Foundations of Probability, Holden-Day, Inc
Rissanen, J. 1999, “Hypothesis Selection and Testing by the MDL Principle”, Computer
Journal, 42 (4): 260-269
Roeper, P. and Leblanc, H., 1999, Probability Theory and Probability Logic, Toronto:
University of Toronto Press
Salmon, W., 1966, The Foundations of Scientific Inference, University of Pittsburgh
Press
Savage, L. J., 1954, The Foundations of Statistics, John Wiley
Schiff, F., 1986, “Dutch Bookies and Money Pumps”, Journal of Philosophy, 83: 112-
119
Scott D., and Krauss P.,1966, “Assigning Probabilities to Logical Formulas”, in Aspects
of Inductive Logic, J. Hintikka and P. Suppes, (eds.), Amsterdam: North-Holland
Shimony, A., 1970, “Scientific Inference”, in The Nature and Function of Scientific
Theories, R. Colodny (ed.), Pittsburgh: University of Pittsburgh Press
—, 1988, “An Adamite Derivation of the Calculus of Probability”, Probability and
Causality, in J.H. Fetzer (ed.), Dordrecht: D. Reidel
Skyrms, B., 1987, “Coherence” in Scientific Inquiry in Philosophical Perspective, N.
Rescher (ed.), Pittsburgh, PA: University of Pittsburgh Press
—, 1980, Causal Necessity, New Haven: Yale University Press
128
—, 2000, Choice and Chance, 4
th
ed, Wadsworth, Inc.
Siegmund, D., 1985, Sequential Analysis, New York: Springer-Verlag.
Sober, E., 2000, Philosophy of Biology, 2
nd
ed., Westview Press
Spirtes, P., Glymour, C. and Scheines, R., 1993, Causation, Prediction, and Search, New
York: Springer-Verlag
Stalnaker, R., 1970, “Probabilities and Conditionals”, Philosophy of Science, 37: 64-80
Stove, D. C., 1986, The Rationality of Induction, Oxford: Oxford University Press
Strawson, P. F., 1952, Introduction to Logical Theory, London: Methuen
Todhunter, I., 1949, A history of the Mathematical Theory of Probability From the Time
of Pascal to That of Laplace, New York: Chelsea
Van Cleve, J., 1984, “Reliability, Justification and the Problem of Induction”, Midwest
Studies in Philosophy, IX: 555-567
Van Fraassen, B., 1977, Relative Frequencies, Synthese, 34: 133-166
—, 1984, “Belief and the Will”, Journal of Philosophy, 81: 235-256
—, 1989, Laws and Symmetry, Oxford: Clarendon Press
—, 1995a, “Belief and the Problem of Ulysses and the Sirens”, Philosophical Studies, 77:
7-37
—, 1995b, “Fine-grained Opinion, Conditional Probability, and the Logic of Belief”,
Journal of Philosophical Logic, 24: 349-377
Venn, J., 1876, The Logic of Chance, 2
nd
ed., Macmillan and co; reprinted, New York,
1962.
von Mises R., 1957, Probability, Statistics and Truth, revised English edition, New York:
Macmillan
von Neumann, J. and Morgenstern, O., 1944, Theory of Games and Economic Behavior,
Princeton: Princeton University Press; New York: John Wiley and Sons, 1964.
von Plato J., 1994, Creating Modern Probability, Cambridge: Cambridge University
Press
129
Weatherford, R., 1982, Philosophical Foundations of Probability Theory, xxx: Routledge
Woodward, J., 2003, A Theory of Explanation: Causation, Invariance and Intervention,
Oxford: Oxford University Press
Zynda, L., 2000, “Representation Theorems and Realism About Degrees of Beliefs”,
Philosophy of Science, Vol. 67, No. 1: 45-69
130
Appendix A
Suppose that an agent’s A probability for an event γ is determined in the following way.
A is forced to fix the price of a bet that pays a sum of money S if γ will occur and zero
otherwise; subsequently, by requiring that the price of such a bet equals
!
p
"
3
# S ,
!
p
"
, A’s
probability for γ, is calculated. Then, if A’s probabilities violate the following three
equations (here α and β are any two incompatible events, and ω is a sure event)
∀χ,
!
p
"
#0 (i)
!
p
"
=1 (ii)
!
p
"#$
3
= p
"
3
+ p
$
3
(iii)
there exists a sequence of transactions of the relevant bets (i.e. of those bets that
correspond to A’s probabilities) that assure a monetary loss to A.
Here is the proof. Suppose, contrary to (i), that for some event χ,
!
p
"
<0; then A
is prepared to sell a bet on χ at price
!
p
"
3
# S ; that is A is prepared to pay somebody
!
p
"
3
# S to take a bet that will pay at a minimum zero; A’s loss is therefore assured. Now
let τ be a sure event and suppose, contrary to (ii), that
!
p
"
<1; then A is prepared to sell
for
!
p
"
3
# S a bet on τ that certainly pays S – A’s loss is therefore assured; on the other
hand, suppose that
!
p
"
>1; then A is prepared to buy for
!
p
"
3
# S a bet on τ that will
certainly pay S, and this is a sure loss. Lastly let α and β be mutually exclusive events
and suppose, contrary to (iii), that
!
p
"
+ p
#
< p
"$#
; then A is prepared to sell the bet on α
for
!
p
"
3
# S , the bet on β for
!
p
"
3
# S and to buy them back as the bet on α∨β for
!
p
"#$
% S;
131
in which case A incurs a sure loss since she paid
!
S" p
#$%
& p
#
& p
% ( )
but the monetary
value of the bet on α∨β equals the sum of the monetary values of the individual bets on α
and β (likewise it’s easy to show that if
!
p
"
+ p
#
> p
"$#
A’s loss is assured).
132
Appendix B
Suppose that an agent’s A probability for any event γ is determined by way of a bet on γ
whose price is formulated as
!
f (p
"
f
)#S . Then, if A’s probabilities for two incompatible
events α and β,
!
p
"
f
and
!
p
"
f
, violate the equation
!
f (p
"#$
f
) = f (p
"
f
) + f (p
$
f
) (iv)
there exists a sequence of transactions of the relevant bets (i.e. of those bets that
correspond to A’s probabilities) that assure a monetary loss to A.
Here is the proof. Suppose, contrary to (iv), that
!
f (p
"#$
f
) > f (p
"
f
) + f (p
$
f
); then A
is prepared to sell the bet on α for
!
p
"
f
#S, the bet on β for
!
p
"
f
#S and to buy them back
as the bet on α∨β for
!
p
"#$
f
% S; in which case A incurs a sure loss since she paid
!
S" p
#$%
f
&p
#
f
&p
%
f
( )
but the monetary value of the bet on α∨β equals the sum of the
monetary values of the individual bets on α and β (likewise it’s easy to show that if
!
f (p
"#$
f
) < f (p
"
f
) + f (p
$
f
) A’s loss is assured).
133
Appendix C
This appendix discusses and reviews two recent, unpublished papers on the foundations
of probability. The first paper, “Rejecting Representationalism”, has been written by
Christopher Meachan and Jonathan Weisberg and is available at
people.umass.edu/cmeacham/Meacham.Weisberg.Rejecting.Representationalism.pdf
The second paper, “Evidential Symmetry and Mushy Structure”, has been written by
Roger White of MIT and is available at
www.fitelson.org/few/few_08/white.pdf
I begin with Weisberg’s and Meachan’s paper. The main thesis of this paper is
sound: representation theorems do not and cannot offer any support whatsoever to
Probabilism (the thesis that subjective probabilities must comply with the axioms of
probability); rather, representation theorems merely support the weaker thesis that
subjective probabilities may be computed in such a way as to comply with the axioms of
probability. So far, so good. The problem of “Rejecting Representationalism” is that its
authors complicate unnecessarily a simple point. Let me discuss this main problem.
Weisberg’s and Meachan’s argument proceeds as follows. First the authors point out that
representation theorems may be taken to support Probabilism in two distinct ways: first,
representation theorems may be taken as stating that subjective probabilities in fact
comply with the axioms of probability (call this descriptive approach); second,
representation theorems may be taken as saying that subjective probabilities should
comply with the axioms of probability (call this normative approach). Next, the authors
134
distinguish two ways of understanding the descriptive approach, an empirical way and a
non-empirical way. Then a large part of the paper aims at dismissing all of the possible
ways in which representation theorems are taken to support Probabilism. Finally, the
authors consider, and reject, two arguments advanced by Zynda and Christensen to the
effect that representation theorems support Probabilism.
As I see it, the main problem with this line of argumentation is the unnecessary
threefold distinction into prescriptive approach, empirical descriptive approach and non-
empirical approach (setting aside that it’s not at all clear there is such thing as the non-
empirical descriptive approach, and setting aside that because of Kahnemann’s and
Tversky’s famous results
94
the empirical descriptive approach should have been
dismissed in the introduction of the paper). In the case at hand, before proceeding with
any philosophical analysis one must understand the underlying mathematics. And the
underlying mathematics says something extremely simple and unambiguous:
representation theorems fail at establishing that probabilities must be represented only as
complying with the axioms of probability (Cf. Ch. 2, §3). Once this is understood,
distinguishing between normative and descriptive approaches becomes secondary:
however you cut it, the argument based on representation theorems doesn’t work.
Weisberg’s and Meachan’s paper should have addressed the mathematics as foremost and
the philosophy only as a side issue. I now turn to White’s paper.
94
In their (1972) Kahnemann and Tversky presented and analyzed experimental data to
the effect that typically an agent’s probabilities, measured by an appropriate betting
system, do not comply with the axioms of probability.
135
“Evidential Symmetry and Mushy Structure” deals with the so-called principle of
indifference (PI), the thesis that if there is no evidence favoring one over another of
several mutually exclusive events then these events have the same probability. In
particular, White attempts to accomplish two things: defending PI from the view that the
paradox of inverse probabilities (for a description of this paradox cf. Ch. 1, §2) is fatal to
PI; and defending a variant of PI that has been increasingly advocated in recent years.
I begin with the first point. According to White (§3), the paradox of inverse
probabilities, which he calls ‘multiple partition problems’, is not due to the use of PI, but,
rather, to assuming certain ‘indifference relations’ as true. In the specific instance of the
paradox he examines, the case of a square whose side s is known to be between 0 and 2
feet long, one indifference relation is L1≈L2, where L1 stands for ‘s is between 0 and 1
foot long’, L2 stands for ‘s is between 1 and 2 feet long’, and ≈ designates indifference
(equivalence, if you will) between the two alternatives. According to White, it is L1≈L2
together with a similar indifference relation that lead to a paradox (in the specific instance
he considers), even without PI. White thinks that the paradox follows from the two
indifference relations by a simple and legitimate manipulation of those relations (p. 4).
Unfortunately, White’s argument is as flawed as it is simple. Consider again
L1≈L2, which expresses indifference between the alternatives L1 and L2. This apparently
unassuming relation presupposes PI to begin with; that is, the truth of L1≈L2 presupposes
the use of PI. Why? L1 and L2 are not atomic alternatives; rather they are the disjunctions
of infinitely many atomic alternatives (e.g. L1 is the disjunction of the atomic alternatives
!
s =0,
!
s =0.1,
!
s =0.16,
!
s =0.07,
!
s =0.62,
!
s =0.4, and so on ad infinitum). But
136
indifference may be invoked only between atomic alternatives. In order to invoke
indifference between L1 and L2, one must be presupposing that the probability associated
to s is uniformly distributed in the interval [0,1]. But this in turn presupposes PI: for what
else may guarantee that the probability associated to s is uniformly distributed in [0,1]
(Cf. also Ch.1, §2)?
I now turn to the second point. After discussing the paradox of inverse
probabilities, White proceeds to defend the following variant of PI (call it PI*): if there is
no evidence favoring one over another of several mutually exclusive events, then these
events should be assigned (the disjunction of) all probability-values in the interval [0,1].
For example, according to PI* the probability p of heads for a single toss of an unknown
coin is
!
0" p"1; and so is the probability of tails. Now, White defends PI* by way of two
cases, the case of a set of urns that contain varying ratios of black to white balls, and the
case of a “magic” coin. Both cases are adapted from Joyce (2005). The first case: suppose
there are eleven urns (U0,…,U10), and suppose that Ui contains i black balls and 10-i
white balls; suppose further that you know nothing about which urn is in front of you;
what is the probability of extracting a black ball from the urn in front of you? The
answer, says White, is [0,1], for in this case it is PI*, and not PI, that should be applied.
However, White goes on, suppose that the urn in front of you was selected by a genuinely
‘random’ process; in this case the probability is simply ½, by PI. I confess that I truly fail
to understand the distinction between a random process and a process about which one is
completely ignorant. Perhaps by ‘random’ the author means a process whose long run
frequency is balanced, but he doesn’t say. Be that as it may, this urn-thing is –at the very
137
least– unnecessarily complicated: White might have as well considered the case in which
from an urn containing one black and one white ball a ball is extracted, randomly at first,
and then by a non-random unknown method. The second case is even more puzzling. It
concerns some kind of magic coin. Unfortunately, after reading several times this
example I wasn’t able to understand its logic. It seems to me that the author should
clarify his discussion of the magic coin case quite a bit.
These two strange examples aside, I do not deny that the idea of dropping PI in
favor of the weaker alternative PI* has a prima facie plausibility. There is, however, a
problem: PI* doesn’t work too well. To see why, consider a couple of examples. Let E be
an event with three possible and mutually exclusive outcomes A, B, C, and suppose that
nothing is known about the likelihood of these outcomes. Now, by PI* the probability of
A, B and C is the very same: it’s any value in the interval [0,1]. But what is the
probability p of the complex outcome A∨B? PI* doesn’t allow us to compute it directly.
And if we invoke the axiom of additivity we get that
!
0" p"1 (if two mutually exclusive
outcomes X, Y have probabilities
!
a"#(X)"b,
!
c"#(Y)"d , then by additivity
!
min{a +c,1}"#(X$Y)"min{b +d,1}). However, this is unsatisfactory, for it is clear
that the probability of A∨B should be higher than the probability of, say, A. Or consider a
second example. Let C be a coin about which nothing is known. According to PI*, the
probability of heads for C, h, is
!
0"h"1; likewise the probability of tails for C, t, is
!
0" t"1. But what is the probability q of the certain event ‘heads or tails’? By additivity,
!
0"q"1. Since the occurrence of the outcome ‘heads or tails’ is certain, this is an
extremely odd result.
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Descartes's creation doctrine and modality
PDF
Probability assessment: Continuous quantities and probability decomposition
PDF
The life of a text: Bertolt Brecht's Three Penny Opera. Studies in literary and cultural transfer
PDF
Just imagining things: Hume's conception-based account of cognition
PDF
Numbers, symbols, and induction: the genetic and finite concepts of number
PDF
Molyneux's question answered!
PDF
The scope and significance of George Berkeley's language model
PDF
Thomas Reid: motives and the anatomy of the mind
PDF
Hobbes, education and the long run stability of nations: toward a Hobbesian model for contemporary religious theocratic socities
PDF
Against optionality in derivation and interpretation: evidence from scrambling
PDF
Entanglement in quantum critical and topological phases
PDF
Pandoramonium: reading outside the box, dialogues in postmodern historiography; The dead of night
PDF
How to use intuitions in philosophy
PDF
Multivariate statistical analysis of magnetoencephalography data
PDF
Lahontan: a novel; and "A million miles from God, three feet from hell": aesthestics and the literature of Nevada
PDF
The cine-eye goes digital: Vertov, Paradjanov and the poetic database
PDF
Critical reflection among school psychologists: an examination of content, cognitive style, and cognitive complexity
PDF
Renewing the people of the book: theory and practice for building Jewish interpretive learning communities
PDF
The role of steroid hormones in the etiology of urologic diseases
PDF
The Americano
Asset Metadata
Creator
Secchi, Luigi Antonio
(author)
Core Title
A critical examination of the three main interpretations of probability
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Philosophy
Degree Conferral Date
2010-08
Publication Date
05/21/2010
Defense Date
05/03/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
foundations of probability,interpretations of probability,OAI-PMH Harvest
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Damnjanovic, Zlatan (
committee chair
), Van Cleve, James (
committee member
), Alexander, Kenneth S. (
committee member
)
Creator Email
luigi_secchi00@yahoo.com,secchi@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3093
Unique identifier
UC1429930
Identifier
etd-Secchi-3742 (filename),usctheses-m40 (legacy collection record id),usctheses-m3093 (legacy record id),usctheses-c127-339179 (legacy record id)
Legacy Identifier
etd-Secchi-3742.pdf
Dmrecord
339179
Document Type
Dissertation
Rights
Secchi, Luigi Antonio
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
foundations of probability
interpretations of probability