Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Intelligent knowledge acquisition systems: from descriptive to predictive models
(USC Thesis Other)
Intelligent knowledge acquisition systems: from descriptive to predictive models
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INTELLIGENT KNOWLEDGE ACQUISITION SYSTEMS:
FROM DESCRIPTIVE TO PREDICTIVE MODELS
by
Mohammad Mehdi Korjani
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
August 2015
Copyright 2015 Mohammad Mehdi Korjani
II
To My Parents
III
Acknowledgements
This thesis would not have been possible without help of my mentor, academic advisor and
dissertation committee chair, Prof. Jerry M. Mendel, who made very important contributions,
suggestions, and comments. His knowledge, supervision, experience and constant support as
well as his kindness and patience have led me through five and half years of my Ph.D. study. It is
been really my honor working with him and be his last student.
I am also very grateful of Prof. Iraj Ershaghi for his support and eye-opening advice. He
created a stable source of financial and intellectual support for my projects, which allows me to
focus on my research issues without any apprehension.
I must also thank the Center for Interactive Smart Oil field Technologies (CiSoft), a joint
USC-Chevron initiative, for providing me scholarships and wonderful research opportunities
I am also grateful to my qualification and defense committee members: Prof. Iraj Ershaghi,
Prof. Keith Jenkins, and Prof. Jay C.-C. Kuo and Prof. Shri Narayanan. I want to thank them for
sharing their valuable time with me and making insightful suggestions.
I also would like to thank Dr. Andrea Popa, David Anderson, Minshen Hao, and Feilong Liu
for their valuable discussions on many aspects of my work.
My final and most heartfelt acknowledgment must go to my family and friends. In particular, I
want to thank Leila and Milad Korjani, Abie Aalem, Farnaz Adeli, Yassaman Dehghani, Alireza
Imani, Alireza Saraf, Bahar Abedi, Nooshin Moosavi, Minshen Hao, Amin Rezapour, Hadi
Goudarzi, Sumita Barahmand, Marjan Sherafati, and Mahrad Sharif Vaghefi for making my
Ph.D. experience easier and more enjoyable.
IV
Contents
List of Tables IX
List of Figures XIII
Abstract XVI
1 Introduction 1
1.1 Problem Statement and Motivations 1
1.2 Outline 5
2 Fuzzy Set Qualitative Comparative Analysis (fsQCA) 6
2.1 Introduction 6
2.2 fsQCA Overview 6
2.3 The Steps of fsQCA 8
2.3.1 Desired Outcome and Cases 8
2.3.2 Causal Conditions 9
2.3.3 Membership Functions for Desired Outcome and the Causal
Conditions 10
2.3.4 Derived Membership Functions 11
2.3.5 Candidate Causal Combinations (Rules) 11
2.3.6 Surviving Causal Combinations 12
2.3.7 Actual Causal Combinations (Consistency) 17
2.3.8 Complex and Parsimonious Solutions (QM Algorithm) 20
2.3.9 Intermediate Solutions (Counterfactual Analysis) 22
2.3.10 Simplified Intermediate Solutions 26
2.3.11 Believable Simplified Intermediate Solutions 27
2.3.12 Best Instances 27
2.3.13 Coverage 29
2.3.14 Summary 31
2.4 Comparisons of Linguistic Summarization Methods That Use Fuzzy Sets 33
2.5 Conclusions 36
V
3 Theoretical Aspects of fsQCA 37
3.1 Introduction 37
3.2 A Brief Quantitative Summary of Steps 1-7 of fsQCA 38
3.3 From 2
k
N to N Firing-Level Computations 41
3.4 Fast fsQCA 44
3.5 When the Number of Causal Conditions Changes 49
3.6 Recursive Computation of Consistency 52
3.7 On the Obliteration of a Rule 55
3.8 On the Existence of a Candidate Causal Combination 58
3.8.1 Introduction 59
3.8.2 When each variable is described by one term 60
3.8.3 When each variable is described by two terms 60
3.8.4 When each variable is described by three terms 63
3.8.5 Summary 66
4 Challenges to Using fsQCA 68
4.1 Introduction 68
4.2 Challenges to fsQCA 69
4.2.1 Step 1.1: Desired outcome and cases 69
4.2.2 Step 1.2: Causal conditions 71
4.2.3 Step 1.3: Membership functions 72
4.2.4 Step 1.4: Derived membership functions 73
4.2.5 Step 2.1: Candidate causal combinations (rules) 73
4.2.6 Step 2.2: Compute the surviving causal combinations 74
4.2.7 Step 2.3: Compute actual causal combinations 75
4.2.8 Step 2.4: Compute the complex and parsimonious solutions 76
4.2.9 Step 2.5:Perform counterfactual analysis so as to obtain the
intermediate solutions 78
4.2.10 Step 2.6: Compute the simplified intermediate solutions 79
4.2.11 Step 2.7: Compute the believable simplified intermediate solutions 80
4.2.12 Step 3.1: Find the best instances (cases) for each believable
simplified intermediate solution 80
VI
4.2.13 Step 3.2: Compute the coverage of each solution 80
4.3 Conclusion 80
5 A New Methodology for Calibrating Fuzzy Sets in fsQCA Using Level 2
and Interval Type-2 Fuzzy Sets 82
5.1 Introduction 82
5.1.1 Fuzzy Sets and Linguistic Variables 83
5.1.2 fsQCA Criticism 84
5.1.3 Size of a Vocabulary and Calibration 85
5.1.4 When Present Implementations of fsQCA are Correct 85
5.1.5 Robustness to Measurement Errors and Calibration 86
5.1.6 Organization of Rest of This Chapter 86
5.2 A New Way to Calibrate Fuzzy Sets for fsQCA 87
5.2.1 Introduction 87
5.2.2 IT2 FSs 87
5.2.3 IT2 FS as a Model for a Linguistic Term 89
5.2.4 Mapping Word Data into an FOU 90
5.2.5 Data Collection and Calibration 91
5.2.6 Recapitulation 93
5.3 Membership Functions for Use in fsQCA 93
5.3.1 Importance of an S-Shaped MF 94
5.3.2 Level 2 Fuzzy Sets 94
5.3.3 Recapitulation 100
5.4 Breakdown of Democracy Example 100
5.4.1 FOUs 101
5.4.2 MFs for the Six Linguistic Variables 103
5.4.3 fsQCA Using the MFs for Linguistic Variables 104
5.4.4 Recapitulation 108
5.5 Robustness of fsQCA in Breakdown of Democracy Example 108
5.5.1 Uncertainty About Models for
t
i
108
5.5.2 Uncertainty About Grades 109
5.5.3 Conclusions 112
VII
5.6 On Obtaining More Precise Causal Combinations 112
5.7 Discussions 114
5.8 Conclusions and Directions for Further Research 115
6 Interval Value fsQCA (IV fsQCA) 117
6.1 Introduction 117
6.2 IV fsQCA Steps 117
6.3 IT2 Min-max Theorem 124
7 On Establishing Nonlinear Combinations of Variables from Small to Big Data
for Use in Later Processing 127
7.1 Introduction 127
7.2 Terminology and Approach 128
7.3 Main Results for the Causal Combination Method (CCM) and the Fast Causal
Combination Method (FCCM) 132
7.3.1 Causal Combination Method (CCM) 132
7.3.2 Fast Causal Combination Method (FCCM) 134
7.4 Computational Speedup for the Fast Causal Combination Method 143
7.5 Additional Ways to Speedup Computations 145
7.6 Conclusions 147
8 Variable Structure Regression 149
8.1 Introduction 149
8.2 Measured Data 151
8.3 Preprocessing 152
8.4 Establish Antecedents of Rules and the Number of Rules 153
8.5 Establish Rules and VSR Equations 157
8.6 Optimizing Parameters and Structure of VSR Model 159
8.7 Experimental Results 167
8.8 Discussion 178
8.9 Conclusion 180
9 Conclusions and Future Works 184
Appendix A: Proofs 187
Appendix B: Prime Implicants and Minimal Prime Implicants 200
VIII
Appendix C: Rules of Counterfactual Analysis (CA) 201
Appendix D: Geometry of Consistency and Best Instances 203
Appendix E: The HM Method 206
Appendix F: Uncertainty Measures for IT2 FSs 210
Appendix G: QPSO Algorithm 214
Appendix H: Publications 216
Appendix I: Patent 217
Reference 218
IX
List of Tables
2.1 Data- and fuzzy-membership-matrix (showing original variables and their
derived fuzzy-set membership function scores) 15
2.2 Causal Combinations whose MF values are greater than 0.50 16
2.3 Distribution of cases across causal combinations and set-theoretic consistency
of causal combinations 16
2.4 Firing levels for nine surviving causal combinations and 18 cases. MFs for the
five causal conditions are in Table 2.1 17
2.5 Calculations and summary of best Instances (BI) for the believable simplified
intermediate solutions. BI procedure is given in Appendix D 29
2.6 Dummy variables and their ranges for the fsQCA spaces 33
2.7 Comparisons of Three Linguistic Summarization (LS) Methods that use fuzzy
sets 34
3.1 Data- and fuzzy-membership-matrix (showing original variables and their
derived fuzzy-set membership function scores) 43
3.2 Min-max calculations and associated causal combinations 43
3.3 Firing levels for six surviving causal combinations and 14 cases. MFs for the
six causal conditions are in Table 3.1 47
3.4 Computations for fsQCA Steps 5 and 6 48
3.5 Computations for Fast fsQCA Steps 5NEW and 6NEW 49
3.6 Frequency and consistency for six surviving causal combinations 53
3.7 Possible candidate causal combinations for two terms per variable and three
situations. Highlighted row indicates the plausibly possible situation and its
possible causal combinations 62
3.8 Possible candidate causal combinations for three terms per variable and 29
situations in Table 3.9 simplified to 23 situations. Highlighted rows indicate
the two plausibly possible situations and their possible causal combinations 64
5.1 Endpoint intervals for e = Instability (of a country) provided by Prof. Charles
Ragin 93
5.2 Some L2 FSs for term set T and their RI L2 MFs 97
X
5.3 Three breakpoints (BP), left end-point intervals (LEPI) and right end-point
intervals (REPI) for six linguistic variables that are used in the Breakdown of
Democracy example, from data provided by Prof. Charles Ragin 101
5.4 COG and centroid interval endpoints for
1
, W fully out
2
, W neither in nor out
3
W fully in 102
5.5 COG and maximum dispersion intervals for
1
, W fully out
2
, W neither in nor out
3
W fully in 102
5.6 MF values for each case using COG T1 MFs, and surviving causal
combination for each case 105
5.7 Distribution of cases and consistency of T1 surviving causal combinations
using COG T1 MFs 105
5.8 T1-fsQCA solutions using COG T1 MFs 105
5.9 Lower and Upper MFs for each case using Centroid FOUs 106
5.10 Surviving causal combination,
j
F , and its LMF and UMF values using
Centroid FOUs 107
5.11 Distribution of cases and consistency of IV surviving causal combinations
using Centroid FOUs 107
5.12 IV fsQCA Solutions using Centroid FOUs 107
5.13 Comparison of solutions to Breakdown of Democracy example using different
kinds of FSs 108
5.14 Comparisons of fsQCA solutions for different amounts of uncertainty added
to data endpoint intervals 109
5.15 Comparison of fsQCA solutions for different amounts of uncertainty added to
breakpoint membership grades 111
5.16 On more precise causal combinations for Estonia, Greece and Italy, where the
adjectives used for Developed, Urban, and Industrial are fully out, neither in
nor out, and fully in. The numbers in this table were obtained by using the
case data in [23] and Fig. 5.10 116
7.1 Data- and fuzzy-membership-matrix (showing original variables and their
derived fuzzy-set membership function scores; adapted from Table 1 in [62]) 136
7.2 Min-max calculations and associated causal combinations (taken from Table 1
in [62]) 137
XI
7.3 Membership grades for the six surviving causal combinations and 14 cases;
membership grades for the six causal conditions are in Table 7.1 (taken from
Table 1 in [16]) 137
7.4 Number of surviving causal combinations for eight problems 139
7.5 Surviving causal combinations and associated number of cases associated for
Abalone data set 141
7.6 Computations for CCM 144
7.7 Computations for FCCM 145
8.1 Pseudo-Code for structure and parameter optimization 166
8.2 Data (
i
x ) and Membership Grades (
i
H ) for 14 Cases 170
8.3 Min-Max Calculations and Associated Causal Combinations 170
8.4 MF Grades for Three Surviving Causal Combinations and 14 Cases (MFs for
the Six Causal Conditions are in Table 8.2) 171
8.5 Fold-1 First Iteration Surviving Causal Combinations and Regression
Coefficients for the Mackey-Glass Chaotic Time-Series Prediction Problem 170
8.6 Fold-1 Final Rules and Regression Coefficients for Mackey-Glass Chaotic
Time Series Prediction Problem [105] 175
8.8 Number of cases for each data set 176
8.9 Comparison of Number of Fuzzy Rules Obtained From VSR and FRI [105]
Methods for Different Problems 176
8.10 Average RMSE and Standard Deviation of Double Monte-Carlo for Different
Problems 177
8.11 Average RMSE and Standard Deviation of Double Monte-Carlo Runs for
Different Problems 177
8.12 Comparisons of Average RMSE for Different Methods on Different Problems 177
8.13 Ranks and Average Ranks for Different Methods With Respect to
Multivariable Regression Problems and Time Series Prediction Problems 178
8.14 Comparisons of WM and VSR Rules 180
A.1 Possible candidate causal combinations for three terms per variable and 29
situations 199
XII
G.1 Pseudo-Code for QPSO 215
XIII
List of Figures
2.1 fsQCA summarized 7
2.2 fsQCA partitions the original 2
k
causal combinations into three subsets. Note
that
S
F
= S
F
S
+ (S
F
- S
F
S
) = S
F
A
+ (S
F
S
-S
F
A
) + (S
F
- S
F
S
)
12
2.3 Mnemonic summary of fsQCA 19
2.4 Flowchart for fsQCA; it is continued in Fig. 2.5 32
2.5 Flowchart for fsQCA, continued 32
3.1 Top portion of flowchart for Fast fsQCA; it is continued in Fig. 3.2 46
3.2 Flowchart for fsQCA, continued from Fig. 3.1 46
3.3 fsQCA partitions the original 2
k
candidate causal combinations into three
subsets 67
3.4 fsQCA partitions the original 2
k
candidate causal combinations into four
subsets 67
5.1 A representative S-shaped MF. Note that 0%, 50% and 100% denote the
locations of x that indicate fully-out, neither in nor out (crossover point) and
fully-in membership, respectively; and, that sometimes [21] the 0% and 100%
breakpoints are located at membership grades of 0.05 and 0.95, respectively,
when a log-odds function is used to mathematically describe the MF 86
5.2 (a) Crisp MF, (b) type-1 MF and (c) FOUs for the linguistic variable Literacy
that is described by three terms, Low Literacy (L), Moderate Literacy (M) and
High Literacy (H) 89
5.3 IT2 FS, , along with its lower and upper MFs, FOU and a primary
membership. The flat spot where u = 1 is shared by both () LMF A and
() UMF A 89
5.4 IT2 FS models for five linguistic terms W
1
–W
5
. Here x has been normalized
to [0, 1], but in general such normalization is unnecessary 90
5.5 Low, Moderate, and High FOUs obtained from the HM method when the data
were obtained from a group of 175 subjects 92
5.6 HM FOUs for Instablity (of a country), when data were obtained from one
expert, Prof. Charles Ragin. Note that 10 / 21 xe 93
XIV
5.7 RI MFs for three linguistic terms: (a) T1 COG, (b) IV COG, (c) IV Centroid,
and (d) IV Maximum Dispersion FSs 97
5.8 Piecewise-linear approximation
1
( ) ( )
vv
xx (dashed) for Fig. 5.7a T1
COG MF;
1
( ) ( )
vv
xx (light weight solid lines) for Fig. 5.7b IV COG
MF; and,
1
( ) ( )
vv
xx (heavy weight solid lines) for Fig. 5.7c IV Centroid
MF 98
5.9 Log-odds approximations:
1
( ) ( )
vv
xx (dashed) for Fig. 5.7a T1 COG
MF and
1
( ) ( )
vv
xx (solid lines) for Fig. 5.7c IV Centroid MF 100
5.10 HM method FOUs for Breakdown of Democracy example: (a) Developed, (b)
Urban, (c) Literate, (d) Industrial, (e) Stable, and (f) Breakdown of
Democracy 102
5.11 Maximum Dispersion FOU (black), Centroid FOU (blue), COG T1 MF
(black) and Ragin’s T1 MF (red dashed) used in Breakdown of Democracy
example, for: a) Developed, b) Urban, c) Literate, d) Industrial, e) Stable, and
f) Breakdown of Democracy 103
5.12 Piecewise-linear approximations of S-shaped MFs for Breakdown of
Democracy example using the centroid for
t
i
and different amounts of
membership grade uncertainty, , for: a) Developed, b) Urban, c) Literate, d)
Industrial, e) Stable, and f) Breakdown of Democracy. The red curves are
Ragin’s T1 MFs 111
5.13 Mapping a number into a word 113
6.1 Four different situations of the IT2 FSs for a case: (a), (b) non-overlapping
intervals, and (c),(d) overlapping intervals 125
8.1 High-level flow-chart for the VSR model 160
8.2 Piecewise-linear MFs 163
8.3 Mackey-Glass time series 168
8.4 LM-FCM MFs 169
8.5 Optimized MFs for the first iteration 172
8.6 Decimal equivalent number of surviving rules for each outer-loop iteration 173
8.7 Fold-1 final optimized MFs 174
A.1 MF of Low (L) and its complement (l) 193
XV
A.2 MF of High (H) and its complement (h) 194
A.3 MFs assigned to variable v
i
when
x
1
< x
2
194
A.4 MFs assigned to variable v
i
when
x
2
< x
1
195
A.5 MF of Moderate (M) and its complement (m) 197
A.6 MFs assigned to variables v
i
when x
1
< x
3
< x
4
< x
2
198
D.1 Consistency regions. Regions A and B are where maximum consistency can
occur 203
E.1 Three kinds of FOUs and their parameters: (a) Left-shoulder FOU, (b) Interior
FOU, and (c) Right-shoulder FOU. In these figures l = 0 and r = 1 207
F.1 Covering an FOU by T1 FSs. Each T1 FS (called an embedded T1 FS)
contains one dashed line from the left-portion of the FOU, the common flat
top, and one dashed line from the right-portion of the FOU. and
are also embedded T1 FSs. is the union of all of the
embedded T1 FSs. In general, embedded T1 FSs do not have to be straight
lines 210
F.1 Mapping each word’s FOU into an interval of numbers, its Centroid
C
W
i
, or a
single number, its COG
c
W
i
, the dot in the middle of the Centroid 211
XVI
Abstract
Linguistic summarization is a data mining or knowledge discovery approach that describes a
pattern in a database. Using techniques to generate linguistic summaries not only facilitate
understanding and communication of data, but also can be used in decision-making. In this thesis
Fuzzy Set Qualitative Comparative Analysis (fsQCA) as a linguistic summarization approach is
proposed. FsQCA is a methodology for obtaining linguistic summarizations from data that are
associated with cases. It was developed by the eminent sociologist Prof. Charles C. Ragin, but
has, as of this date, not been applied by engineers or computer scientists. Unlike more
quantitative methods that are based on correlation, fsQCA seeks to establish logical connections
between combinations of causal conditions and an outcome, the result being rules that
summarize (describe) the sufficiency between subsets of all of the possible combinations of the
causal conditions (or their complements) and the outcome. The rules are connected by the word
OR to the output. Each rule is a possible path from the causal conditions to the outcome. We, for
the first time, explained fsQCA in a very quantitative way, something that is needed if engineers
and computer scientists are to use fsQCA.
Having been able to summarize fsQCA mathematically, it is possible to study some of its key
steps in order to better understand them and to even enhance them. We focus on how to greatly
speed up some of the computationally intensive steps of fsQCA and show how to use the speed-
up equations to obtain some interesting and important properties of fsQCA. These properties not
only provide additional understanding about fsQCA, but also lead to different ways to implement
fsQCA.
To actually apply fsQCA to some engineering data problems, there are some challenges that
had to be overcome. We explain the challenges. Many of these challenges are results of the way
membership functions are determined which is called calibration method. We explain why
calibration methods that are being used by fsQCA scholars, must be applied with great care in
order for their results to actually correspond to fuzzy sets, and that many times they do not lead to
fuzzy sets at all, even though users think they do, which calls into question the validity of fsQCA
since it is built upon fuzzy sets. We provide a new methodology for calibrating the fuzzy sets that
are used in fsQCA. The result is approximated Reduced Information Level 2 Fuzzy Set
Membership Function (RI L2 FS MF), and is not the MF of an ordinary FS but instead is the MF
XVII
of a level 2 FS, one that has an S-shape, the kind of shape that is so widely used by fsQCA
scholars, and is so important to fsQCA.
FsQCA rules involve words that are modeled using type-1 fuzzy sets (T1 FSs). Unfortunately,
once the T1 FS membership functions (MFs) have been chosen, all uncertainty about the words
that are used in fsQCA disappears, because T1 MFs are totally precise. Interval type-2 FSs (IT2
FSs), on the other hand, are first-order uncertainty models for words. In this thesis, we extend
fsQCA to IT2 FSs. More specifically, we develop IT2-fsQCA by extending the steps of fsQCA
from T1 FSs to IT2 FSs.
Using some key steps of fsQCA, we present a very efficient method for establishing nonlinear
combinations of variables from small to big data for use in later processing (e.g., regression,
classification, etc.). Variables are first partitioned into subsets each of which has a linguistic term
(called a causal condition) associated with it. Our Causal Combination Method uses fuzzy sets to
model the terms and focuses on interconnections (causal combinations) of either a causal
condition or its complement, where the connecting word is AND which is modeled using the
minimum operation. Our Fast Causal Combination Method is based on a novel theoretical result,
leads to an exponential speedup in computation and lends itself to parallel and distributed
processing; hence, it may be used on data from small to big.
In order to use fsQCA for forecasting, we use some of the steps of fsQCA to obtain an
optimized fuzzy rule extraction system model called Variable Structure Regression (VSR). This
model is a fuzzy linguistic regression where we simultaneously determine how many regressors
there are and how the variables should be combined in each of the regressors. A novel feature of
this new model is it not only uses a linguistic term for a variable but it also uses the complement
of that term. We also present an iterative procedure for optimizing both the structure and
parameters of the VSR model. Results obtained from VSR are compared against five other
methods for eight readily available data sets (four for multivariate approximation and four for
forecasting) and VSR ranks #1 in all cases.
1
Chapter 1
Introduction
1.1 Problem Statement and Motivations
The rapid progress of information technology has made huge amounts of data accessible to
people which is called data overload. Unfortunately, the raw data alone are often hardly
comprehensible. Data mining approaches, which extract knowledge from the raw data and
present it in a human-readable way, are highly desirable. Data summarization method which is
presented is the process of obtaining the most important information from data to provide a
condense version for an individual. Particularly, data summarization means to provide a sentence
or a group of sentences with linguistic terms that describes a pattern in a database without doing
(explicit) manual analysis, which is called Linguistic Summarization.
There are different kinds of linguistic summarizations, ranging from a library of pre-chosen
sentences from which the most representative one (or group) is chosen and is then declared to be
the linguistic summarization, to a collection of if-then rules, some or all of which are chosen to
be the linguistic summarization. Each of the different kinds of linguistic summarizations has its
useful place; however, in this thesis we are interested only in linguistic summarizations that are
in the form of if-then rules.
Linguistic data (base) summaries using type-1 fuzzy sets were introduced by Yager [1]–[4],
advanced by Kacprzyk and Yager [4], Kacprzyk, et al. [5] and Zadrożny and Kacprzyk [6],
implemented by Kacprzyk and Zadrożny (see, e.g., [7] and its many references by these authors),
and extended to type-2 fuzzy sets by Niewiadomski [8], [9]. Linguistic summarizations of time
series that use type-1 fuzzy sets has been studied by Kacprzyk and Wilbik, e.g. [10] (see, also, 13
other references by these authors, including Zadrożny, that are in this article). Because all of
these summarizations are for a library of pre-defined summarizers, and are not in the form of if-
then rules, they are not elaborated upon in this paper; however, detailed comparisons of three
different summarization methods are given in Section 4.
Linguistic summarization using if-then rules and type-1 fuzzy sets had its origins in Zadeh’s
2
classical 1973 paper [11]. Although these if-then rules are the foundation for the developments
of many kinds of quantitative rule-based systems, such as fuzzy logic control, rule-based
classification, etc., until recently very few people, other than perhaps Zadeh (e.g., [12]–[16]),
thought of them any longer as linguistic summarizations. This is because it is the mathematical
implementations of the set of rules that has become important in such applications, rather than
the rules themselves. In essence, the rules have become the means to the end, where the end is a
mathematical formula that produces a numerical output. Only since Zadeh’s pioneering works on
computing with words has there been a return to the understanding that a collection of if-then
rules, by themselves, is indeed a linguistic summarization.
Wang and Mendel [17] developed the first method to extract if-then rules from time-series data
(the WM method). Many improvements to the WM method have been published since the
original method. All of these works use the if-then rules as a predictive model, which according
to Hand, et al. [18] “… has the specific objective of allowing us to predict the value of some
target characteristic of an object on the basis of observed values of other characteristics of the
object.”
Recently, Wu and Mendel [19], [20] developed a different way to extract rules from data in
which the rules use interval type-2 fuzzy sets to model the words in both their antecedents and
consequents. Their rules construct a descriptive model, which according to Hand, et al. [18] “…
presents, in convenient form, the main features of the data. It is essentially a summary of the
data, permitting us to study the most important aspects of the data without them being obscured
by the sheer size of the data set
1
.
The linguistic summarization method that is described in this thesis also leads to a descriptive
model, and is called Qualitative Comparative Analysis (QCA). This has been used mainly in the
fields of social and political sciences and that does not seem to have been used (prior to this
work) in engineering or computer science
2
. Consequently, this thesis should be viewed as a
conduit for fsQCA from the less mathematically oriented social and political sciences literatures
into the more mathematically oriented engineering and computer science literatures.
According to Ragin [21, p. 183]: “The goal of QCA is to derive a logically simplified
statement describing the different combinations of conditions linked to an outcome.” Each
1
The linguistic summarizations mentioned above, due to Yager, Kacprzyk, Zadrożny, Niewiadomski and Wilbik
also fall into the class of descriptive models.
2
An exception to this are [121] and [122], but they only discuss a few aspects of fsQCA and not all of it.
3
combination of conditions and same outcome is sometimes referred to [22] as a type or a
typological configuration. According to Rihoux and Ragin [23, pp. 33, 66]:
Crisp set Qualitative Comparative Analysis (csQCA) was the first QCA technique,
developed in the late 1980s, by Professor Charles Ragin
1
and programmer Kriss Drass.
Ragin’s research in the field of historical sociology led him to search for tools for the
treatment of complex sets of binary data that did not exist in the mainstream statistics
literature. He adapted for his own research, with the help of Drass, Boolean algorithms
that had been developed in the 1950s by electrical engineers to simplify switching
circuits, most notably Quine
2
[24] and McCluskey [25]. In these so-called minimization
algorithms, he had found an instrument for identifying patterns of multiple-conjunctural
causation and a tool to simplify complex data structures in a logical and holistic manner
[26]. … csQCA is based on Boolean algebra, which uses only binary data (0 or 1), and is
based on a few simple logical operations
3
[union, intersection and complement]. … [In
csQCA,] it is important to follow a sequence of steps, from the construction of a binary
data table to the final ‘minimal formulas.’ … Two key challenges in this sequence, before
running the minimization procedure, are: (1) implementing a useful and meaningful
dichotomization of each variable, and (2) obtaining a ‘truth table’ (table of
configurations) that is free of ‘contradictory configurations.’ … The key csQCA
procedure is ‘Boolean minimization.’
csQCA was extended by Ragin to fuzzy sets, because he realized that categorizing social
science causes and effects as black or white was not realistic. Fuzzy sets let him get around this.
According to [23, p. 120]:
fsQCA retains key aspects of the general QCA approach, while allowing the analysis of
phenomena that vary by level or degree. … The fsQCA procedure … provides a bridge
between fuzzy sets and conventional truth table analysis by constructing a Boolean truth
table summarizing the results of multiple fuzzy-set analyses. … Fuzzy membership
scores (i.e., the varying degree to which cases belong to sets) combine qualitative and
quantitative assessments. … The key set theoretic relation in the study of causal
1
He is Chancellor's Professor of Sociology at the University of California, Irvine. He was a professor of sociology
and political science at the University of Arizona. In the 1980’s he was a professor of sociology and political science
at Northwestern University.
2
Actually, Quine is a famous logician and is not an electrical engineer.
3
Bracketed phrases, inserted by the present authors, are meant to clarify quoted materials.
4
complexity is the subset relation; cases can be precisely assessed in terms of their degree
of consistency [subsethood] with the subset relation, usually with the goal of establishing
that a combination of conditions is sufficient for a given outcome.
Both csQCA and fsQCA are set-theoretic methods. They differ from conventional quantitative
variable-based methods (e.g., correlation and regression) in that they [22] “… do not
disaggregate cases into independent, analytically separate aspects but instead treat configurations
as different types of cases.” Additionally, [22] “The basic intuition underlying QCA
1
is that cases
are best understood as configurations of attributes resembling overall types and that a
comparison across cases can allow the researcher to strip away attributes that are unrelated to the
outcome in question.”
According to [21, p. 183], “… QCA summarizes the truth table in a logically shorthand
manner.” This is linguistic summarization.
Kacprzyk and Zadrożny [6] classify linguistic summaries into five forms, type 5 being the most
general form, about which they state:
Type 5 summaries represent the most general form … fuzzy rules describing the
dependencies between specific values of particular attributes. … Two approaches to Type
5 summaries have been proposed. First, a subset of such summaries may be obtained by
analogy with association rules concept and employing their efficient algorithms. Second,
genetic algorithms may be used to search the summaries’ space.
fsQCA provides a type 5 summary and is a new approach for engineers and computer scientists
to obtain such a summary.
Although there are two kinds of fsQCA, one for establishing sufficient conditions and one for
establishing necessary conditions, this thesis focuses only on the former because it is only the
sufficient conditions for a specific outcome that are in the form of if-then rules; hence, our use of
the term “fsQCA” implies the phrase “fsQCA for sufficient conditions.”
One may ask: “Why is our work needed, since Ragin et al. have already published so much
about fsQCA?” Our answer to this rhetorical question is: We, for the first time, explain fsQCA in
a very quantitative way, something that is not found in the existing literature, and something that
is needed if engineers and computer scientists are to use fsQCA.
1
It is quite common to refer to both csQCA and fsQCA as “QCA” letting the context determine which QCA it is.
More recently, the phrase Configurational Comparative Methods is used to cover all QCA methods.
5
1.2 Outline
The work that is described in my thesis began in September 2009 has been underway for more
than five years, and has already led to the publications that are listed in Appendix H. The rest of
this research proposal is organized as follows: Chapter 2 describes the steps of fsQCA and puts
them on an analytical footing. It explains Counterfactual Analysis (CA), which is a way to
overcome the limitation of a lack of empirical instances, i.e. the problem of limited diversity; CA
leads to so-called intermediate solutions (summarizations). It also provides a comprehensive
numerical example that illustrates every step of fsQCA and also shows how the results from
those steps can be collected together in summary tables. Chapter 3 studies theoretical aspects of
fsQCA. It explains how fsQCA steps can be modified so that it is greatly speeded up, the result
being Fast fsQCA. Chapter 4 discusses some challenges that must still be overcome in order to
apply fsQCA to engineering and computer science applications. Without solving these
challenges it is impossible to use fsQCA for summarization. Chapter 5 provides a new
calibration method which can solve challenges in fsQCA. In order to understand the new
calibration method we return to some history and definitions about fuzzy sets and linguistic
variables. Then we extend type-1 FSs in fsQCA to Interval type-2 (IT2) FSs in Chapter 6. We
argue that if the T1 FS MFs have been chosen in fsQCA then all uncertainty about the words that
are used in fsQCA disappears, because T1 MFs are totally precise. IT2 FSs, on the other hand,
are first-order uncertainty models for words. In this chapter, we develop IT2-fsQCA by
extending the steps of fsQCA from T1 FSs to IT2 FSs. In Chapter 7, a very efficient method for
establishing nonlinear combinations of variables is developed using some key steps of fsQCA.
This method leads to an exponential speedup in computation and lends itself to parallel and
distributed processing; hence, it may be used on data from small to big. Chapter 8 uses some key
steps of fsQCA to provide a predictive model which is called Variable Structure Regression
(VSR). This model is a nonlinear regression method where simultaneously determine how many
regressors and how the variables should be combined in the regression model. Finally, Chapter 9
summarizes our works and proposes some future works.
6
Chapter 2
Fuzzy Set Qualitative Comparative Analysis (fsQCA)
1
2.1. Introduction
Fuzzy Set Qualitative Comparative Analysis (fsQCA) is a methodology for obtaining linguistic
summarizations from data that are associated with cases. It was developed by the eminent social
scientist Prof. Charles C. Ragin, but has, as of this date, not been applied by engineers or
computer scientists. Unlike more quantitative methods that are based on correlation, fsQCA
seeks to establish logical connections between combinations of causal conditions and an
outcome, the result being rules that summarize the sufficiency between subsets of all of the
possible combinations of the causal conditions (or their complements) and the outcome. The
rules are connected by the word OR to the output. Each rule is a possible path from the causal
conditions to the outcome. We, for the first time, explain fsQCA in a very quantitative way,
something that is needed if engineers and computer scientists are to use fsQCA. The material of
this chapter is mainly from our paper “Charles Ragin's Fuzzy Set Qualitative Comparative
Analysis (fsQCA) used for linguistic summarizations” [27].
2.2. fsQCA Overview
fsQCA begins (Fig. 2.1) with (your) substantive knowledge ( ①) about a problem. You specify
a desired outcome ( ②) (a separate fsQCA is run for each such outcome) and then choose the
cases ( ③) from which you hope to extract new knowledge about the potential causes for that
outcome. Next you postulate a set of k potential causes ( ④) that you believe could have, either
individually or in various combinations, led to the desired outcome. You might be wrong about
postulating a cause and so you protect yourself against this by simultaneously considering each
cause and its complement.
fsQCA connects the 2
k
possible (candidate) causal combinations to the desired outcome as a
simple if-then rule, namely “if this causal combination, then the desired outcome.” Each causal
combination contains exactly k terms (the causal condition or its complement) connected to each
1
This chapter is a duplication of our 2012 INS paper [27].
7
other by AND. All 2
k
candidate rules are for the same desired outcome and are therefore
connected by the word OR ( ⑤).
Fig. 2.1. fsQCA summarized.
fsQCA now uses the case-based data to reduce the 2
k
candidate rules to a much small number
of rules, and it simplifies the rules so that they usually contain causal combinations with fewer
than k terms ( ⑥). The latter happens because all of the rules are for the same desired outcome;
hence, they can be logically combined using set theory reduction techniques, and by doing this it
frequently happens that some or many causal conditions are absorbed (so they disappear from the
final causal combination).
There may not be enough cases (Ragin calls this “limited diversity”) to provide evidence (or
enough evidence) about all 2
k
candidate causal combinations, so more substantive knowledge is
obtained from domain experts ( ⑦) about whether or not a causal condition, or its complement,
could have led to the desired outcome. This additional substantive knowledge is then
incorporated into further fsQCA computations ( ⑧).
At the end of fsQCA one has a small collection of simplified if-then rules ( ⑨) that provide at
least one simplified causal combination for a desired outcome (unless no such rule can be found).
It is then possible to connect cases to each rule that are its best instances ( ⑩), and to compute
the coverage ((11)) of the cases by each rule.
8
Fuzzy sets are used in some of the fsQCA steps because things are not always black and white;
instead, they are a matter of degree.
2.3. The Steps of fsQCA
fsQCA seeks to establish logical connections between combinations of causal conditions and a
desired outcome, the result being rules (typological configurations) that summarize
1
(describe)
the sufficiency between subsets of all of the possible combinations of the causal conditions (or
their complements) and the outcome. It is not a methodology that is derived through
mathematics, e.g. as the solution to an optimization problem (nor are the linguistic
summarization methods of Yager, Kacprzyk, Zadrożny, Niewiadomski and Wilbik that are
mentioned in Section 1), although, as will be seen below, it uses mathematics. Our mathematical
description of fsQCA does not appear in the existing literatures about fsQCA. It is needed,
though, if engineers and computer scientists are to use fsQCA.
fsQCA as explained below has 13 steps
2
that provide one or more subsets of sufficient
conditions between a collection of postulated causal conditions and a desired outcome.
2.3.1 Desired Outcome and Cases
To begin, one must choose a desired outcome for a specific application, e.g. in an oilfield, high
180-day cumulative oil production rate; among 18 European countries between World Wars 1
and 2, breakdown in democracy; etc. One must also choose appropriate cases for that outcome.
Step 1. Choose a desired outcome and its appropriate cases: Let
O
S be the finite space of
possible outcomes,
w
O , for a specific application, i.e.
{ , 1,..., }
w
OO
O w n S (2.1)
1
Ragin does not think of fsQCA as linguistic summarization; he thinks of it as describing what’s going on between
a collection of causal conditions and an outcome. It is only in [47, page 15, Box 1.4] that “summarizing data” is
acknowledged as one of the five types of uses of QCA techniques. Consequently, it now seems legitimate to use
fsQCA for linguistic summarization. The other four uses for QCA are: check coherence of data, check hypotheses of
existing theories, quickly test conjectures, and develop new theoretical arguments.
A reviewer of our paper pointed out that: “Linguistic summarization is not used as a term in social science. If
explained to current fsQCA users, they would agree that most applications of fsQCA yield linguistic summaries.”
2
To the best of our knowledge, fsQCA has never before been enumerated as a collection of 13 steps. In fact, some
of the steps have never before been explained in the fsQCA literature.
9
The desired outcome is O, where
O
O S (e.g., high production rate, moderate production rate,
low production rate). fsQCA focuses on one outcome at a time, and each fsQCA is independent
of the others.
Let
Cases
S be the finite space of all appropriate cases (x) that have been labeled
1
1, 2, …, N, i.e.
{ 1,2,..., }
Cases
N S (2.2)
According to Ragin [28]:
The question “What is the case?” can have different answers in studies that might appear, at
first glance, to have identical casings. … In fact, the first step in much case-oriented inquiry is to
identify the best possible instances of the phenomenon to be explained and then study these
instances in great depth. … casing is outcome driven …. [This means that] … one can have
different choices of cases for different kinds of studies, ranging from a study for which there is
only once case, to a study in which there are a set of cases for the same outcome, to a study for
which there are both negative and positive cases
2
for the same outcome, to a study that uses the
entire population (such a study seeks generalizations about the population).
Choosing the appropriate cases needs to be done first, and this choice does not have to be done
once and for all, i.e. it can be modified during the entire fsQCA procedure. The appropriate cases
can be different when the desired outcome is different levels of the same variable, i.e. high
production rate and low production rate may have different sets of appropriate cases.
2.3.2 Causal Conditions
Next, one must postulate a subset of causal conditions that you believe (based on your
knowledge) to be a possible cause (either individually or as a sub-group or as an entire group) of
the desired outcome.
1
Cases have no natural ordering, but instead each case is identified by an integer, so that by knowing the integer one
also knows the case. The integers x = 1, 2, …, N are used to represent the N cases, and in this way the cases are
ordered. For a person to repeat someone else’s fsQCAs, and compare their intricate details with someone else’s
intricate details, they need to know the ordering of the N; hence, it is assumed that this information is provided to
them.
2
A positive case is one for which the desired outcome is strongly present and a negative case is one for which the
desired outcome may not be present at all or is weakly present.
10
Step 2. Choose k causal conditions (if a condition is described by more than one term, treat
each term as an independent causal condition): Let
C
S be the finite space of all possible causal
conditions,
i
C , for the specific application, i.e.
{ , 1,..., }
C i C
C i n S (2.3)
A subset of the ¢ C
¢ i
possible causal conditions,
C
S , is chosen whose elements are re-numbered
1,2,...,k , i.e.
{ , 1,..., }
C i i C
C i k C SS (2.4)
In Ragin’s works (e.g., [47, Ch. 5]) it is quite common for each causal condition to be
described just by one term, e.g., in a study of the breakdown of democratic systems for 18
countries in Europe between World Wars 1 and 2, he chose the following five candidate causal
conditions: [country is] developed, urban, literate, industrial and unstable. Observe that none of
these words has adjectives appended to it, whereas in most engineering applications for fsQCA it
is very common to have adjectives appended to a causal condition, e.g., low permeability and
high permeability.
2.3.3 Membership Functions for Desired Outcome and the Causal Conditions
fsQCA needs membership functions (MFs) for the k possible causal conditions and the desired
outcome.
Step 3. Treat the desired outcome and causal conditions as fuzzy sets and determine MFs for
them, i.e.
m
O
:W Í ® [0,1]
w m
O
(w )
(2.5)
m
C
i
:X
i
Í ® [0,1]
x
i
m
C
i
(x
i
)
ü
ý
ï
þ
ï
i = 1,..., k (2.6)
In this report, it is assumed that these membership functions, which are continuous functions of
independent variables,
j
or , are known. How to obtain them may be non-trivial, but those
details (e.g., given in [29]) are not needed in order to understand the major computations of
fsQCA, which are the focus of this report.
11
2.3.4 Derived Membership Functions
Next, one computes derived membership functions for the k possible causal conditions and the
desired outcome as functions of the ordered cases
1
.
Step 4. Evaluate the MFs for all N appropriate cases, the results being the derived
membership functions, i.e. ( x = 1,..., N and i = 1,..., k)
: ( , ) [0,1]
( ) ( ( )) ( )
D
O Cases O
D
OO
x x x x
SS
(2.7)
: ( , ) [0,1]
( ) ( ( )) ( )
i
ii
D
C Cases C
D
i C i C
x x x x
SS
(2.8)
Generally, ()
j
D
C
x and ()
D
O
x are neither normal nor unimodal functions
2
of x.
2.3.5 Candidate Causal Combinations (Rules)
fsQCA next establishes a set of 2
k
candidate causal combinations (rules), one rule for each
possible causal combination of the k causal conditions or its complement.
Step 5. Create 2
k
candidate causal combinations (rules) and interpret each as a corner in a 2
k
-
dimensional vector space
3
: Let
F
S be the finite space of 2
k
candidate causal combinations,
called (by us) firing level fuzzy sets
4
, F
i
, i.e. ( 1,...,2 and 1,...,
k
j i k )
1
Ragin does not use the phrase “derived membership functions” nor does he interpret such a calculation as a MF of
another fuzzy set. The latter is important for the subsethood calculation that is performed below. Instead, he
provides tables listing the membership in each causal condition and in the causal combinations for each of the N
cases. We provide such tables later in this section.
One may feel that Steps 3 and 4 can be combined. We have chosen not to do this so it is clear that there is a
distinction between the MFs and derived MFs of causal conditions and the desired output. See [123] and [115] for
detailed explanations of how Steps 3 and 4 can be performed.
2
This is okay, because in a traditional type-1 fuzzy logic system (T1 FLS) (e.g., [42]), when fired-rule output fuzzy
sets are combined by the union operation, the resulting fuzzy set is also non-normal and is also frequently non-
unimodal; so, such fuzzy sets are already in wide use.
3
Associating crisp sets with corners of a hypercube and fuzzy sets with points between the corners of a hypercube
was also proposed by Kosko [124].
4
Ragin does not use the term “firing-level fuzzy set”; instead, he uses the phrase “causal combination.” Here terms
and phrases are used that are already well established in the T1 FLS literature, however, the terms “firing-level
fuzzy set” and “causal combination” are used by us interchangeably.
12
1 1 2
2
{ ,..., } ...
or
k
j j j
F j k
j
i i i
F F F A A A
A C c
S
(2.9)
where (using Ragin’s notation)
i
c denotes the complement of
i
C .
By this step, fsQCA establishes one candidate rule (typology) for the desired outcome O that
has the form: IF
1
F or
2
2
or ... or , THEN
k
F F O , where the logical OR operation (a disjunction)
is implemented using the maximum. Of course, this rule can also be expressed as a collection of
2
k
rules (one for each causal combination) each having the same consequent. In the rest of
fsQCA these candidate rules are either deleted or simplified.
Step 5 is the starting point for the mnemonic summary of Steps 6-11 of fsQCA shown in Fig.
2.2.
Fig. 2.2. fsQCA partitions the original 2
k
causal combinations into three subsets. Note that
( ) ( ) ( )
S S A S A S
F F F
F F F F F F
S S S S S S -S S S .
2.3.6 Surviving Causal Combinations
It is in this step that the first major computations are performed in fsQCA. The 2
k
candidate
causal combinations are reduced to a much smaller subset of surviving causal combinations.
Step 6. Compute the MF of each of the 2
k
candidate causal combinations in all the appropriate
cases, and keep only the ones—the R
S
surviving causal combinations (firing level surviving
rules)—whose MF values are > 0.5 for an adequate number of cases (this must be specified by
the user), i.e., keep the adequately represented causal combinations that are closer to corners
and not the ones that are farther away from corners: This is a mapping from ,}
F Cases
{S S into
13
S
F
S , that makes use of ()
j
i
A
x , where
S
F
S is a subset of
F
S , with
S
R elements, i.e.
( j =1 ,...,2
k
, x =1 ,..., N and l =1 ,..., R
S
)
12
: ( , ) [0,1]
( ) min ( ), ( ),..., ( )
j
j j j
j
k
F F Cases
F
A A A
x x x x x
SS
(2.10)
( ) ( ) or ( ) 1 ( ) 1,...,
j
i i i
i
D D D
C c C
A
x x x x i k (2.11)
: ([0,1], ) {0,1 }
( )
1 if ( ) 0.50
0 if ( ) 0.50
j
j
j
j
F Cases
F
F
F
t
x t x
x
x
S
(2.12)
1
:{0,1 }
= ( )
j
j j j
F
N
F F F
x
NI
t N t x
(2.13)
: ( , )
( ) , 1,...,2
S
j
S
lF
F
Sk
j l j F
FI
F F F j l N f j
SS
(2.14)
()
j
F
x in (2.10) is the firing level
1
for the j
th
rule. In (2.14) f is an integer frequency threshold
that must be set by the user. The firing level fuzzy sets for these
S
R surviving rules, that are
denoted F
l
S
, have associated re-numbered membership functions ( ), 1,...,
S
l
S
F
x l R , and are
expressed as in (2.9).
Regarding closeness to corners in a 2
k
- dimensional vector space, in crisp set QCA [23], a
candidate rule is either fully supported (i.e., it’s firing level MF value equals 1) or is not
supported at all (i.e., it’s firing level MF value equals 0), and only the fully supported candidate
rules survive. fsQCA backs off from the stringent requirement of crisp set QCA by replacing the
vertex membership value of “1” with a vertex membership value of > 0.5, meaning that if the
firing level is greater than 0.5 then the causal combination is closer to its vertex than it is away
1
Ragin does not use this terminology; he calls this the “fuzzy set membership of cases in the causal conditions.” In
order to be consistent with terms that are already used in the FS literature, the term “firing level” is used herein.
14
from its vertex. Only those cases whose firing levels are greater than 0.5 are said to support the
existence of a candidate rule.
Regarding the frequency threshold, some guidance on how to choose f is provided by Ragin,
i.e. for a small number of cases f is set at 1 or 2 [23, Ch. 5, p. 107], whereas for a large number of
cases f is at least
1
10 [21, p. 197]. Unfortunately the words “small” and “large” are fuzzy, so the
user must vary f until acceptable results are obtained. At this time, there does not seem to be a
way to quantify what “acceptable results” means. We are developing type-2 fsQCA that will
hopefully overcome some of the subjectivities of fsQCA.
Note that (2.10) and (2.11) require 2
k
N computations. A fast version of fsQCA appears in
Chapter 3 and [30].
Example 2.1. In order to illustrate the computations from the candidate rules to the subset of
firing-level surviving rules, we consider an example that is taken from [23, Ch. 5] for which the
desired outcome is O = Breakdown of Democracy (of 18 European countries between World
Wars 1 and 2) and there are five causal conditions: A = developed (country), B = urban
(country), C = literate (country), D = industrial (country) and E =stable (country). The data in
Table 2.1 are taken from Table 5.2 in [23].
Using knowledge and techniques from social science, numerical values were obtained for A–E
for 18 European countries that in Table 2.1 are called
2
“Cases 1–18.” Numerical values were
initially obtained by Ragin for o = Survival of Democracy, which was assumed to be the
complement
3
of Breakdown of Democracy; hence, () MF O was computed from () MF o as
1 ( ) MF o . S-shaped MFs were obtained for Survival of Democracy, developed (country), urban
1
Ragin and Fiss [21, p. 197] (Fiss is the co-author of Chapter 11) state (for large N): “The fuzzy-set analysis that
follows uses a frequency threshold of at least ten strong instances. This value was selected because it captures more
than 80 percent of the [more than N = 700] cases assigned to [causal] combinations [in their works].” Setting the
threshold at 10 and then deleting all causal combinations where there are fewer than 10 cases left them with 80% of
the cases. In a later work f was chosen to be 3 when N = 205.
2
The numbered cases correspond to the following countries: 1-Austria, 2-Belgium, 3-Czechoslovakia, 4-Estonia, 5-
Finland, 6-France, 7-Germany, 8-Greece, 9-Hungary, 10-Ireland, 11-Italy, 12-Netherlands, 13-Poland, 14-Portugal,
15-Romania, 16-Spain, 17-Sweden, and 18-United Kingdom.
3
Breakdown of Democracy actually is an antonym of Survival of Democracy, and the MF of an antonym is very
different from the MF of the complement. Let B = Breakdown of Democracy and S = Survival of Democracy; then,
a MF for an antonym is m
B
(x) = m
S
(10 - x)[42]. This m
B
(x) is very different from using the complement of S;
however, for the purposes of this example, we use the complement because it is widely used by Ragin.
15
(country), literate (country), industrial (country) and unstable (country) using a method that is
described in [21, Ch. 5] the details of which are not important for this chapter. The MF for e =
Unstable Country, was assumed to be the complement of Stable Country; hence, () MF E was
computed from () MF e as 1 ( ) MF e . Using these MFs, Ragin obtained the MF scores that are
also given in Table 2.1. These MFs implement (2.7) and (2.8).
Table 2.1
Data- and fuzzy-membership-matrix (showing original variables and their derived fuzzy-set membership function
scores)
a
Case
Outcome Causal Condition and Derived MF scores
o MF(O) A MF(A) B MF(B) C MF(C) D MF(D) e MF(E)
1 -9 0.95 720 0.81 33.4 0.12 98 0.99 33.4 0.73 10 0.43
2 10 0.05 1098 0.99 60.5 0.89 94.4 0.98 48.9 1 4 0.98
3 7 0.11 586 0.58 69 0.98 95.9 0.98 37.4 0.90 6 0.91
4 -6 0.88 468 0.16 28.5 0.07 95 0.98 14 0.01 6 0.91
5 4 0.23 590 0.58 22 0.03 99.1 0.99 22 0.08 9 0.58
6 10 0.05 983 0.98 21.2 0.03 96.2 0.99 34.8 0.81 5 0.95
7 -9 0.95 795 0.89 56.5 0.79 98 0.99 40.4 0.96 11 0.31
8 -8 0.94 390 0.04 31.1 0.09 59.2 0.13 28.1 0.36 10 0.43
9 -1 0.58 424 0.07 36.3 0.16 85 0.88 21.6 0.07 13 0.13
10 8 0.08 662 0.72 25 0.05 95 0.98 14.5 0.01 5 0.95
11 -9 0.95 517 0.34 31.4 0.10 72.1 0.41 29.6 0.47 9 0.58
12 10 0.05 1008 0.98 78.8 1 99.9 0.99 39.3 0.94 2 0.99
13 -6 0.88 350 0.02 37 0.17 76.9 0.59 11.2 0 21 0
14 -9 0.95 320 0.01 15.3 0.02 38 0.01 23.1 0.11 19 0.01
15 -4 0.79 331 0.01 21.9 0.03 61.8 0.17 12.2 0 7 0.84
16 -8 0.94 367 0.03 43 0.30 55.6 0.09 25.5 0.21 12 0.20
17 10 0.05 897 0.95 34 0.13 99.9 0.99 32.3 0.67 6 0.91
18 10 0.05 1038 0.98 74 0.99 99.9 0.99 49.9 1 4 0.98
a
This table is modeled after Table 5.2 in [32], and the numbers in it are the same as the ones in that table.
For five causal conditions there are
5
2 32 causal combinations whose MFs have to be
evaluated for all 18 cases. An examination of the 32 MFs for each of the 18 cases reveals that
there is only one causal combination for each case whose membership is greater than
1
0.50.
Those causal combinations, called
*
()
ji
Fx , are summarized in Table 2.2.
1
Ragin [21, p. 131] observed that, in an example with four causal conditions, “… each case can have (at most) only
a single membership score greater than 0.5 in the logical possible combinations from a given set of causal
conditions.” Because a proof of this fact (which is true in general) is not needed in the rest of this chapter, it is not
included herein. It is given in Chapter 3.
16
Table 2.2
Causal Combinations whose MF values are
greater than 0.50
Cases,
x
i
m
F
j*
(x
i
) F
j*
(x
i
)
1 0.57 AbCDe
2 0.89 ABCDE
3 0.58 ABCDE
4 0.84 abCdE
5 0.58 AbCdE
6 0.81 AbCDE
7 0.69 ABCDe
8 0.57 abcde
9 0.84 abCde
10 0.72 AbCdE
11 0.53 abcdE
12 0.94 ABCDE
13 0.59 abCde
14 0.89 abcde
15 0.83 abcdE
16 0.70 abcde
17 0.67 AbCDE
18 0.98 ABCDE
The firing-level surviving rules (obtained from the last column in Table 2.2) are summarized in
the third column of Table 2.3 (this table is also used in Example 2.2). Observe that only nine out
of the 32 possible causal combinations have survived. Their Best Instances were obtained by
matching up the last and first columns of Table 2.2 for each of the nine causal combinations.
Table 2.4 shows the firing levels for the nine surviving rules for all 18 cases because these
quantities will be used in the next step of fsQCA.
Table 2.3
Distribution of cases across causal combinations and set-theoretic consistency of causal combinationsa
a
Bold-faced entries are for consistencies > 0.80.
Best Instances
Causal Combinations
Corresponding
Vector Space
Corner
Number of
cases with
> 0.5
membership
Set-theoretic
Consistency
A B C D E
8, 14, 16 0 0 0 0 0 abcde 3 1
11, 15 0 0 0 0 1 abcdE 2 0.982
1 1 0 1 1 0 AbCDe 1 0.974
7 1 1 1 1 0 ABCDe 1 0.971
4 0 0 1 0 1 abCdE 1 0.861
9, 13 0 0 1 0 0 abCde 2 0.855
5, 10 1 0 1 0 1 AbCdE 2 0.498
6,17 1 0 1 1 1 AbCDE 2 0.495
2, 3, 12, 18 1 1 1 1 1 ABCDE 4 0.250
17
Table 2.4
Firing levels for nine surviving causal combinations and 18 cases. MFs for the five causal conditions are in Table
2.1
a
.
Case
Memberships of Surviving Causal Combinations (minimum of five causal-condition-MFs):
Firing Levels
AbCDe ABCDE abCdE AbCdE AbCDE ABCDe abcde abCde abcdE
1 0.57 0.12 0.19 0.27 0.43 0.12 0.01 0.19 0.01
2 0.02 0.89 0 0 0.11 0.02 0 0 0
3 0.02 0.58 0.02 0.02 0.02 0.09 0.02 0.02 0.02
4 0.01 0.01 0.84 0.16 0.01 0.01 0.02 0.09 0.02
5 0.08 0.03 0.42 0.58 0.08 0.03 0.01 0.42 0.01
6 0.05 0.03 0.02 0.19 0.81 0.03 0.01 0.02 0.01
7 0.21 0.31 0.04 0.04 0.21 0.69 0.01 0.04 0.01
8 0.04 0.04 0.13 0.04 0.04 0.04 0.57 0.13 0.43
9 0.07 0.07 0.13 0.07 0.07 0.07 0.12 0.84 0.12
10 0.01 0.01 0.28 0.72 0.01 0.01 0.02 0.05 0.02
11 0.34 0.10 0.41 0.34 0.34 0.10 0.42 0.41 0.53
12 0.00 0.94 0 0 0 0.01 0 0 0
13 0.00 0 0 0 0 0 0.41 0.59 0
14 0.01 0.01 0.01 0.01 0.01 0.01 0.89 0.01 0.01
15 0.00 0 0.17 0.01 0 0 0.16 0.16 0.83
16 0.03 0.03 0.09 0.03 0.03 0.03 0.70 0.09 0.20
17 0.09 0.13 0.05 0.33 0.67 0.09 0.01 0.05 0.01
18 0.01 0.98 0 0 0.01 0.02 0 0 0
a
The bold-faced numbers indicate the one causal condition for each case whose membership is greater than 0.50.
2.3.7 Actual Causal Combinations (Consistency)
So far the calculations of fsQCA have focused exclusively on the antecedents of a candidate
rule (which means that they do not have to be repeated for different desired outcomes, unless the
cases change for such outcomes). The next major calculation involves both the antecedents and
the consequent of a firing-level surviving rule.
A traditional fuzzy logic (FL) rule assumes that its antecedents are sufficient for its
consequent, by virtue of the a priori construct of that if-then rule, e.g., “ if , then DE ” means “D
implies E,” i.e. “D is sufficient for E.” One does not usually question the existence of the stated
FL rule; however, at this point in fsQCA it is not known if the antecedents of a firing-level
surviving rule are indeed sufficient for the consequent, i.e. one questions the existence of the
rule.
If the antecedents are sufficient for the consequent, then the rule actually exists; however, if
they are not then the rule does not exist. So, the next calculation of fsQCA establishes whether or
not a rule exists. This calculation is a quantification of the fact that a causal combination is
sufficient for an outcome if the outcome always occurs when the causal combination is present
18
(however, the outcome may also occur when a different causal combination is present), i.e. the
causal combination (the antecedents) is a subset of the outcome. Ragin uses Kosko’s subsethood
formula
1
to compute the subsethood—consistency
2
—of the antecedents in the outcome for each
of the
S
R firing-strength surviving rules.
Step 7. Compute the consistencies (subsethoods) of the
S
R surviving causal combinations, and
keep only those
A
R causal combinations—the actual causal combinations (actual rules)—
whose consistencies are greater than
3
0.80: This is a mapping from , , }
S
Cases
F
O {S S into
A
F
S ,
where
A
F
S is a subset of
S
F
S , with
A
R elements, i.e. ( 1,...,
S
lR and 1,...,
A
mR )
1
1
1
( , ) : , , } [0,1]
min ( ), ( )
( ), ( ) ( , )
()
S
S
l
S
l
S
l
S
K l Cases
F
N
D
N
O
F x
DS
O K l N F
x
F x
ss F O O
xx
x x ss F O
x
{S S (2.15)
:[0,1]
( , ) = ( ) ( , ) 0.80, 1,...,
A
A
m
F
S A S S
K l m l K l S
F
ss F O F F l m ss F O l R
S
(2.16)
The firing level fuzzy sets for these actual rules, that are denoted
A
m
F , have associated re-
numbered membership functions ( ), 1,...,
A
m
A
F
x m R , and can be expressed as [from (2.9)] (the
superscript A in each
, Am
i
A denotes “actual”):
, , ,
12
...
A A m A m A m
mk
F A A A (2.17)
1
A study into the robustness of the subsethood computations to other choices for the subsethood formula is found in
Wierman [122].
2
“Consistency” is the term favored by Ragin [21], [23] although he also uses the term “subsethood.” Here we use
the two terms interchangeably.
3
A reviewer noted that too much emphasis is placed on the use of the somewhat arbitrary value of 0.8 as a
consistency cut-off, and that any value within a limited range of values (e.g., 0.75 to 1) probably would be
acceptable. The reviewer also suggests using multiple values for the same analysis (e.g., 0.80 and 0.90) and then
assessing the difference between the two sets of results. We agree with these suggestions. Note that Ragin also
advocates “… looking for gaps in the upper range of consistency [subsethood] that might be useful for establishing a
threshold, keeping in mind that it is always possible to examine several different thresholds and assessing the
consequences of lowering or raising the consistency [subsethood] cut-off.”
19
Observation 1: Steps 5-7 partition
F
S into three mutually exclusive subspaces (Fig. 2.3):
S
F
F
SS , whose elements are the causal combinations whose firing levels do not pass the
frequency threshold test and are called “Remainders;”
SA
FF
SS , whose elements are the causal
combinations whose firing levels pass the frequency threshold test but whose consistencies are <
0.80; and,
A
F
S , whose elements are the causal combinations whose firing levels pass the
frequency threshold test and whose consistencies are ≥ 0.80.
Fig. 2.3. Mnemonic summary of fsQCA.
Example 2.2. Consistencies (subsethoods) for the data that are in Tables 2.1 and 2.4 are
computed using (2.15), and are summarized in the last column of Table 2.3. Note that each of
these calculations uses the MFs for all 18 cases. The rows of Table 2.3 are ordered so that its
first row has the largest value for Consistency and the last row has the smallest value for
Consistency. Columns 2-6 of this table are for the five causal conditions in a causal combination,
and their entries are listed as 0 or 1, where a 0 occurs if the complement of the causal condition
appears in the causal combination, and a 1 appears if the causal condition appears in the causal
combination [e.g., (1,0,1,1,0) AbCDe ]. Observe that of the nine causal combinations only six
have consistency values ≥ 0,80 (the ones that are in bold face). It follows, therefore, that the
6
A
R actual rules are (the + denotes the union—OR—operation)
AbCDe abCdE ABCDe abcde abCde abcdE O .
It is very interesting to observe that causal condition ABCDE that had the most cases
supporting it—four— has now vanished from the analysis. Referring to Table 2.3, observe that
() MF O for Cases 2, 3, 12 and 18, which are the best instances of ABCDE, are 0.05, 0.11, 0.05
and 0.05, respectively, which suggests that the low MF of these cases in O may have led to the
demise of ABCDE.
{S
F
(2
k
),
S
Cases
(N)}
0.5 / f
0.8
QM
QM
CA
QM
0.8
fuzzy crisp fuzzy
S
F
S
(R
S
)
S
F
A
(R
A
)
S
F
PI
(R
C
)
S
F
MPI
(R
P
)
S
F
I
(R
I
)
S
F
SI
(R
SI
)
S
F
BSI
(R
BSI
)
20
2.3.8 Complex and Parsimonious Solutions (QM Algorithm)
It is quite possible that there are still too many rules, but now for a totally different reason than
before. When the
A
R actual rules are combined using the logical OR (disjunction) operation,
then, because all of these rules share the same consequent, they can be re-expressed, as:
12
IF ( ... ), THEN
A
A A A
R
F F F O , where each
A
m
F is the conjunction of one of the 2
k
combinations of the k original causal conditions, e.g.,
2 1 2 3 1
...
A
kk
F C c c C c ,
5 1 2 3 1
...
A
kk
F c C c c C , etc. It should be obvious to anyone who is familiar with set
theory that there can be a lot of redundancy in
12
...
A
A A A
R
F F F , e.g., if 3 k , 2
A
R ,
1 1 2 3
A
F c c c and
2 1 2 3
A
F c c C , then by using simple set theory calculations,
12
AA
FF
can be simplified, i.e.:
1 2 1 2 3 1 2 3 1 2 3 3 1 2
( ) ( ) ( ) ( )
AA
F F c c c c c C c c c C c c ,
because
3 3 3
cC and
3
3
( ) 1.
While it was easy to perform these set-theoretic computations by hand, it is difficult to do this
for larger values of k and
A
R . Instead, Ragin uses the Quine-McCluskey (QM) algorithm
1
to do
this automatically.
Step 8. Use the Quine-McCluskey (QM) algorithm to obtain
C
R complex solutions [prime
implicants
2
] and R
P
parsimonious solutions [minimal prime implicants
3
]: This is a mapping
from
A
F
S ,
S
F
F
SS and
SA
FF
SS into
PI
F
S and
MPI
F
S by two different applications of the QM
algorithm, i.e.
1
The Quine-McCluskey algorithm is used to minimize Boolean functions; see, e.g., Appendix A,
http://en.wikipedia.org/wiki/Quine-McCluskey_algorithm, or [125].
2
A prime implicant is a combination of primitive Boolean expressions that differ on only one cause and have the
same output.
3
Minimal prime implicants (also called essential prime implicants by Ragin [126]) cover as many of the primitive
Boolean expressions as possible with a logically minimal number of prime implicants. In other words (as pointed
out by a reviewer), it is a prime implicant that uniquely covers at least one truth table row and therefore must appear
in the minimal truth table solution. For an example of how minimal prime implicants are determined from prime
implicants, see [127, pp. 95–98]. Other examples can be found in [125].
21
2
1 1
:{ , , }
,
present
absent
absent
A S A S PI
k
C
A
S
SA
PI
F
F F F F F
R
PI PI
j n n PI
j n
F
F
F
FF
F
F F F QM
S S S S S S
S
SS
SS
(2.18)
2
1 1
:{ , , }
,
present
don't care
absent
A S A S MPI
k
P
A
S
SA
MPI
F
F F F F F
R
MPI MPI
j p p MPI
j p
F
F
F
FF
F
F F F QM
S S S S S S
S
SS
SS
(2.19)
Observation 2: QM
PI
leads to causal combinations
PI
n
F that usually contain fewer causal
conditions than the original k, and QM
MPI
leads to causal combinations
MPI
p
F that almost always
contain much fewer causal conditions than the original k.
Observation 3: In (2.19), “don’t care” means causal combinations that are either impossible
(i.e., there are no cases to support them) or for which there are fewer cases than the frequency
threshold. In QM
MPI
those don’t care combinations that lead to maximal simplifications of causal
combinations are chosen.
Observation 4: Ragin equates the prime implicants with a complex solution (linguistic
summarization, containing
C
R terms), the minimal prime implicants with a parsimonious
solution (linguistic summarization, containing
P
R terms), and interprets these two solutions as
the end-points of a countable continuum of solutions, where the intermediate solutions (linguistic
summarizations containing
I
R terms) have to be established using a methodology called
counterfactual analysis (CA) that is Step 9 below. He believes that the most useful linguistic
summarization is an intermediate summarization.
Example 2.3. The prime implicants for AbCDe abCdE ABCDe abcde abCde abcdE
(Example 2.2) are easy to obtain by first recognizing that:
()
()
abCdE abCde abCd e E abCd
abcde abcdE abcd e E abcd
(2.20)
22
Substituting (2.20) into the original six terms, it follows that:
( ) ( )
( )
AbCDe abCdE ABCDe abcde abCde abcdE
abCdE abCde abcde abcdE AbCDe ABCDe
abCd abcd AbCDe ABCDe
abCd abcd ACDe b B abCd abcd ACDe
(2.21)
So, the prime implicants are abCd, abcd and ACDe. Note that (2.21) can be further simplified
to
() abCd abcd ACDe abd c C ACDe abd ACDe (2.22)
These are exactly the same solutions that Ragin has obtained, and that are given on the bottom
of page 115 in [23], and that he refers to as the complex solutions.
The minimal prime implicants, found from the QM algorithm are a + e. These parsimonious
solutions (i.e., parsimonious solution terms) also agree with one ones that are given in [23].
In summary, the complex and parsimonious solutions are:
Complex solution ( )
Parsimonious solution ( )
abd ACDe O
a e O
(2.23)
In words, these solutions are:
( and (rural) and )
Complex solution OR ( and and and ),
are sufficient causal combinations for
Parsimon
Not developed not urban not industrial
Developed literate industrial not stable
Breakdown of Democracy
OR
ious solution
are sufficient conditions for
Not developed Not stable
Breakdown of Democracy
(2.24)
2.3.9 Intermediate Solutions (Counterfactual Analysis)
Recall that 2
k
S
R causal combinations were eliminated early on (Step 6) in fsQCA because
either there were too few cases to support them or their firing levels were below the threshold of
0.5. Counterfactual analysis (CA) begins with both the complex and parsimonious solutions and
modifies the complex solutions subject to the constraint that a parsimonious solution term must
always be present (in some form) in the final intermediate solutions. The modifications use
23
causal combinations for which there either were no cases or not enough cases, and require that
the user bring a lot of substantive knowledge about the cases into the modification process. Each
modified complex solution is called a counterfactual, and each counterfactual is usually less
complex in its structure than is the complex solution, unless the complex term does not change as
a result of CA, in which case it becomes the counterfactual. Once all of the counterfactuals have
been obtained for all of the complex terms, they are combined using the set theory operation
union. This result is called the (set of) intermediate solutions, and it contains R
I
terms.
CA offers a way to overcome the limitations of a lack of empirical instances, i.e. the problem
of limited diversity [31], and involves thought experiments in which the substantive knowledge
of a domain expert is used. Recall that diversity refers to whether or not a case actually exists for
a particular combination of causal conditions. In most applications it is very common for no
cases to exist for many combinations of causal conditions, and this is referred to as “limited
diversity.”
In the thought experiments one asks: Based on my expert knowledge, (1) Do I believe that
i
C
strongly influences the desired output? If the answer is YES, then stop, and
i
C is put on the list
of substantive knowledge. On the other hand, if the answer is NO or DON’T KNOW, then one
asks: (2) Is it, instead,
i
c that strongly influences the desired output? If the answer is YES, then
i
c is put on the list of substantive knowledge. If the answer is NO or DON’T KNOW, then
neither
i
C or
i
c are put on the list of substantive knowledge, i.e. the substantive knowledge is
silent about the causal condition or its complement.
The substantive knowledge cannot contradict the individual terms of the parsimonious
solution, e.g., if
i
C is in one term of the parsimonious solution, then
i
c cannot be used as
substantive knowledge for it; however,
i
c may be substantive knowledge for another term of the
parsimonious solution as long as that term does not contain
i
C .
Step 9. Use additional substantive knowledge, obtained from an expert, to perform CA on each
term of the complex solution (one at a time), but constrained by each term of the parsimonious
solution (one at a time), to obtain intermediate solutions: To begin, this substantive knowledge
is used (in thought experiments) to establish the presence or the absence of each causal condition
24
or its complement on the desired outcome. This is a transformation of
C
S into
() KC
S , where
K(C) denotes knowledge applied to C. Then
() KC
S is used to map (
PI
F
S ,
MPI
F
S ), by means of CA
into
I
F
S , the space of intermediate solutions that contains
I
R elements, i.e. ( 1,..., ik )
()
:
Either , or , or
C K C
K
i i i
C
C C c unknown
SS
(2.25)
()
1 1 1
: ( , , )
( ),{ } ,{ } { }
MPI PI I
C PI
I
KC
F F F
CA
R RR PI MPI I
n n p p q q
F
K C F F F
S S S S
(2.26)
CA is an algorithm that is implemented by the following pseudo-code:
1 Cycle on all parsimonious terms
2 Cycle on all complex terms
3 If complex term is subset of parsimonious term
4 Apply CA rules
Observation 5: The word “unknown” in (2.25) means no substantive knowledge is known
about the i
th
causal condition; hence, that causal condition is excluded from CA.
Observation 6: There are
P
R parsimonious term cycles and
C
R complex term cycles. The CA
rules that are used in the pseudo-code are given in Appendix C. An example that illustrates the
use of the CA rules is also given in that Appendix.
Observation 7: The number of causal conditions in
I
q
F is between the number that are in
PI
n
F
and
MPI
p
F , whereas the number of causal conditions in the original
j
F in (2.9) is k.
Example 2.4: Here we perform CA on the results of Example 2.3. Recall [see (2.23)] that the
complex terms are abd and ACDe and the parsimonious terms are a and e. As in [23] we shall
use the following substantive knowledge: ( ) { , , , , } K C a b c d unknown . There are four cycles
to CA. Rules 1-7 that are given in Appendix C are used.
25
Cycle 1 for parsimonious term a and complex term abd: a is contained in abd
and so we can perform CA for abd (Rule 1). abd is its own initial counterfactual (Rule 2).
This counterfactual flows to the final simplified counterfactual as follows:
() CA A
abd abd (Rule 4),
() CA B
abd abd (Rule 4),
() CA C
abd abd (Rule 5),
() CA D
abd abd (Rule 4), and
() CA E
abd abd (Rule 3). abd is its own counterfactual
for Cycle 1.
Cycle 2 for parsimonious term a and complex term ACDe: a is not in ACDe;
hence, CA is not performed for this term (Rule 1).
Cycle 3 for parsimonious term e and complex term abd: e is not in abd; hence,
CA is not performed for this term (Rule 1).
Cycle 4 for parsimonious term e and complex term ACDe: e is contained in
ACDe and so we can perform CA for ACDe (Rule 1). ACDe is its own initial counterfactual
(Rule 2). This counterfactual flows to the final simplified counterfactual as follows:
() CA A
ACDe CDe (Rule 6),
() CA B
CDe CDe (Rule 5),
() CA C
CDe De (Rule 6),
() CA D
De e (Rule 6), and
() CA E
ee (Rule 3). e is the simplified counterfactual of
ACDe for Cycle 4.
The union of the two counterfactuals, abd + e gives the intermediate solutions for the desired
outcome of Breakdown in Democracy.
2.3.10 Simplified Intermediate Solutions
Because CA leads to a new set of solutions, it is possible that their union can be simplified.
This is accomplished by subjecting the intermediate solutions to the QM algorithm in which the
remainders are set to absent.
26
Step 10. Perform QM on the
I
R intermediate solutions to obtain the
SI
R simplified
intermediate solutions
1
(this step is similar to Step 8): This is a mapping from
I
F
S and
I
F
F
SS
, but into
SI
F
S by yet another application of the QM algorithm, i.e.
2
1 1
:{ , }
present
,
absent
I I SI
k
I
SI
I
SI
F
F F F
R
F SI SI
j r r SI
j r
F
F
F
F F F QM
S S S S
S
SS
(2.27)
The firing level fuzzy sets for these simplified intermediate rules, that are denoted
SI
r
F , have
associated re-numbered membership functions ( ), 1,...,
SI
r
SI
F
x r R .
Observation 8: QM
SI
may lead to causal combinations
SI
r
F that contain fewer causal
conditions than are in
I
F .
Example 2.5: The intermediate solutions in Example 2.4 are abd + e. No further
simplifications are possible for this union; hence, the simplified intermediate solutions are also
abd + e. These solutions also agree with one ones that are given in [23]. Translating these
simplified intermediate solutions into words, we have the following:
(Not developed and not urban (rural) and not industrial) OR (unstable) are sufficient
causal combinations for Breakdown of Democracy.
2.3.11 Believable Simplified Intermediate Solutions
It is important to re-compute the consistencies of the
SI
R simplified intermediate
summarizations, because CA and QM make no use of the fuzzy nature of the causal conditions
and outcome, and so this connection back to fuzziness has to be re-established. It can happen, for
example, that one or more of the simplified intermediate solutions have a consistency that is well
below 0.80, in which case they are not to be believed. The end of all of this is a collection of
BSI
R believable simplified intermediate rules (solutions) for a desired outcome.
1
Ragin does not use different names for the intermediate and simplified intermediate solutions, and it was only in e-
mail to the first author that he mentioned his using QM to obtain the simplified intermediate solutions.
27
Step 11. Retain only those simplified intermediate solutions whose consistencies are
approximately greater than 0.80 (see footnote 23), the believable simplified intermediate
solutions: This is a mapping from , , }
SI
Cases
F
O {S S into
BSI
F
S , where
BSI
F
S is a subset of
SI
F
S with
BSI
R elements, i.e. ( 1,...,
SI
rR and 1,...,
BSI
sR ), i.e.
1
1
1
( , ) : , , } [0,1]
min ( ), ( )
( ), ( ) ( , )
()
SI
SI
r
SI
r
SI
r
SI
K r Cases
F
N
D
N
O
F x
D SI
O K r N F
x
F x
ss F O O
xx
x x ss F O
x
{S S
(2.28)
:[0,1]
( , ) = ( ) ( , ) 0.80, 1,...,
BSI
BSI
s
F
SI BSI SI SI
K r s r K r SI
F
ss F O F F r s ss F O r R
S
(2.29)
Example 2.6: The set theoretic consistencies of abd and e in Example 2.5 were computed to be
[using (2.28)] 0.886 and 0.902, respectively, both of which are greater than 0.80, so both
solutions are retained, and therefore abd and e are the believable simplified intermediate
solutions ( 2
BSI
R ). It is rather impressive that fsQCA has reduced 32 candidate causal-
combinations each comprised of five causal conditions to two causal combinations, one with
three causal conditions and the other with just one causal condition.
2.3.12 Best Instances
Prior to QM and CA it is easy to connect each case to a causal combination because each
surviving causal combination that has a MF value greater than 0.5 can be directly connected to a
case, and each actual causal combination whose consistency is greater than 0.8 can also be
directly connected to a case (e.g., see Table 2.3). Unfortunately, the causal combinations used in
QM are not revealed to the end-user because they are internal to the QM processing. Because CA
begins with the results from QM, CA also does not provide a direct connection to the causal
combinations. Consequently, after QM and CA it is no longer possible to directly connect cases
to each of the
BSI
R believable simplified intermediate solutions. Ragin establishes the best
instances for the believable simplified intermediate solutions through geometrical arguments that
28
are provided in Appendix D. The gist of those arguments is that best instances occur when
1,...,
max ( ) 0.50
BSI
s
BSI
F
sR
x and
1,...,
( ) max ( )
BSI
s
BSI
D
O
F
sR
xx , where is a small positive number that
accounts for the subjectivity of MF values. The “maximum” operation in these formulas is due to
the terms of the believable simplified intermediate solutions being connected by the word OR. In
this report we chose 0.05, but further research is needed in order to study the effects of d on
best instances.
Step 12. Connect each solution (believable simplified intermediate solutions and, if desired,
the complex and parsimonious solutions) with its best instances that are subsets of
Cases
S : We
illustrate this only for the intermediate solutions, since they are the most useful solutions.
Finding the best instances for each of the believable simplified intermediate solutions is a
mapping from , , }
BSI Cases
F
O {S S into ()
BeIn
s S ( 1,...,
BSI
sR ).
1
( ) : , , } ( )
( ), ( ) ( )
BSI
BSI
s
BeIn Cases BeIn
F
BeIn N
D
O BeIn
F
x
s O s
x x s
S {S S S
S
(2.30)
BeIn is an algorithm that is implemented by the following pseudo-code:
1 Cycle on all cases
2
Apply Ragin’s four-step procedure
for determining the Best Instances
The details for implementing Step 2 are in Appendix C.
Example 2.7: This is a continuation of Example 2.6 in which the believable simplified
intermediate solutions were found to be abd and e. According to Wagemann and Schneider [32]
it is a good practice to summarize the believable simplified intermediate solutions in a table such
as the one in Table 2.5. Observe that the Best Instances for abd are Cases 4, 8, and 11 (Estonia,
Greece and Italy), and the Best Instances for e are Cases 1, 7, 14 and 16 (Austria, Germany,
Portugal and Spain).
29
Table 2.5
Calculations and summary of best Instances (BI) for the believable simplified intermediate solutions. BI procedure
is given in Appendix D.
Case
m
abd
(x)
m
e
(x)
max(m
abd
(x),
m
e
(x)) º m *
m* ³ 0.50?
m
O
D
(x)
m
O
D
(x) ³ m *
- 0.05?
This case is
a BI for?
1 0.09 0.57 0.57 YES 0.95 YES e
2 0 0.02 0.02 NO 0.05 –––––– ––––––
3 0.02 0.09 0.09 NO 0.11 –––––– ––––––
4 0.84 0.09 0.84 YES 0.88 YES abd
5 0.42 0.42 0.42 NO 0.23 –––––– ––––––
6 0.02 0.05 0.05 NO 0.05 –––––– ––––––
7 0.04 0.69 0.69 YES 0.95 YES e
8 0.64 0.57 0.64 YES 0.94 YES abd
9 0.84 0.87 0.87 YES 0.58 NO ––––––
10 0.28 0.05 0.28 NO 0.08 –––––– ––––––
11 0.53 0.42 0.53 YES 0.95 YES abd
12 0 0.01 0.01 NO 0.05 –––––– ––––––
13 0.83 1 1 YES 0.88 NO ––––––
14 0.89 0.99 0.99 YES 0.95 YES e
15 0.97 0.16 0.97 YES 0.79 NO ––––––
16 0.70 0.80 0.80 YES 0.94 YES e
17 0.05 0.09 0.09 NO 0.05 –––––– ––––––
18 0 0.02 0.02 NO 0.05 –––––– ––––––
2.3.13 Coverage
Coverage is an assessment of the way respective terms in the believable simplified
intermediate solution
1
“cover” observed cases [23]. Ragin [26] mentions three kinds of coverage
and Rihoux and Ragin [23, p. 64] define them as: (1) solution coverage, C
s
, which is the
proportion of cases that are simultaneously covered by all of the terms (combined by the union);
(2) raw coverage, C
r
, which is the proportion of cases that are covered by each term one at a
time; and, (3) unique coverage, C
u
, which is the proportion of cases that are uniquely covered by
a specific term (no other terms cover those cases). Each measure of coverage provides a different
insight into the believable simplified intermediate solutions.
Presently, there is no threshold for coverage, as there is for consistency, because, in general,
coverage is only used descriptively, although sometimes Ragin uses it to exclude a solution.
Consequently, there are no guidelines given regarding what is “good coverage” because
coverage depends on the nature of the evidence. In this chapter we focus only on raw coverage,
because it is in close agreement with all of the steps of fsQCA that focus on each sufficient
causal combination separately (however, a formula for solution coverage is provided in (2.33)).
1
Coverage can also be computed for the complex and parsimonious solutions.
30
From fuzzy set theory [33] the number of cases covered by a single fuzzy set is a simple
summation of membership scores in that fuzzy set; and, the number of cases simultaneously
covered by two fuzzy sets is the size of the overlap of the two fuzzy sets. Consequently, if one
wants to compute the coverage of T1 FS D in T1 FS E, ( , ) C D E , then one computes (this is also
called the sigma count of D in E)
1
1
min( ( ), ( ))
( , )
()
N
D i E i
i
N
Ei
i
xx
C D E
x
(2.31)
Although the numerator of this coverage is identical to the numerator in the formula for
consistency in (2.15) the denominator of (2.31) is different. (2.31) is the raw coverage formula
that is used by Ragin.
Step 13. Compute the raw coverage of each solution: This is a mapping from
BSI
F
S , O and
Cases
S into [0,1] ( 1,...,
BSI
sR ), i.e.
1
1
1
: ( , , ) [0,1]
min ( ), ( )
( ), ( ) ( , )
()
BSI
BSI
s
BSI
s
r Cases
F
N
D
N
O
F x
D BSI
O r s N F
D
x
O
x
CO
xx
x x C F O
x
SS
(2.32)
Observation 9: The believable simplified intermediate solutions often contain several terms,
F
l
BSI
(
l = 1,..., R
BSI
), connected by the logical OR (modeled by the maximum). We shall refer to
the union of the terms of the believable simplified intermediate solutions as the composite
solution, and denote it as
F
BSI
. The firing level of
F
BSI
is the maximum of the firing levels of
each of its terms. Consequently, solution coverage, C
s
, which is the proportion of cases that are
covered by all of the terms in
F
BSI
, is obtained from (2.31), as:
C
s
(F
BSI
,O) =
min max
i
m
F
i
BSI
(x)
( )
,m
O
(x)
( )
x=1
N
å
m
O
(x)
x=1
N
å
(2.33)
31
Example 2.8. This is also a continuation of Example 2.6 in which the believable simplified
intermediate solutions were found to be abd and e. Using ()
O
x , ()
abd
x and ()
e
x that are
given in Table 2.5, it is straightforward to compute ( , ) 0.678
r
C abd O and ( , ) 0.657
r
C e O .
2.3.14 Summary
The results of fsQCA for each desired outcome O are given by
1
1
, ( ), ( , ), ( , )
BSI
R
BSI BSI BSI
s BeIn r s K s
s
F s C F O ss F O S (2.34)
( , )
BSI
Ks
ss F O provides the degree of subsethood for
BSI
s
F , a number that is between 0.80 and 1.
Usually, ( , ) 1
BSI
Ks
ss F O , and, the results should be organized in decreasing order of
( , )
BSI
Ks
ss F O .
As already mentioned, Steps 6–11 of fsQCA are summarized in the relatively easy-to-
remember Fig. 2.2. The emphasis in this diagram is on the reduction of causal combinations from
2
k
to R
BSI
. Also shown are the “fuzzy” and “crisp” computations. In addition, we provide a
flowchart for fsQCA in Figs. 2.4 and 2.5. Because fsQCA involves many operations, each with
its own dummy variable, the space and its dummy variable have been collected into Table 2.6.
1
Other coverage values may also be included in (2.34).
32
Fig. 2.4. Flowchart for fsQCA; it is continued in Fig. 2.5.
Fig. 2.5. Flowchart for fsQCA, continued.
No
Yes
N
F
i
> f ?
A
Compute N
F
i
Make the
Causal Combination
a Remainder
Compute
Derived MFs
N Data Cases
Choose Desired Outcome
Postulate k Causal Conditions
Obtain MFs
R
S
Firing-Level Surviving Rules ( )
( )
m
F
i
S (x), i = 1,..., R
S
Create 2
k
Candidate Rules
Compute Firing Levels for
2
k
Candidate Rules
S
F
- S
F
S
S
F
S (R
S
)
S
F
(2
k
)
1
1
3
4
5
6
2
6
6
6
8
QM Algorithm
Prime Implicants
Complex Summarizations
Counterfactual
Analysis
Minimal Prime Implicants
Parsimonious Summarizations
Intermediate Summarizations
Compute Consistency,
Best Instances and
Coverages
Simplified Intermediate
Summarizations
QM Algorithm
Believable Simplified
Intermediate
Summarizations
A
Compute R
S
Subsethoods
≥ 0.80 ?
No
Yes
R
A
Actual Rules ( )
Derived MF for
Desired Outcome
S
F
S (R
S
)
S
F
S
- S
F
A
S
F
A
(R
A
)
S
F
- S
F
S
S
F
PI
(R
C
)
S
F
MPI
(R
P
)
S
F
I
(R
I
)
S
F
SI
(R
SI
)
S
F
BSI (R
BSI
)
S
F
- S
F
I
7
7
7
8
9
10
11-13
k Causal
Conditions
Substantive
Knowledge
Desired
Outcome
2 1
33
Table 2.6
Dummy variables and their ranges for the fsQCA spaces
Space
Dummy
variable
Range Space
Dummy
variable
Range
S
O
w 1,...,n
O
S
F
PI
n 1,..., R
C
S
¢ C
¢ i 1,..., n
¢ C
S
F
MPI
p 1,..., R
P
S
C
i 1,..., k
S
F
I
q 1,..., R
I
S
Cases
x 1,..., N
S
F
SI
r 1,..., R
SI
S
F
j
1,...,2
k
S
F
BSI
s 1,..., R
BSI
S
F
S
l 1,..., R
S
S
BeIn
(s) s 1,..., R
BSI
S
F
A
m 1,..., R
A
2.4. Comparisons of Linguistic Summarization Methods That Use Fuzzy Sets
In this section comparisons are provided between fsQCA and two other methods that use fuzzy
sets for linguistic summarization, namely Yager, Kacprzyk, Zadrożny, et al.’s summarizers (see
Paragraph 2 in Section 2.1 for the references) and Wu and Mendel’s [19], [20] if-then rules. This
is needed so that it is clear how fsQCA differs from those methods. Our comparisons are given in
Table 2.7, which has been organized according to the chronology in which we have presented
fsQCA. The comparisons should be self-explanatory, and so no additional comments about them
are provided.
Note that fsQCA is not meant to be a replacement for these existing linguistic summarization
methods; it is meant to provide a different kind of if-then linguistic summarization, one that
establishes the rules from data that are available about cases.
One may also wonder about a comparison between fsQCA and the if-then rules that are
obtained from the Wang-Mendel (WM) method [34]. Because the WM-rules are predictive
whereas the fsQCA rules are descriptive, the two are non-competitive and are therefore not
compared in this proposal.
34
Table 2.7
Comparisons of Three Linguistic Summarization (LS) Methods that use fuzzy sets
Does the LS Method
Include
Linguistic Summarization Method
fsQCA
Yager, Kacprcyk, Zadrozny, et
al.’s Summarizers
Wu & Mendel’s
If-then rules
Focusing on a
specific desired
outcome?
Yes (fsQCA must be
repeated for each such
outcome and usually its
complement, one at a
time)
Yes and No [one or more
desired outcomes
(summarizers) are chosen, but
each can have more than one
linguistic term associated with
it; summaries are pre-
established for all of these
linguistic terms]
Yes and No (a desired outcome
is chosen, and is the consequent
of the if-then rules; it has m
O
linguistic terms associated with
it, and rules are pre-established
for all of these linguistic terms)
Pre-establishing a
library (codebook) of
summaries?
No Yes Yes
A specific structure
for the summaries?
Yes (summaries are
multiple-antecedent if-
then rules, all for the
same consequent term;
rules connected by OR)
Yes (there are two canonical
forms for the summaries
a
;
multiple summary
connections are not specified
explicitly
b
)
Yes (summaries are multiple
antecedent if-then rules, for all
of the consequent terms;
multiple summary connections
are not specified explicitly
b
)
Pre-chosen
summarizers?
Yes (called causal
conditions)
Yes (called summarizers) Yes (called antecedents)
A summarizer that is
described by more
than one linguistic
term?
Yes (each linguistic term
is considered to be a
separate causal condition
in the same causal
combination of the
causal conditions)
Yes (usually each of the k
summarizers is described by
m
i
terms; but, each of these
terms is not thought of as a
separate summarizer in the
same summary, i.e. there is
only one linguistic term per
summarizer for each
summary)
Yes (usually each of the k
antecedents is described by m
i
linguistic terms; but, each of
these linguistic terms is not
thought of as a separate
antecedent in the same rule, i.e.
there is only one linguistic term
per antecedent for each rule)
The complement of a
summarizer?
Always (both the causal
condition and its
complement are used)
Usually not Never
The concept of
combinations of
summarizers?
Yes (each is called a
causal combination in
which the causal
conditions are connected
by AND; also, called
conjunctural causation)
Yes (multiple summarizers are
connected by AND, but they
do not constitute a causal
combination in the sense of
fsQCA)
Yes (multiple antecedents are
connected by AND, and they
constitute a causal combination
in the sense of fsQCA)
A collection of 2
k
summaries that are
constructed from k
summarizers or their
complements?
Yes (to begin, there are
2
k
candidate causal
combinations—rules)
No (if there are k summarizers
and each is described by the
same number of m linguistic
terms, then there can be m
k
summarizations in the library
of pre-established summaries)
No (if there are k antecedents
and each is described by the
same number of m linguistic
terms, then there can be m
k
rules
in the library of pre-established
summary rules)
a
The two forms are: (1) Q (a linguistic quantifier) objects from a given database are/have summarizers 1 through N at truth level T
(e.g., Many automobile models have heavy weight and low MPG [T = 0.60]); and, (2) Q (a linguistic quantifier) objects from a
given database with a pre-specified linguistically-qualified summarizer are/have additional summarizers 1 through N-1 at truth level
T (e.g., Many automobile models with heavy weights have low MPG [T = 0.58]).
b
“Multiple summary connections are not specified explicitly” means that when more than one summarizer (or if-then rule) is used,
it is not specified if the summarizers (or if-then rule) are connected by the words OR, AND, or ELSE.
35
Table 2.7 (Continued)
Does the LS Method
Include
Linguistic Summarization Method
fsQCA
Yager, Kacprcyk, Zadrozny, et
al.’s Summarizers
Wu & Mendel’s
If-then rules
Interpreting a
candidate
summarization as a
corner in a vector
space?
Yes (with 2
k
dimensions) No No
Removing a subset of
the candidate
summarizations based
on computing firing
levels and a frequency
threshold test?
Yes No No
Computing
subsethood?
Yes (called consistency)
Yes (called truth level; the
formulas for truth level and
consistency are the same)
Yes (called truth level; the
formulas for truth level and
consistency are the same)
Discarding additional
candidate
summarizations based
on a subsethood
threshold?
Yes (usually the threshold
is ≥ 0.80)
No (in addition to truth level,
other summarization measures
are computed, with the objective
usually being to choose a best
summarization)
No (in addition to truth level, other
summarization measures are
computed, with the objective
usually being to choose a best
summarization)
Further processing?
Yes (QM algorithm used
to compute prime and
minimal prime implicants,
after which
Counterfactual Analysis is
performed, after which a
QM algorithm is again
used)
No No
Accounting for limited
diversity?
Yes (done during
Counterfactual Analysis,
by means of thought
experiments and using the
substantive knowledge of
an expert)
No No
Multiple summaries
for the same outcome?
Yes (this occurs
automatically, and is
called equifinal
causation—equifinality)
Maybe (it depends on the user,
and occurs only if the user
decides he/she wants more than
one summary; the user must
choose how many summaries
and the structure of the
summaries)
Maybe (it depends on the user, and
occurs only if the user decides
he/she wants more than one
summary; the user must choose
how many summaries and the
structure of the summaries)
A direct connection to
best instances?
Yes No No
Collections of
summaries?
Yes (ranging from most
complex, to intermediate
to parsimonious;
intermediate summaries
are considered to be the
most useful ones)
No (no reason why they could
not be obtained, but this would
require creating a library of
summarizations with numbers of
summarizers ranging from 1 to
k)
No (no reason why they could not
be obtained, but this would require
creating a library of rules with
numbers of antecedents ranging
from 1 to k)
36
2.5. Conclusions
As is stated in the Introduction, fsQCA is explained, for the first time, in a very quantitative
way, something that is not found in the existing literature, and something that is needed if
engineers and computer scientists are to use fsQCA.
Stepping back from the details, fsQCA for sufficient conditions—each of which is a linguistic
summarization—involves the following steps: (1) Choose a desired outcome; (2) Choose k
causal conditions (if a condition is described by more than one term, treat each term as an
independent causal condition); (3) Treat the desired outcome and causal conditions as fuzzy sets,
and determine MFs for all of them; (4) Evaluate these MFs for all available cases, the results
being derived MFs; (5) Create 2
k
candidate causal combinations (rules) and view each as a
possible corner in a 2
k
-dimensional vector space; (6) Compute the MF of each of the 2
k
candidate causal combinations in all of the available cases, and keep only the ones—the
S
R
surviving causal combinations (firing-level surviving rules)—whose MF values are > 0.5, i.e.,
keep the causal combinations that are closer to corners and not the ones that are farther away
from corners; (7) Compute the consistencies (subsethoods) of these R
S
surviving causal
combinations, and keep only those
A
R causal combinations—the actual causal conditions (actual
rules)—whose consistencies are > 0.80; (8) Use the QM algorithm to obtain prime implicants
(the complex solutions) and minimal prime implicants (the parsimonious solutions); (9) Use
substantive knowledge from an expert to perform Counterfactual Analysis on the complex
solutions, constrained by the parsimonious solutions, to obtain the intermediate solutions; (10)
Perform QM on the intermediate solutions to obtain the simplified intermediate solutions; (11)
Retain only those simplified intermediate solutions whose consistencies are approximately
greater than 0.80, the believable simplified intermediate solutions; (12) Connect the believable
simplified intermediate solutions (this can also be done for the complex and parsimonious
solutions) with its best instances; and, (13) Compute the coverage of each solution.
37
Chapter 3
Theoretical Aspects of fsQCA
1
3.1. Introduction
We have spent more than five years (beginning in September 2009) studying fsQCA, and with
the very generous help of Prof. Ragin, have been able to summarize it mathematically as a
collection of 13 steps in Chapter 2. This description of fsQCA cannot be found in Ragin’s works,
and is essential if fsQCA is to be used by engineers and computer scientists.
Having been able to summarize fsQCA mathematically, it is now possible to study some of its
key steps in order to better understand them and to even enhance them. This is all done in the rest
of this chapter, which is from our paper “Theoretical aspects of fuzzy set qualitative comparative
analysis (fsQCA)” [35]. More specifically, Section 3.2 provides a more mathematical, but brief,
description of key Steps 1-7; Section 3.3 provides a new way to find the R
S
surviving causal
combinations in Step 6—a way that greatly reduces the number of causal combination MF
computations; Section 3.4 explains how Steps 5 and 6 can be modified so that fsQCA is greatly
speeded up, the result being Fast fsQCA; Section 3.5 shows there are fast ways to perform fsQCA
once it has already been performed for k causal conditions, and one wants to then either remove or
add some causal conditions; Section 3.6 provides a new recursive (in the cases) way to compute
consistency in Step 7; Section 3.7 uses the results from Section 3.6 to study when it is possible for
a causal combination to be obliterated by cases that are not associated (or are associated very
weakly) with a desired outcome, indicating that fsQCA is quite case-dependent; Section 3.8
examines the question Do all of the
2
k
candidate causal combinations postulated in Steps 5 or
5NEW exist?, and proves that if a variable is described by more than one term (e.g., Low and
High), the answer is NO.
1
This chapter is a duplication of our accepted 2013 INS paper [35].
38
3.2. A Brief Quantitative Summary of Steps 1-7 of fsQCA
In this section we provide a more quantitative description of the first seven steps of fsQCA
because this is needed in order to understand the rest of this chapter. The seven steps are:
Step 1. Choose a desired outcome and its appropriate cases: Let
S
O
be the finite space of
possible outcomes, O
w
, for a specific application, i.e.
S
O
= {O
w
, w = 1,...,n
O
} . The desired
outcome is O, where
O ÎS
O
. fsQCA focuses on one outcome at a time, and each fsQCA is
independent of the others.
Let
S
Cases
be the finite space of all appropriate cases (x) that have been labeled 1, 2, …, N, i.e.
S
Cases
= {1,2,..., N}. Chapter 2 provides some discussions about how to choose such cases.
Step 2. Choose k causal conditions (if a variable is described by more than one term, treat
each term as an independent causal condition): Let
S
¢ C
be the finite space of all possible causal
conditions, ¢ C
¢ i
, for the specific application, i.e.
S
¢ C
= { ¢ C
¢ i
, ¢ i = 1,...,n
¢ C
}. A subset of the ¢ C
¢ i
possible causal conditions,
S
C
, is chosen whose elements are re-numbered 1,2,...,k , i.e.
S
C
= {C
i
, i = 1,...,k} ' "C
i
ÎS
¢ C
(3.1)
For a specific application, one may choose different
S
C
and then perform independent
fsQCAs for each of them.
Step 3. Treat the desired outcome and causal conditions as fuzzy sets and determine MFs for
them ([29] provide ways to obtain the MFs; however, the exact way the MFs are obtained is
explain in Chapter 5), i.e.
m
O
:W Í ® [0,1]
w m
O
(w )
(3.2)
m
C
i
:X
i
Í ® [0,1]
x
i
m
C
i
(x
i
)
ü
ý
ï
þ
ï
i = 1,..., k (3.3)
Step 4. Evaluate the MFs for all N appropriate cases, the results being the derived MFs, i.e.
( x =1,..., N and i =1,...,k )
39
m
O
D
:(S
Cases
,S
O
) ® [0,1]
x w(x) m
O
(w(x)) º m
O
D
(x)
ü
ý
ï
þ
ï
(3.4)
m
C
i
D
:(S
Cases
,S
C
) ® [0,1]
x x
i
(x) m
C
i
(x
i
(x)) º m
C
i
D
(x)
ü
ý
ï
þ
ï
(3.5)
It is the derived MFs that are used in the remaining steps of fsQCA.
Step 5. Create 2
k
candidate causal combinations (rules) and view each as a corner in a 2
k
-
dimensional vector space: Let
S
F
be the finite space of 2
k
candidate causal combinations, called
(by us) firing level fuzzy sets, F
i
, that are given in (3.6), where (using Ragin’s notation) c
i
denotes the complement of C
i
( j = 1,...,2
k
and i = 1,...,k ):
S
F
= {F
1
,..., F
2
k
} ' F
j
= A
1
j
Ù A
2
j
Ù ... Ù A
k
j
A
i
j
= C
i
or c
i
ü
ý
ï
þ
ï
(3.6)
where Ù denotes conjunction and is modeled using minimum.
Step 6. Compute the MF of each of the 2
k
candidate causal combinations in all the
appropriate cases, and keep only the ones—the R
S
surviving causal combinations (firing level
surviving rules)—whose MF values are > 0.5 for an adequate number of cases (this must be
specified by the user), i.e., keep the adequately represented causal combinations that are closer
to corners and not the ones that are farther away from corners. This is a mapping from
{S
F
,S
Cases
} into
S
F
S
that makes use of m
A
i
j
(x), where
S
F
S
is a subset of
S
F
, with R
S
elements,
i.e. ( j = 1,...,2
k
, x = 1,..., N and l = 1,..., R
S
)
m
F
j
:(S
F
,S
Cases
) ® [0,1]
x m
F
j
(x) = min m
A
1
j
(x),m
A
2
j
(x),..., m
A
k
j
(x)
{ }
ü
ý
ï
þ
ï
(3.7)
m
A
i
j
(x) = m
C
i
D
(x) or m
c
i
D
(x) = 1- m
C
i
D
(x) i = 1,...,k (3.8)
40
t
F
j
:([0,1],S
Cases
) ® {0,1}
x t
F
j
(x) =
1 if m
F
j
(x) > 0.50
0 if m
F
j
(x) £ 0.50
ì
í
ï
î
ï
ü
ý
ï
ï
þ
ï
ï
(3.9)
N
F
j
:{0,1} ® I
t
F
j
N
F
j
= t
F
j
(x)
x=1
N
å
ü
ý
ï
þ
ï
(3.10)
F
l
S
:(S
F
, I) ® S
F
S
F
j
F
l
S
= F
j
( j ® l) N
F
j
³ f , j = 1,...,2
k
{ }
ü
ý
ï
þ
ï
(3.11)
In (3.11) f is an integer frequency threshold that must be set by the user (e.g., [22], [21, p. 197]).
The firing levels for these R
S
surviving rules are denoted F
l
S
with associated re-numbered
membership functions m
F
l
S
(x), l = 1,..., R
S
.
Step 7. Compute the consistencies (subsethoods) of the R
S
surviving causal combinations,
and keep only those R
A
causal combinations—the actual causal combinations (actual rules) —
whose consistencies are greater than 0.80. This is a mapping from
{S
F
S
,O,S
Cases
} into
S
F
A
,
where
S
F
A
is a subset of
S
F
S
, with R
A
elements, i.e. ( l = 1,..., R
S
and m = 1,..., R
A
):
ss
K
(F
l
S
,O) :{S
F
S
,O,S
Cases
} ® [0,1]
m
F
l
S
(x),m
O
D
(x)
{ }
x=1
N
ss
K
(F
l
S
,O) =
min m
F
l
S
(x),m
O
D
(x)
( )
x=1
N
å
m
F
l
S
(x)
x=1
N
å
ü
ý
ï
ï
þ
ï
ï
(3.12)
F
m
A
:[0,1] ® S
F
A
ss
K
(F
l
S
,O) F
m
A
= F
l
S
(l ® m) ss
K
(F
l
S
,O) ³ 0.80,l = 1,..., R
S
{ }
ü
ý
ï
þ
ï
(3.13)
The firing levels for these actual rules, that are denoted F
m
A
, have associated re-numbered
membership functions m
F
m
A
(x), m = 1,..., R
A
, and can be expressed [from (3.13)] (the superscript
A in each A
i
A,m
denotes “actual”), as:
, , ,
12
...
A A m A m A m
mk
F A A A (3.14)
41
With this as a brief summary of Steps 1-7 of fsQCA, it is now possible to study some of these
key steps mathematically in order to better understand them and to even enhance them. We do
this below in Sections 3-8.
3.3. From 2
k
N to N Firing-Level Computations
Fiss [22, p. 402] states: “… the first step
1
[of fsQCA] is… to construct a data matrix known as
a truth table with
2
k
rows … . Each row of this table is associated with a specific combination of
attributes [a causal combination], and the full table thus lists all possible combinations.”
Enumerating all of the
2
k
causal combinations (used in Step 5) is very tedious, and displaying
such a table for even a modest number of cases is difficult-to-impossible.
Additionally, Step 6 may be a computational bottleneck because it requires 2
k
N
computations of m
F
j
(x) in (3.7). N is the number of cases, and in some applications it may be
very large, e.g., time series with N = 10
6
time points. The 2
k
multiplier makes using such large
N in fsQCA either impractical or impossible. Even for modest k (e.g., k = 8) and N (e.g., N =
200), 2
k
N is large (e.g., 51,200).
Ragin [21, p. 131] observed the following in an example with four causal conditions: “… each
case can have (at most) only a single membership score greater than 0.5 in the logical possible
combinations from a given set of causal conditions.” We have found this observation to be true
in general and the following theorem locates the one causal combination for each case whose MF
> 0.5.
Theorem 3.1 (Min-max Theorem): Given k causal conditions, C
1
, C
2
,…, C
k
and their
respective complements, c
1
, c
2
, …, c
k
. Consider the 2
k
causal combinations ( j = 1,...,2
k
)
12
...
j j j
jk
F A A A where A
i
j
= C
i
or c
i
and i = 1,...,k . Let
m
F
j
(x) = min{m
A
1
j
(x),m
A
2
j
(x),..., m
A
k
j
(x)}, x =1,2,..., N (3.15)
1
This is our Step 5.
42
Then for each x (case) there is only one j, j*(x), for which m
F
j*( x )
(x) > 0.5 and m
F
j*( x )
(x) can be
computed as:
m
F
j*( x )
(x) = min max m
C
1
D
(x),m
c
1
D
(x)
( )
,...,max m
C
k
D
(x),m
c
k
D
(x)
( ) { }
(3.16)
F
j*(x)
(x) is determined from the right-hand side of (16), as:
11
*( )
*( ) *( )
1
( ) max ( ), ( ) ... max ( ), ( )
...
kk
D D D D
j x C c C c
j x j x
k
F x arg x x arg x x
AA
(3.17)
In (3.17), argmax m
C
i
D
(x),m
c
i
D
(x)
( )
denotes the winner of max m
C
i
D
(x),m
c
i
D
(x)
( )
, namely C
i
or c
i
.
Proofs of all theorems and corollaries are in Appendix A.
Proof: See Section A.1 in Appendix A.
It is important to note that: (1) (3.17) was obtained without having to enumerate the 2
k
causal
combinations; and, (2) this theorem reduces the number of computations of m
F
j
(x) in (3.15) (in
Step 6) from 2
k
N to N, which is a very substantial reduction.
Example 3.1. In order to illustrate the computations from the candidate rules to the subset of
firing-level surviving rules, we consider a simplified Auto MPG
1
application for which we chose
the desired outcome to be O = Low MPG cars (from 14 four-cylinder automobiles
2
). A more
complete version of this example appears in [29]. We selected three input variables, namely,
Horsepower (H), Weight (W) and Acceleration (A), and two terms (Low and High) for each
variable; hence, there are six causal conditions: L
H
= Low Horsepower, H
H
= High Horsepower,
L
W
= Light Weight, H
W
= Heavy Weight, L
A
= Low Acceleration and H
A
=High Acceleration. The
data in Table 3.1 show the variables and derived MFs. How these MF values were actually
obtained is explained in [29].
1
The MPG data set can be obtained at: http://archive.ics.uci.edu/ml/datasets/Auto+MPG.
2
The numbered cases correspond to the following cars: 1-Toyota Corona Mark II, 2-Datsun pl510 (70), 3-
Volkswagen 1131 Deluxe Sedan, 4-Peugeot 504, 5-Audi 100 LS, 6-Saab 99e, 7-BMW 2002, 8-Datsun pl510 (71),
9-Chevrolet Vega 2300, 10-Toyota Corona, 11-Chevrolet Vega (sw), 12-Mercury Capri 2000, 13-Opel 1900, 14
Plymouth Cricket. These cars all had MF values greater than zero in Low MPG cars.
43
Table 3.1
Data- and fuzzy-membership-matrix (showing original variables and their derived fuzzy-set membership function scores)
Case
Outcome Causal Condition and Derived MF scores
MPG MF(L
MPG
) H MF(L
H
) MF(H
H
) W MF(L
W
) MF(H
W
) A MF(L
A
) MF(H
A
)
1 24 0.99 95 0 0.91 2372 0 0.08 15 0.92 0
2 27 0.44 88 0 0.12 2130 0.31 0 14.5 1 0
3 26 0.74 46 1 0 1835 1 0 20.5 0 1
4 25 0.93 87 0 0.06 2672 0 0.97 17.5 0 0.06
5 24 0.99 90 0 0.33 2430 0 0.22 14.5 1 0
6 25 0.93 95 0 0.91 2375 0 0.08 17.5 0 0.06
7 26 0.74 113 0 1 2234 0 0 12.5 1 0
8 27 0.44 88 0 0.12 2130 0.31 0 14.5 1 0
9 28 0.17 90 0 0.33 2264 0 0 15.5 0.63 0
10 25 0.93 95 0 0.91 2228 0.01 0 14 1 0
11 22 1 72 0.75 0 2408 0 0.16 19 0 0.8
12 23 1 86 0 0.02 2220 0.02 0 14 1 0
13 28 0.17 90 0 0.33 2123 0.35 0 14 1 0
14 26 0.74 70 0.89 0 1955 0.99 0 20.5 0 1
For six causal conditions there are
6
2 64 causal combinations whose MFs have to be
evaluated for all 14 cases. Using the min-max formulas from Theorem 3.1 we found the winning
causal combination for each case. These results are summarized in Table 3.2.
Table 3.2
Min-max calculations and associated causal combinations
Case
Maximum (MF, complement of MF)/Winner (W) Minimum
calculation
[Using (3.16)]
Causal
combination
[Using (3.17)]
Max(L H,l H)
/W
Max(H H,h H)
/W
Max(L W,l W)
/W
Max(H W,h W)
/W
Max(L A,l A)
/W
Max(H A,h A)
/W
1 1/ l H 0.91/ H H 1/ l W 0.92/ h W 0.92/ L A 1/ h A 0.91 l HH Hl Wh WL Ah A
2 1/ l H 0.88/ h H 0.69/ l W 1/ h W 1/ L A 1/ h A 0.69 l Hh Hl Wh WL Ah A
3 1/ L H 1/ h H 1/ L W 1/ h W 1/ l A 1/ H A 1 L Hh HL Wh Wl AH A
4 1/ l H 0.94/ h H 1/ l W 0.97/ H W 1/ l A 0.94/ h A 0.94 l Hh Hl WH Wl Ah A
5 1/ l H 0.67/ h H 1/ l W 0.78/ h W 1/ L A 1/ h A 0.67 l Hh Hl Wh WL Ah A
6 1/ l H 0.91/ H H 1/ l W 0.92/ h W 1/ l A 0.94/ h A 0.91 l HH Hl Wh Wl Ah A
7 1/ l H 1/ H H 1/ l W 1/ h W 1/ L A 1/ h A 1 l HH Hl Wh WL Ah A
8 1/ l H 0.88 h H 0.69/ l W 1/ h W 1/ L A 1/ h A 0.69 l Hh Hl Wh WL Ah A
9 1/ l H 0.67/ h H 1/ l W 1/ h W 0.63/ L A 1/ h A 0.63 l Hh Hl Wh WL Ah A
10 1/ l H 0.91/ H H 0.99/ l W 1/ h W 1/ L A 1/ h A 0.91 l HH Hl Wh WL Ah A
11 0.75/ L H 1/ h H 1/ l W 0.84/ h W 1/ l A 0.8/ H A 0.75 L Hh Hl Wh Wl AH A
12 1/ l H 0.98/ h H 0.98/ l W 1/ h W 1/ L A 1/ h A 0.98 l Hh Hl Wh WL Ah A
13 1/ l H 0.67/ h H 0.65/ l W 1/ h W 1/ L A 1/ h A 0.65 l Hh Hl Wh WL Ah A
14 0.89/ L H 1/ h H 0.99/ L W 1/ h W 1/ l A 1/ H A 0.89 L Hh HL Wh Wl AH A
Observe from the last column in this table that only six out of the 64 possible causal
combinations have survived, namely:
l
H
H
H
l
W
h
W
L
A
h
A
, l
H
h
H
l
W
h
W
L
A
h
A
, L
H
h
H
L
W
h
W
l
A
H
A
, l
H
h
H
l
W
H
W
l
A
h
A
, l
H
H
H
l
W
h
W
l
A
h
A
, L
H
h
H
l
W
h
W
l
A
H
A
.
44
Although it is tempting to linguistically simplify these causal combinations at this point (e.g.,
l
H
H
H
® H
H
, L
A
h
A
® L
A
, etc.) this is not done until the very end of fsQCA (see Example 3.5 in
Section 3.6).
3.4. Fast fsQCA
In Fast fsQCA, Steps 1-4 of fsQCA are unchanged, Steps 5 and 6 are changed and Steps 7-13
are also unchanged. Unlike fsQCA, where all 2
k
candidate causal combinations must actually be
established and enumerated in Step 5, so that their MF values can be computed in Step 6 for all
cases, which requires 2
k
N computations, Fast fsQCA does not require that those 2
k
candidate
causal combinations actually be established or enumerated; it only requires that the concept of
those causal combinations be established (because the rest of fsQCA is in terms of such causal
combinations). It is not until (3.21) in Step 6New below that a specific subset, with R
S
elements
from all of the 2
k
candidate causal combinations, is computed for all N appropriate cases. This
requires approximately NR
S
computations, which is a vastly smaller number than 2
k
N , which is
why this is called “Fast fsQCA.” The new Steps 5 and 6 are:
Step 5New. Conceptually, create 2
k
candidate causal combinations (rules) and view each as a
corner in a 2
k
- dimensional vector space. Let
S
F
be the finite space of 2
k
candidate causal
combinations, called (by us) firing level fuzzy sets, F
j
, as in (3.6).
Step 6New. Compute the R
S
surviving causal combinations (firing level surviving rules),
whose MF values are > 0.5 for an adequate number of cases (this must be specified by the user),
i.e., keep the adequately represented causal combinations that are closer to corners and not the
ones that are farther away from corners, and then compute the MF of each of the R
S
surviving
causal combinations in all the appropriate N cases. This is a mapping from
{S
C
,S
Cases
} into
S
F
S
that makes use of m
C
i
(x) and m
c
i
(x), where
S
F
S
is a subset of
S
F
, with R
S
elements, i.e.
F
j*(x)
(x) is first computed by (3.17) (
S
F
j*( x )
= {F
j*(1)
(1),...,F
j*(N )
(N)}), after which the J uniquely
45
different
1
F
j*(x)
(x) are relabeled F
¢ j
( ¢ j = 1,..., J , and
S
F
¢ j
= {F
1
,...,F
J
} ), and then the following
are computed, ( ¢ j = 1,..., J, l = 1,..., R
S
and x =1,..., N )
t
F
¢ j
:(S
F
j*
,S
F
¢ j
,S
Cases
) ® {0,1}
x t
F
¢ j
(x) =
1 if F
¢ j
= F
j*(x)
(x)
0 otherwise
ì
í
ï
î
ï
ü
ý
ï
ï
þ
ï
ï
(3.18)
N
F
¢ j
:{0,1} ® I
t
F
¢ j
N
F
¢ j
= t
F
¢ j
(x)
x=1
N
å
ü
ý
ï
þ
ï
(3.19)
F
l
S
:(S
F
, I) ® S
F
S
F
¢ j
F
l
S
=
F
¢ j
( ¢ j ® l) if N
F
¢ j
³ f
0 if N
F
¢ j
< f
ì
í
ï
î
ï
ü
ý
ï
ï
þ
ï
ï
(3.20)
m
F
l
S
:(S
F
S
,S
Cases
) ® [0,1]
x m
F
l
S
(x) = min max m
C
1
l
D
(x),m
c
1
l
D
(x)
( )
,...,max m
C
k
l
D
(x),m
c
k
l
D
(x)
( ) { }
ü
ý
ï
þ
ï
(3.21)
Figure 3.1 is a modified flowchart for Fast fsQCA; Fig. 3.2 remains as is.
Example 3.2. This is a continuation of Example 3.1. Observe in the last column of Table 3.2
that for the 14 cases there exist only 6 J uniquely different causal combinations (these are
listed at the end of Example 3.1). The firing levels for these six surviving causal combinations
for all 14 cases are summarized in Table 3.3. MFs for the six causal conditions are in Table 3.2.
Observe, from Table 3.3, that for each case there is only one causal combination for which its
MF is greater than 0.5; it is shown in boldface.
1
If N < 2
k
, there can be at most N uniquely different F
j*(x)
(x); but, if N ³ 2
k
, there can be at most 2
k
uniquely
different F
j*(x)
(x), although in practice J will be much smaller than 2
k
. Even when N < 2
k
it is quite possible that J
will be much less than
N. Determining the J uniquely different F
j*(x)
(x) requires a sorting algorithm.
46
Fig. 3.1. Top portion of flowchart for Fast fsQCA; it is continued in Fig. 3.2.
Fig. 3.2. Flowchart for fsQCA, continued from Fig. 3.1.
No
Yes
N
F
i
> f ?
A
Compute N
F
i
Make the
Causal Combination
a Remainder
Compute
Derived MFs
N Data Cases
Choose Desired Outcome
Postulate k Causal Conditions
Obtain MFs
R
S
Firing-Level Surviving Rules ( )
m
F
i
S
(x), i = 1,..., R
S
Conceptually Create
2
k
Candidate Rules
Compute F
j*
(x) using (17) and
find the J uniquely different F
j*
(x)
S
F
- S
F
S
S
F
S
(R
S
)
1
1
3
4
5NEW
2
Compute MFs
S
F
¢ j
6NEW
6NEW
6NEW
6NEW
6NEW 8
QM Algorithm
Prime Implicants
Complex Summarizations
Counterfactual
Analysis
Minimal Prime Implicants
Parsimonious Summarizations
Intermediate Summarizations
Compute Consistency,
Best Instances and
Coverages
Simplified Intermediate
Summarizations
QM Algorithm
Believable Simplified
Intermediate
Summarizations
A
Compute R
S
Subsethoods
≥ 0.80 ?
No
Yes
R
A
Actual Rules ( )
Derived MF for
Desired Outcome
S
F
S (R
S
)
S
F
S
- S
F
A
S
F
A
(R
A
)
S
F
- S
F
S
S
F
PI
(R
C
)
S
F
MPI
(R
P
)
S
F
I
(R
I
)
S
F
SI
(R
SI
)
S
F
BSI (R
BSI
)
S
F
- S
F
I
7
7
7
8
9
10
11-13
k Causal
Conditions
Substantive
Knowledge
Desired
Outcome
2 1
47
Table 3.3
Firing levels for six surviving causal combinations and 14 cases. MFs for the six causal conditions are in Table 3.1
Case
Memberships of Surviving Causal Combinations (minimum of six causal-condition-MFs):
Firing Levels
a
l
H
H
H
l
W
h
W
L
A
h
A
l
H
h
H
l
W
h
W
L
A
h
A
L
H
h
H
L
W
h
W
l
A
H
A
l
H
h
H
l
W
H
W
l
A
h
A
l
H
H
h
l
W
h
W
l
A
h
A
L
H
h
H
l
W
h
W
l
A
H
A
1 0.91 0.09 0 0.08 0.08 0
2 0.12 0.69 0 0 0 0
3 0 0 1 0 0 0
4 0 0 0 0.94 0.03 0
5 0.33 0.67 0 0 0 0
6 0 0 0 0.08 0.91 0
7 1 0 0 0 0 0
8 0.12 0.69 0 0 0 0
9 0.33 0.63 0 0 0.33 0
10 0.91 0.09 0 0 0 0
11 0 0 0 0.16 0 0.75
12 0.02 0.98 0 0 0 0
13 0.33 0.65 0 0 0 0
14 0 0 0.89 0 0 0.01
a
Each boldface MF is greater than 0.5.
Example 3.3. In order to establish how much faster Fast fsQCA is as compared to fsQCA, we
have tabulated the computations for Steps 5 and 6 of fsQCA in Table 3.4 and Steps 5NEW and
6NEW of Fast fsQCA in Table 3.5. For simplicity, all operations are treated the same. Observe,
from the bottom lines of these tables, that:
fsQCA Steps 5 and 6 require O(2
k+1
kN) operations. Because fsQCA and Fast fsQCA
both require computing m
C
i
D
(x) and m
c
i
D
(x) (x = 1,..., N) , we do not include these
computations in these tables.
Fast fsQCA Steps 5NEW and 6NEW require O(2kNR
S
) operations
Consequently,
1
fsQCA Steps 5 and 6 (2 )
Speedup (2 / )
Fast fsQCA Steps 5NEW and 6NEW (2 )
k
k
S
S
O kN
OR
O kNR
Speedup is independent of the number of cases, N
As a numerical example, suppose there are six variables with two or three terms per
variable (in which case k = 12 and k = 18, respectively). Then
Speedup(2 terms/variable) = O(2
12
/ R
S
) = O(4,096 / R
S
)
Speedup(3 terms/variable) = O(2
18
/ R
S
) = O(262,144 / R
S
)
Usually R
S
is small as compared to N, e.g., 10-20. For R
S
= 20, these two speedups are
Speedup( 20| 2terms/variable) 205
S
R
48
Speedup( 20| 3terms/variable) 13,107
S
R .
Table 3.4
Computations for fsQCA Steps 5 and 6
Computation fsQCA Steps 5 and 6 Computations
a
1
Enumerate and store the
2
k
candidate causal combinations
F
j
using (6)
The enumeration requires creating
2
k
permutations, each of
length k, in which each element is either
C
i
or
c
i
, and then
labeling each of the permutations, 1, 2, …,
2
k
.
2
Compute and store
m
F
j
(x)
using (7)
For each x
(x = 1,..., N) :
For each j
( j = 1,...,2
k
):
Retrieve the k MF values for each
F
j
(requires
k
operations)
Compute the minimum of the k MF values for each
F
j
(requires
k -1 operations)
Total operations required are:
2
k
Nk + 2
k
N(k -1) = 2
k
N(2k -1)
3
Compute and store
t
F
j
(x) using (9) The computation requires carrying out a test
2
k
times.
4
Compute and store
N
F
j
using (10)
The computation requires
N -1 additions of two numbers,
repeated
2
k
times, for a total of
2
k
(N -1) additions.
5
Compute
F
l
S
using (11)
The computation requires
2
k
tests.
fsQCA Steps 5 and 6 require performing
2
k+1
(kN +1) operations
a
Each permutation, retrieval, minimum, test and addition is counted as one operation.
49
Table 3.5
Computations for Fast fsQCA Steps 5NEW and 6NEW
Computation
Fast fsQCA Steps 5NEW and
6NEW
Computations
a
1
Establish and store the N
F
j*( x)
(x)
using (17)
For each x
(x = 1,..., N) :
Compute the maximum of two MF values, and save
that value for use in Step 6 (requires k operations)
Extract the label of the winning MF value (requires
k extractions)
Store the extracted labels as an ordered chain of k
labels (1 or 0)
Total operations required are: 2kN.
2 Establish and store the J uniquely
different
F
j*( x)
(x)
This requires finding distinct ordered chains from the N
ordered chains (requires on the order of N operations).
3
Compute and store
t
F
¢ j
(x) using (18)
The computation requires carrying out a test N times.
4
Compute and store
N
F
¢ j
using (19)
The computation requires
N -1 additions, repeated J
times, for a total of
J(N -1) additions.
5
Establish
F
l
S
using (20)
The computation requires J tests.
6
Compute and store
m
F
l
S
(x) using (16)
For each x
(x = 1,..., N) :
For each l
(l =1,..., R
S
):
Retrieve the k saved numbers from Computation 1
for each
F
j*( x)
(requires
k operations)
Compute the minimum of the k saved numbers for
each
F
j*( x)
(requires
k -1 operations)
Total operations required are:
( 1) (2 1 )
S S S
NR k NR k k NR
Fast fsQCA Steps 5NEW and 6NEW require performing
2kN(1+ R
S
) + N(J - R
S
+ 2) operations
a
Each maximum, extraction, sort, test, addition, retrieval and minimum is counted as one operation.
3.5. When the Number of Causal Conditions Changes
Sometimes one wants to perform fsQCA for different combinations of causal conditions, by
either including more causal conditions into the mix of the original k causal conditions, or by
removing some of the original k causal conditions. Presently, doing any of these things requires
treating each modified set of causal conditions as a totally new fsQCA. The results in this section
show that there are much easier and faster ways to perform fsQCA once it has already been
performed for k causal conditions.
Observe in (3.17) that, e.g., argmax m
C
1
(x),m
c
1
(x)
( )
is unchanged whether there are one, two,
three, etc. causal conditions. This means that, for each case, the winning causal combination for k
50
causal conditions includes the winning causal combination for ¢ k causal conditions, when ¢ k < k
.
This means that if one knows the winning causal combination for ¢¢ k causal conditions, where
¢¢ k > k , and one wants to know the winning causal combination for k causal conditions, one
simply deletes the undesired ¢¢ k - k causal conditions from the winning causal combination of
¢¢ k causal conditions.
These observations suggest that there are both a forward recursion and a backward recursion
for (3.16) and (3.17).
In what follows, it is assumed that the smallest number of causal conditions for which an
fsQCA is performed is two.
Corollary 3-1-1 (Forward Recursion). For each case, it is true that ( k = 3,4,...):
F
j*(x)
(x | C
1
,C
2
,...,C
k
) = F
j*(x)
(x | C
1
,C
2
,...,C
k-1
)argmax m
C
k
D
(x),m
c
k
D
(x)
( )
(3.22)
m
F
j*( x )
(x | C
1
,C
2
,...,C
k
) = min m
F
j*( x )
(x | C
1
,C
2
,...,C
k-1
),max m
C
k
D
(x),m
c
k
D
(x)
( ) { }
(3.23)
These results, arguably for the first time, connect fsQCA firing-level calculations for k and k -
1 causal conditions, provide an entirely new way to perform fsQCA computations when one
wishes to study different combinations of causal conditions on a desired outcome, and should
lead to a vast reduction in computation time for such a study. It does assume that the same cases
are used throughout.
Proof: See Section A.2 in Appendix A.
Corollary 3-1-2 (Backward Recursion). Let C
j
denote the suppression of causal condition
C
j
. Then it is true that:
F
j*(x)
(x | C
1
,C
2
,..., C
i
,...,C
k
) = F
j*(x)
(x | C
1
,C
2
,...,C
i-1
,C
i+1
,...,C
k
) (3.24)
Proof: Obvious from (3.17).
51
This backward recursion can also lead to a vast reduction in computation time. For example, if
the winning causal combination F
j*(x)
(C
1
,C
2
,...,C
k
) has been determined for five causal
conditions (k = 5), then it can be used to establish the winning causal combination for any
combination of four, three, or two of the causal conditions, by inspection! No new computations
have to be performed, because the winning combinations for fewer causal conditions are always
contained in the winning combinations for more causal combinations.
Example 3.4. Assume, for example, that AbcdE is a winning causal combination for Case x
when five causal conditions are used. Then from (3.24), one can conclude:
If one wants to eliminate causal conditions B and D from a study, then AcE is the winning
causal combination for Case x when only the three causal conditions A, C and E are used.
If one wants to eliminate causal conditions C, D and E from a study, then Ab is the winning
causal combination for Case x when only the two causal conditions A and B are used.
Of course, many other situations for AbcdE
can be considered.
No way has yet been determined for computing m
F
j*( x )
(x | C
1
,C
2
,..., C
i
,...,C
k
) from
m
F
j*( x )
(x | C
1
,C
2
,...,C
i
,...,C
k
) . It seems that once F
j*(x)
(C
1
,C
2
,..., C
i
,...,C
k
) has been determined
from (3.17), m
F
j*( x )
(x | C
1
,C
2
,..., C
i
,...,C
k
) must be computed directly from (3.16), as
( 1,2,..., )
S
lR :
m
F
j*( x )
(x | C
1
,C
2
,..., C
i
,...,C
k
) = min max m
C
1
D
(x),m
c
1
D
(x)
( )
,...,max m
C
i-1
D
(x),m
c
i-1
D
(x)
( )
,
é
ë
max m
C
i+1
D
(x),m
c
i+1
D
(x)
( )
,...,max m
C
k
D
(x),m
c
k
D
(x)
( )
ù
û
(3.25)
Corollary 3-1-3 (Firing Levels are Bounded). If m
F
j*( x )
(x | C
1
,C
2
,...,C
k
1
) has been computed
for k
1
causal conditions, and one now considers k
2
causal conditions, where k
2
> k
1
, and the k
2
causal conditions include the original k
1
causal conditions, then
52
m
F
j*( x )
(x | C
1
,C
2
,...,C
k
2
) £ m
F
j*( x )
(x | C
1
,C
2
,...,C
k
1
) (3.26)
This means that when new causal conditions are added to an existing set of causal conditions,
the firing level for the new winning causal combination (which, by Corollary 1-1, contains the
prior winning causal combination) can never be larger than the prior firing level, i.e. firing levels
tend to become weakened when more causal conditions are included.
Proof: See Section A.3 in Appendix A.
3.6. Recursive Computation of Consistency
Subsethood is the major computation of Step 7 of fsQCA. That computation is (l = 1,2,..., R
S
):
ss
K
(F
l
S
,O) =
min(m
F
l
S
(x),m
O
D
(x))
x=1
N
å
m
F
l
S
(x)
x=1
N
å
(3.27)
Additionally, in Step 7 only those F
l
S
whose subsethoods are greater than or equal to 0.80 are
kept and become the so-called actual causal combinations, F
m
A
, m = 1,..., R
A
.
Before moving to the main subject of this section, we pause to complete the Low mpg example.
Example 3.5. This is a continuation of Examples 3.1 and 3.2. Beginning with the R
S
= 6
surviving causal combinations listed at the end of Example 3.1, repeated here for the
convenience of the readers,
l
H
H
H
l
W
h
W
L
A
h
A
, l
H
h
H
l
W
h
W
L
A
h
A
, L
H
h
H
L
W
h
W
l
A
H
A
, l
H
h
H
l
W
H
W
l
A
h
A
, l
H
H
H
l
W
h
W
l
A
h
A
, L
H
h
H
l
W
h
W
l
A
H
A
,
ss
K
(F
l
S
,O) is computed using (3.27), where all MF values for each case are given in Table 3.1.
Results are summarized in Table 3.6. In that table Frequency, N
f
, is the number of cases for
which each of these causal combinations is the winner; that number is obtained by counting the
number of bold-faced entries in each column of Table 3.3. In Table 3.6, observe that
ss
K
(F
l
S
,O) ³ 0.80 for only four of the six surviving causal combinations; these are the R
A
actual
causal combinations that (see Steps 8-13) move on to the rest of fsQCA (details for these steps
53
are in [29] but for a more complete study of the Low MPG problem). The two applications of the
QM algorithm (Step 8) lead to:
Three complex solutions: l
H
H
H
l
W
h
W
h
A
, L
H
h
H
l
W
h
W
l
A
H
A
, l
H
h
H
l
W
H
W
l
A
h
A
Two parsimonious solutions: h
H
l
W
l
A
, H
H
l
W
h
A
Steps 9-11 lead to three believable simplified intermediate solutions: l
H
H
H
l
W
h
A
, h
H
l
W
H
W
l
A,
L
H
h
H
l
W
l
A
H
A
.
Table 3.6
Frequency and consistency for six surviving causal combinations.
Frequency and Consistency of Surviving Causal Combinations
a
l
H
H
H
l
W
h
W
L
A
h
A
l
H
h
H
l
W
h
W
L
A
h
A
L
H
h
H
L
W
h
W
l
A
H
A
l
H
h
H
l
W
H
W
l
A
h
A
l
H
H
H
l
W
h
W
l
A
h
A
L
H
h
H
l
W
h
W
l
A
H
A
Frequency: N
f
3 6 2 1 1 1
Consistency: ss
K
0.85 0.67 0.78 0.99 0.88 1
a
Each boldface consistency is greater than 0.8.
Because a linguistic summarization must be easy to understand by a person, we now simplify
the occurrence of two terms for a variable in the believable simplified intermediate solutions
(e.g., l
H
H
H
® H
H
, l
W
H
W
® H
W
, L
H
h
H
® L
H
and l
A
H
A
® H
A
),
so that the following three
rules summarize Low mpg autos:
IF Horsepower is High and Weight is Not Light and Acceleration is Not High THEN MPG
is Low
IF Horsepower is Not High and Weight is Heavy and Acceleration is Not Low THEN
MPG is Low
IF Horsepower is Low and Weight is Not Light and Acceleration is High THEN MPG is
Low
Let us now return to the computation of ss
K
(F
l
S
,O). Consider two populations, one of size
N
1
and the other of size N = N
1
+ N
2
, where the N
1
cases are contained in the N cases. In
order to show the dependency of ss
K
(F
l
S
,O) on the population size, we use a conditioning
notation, i.e.:
ss
K
(F
l
S
,O | N
1
) =
min(m
F
l
S
(x),m
O
D
(x))
x=1
N
1
å
m
F
l
S
(x)
x=1
N
1
å
(3.28)
54
ss
K
(F
l
S
,O | N) =
min(m
F
l
S
(x),m
O
D
(x))
x=1
N
å
m
F
l
S
(x)
x=1
N
å
(3.29)
The following theorem shows how ss
K
(F
l
S
,O | N) is computed recursively from
ss
K
(F
l
S
,O | N
1
), which can also save computations.
Theorem 3.2. Suppose N > N
1
, and the N
1
cases are contained in the N cases.
ss
K
(F
l
S
,O | N) is computed recursively from ss
K
(F
l
S
,O | N
1
), as:
1
1
1
1
1
11
1
min( ( ), ( ))
1
( , | ) ( , | )
( ) ( )
1
()
S
l
SS
ll
S
l
N
D
O
F xN SS
K l K lNN
FF x N x
N
F x
xx
ss F O N ss F O N
xx
x
(3.30)
Note that m
F
l
S
(x)
x=1
N
å
can also be computed recursively, but we do not pursue that here. Eq.
(3.30) lets us compute ss
K
(F
l
S
,O | N) in two steps: (1) resize ss
K
(F
l
S
,O | N
1
) and (2) add in a
correction due to the new N
2
cases.
Proof: See Section A.4 in Appendix A.
For those readers who are familiar with recursive filters or recursive estimators (e.g., [36])
(3.30) has the structure of a case-varying recursive filter.
Equation (3.30) allows us to examine the relative sizes of ss
K
(F
l
S
,O | N) and ss
K
(F
l
S
,O | N
1
).
To begin, let
min(m
F
l
S
(x),m
O
D
(x))
x=N
1
+1
N
å
m
F
l
S
(x)
x=N
1
+1
N
å
ss
K
(F
l
S
,O | N
2
) (3.31)
ss
K
(F
l
S
,O | N
2
) is the subsethood of F
l
S
in O, but only for the new N
2
cases. Then:
55
Corollary 3-2-1. It is true that
1
( , | ) ( , | )
SS
K l K l
ss F O N ss F O N (3.32)
if and only if
21
( , | ) ( , | )
SS
K l K l
ss F O N ss F O N (3.33)
Proof: See Section A.5 in Appendix A.
It is clear from Corollary 3-2-1 that it is possible for new cases to cause a rule (causal
combination) to be obliterated if ss
K
(F
l
S
,O | N
2
) is smaller than ss
K
(F
l
S
,O | N
1
); but, how much
smaller must ss
K
(F
l
S
,O | N
2
) be than ss
K
(F
l
S
,O | N
1
) for a rule to be obliterated? This question is
answered in our next section.
3.7. On the Obliteration of a Rule
Corollary 3-2-2. Suppose ss
K
(F
l
S
,O | N
1
) = 0.8 + D(N
1
) ³ 0.8 , where
1
0 £ D(N
1
) £ 0.2. Let
1
1
1
1
()
()
S
l
S
l
N
F x
N
F xN
x
x
(3.34)
Then ( l = 1,..., R
S
)
ss
K
(F
l
S
,O | N) < 0.8, (3.35)
if and only if
ss
K
(F
l
S
,O | N
2
) < 0.8 - D(N
1
)r (3.36)
which means that F
l
S
is obliterated. Another way to express (3.36) is:
r <
0.8 - ss
K
(F
l
S
,O | N
2
)
ss
K
(F
l
S
,O | N
1
) - 0.8
(3.37)
Proof: See Section A.6 in Appendix A.
1
The maximum value of consistency is 1. Ragin’s consistency threshold is 0.8. It is easy to modify the results in this
corollary for another value of this threshold; simply replace 0.8 by that value, say t, in which case 0 £ D(N
1
) £1- t .
56
Observe that (3.36) or (3.37) provide constructive tests to establish if a rule that has survived
the consistency threshold based on N
1
cases will be obliterated by the additional N
2
cases. This
requires computing both sides of (3.36) [or (3.37)] to see if it is (or is not) satisfied. If it is
satisfied, then (3.35) will be true and the l
th
rule will be obliterated. If it is not satisfied, then the
l
th
rule will not be obliterated.
r is a very interesting parameter. It is the ratio of the sums of firing levels, one for the
original N
1
cases and the other for the additional N
2
cases. To-date, we have no other physical
meaning for this parameter, and we also have no physical meaning for the right-hand side of
(3.37).
The results in this section cause one to think about the cases that are or will be used in fsQCA.
Ragin [28] has a lot to say about cases and their choice (see also Chapter 2, Step 1). If, for
example, one insists on always using all of the cases for every desired outcome, then it is quite
possible that important rules will be obliterated by those cases for which their MF value in the
desired outcome is zero. On the other hand, if one includes cases that seem to support the desired
outcome, then the likelihood of obliterating an important rule will be greatly reduced. So, the
choice of cases should be done in Step 1 in a way that actively supports the goal of learning
which causal combinations are the cause of a desired outcome.
Example 3.6. It sometimes happens that membership in the desired outcome for one or more
cases is zero [e.g., this could have happened in the Low MPG application if we had included cars
(cases) whose MF value in Low MPG was zero]. The purpose of this example is to examine the
consequences of including such cases in fsQCA. Suppose that the N cases are sorted into two
subsets, one with N
1
cases and the other with N
2
= N - N
1
cases, where:
m
O
D
(x) > 0 N
1
cases
m
O
D
(x) = 0 N
2
cases
ì
í
ï
î
ï
(3.38)
Note that the causal combinations that were established in Step 6 or Step 6New using the N
1
cases do not change as a result of using the other N
2
cases. This is a direct result of Theorem
3.1, i.e., associated with each case is one and only one winning causal combination, and that
57
winning combination is independent of all other cases. What might change is the value that is
used for the frequency threshold in Step 6 or Step 6New and, of course, there will be one
winning causal combination for each of the N
2
cases (some, or all of them, may be the same as
the winning causal combinations for the N
1
cases). In this example we assume that this
threshold remains the same for both the N
1
cases and N
1
+ N
2
cases.
Obliteration of a rule, if it occurs, occurs in Step 7, and requires our examining ss
K
(F
l
S
,O | N)
in (3.30). Based on (3.38) the second term in (3.30) is zero, so that (3.30) simplifies to (
l = 1,..., R
S
):
1
1
1
1
()
( , | ) ( , | )
()
S
l
S
l
N
F xSS
K l K l N
F x
x
ss F O N ss F O N
x
(3.39)
Because m
F
l
S
(x)
x=1
N
å
³ m
F
l
S
(x)
x=1
N
1
å
, (3.39) reveals that ss
K
(F
l
S
,O | N) £ ss
K
(F
l
S
,O | N
1
) , which
means that including cases for which m
O
D
(x) = 0 can only reduce consistency, which strongly
suggests that one should choose cases in Step 1 for which m
O
D
(x) ¹ 0 .
According to Corollary 3-2-2, ss
K
(F
l
S
,O | N
1
) = 0.8 + D(N
1
); hence, (3.39) becomes:
ss
K
(F
l
S
,O | N) =
m
F
l
S
(x)
x=1
N
1
å
m
F
l
S
(x)
x=1
N
å
´[0.8 + D(N
1
)] (3.40)
We are interested to learn when ss
K
(F
l
S
,O | N) < 0.8, in which case F
l
S
would be obliterated.
Substituting (3.40) into (3.35), we see that F
l
S
will be obliterated if:
m
F
l
S
(x)
x=1
N
1
å
m
F
l
S
(x)
x=1
N
å
´[0.8 + D(N
1
)] < 0.8 (3.41)
or, using a bit of algebra, when (l = 1,..., R
S
):
m
F
l
S
(x)
x=N
1
+1
N
å
>
D(N
1
)
0.8
m
F
l
S
(x)
x=1
N
1
å
(42)
The left-hand side of (3.42) is for the N
2
cases, whereas the right-hand side of (3.42) is for the
N
1
cases.
58
From the constructive test, given by (3.42), one observes that:
If
1
D(N
1
) = 0 , which means that a causal combination has just barely passed the
consistency test based on the N
1
cases, then ss
K
(F
l
S
,O | N) < 0.8 if
m
F
l
S
(x)
x=N
1
+1
N
å
> 0
,
which is always true even when N
2
= 1. So, when a causal combination passes the
consistency test just at the threshold value it is in a very precarious situation. It only takes
one case for which m
O
D
(x) = 0 to obliterate that causal combination!
If the consistency threshold of 0.8 is lowered to t < 0.8, then D(N
1
) / t becomes larger
and so it is more difficult to obliterate a causal combination, e.g., if t = 0.6, then (see
footnote 12) 0 £ D(N
1
) £ 0.4 , and
2
max[D(N
1
) / 0.6] = 0.4 / 0.6 = 2 / 3 >> 0.25. However,
lowering t in this manner is not in the spirit of the consistency test, so avoiding the
obliteration of rule by doing this is not recommended.
If N
1
is large, then m
F
l
S
(x)
x=1
N
1
å
can also be large, because "m
F
l
S
(x) Î[0,1], in which
case it will be more difficult for (3.42) to be satisfied, unless D(N
1
) is very small. If,
however, D(N
1
) » 0 then it does not matter how large N
1
or m
F
l
S
(x)
x=1
N
1
å
are, and it will
again be very easy to obliterate a causal combination.
These observations demonstrate that it is very important to choose cases very carefully for
3.8. On the Existence of a Candidate Causal Combination
3.8.1 Introduction
This section focuses on a question that does not seem to have been asked before in the context
of fsQCA: Do all of the
2
k
candidate causal combinations postulated in Steps 5 or 5NEW exist,
i.e., can they actually occur, or, put another way, are they possible? We shall demonstrate in this
section that the answer depends on how many terms are used for each variable, and, when more
1
According to Corollary 3-2-2, 0 £ D(N
1
) £ 0.2.
2
When 0 £ D(N
1
) £ 0.2, max[D(N
1
) / 0.8] = 0.2 / 0.8 = 0.25.
59
than one term is used for a variable the answer surprisingly (to us) is NO! This has profound
implications on the computation of the parsimonious solutions in Step 8 of fsQCA, which is:
Step 8. Use the Quine-McCluskey (QM) algorithm to obtain
R
C
complex solutions [prime
implicants] and
R
P
parsimonious solutions [minimal prime implicants]. This is a mapping from
1
S
F
A
,
S
F
- S
F
S
and
S
F
S
- S
F
A
into
S
F
PI
and
S
F
MPI
by two different applications of the QM algorithm,
i.e.
F
PI
:{S
F
A
,S
F
S
- S
F
A
,S
F
- S
F
S
} ® S
F
PI
F
j
{ }
j=1
2
k
F
n
PI
{ }
n=1
R
C
, F
n
PI
= QM
PI
S
F
A
present
S
F
- S
F
S
absent
S
F
S
- S
F
A
absent
æ
è
ç
ç
ç
ç
ö
ø
÷
÷
÷
÷
ü
ý
ï
ï
ï
þ
ï
ï
ï
(3.43)
F
MPI
:{S
F
A
,S
F
S
- S
F
A
,S
F
- S
F
S
} ® S
F
MPI
F
j
{ }
j=1
2
k
F
p
MPI
{ }
p=1
R
P
, F
p
MPI
= QM
MPI
S
F
A
present
S
F
- S
F
S
don't care
S
F
S
- S
F
A
absent
æ
è
ç
ç
ç
ç
ö
ø
÷
÷
÷
÷
ü
ý
ï
ï
ï
þ
ï
ï
ï
(3.44)
There is no problem with (3.43); however, there is with (3.44). Observe, in (3.44) that a subset
of the original 2
k
candidate causal conditions (in
S
F
) is treated as “don’t cares.” These are all of
the causal conditions whose firing levels either do not pass the frequency threshold test or were
not revealed by the cases (the remainders that are not in
S
F
S
). If
S
F
- S
F
S
is stated incorrectly,
then the parsimonious solutions will be incorrect.
Our approach in this section is to proceed gradually. First we shall assume that each variable is
described by only one term; next, we shall assume that each variable is described by two terms;
and, finally, we shall assume that each variable is described by three terms.
3.8.2 When each variable is described by one term
1
See Fig. 3.3 for explanations of
S
F
- S
F
S
,
S
F
S
- S
F
A
and
S
F
A
.
60
Suppose that
12
{ , ,..., }
v
Vn
S v v v is the set of all variables. Assume that one term [e.g., Low (L)]
is assigned to each variable. Then the space of all possible causal conditions is
S
C
={C
iL
| "i = 1,...,n
v
} (3.45)
Theorem 3.3 shows that if only one term is assigned to each variable, in which case there can
only be two possible causal combinations for each variable v
i
, namely S
CC
i
= {C
iL
,c
iL
}, then all
of the 2
k
candidate causal
combinations are possible, where
v
kn .
Theorem 3.3 (Possible causal combinations for one term/variable): If one term (e.g., Low)
is assigned to each variable v
i
, and the MF for that term is a shoulder
1
(left, or right) then all of
the 2
k
v
kn
postulated candidate causal combinations in the space of all candidate causal
combinations,
F
S , are possible.
Proof. See Section A.7 in Appendix A.
We have included this simplest of all cases so that the reader can see our method of proof and
learn how to confirm what he or she may have felt was already obvious.
3.8.3 When each variable is described by two terms
Suppose that two terms [e.g., Low (L) and High (H)] are assigned to each variable. Then the
space of all possible causal conditions is
S
C
={C
iL
,C
iH
| "i = 1,...,n
v
} (3.46)
Now, there can be four possible candidate causal combinations for each variable v
i
, namely:
,,,
i
CC iL iH iL iH iL iH iL iH
S C C C c c C c c (3.47)
For simplicity,
iL
C and
iH
C are simplified to L and H, respectively, so that (3.47) can be expressed
more simply, as:
S
CC
i
= LH, Lh,lH,lh
{ }
(3.48)
1
We assume that a left shoulder has a monotonically non-increasing MF, whereas a right shoulder has a
monotonically non-decreasing MF.
61
Theorem 3.4 shows that if two terms are assigned to each variable then all 2
k
candidate causal
combinations are not possible, where
k = 2n
v
.
Theorem 3.4 (Possible candidate causal combinations for two terms/variable): If two
terms are assigned to each variable v
i
[e.g., Low (L) and High (H)], and the MFs for those terms
are a left shoulder and a right shoulder, then all of the 2
k
k = 2n
v
( )
postulated candidate causal
combinations in the space of all candidate causal combinations,
F
S , are not possible.
More specifically, let
x
1
(
x
2
) be the value of variable x when m
L
(x) = m
l
(x) ( m
H
(x) = m
h
(x)).
Then the following are true:
a. If x
1
< x
2
, then S
CC
i
= Lh,lH,lh { }
and causal combination LH never occurs.
b. If x
1
> x
2
, then ,,
i
CC
S LH Lh lH and causal combination lh never occurs.
c. If x
1
= x
2
, then ,
i
CC
S Lh lH and causal combinations LH and lh
never occur.
Proof: See Section A.8 in Appendix A where two different proofs are provided; one uses min-
max Theorem 3.1 and other uses the fundamental definition of a causal combination that is given
in (3.6). Two proofs are provided because many readers will be familiar with (3.6) but not with
the new min-max Theorem.
In case (c), when and m
l
(x) = m
H
(x) (because x
1
= x
2
), we no longer have two
terms per variable, i.e. the two terms have reduced to one term for which Theorem 3.3 then
applies. This case is not as far-fetched as one might believe. If, for example fuzzy c-means is used
to cluster the data for a variable into two clusters, then the MFs for the two clusters will be the
complements of one another. If we assign the linguistic terms Low and High to the two clusters
then we will have m
L
(x) = m
h
(x) and m
l
(x) = m
H
(x). Consequently, in this situation, what one
believes to be two terms per variable reduces to only one term per variable.
Once one realizes that, if x
1
= x
2
then H and L are indistinguishable from l and h, one would
redesign the MFs of H and L so that x
1
¹ x
2
, and this is easy to do.
m
L
(x) = m
h
(x)
62
The possible candidate causal combinations for two terms per variable are summarized in Table
3.7. After the MFs for L and H are established it is easy to find x
1
and x
2
. Those numbers
establish which row of Table 3.7 is activated, and only one row can be activated for each variable.
Of course, a different row may be activated for different variables.
Table 3.7
Possible candidate causal combinations for two terms per
variable
a
and three situations. Highlighted row indicates the
plausibly possible situation and its possible causal
combinations.
Situation lh lH Lh LH
Number of possible
combinations
x
1
< x
2
P P P 3
x
1
> x
2
P P P 3
x
1
= x
2
P P 2
a
P denotes a possible causal combination.
Example 3.7. For the Low MPG application (Examples 3.1, 3.2 and 3.5) two terms have been
assigned to each of three variables (n
v
= 3) . Originally, the number of postulated candidate
causal combinations was 2
6
= 64, whereas, according to Theorem 3.4 and Table 3.7, the number
of possible candidate causal combinations is at most 3
3
= 27. This is a 27 / 64 = 0.422 reduction
factor, which, although not huge, is notable.
Leaving the Low MPG application, assume next that two terms are assigned to each of six
variables ( n
v
= 6 ). Originally, the number of postulated candidate causal combinations was
2
12
= 4
6
= 4096,
whereas, according to Theorem 3.4 and Table 3.7, the number of possible
candidate causal combinations is at most
3
6
= 729. This is a
729 / 4096 = 0.178
reduction factor,
Even though Table 3.7 contains three situations, we believe that in practice only the situation in
the highlighted row, for x
1
< x
2
, will occur, because the MF for Low should be sufficiently to the
left of the MF for High. We therefore distinguish between causal combinations that are possible
versus those that are plausibly possible.
63
Corollary 3-4-1. Of the three situations that may occur when two terms are used for a variable,
as summarized in Table 3.7, only the situation when x
1
< x
2
is plausible, for which only the three
causal combinations lh, lH and Lh are plausibly possible.
No proof is required for this Corollary.
3.8.4 When each variable is described by three terms
It is quite common in engineering and computer science to want to use three terms per variable,
e.g. Low (L), Moderate (M) and High (H). In this case, the space of all possible causal conditions
is
S
C
= {C
iL
,C
iM
,C
iH
| "i = 1,...,n
v
} (3.49)
Now there are eight possible causal combinations for each variable v
i
, which we will immediately
express in terms of L, l, M, m, H and h, as:
S
CC
i
= {lmh,lmH,lMh,lMH,Lmh,LmH,LMh,LMH} (3.50)
Theorem 3.5 shows that if three terms are assigned to each variable, then all the 2
k
candidate
causal combinations are not possible, where k = 3n
v
.
Theorem 3.5 (Possible candidate causal combinations for three terms/variable): If three
terms are assigned to each variable v
i
[e.g., Low (L), Moderate (M) and High (H)], and the MFs
for those terms are a left shoulder, interior and right shoulder
1
, respectively, then all of the 2
k
(k = 3n
v
)
postulated candidate causal combinations in the space of all candidate causal
combinations,
F
S , are not possible. More specifically, let x
1
( x
2
) be the value of variable x
when m
L
(x) = m
l
(x)
( m
H
(x) = m
h
(x) ), and, x
3
and x
4
be the two intersection points of
m
M
(x) = m
m
(x) (see Fig. A.5 in Appendix A). Then the possible candidate causal combinations
depend on 23 relative locations of x
1
, x
2
, x
3
and x
4
, and are given in Table 3.8.
1
We assume that all MFs are convex. See footnote 16 for left and right shoulders. An interior MF is one that is first
monotonically non-decreasing and is then monotonically non-increasing.
64
Table 3.8
Possible candidate causal combinations for three terms per variable
a
and 29 situations in Table
3.9 simplified to 23 situations. Highlighted rows indicate the two plausibly possible situations
and their possible causal combinations.
Situation lmh lmH lMh lMH Lmh LmH LMh LMH
Number
of possible
combinations
x
1
< x
2
x
3
< x
4
£ x
1
< x
2
P P P P 4
x
3
< x
1
< x
4
< x
2
P P P P P 5
x
3
< x
1
< x
4
= x
2
P P P P 4
x
1
£ x
3
< x
4
< x
2
P P P P 4
x
1
< x
3
< x
4
£ x
2
P P P P 4
x
3
= x
1
< x
4
= x
2
P P P 3
x
3
< x
1
< x
2
< x
4
P P P P P 5
x
3
= x
1
< x
2
< x
4
P P P P 4
x
1
< x
3
< x
2
< x
4
P P P P P 5
x
1
< x
2
£ x
3
< x
4
P P P P 4
x
2
< x
1
x
3
< x
4
£ x
2
< x
1
P P P P 4
x
3
< x
2
< x
4
< x
1
P P P P P 5
x
3
< x
2
< x
4
= x
1
P P P P 4
x
2
£ x
3
< x
4
< x
1
P P P P 4
x
2
< x
3
< x
4
£ x
1
P P P P 4
x
3
= x
2
< x
4
= x
1
P P P 3
x
3
< x
2
< x
1
< x
4
P P P P P 5
x
3
= x
2
< x
1
< x
4
P P P P 4
x
2
< x
3
< x
1
< x
4
P P P P P 5
x
2
< x
1
£ x
3
< x
4
P P P P 4
x
1
= x
2
x
3
< x
4
£ x
1
P P P 3
x
3
< x
1
< x
4
P P P P 4
x
1
£ x
3
< x
4
P P P 3
a
P denotes that causal combination is possible to occur.
Proof: See Section A.9 in Appendix A.
Referring to Table 3.8, after the MFs for L, M and H are established, it is easy to find x
1
, x
2
,
x
3
and x
4
. Those numbers establish which row of Table 3.8 is activated, and only one row can be
activated for each variable. Of course, a different row may be activated for different variables.
65
Example 3.8. For the Low MPG application (Examples 3.1, 3.2, 3.5 and 3.7) assume that three
terms have been assigned to each of three variables (n
v
= 3) . Originally, the number of
postulated candidate causal combinations was 2
9
= 512, whereas, according to Theorem 3.5 and
Table 3.8, the number of possible candidate causal combinations is at most 5
3
= 125. This is a
125 / 512 = 0.244 reduction factor, which, although still not huge, is even more notable. Below,
when we introduce the concept of a plausible situation, we will see that only three or four causal
combinations are possible; so, for plausible situations, the actual number of possible candidate
causal combinations is at most 4
3
= 64 . This is a 64 / 512 = 0.125 reduction factor, which is
quite substantial.
Assume, next, as we did in Example 3.7, that three terms are assigned to each of six variables
(n
v
= 6) . Originally, the number of postulated candidate causal combinations was
8
6
= 2
18
= 262,144 ,
whereas, according to Theorem 5 and Table 3.8, the number of possible
candidate causal combinations is at most 5
6
= 15,625.
Although this is still a fairly large number
of candidate causal combinations it is a 15,625 / 262,144 = 0.059 reduction factor, which is very
substantial. For four plausible situations, the actual number of possible causal combinations is at
most 4
6
= 4096. This is a 4096 / 262,144 = 0.0156 reduction factor, which is enormous.
Even though Table 3.8 contains 23 situations, we believe that in practice only three situations
will occur, because not only should the MF for Low be sufficiently to the left of the MF for High,
but the MF for Moderate should be reasonably situated between the MFs of Low and High. We
therefore again distinguish between causal combinations that are possible versus those that are
plausibly possible.
Corollary 3-5-1. Of the 23 situations that may occur when three terms are used for a variable,
as summarized in Table 3.8, only three situations are plausible, namely
1
:
1
This is based on what to us is the very reasonable requirement (see Fig. A.6)
x
1
£ x
3
< x
4
£ x
2
.
66
1. x
1
£ x
3
< x
4
< x
2
and x
1
< x
3
< x
4
£ x
2
for which only the same four causal combinations lmh,
lmH, lMh and Lmh are plausibly possible.
2.
x
3
= x
1
< x
4
= x
2
for which only the three causal combinations lmH, lMh and Lmh are
plausibly possible.
It is interesting to observe that causal combinations lmH, lMh and Lmh are shared by all three of
these plausible situations.
3.8.5 Summary
In order to summarize the importance of Theorems 3.3–3.5 on computing the parsimonious
solutions using the QM algorithm, we remind the reader that: (1) if they are only using one term
per variable (Theorem 3.3), then all of the postulated candidate causal combinations exist (are
possible), and their obtained parsimonious solutions are correct; however, (2) if they are using
two (Theorem 3.4) or three (Theorem 3.5) terms per variable (or, even if only one variable uses
more than one term), then all of the
2
k
postulated candidate causal combinations do not occur
(i.e., many are impossible) and to use all of them to compute the parsimonious solutions is
incorrect. This new phenomenon is not related to limited diversity, because no case (or
substantive knowledge) can ever cause an impossible candidate causal combination to appear in
the fsQCA solution.
So, (3.44) must be modified to:
F
j
{ }
j=1
2
k
F
p
MPI
{ }
p=1
R
P
, F
p
MPI
= QM
MPI
S
F
A
present
S
F
(possible) - S
F
S
don't care
S
F
(impossible) absent
S
F
S
- S
F
A
absent
æ
è
ç
ç
ç
ç
ç
ç
ç
ç
ö
ø
÷
÷
÷
÷
÷
÷
÷
÷
(3.51)
where the elements in ()
F
possible S and
S
F
(impossible) can be constructed from our Tables 3.7
and 3.8.
67
In addition, in Step 8 of Fig. 3.2,
S
F
- S
F
S
must be replaced by ()
S
F
F
possible SS and
S
F
(impossible), and, in Step 10,
S
F
- S
F
I
must be replaced by
S
F
(possible)- S
F
I
. Similarly,
in the Step 6NEW block of Fig. 3.1 called “Make the Causal Combination a Remainder”
S
F
- S
F
S
should be replaced by ()
S
F
F
possible SS .
In Chapter 2 we had the diagram that is repeated below as Fig. 3.3. Whereas this diagram is
still correct when each variable uses only one term, it is incorrect when even one variable uses
more than one term. A corrected version of it appears in Fig. 3.4.
Fig. 3.3. fsQCA partitions the original 2
k
candidate causal combinations into three subsets.
Fig. 3.4. fsQCA partitions the original 2
k
candidate causal combinations into four subsets.
Finally, we wish to point out that all of the results in this section are valid for crisp sets as well
as for fuzzy sets. This follows from the fact that a crisp set is contained within the definition of a
fuzzy set.
68
Chapter 4
Challenges to Using fsQCA
4.1. Introduction
fsQCA’s steps, grouped here into three higher-level steps—Preparatory, Processing and
Summarization—are the basis for the rest of this chapter, and are:
1. Preparatory steps
a. Choose a desired outcome (e.g., Low MPG autos) and associated N cases (the autos that are associated
with low mpg autos).
b. Postulate k causal conditions (e.g., horsepower, weight, age, displacement, …). If a variable is described
by more than one term (e.g., Low Horsepower, Moderate Horsepower, High Horsepower) then treat each
term as an independent causal condition.
c. Treat the desired outcome and causal conditions as fuzzy sets, and determine membership functions (MFs)
for all of them.
d. Evaluate these MFs for all available cases, obtaining derived MFs.
2. Processing steps
2.1 Create 2
k
candidate causal combinations that are the candidate antecedents of the fsQCA if-then rules.
2.2 Compute the MF of each of these candidate causal combinations in all of the available cases, and keep
only the R
S
surviving causal combinations whose MF values are > 0.5, and that occur for N
F
i
cases,
where
i
F
Nf .
2.3 Compute the subsethoods (consistencies) of these R
S
surviving causal combinations, and keep only those
R
A
actual causal combinations whose subsethoods are ≥ 0.80 (a design parameter advocated by Ragin—it
can be modified by an end user).
2.4 Use the Quine-McClusky (QM) algorithm to obtain the R
C
complex solutions (summarizations) and the
R
P
parsimonious solutions.
2.5 Perform Counterfactual Analysis (CA) on the R
C
complex solutions, constrained by the R
P
parsimonious
solutions, to obtain the R
I
intermediate solutions.
69
2.6 Perform QM on the intermediate solutions to obtain the R
SI
simplified intermediate solutions.
2.7 Retain only those simplified intermediate solutions whose subsethoods are approximately ≥ 0.80, the R
BSI
believable simplified intermediate solutions.
3. Summarization steps
3.1 Connect each believable simplified intermediate solution with its best instances (cases).
3.2 Compute the coverage of each solution.
We summarized fsQCA mathematically in Chapter 2 and were then able to study some of its
key steps mathematically in Chapter 3 so as to better understand them and to even enhance them.
Some of those enhancements are used in this chapter, where each of the above 13 steps are
examined and challenges to their implementations are identified. Without an understanding of
the challenges as well as their solutions we do not believe it is possible for fsQCA to be applied
to engineering and computer science problems.
The rest of this chapter is organized as follows: Section 4.2 explains the challenges to fsQCA;
and Section 4.3 provides conclusion.
4.2. Challenges to fsQCA
In this section we establish which of the 13 fsQCA steps has a challenge and state the
challenge. There are nine such challenges.
4.2.1. Step 1.1: Desired outcome and cases
There is no challenge to choosing a desired outcome, i.e. one uses fsQCA to establish causal
combinations for a specific outcome that is the focus of a study, e.g., to establish the causal
combinations associated with: Breakdown of Democracy between World Wars 1 and 2 among 18
European countries [23, Ch. 5]; High Performance Firms for a set of 205 high technology firms
located in the United Kingdom [22]; High Fluid Production Rate for a group of 100 oil wells;
etc.
There is, however, a challenge to choosing the cases that are associated with the desired
outcome. In Ragin’s works, as well as others (e.g., [22], [23], [28]) cases are usually chosen
based on substantive knowledge about the specific problem, e.g. to establish the causal
combinations associated with Breakdown of Democracy between World Wars 1 and 2 among 18
70
European countries, the cases are those 18 countries; or, to establish the causal combinations
associated with High Performance Firms for a set of 205 high technology firms located in the
United Kingdom, the cases could be all 205 companies.; or, to establish the causal combinations
associated with High Fluid Production Rate for a group of 100 oil wells, the cases could be all
100 oil wells.
The last two examples raise an interesting question: Should one use all 205 companies or only
the subset of them that are in the set of High Performance Firms, and should one use all 100 oil
wells or only the subset of them that are in the set of High Fluid Production Rate oil wells?
According to Ragin [28]
The question “What is the case?” can have different answers in studies that might appear, at first glance, to
have identical casings. … In fact, the first step in much case-oriented inquiry is to identify the best possible
instances of the phenomenon to be explained and then study these instances in great depth. … casing is
outcome driven …. [This means that] … one can have different choices of cases for different kinds of studies,
ranging from a study for which there is only once case, to a study in which there are a set of cases for the same
outcome, to a study for which there are both negative and positive cases
1
for the same outcome, to a study that
uses the entire population (such a study seeks generalizations about the population).
If one chooses to use all of the available cases, then there is no challenge associated with Step
1.1; however, if one chooses to use only a subset of those cases that are “associated” with the
desired outcome then how to do this is a challenge. Of course, a related question is: Why not use
all of the cases? So our first challenge is:
Challenge #1: How does one determine the cases that are associated with the desired outcome
when one does not want to use all of the cases?
Solution #1: We have shown in Section 3.7 (on the obliteration of a rule) that cases with MF
values for the desired outcome that are equal to zero may obliterate an actual causal combination
by causing its consistency to become less than the consistency threshold. Consequently, cases
with zero output membership should be excluded from the fsQCA procedure for that desired
outcome, because they are not associated with that desired outcome at all. We use this rule to
choose the associated cases.
1
A positive case is any case that displays the outcome, whereas a negative case is any case that does not display the
outcome.
71
Let
All
Cases
S be the finite space of all appropriate cases. We choose only those cases whose
desired outcome MF is greater than zero (positive cases) as the appropriate cases in
{ 1,2,..., }
Cases
SN .
4.2.2. Step 1.2: Causal conditions
In this step one must postulate the k causal conditions. When only one term is used for each
variable (which is frequently done by Ragin and other social scientists
1
) k equals the number of
variables, and substantive (expert) knowledge is used to postulate those variables (causal
conditions), after which there is no challenge to this step. Although one could argue that
postulating these causal conditions is itself a challenge, we do not treat this as a challenge
because such substantive knowledge is an integral part of fsQCA, i.e. without it one cannot
perform fsQCA. The rest of fsQCA establishes if a postulated causal condition is an actual cause,
either individually or in combination with other causal conditions.
When, e.g., two or three terms are used to granulate a variable then a rule will usually involve
two or three terms—doublets and triplets—for the same variable that are connected using the
word AND, e.g., “Low Acceleration and
2
Not High Acceleration,” or “Not Low Acceleration and
Not High Acceleration.” Such doublets (or even more complicated triplets) are very difficult for
a person to understand, and, in our opinion, this defeats the purpose of a linguistic
summarization, which should be understandable to a person. Practitioner of fsQCA treat terms
related to a variable independently, but there is no evidence that support it. So our next
challenges is
Challenge #2: Given a vocabulary of terms for a variable, does one treat a subset of them
dependently or independently?
Solution #2: We show in the next chapter that how to treat terms related to a variable
dependently and solve most of the challenges when terms treated independently.
1
In a private correspondence to the second author, Ragin has indicated that he sometimes chooses two terms for a
variable. This is also evident in Chapter 11 of [21], co-authored by Fiss, who use Low and High in two terms for
Parental Income and a Test Score.
2
Recall that fsQCA uses both the causal condition and its complement during Processing Step 2.
72
4.2.3. Step 1.3: Membership functions
Ragin [26] has interesting general discussions about measuring membership functions (MFs);
however, it is in [23] where one finds the details about two methods for “calibrating fuzzy sets.”
In the direct method “… the researcher specifies the values of an interval scale that correspond to
the three qualitative breakpoints that structure a fuzzy set: full membership, full non-membership
and the crossover point. These three qualitative breakpoints are then used to transform the
original interval scale to fuzzy membership scores.” In the indirect method the researcher has to
provide a “qualitative assessment of the degree to which cases with given scores on the interval
scale are members of the target set. The researcher assigns each case into one of six categories
and then uses a simple estimation technique to rescale the original measure so that it conforms to
these qualitative assessments.”
In the next chapter, we will explain why the calibration method, or any of the other calibration
methods that are being used by fsQCA scholars, must be applied with great care in order for their
results to actually correspond to fuzzy sets, and that many times (depending, as explained below,
upon the wording of the causal condition) they do not lead to fuzzy sets at all, even though users
think they do, which calls into question the validity of fsQCA since it is built upon fuzzy sets.
Usually, the only information available about the variables is their measured values, but they
are only available for the given cases. So our third challenge is:
Challenge #3: Given a vocabulary of terms for a variable and measured values of the variable,
how does one determine (linguistically meaningful) MFs for all of the words in that vocabulary
from the data?
Solution #4: In the next chapter, we provide a new methodology for calibrating the fuzzy sets
that are used in fsQCA, one that is based on clearly distinguishing between a linguistic variable
and the linguistic terms for that variable. The result MF is for the linguistic variable, and is not
the MF of an ordinary FS but instead is the MF of a level 2 FS, one that has an S-shape, the kind
of shape that is so widely used by fsQCA scholars, and is so important to fsQCA.
73
4.3.4. Step 1.4: Derived membership functions
Once MFs have been found for all of the causal conditions and the desired outcome, MF values
are computed for each case. The cases are ordered and when the resulting MF values are put in
that same order the results are called “derived MFs.” Such MFs are what are used in the fsQCA
Processing Steps. There is no challenge to this step.
4.2.5. Step 2.1: Candidate causal combinations (rules)
fsQCA next postulates a set of
2
k
candidate causal combinations for a desired outcome (rules),
one rule for each possible causal combination of the k causal conditions or their complements.
Such a causal combination is expressed as the conjunction of k causal conditions or their
complements, where the conjunction is modeled using the minimum operator. Using the notation
advocated by Ragin, C
i
denotes the i
th
causal condition, c
i
denotes its complement, and
multiplication of conditions denotes conjunction, e.g.
C
1
and
c
2
= C
1
Ù c
2
= C
1
c
2
.
Fiss [22, p. 402] states: “… the first step [of fsQCA] is… to construct a data matrix known as a
truth table with
2
k
columns. Each column of this table is associated with a specific combination
of attributes [the causal combination], and the full table thus lists all possible combinations.”
Each row is related to a case. Enumerating all of the
2
k
causal combinations (used in Step 2.2) is
very tedious, and displaying such a table for even a modest number of cases is difficult-to-
impossible. So our fourth challenge is:
Challenge #4: Can one bypass having to enumerate all of the
2
k
causal combinations and
still complete Step 2.2?
Solution #4: Let
S
F
be the finite space of 2
k
candidate causal combinations, called firing
level fuzzy sets, F
i
, that are given in (1), where c
i
denotes the complement of C
i
( 1,...,2 and 1,..., )
k
j i k :
1 1 2
2
{ ,..., } ...
or
k
j j j
F j k
j
i i i
F F F A A A
A C c
S
(4.1)
74
One does not know ahead of time which of the 2
k
candidate causal combinations should
actually be used in a rule. Ragin saves only those combinations where MF value is > 0.5. Such a
causal combination is called a surviving causal combination. We proved in Chapter 3 that for
each case only one of the 2
k
candidate causal combinations has a MF value that is > 0.5, and a
simple formula for establishing the winning causal combination, j *(x) , from all causal
combinations, F
j
, ( j = 1,...,2
k
) is:
m
F
j*( x )
(x) = min max m
C
1
(x),m
c
1
(x)
( )
,...,max m
C
k
(x),m
c
k
(x)
( ) { }
(4.2)
F
j*(x)
(x)is determined from the right-hand side of (14), as:
11
*( )
*( ) *( )
1
( ) max ( ), ( ) ... max ( ), ( )
...
kk
j x C c C c
j x j x
k
F x arg x x arg x x
AA
(4.3)
In (3), argmax m
C
i
(x),m
c
i
(x)
( )
denotes the winner of max m
C
i
(x),m
c
i
(x)
( )
, namely C
i
or c
i
.
4.2.6. Step 2.2: Compute the surviving causal combinations
This step requires computing the MF values of the
2
k
causal combinations in all N cases, after
which only those R
S
causal combinations whose MF values are greater than 0.5 (for enough
cases) survive. These
2
k
N MF computations can be a bottleneck, especially for large N. So our
fifth challenge is:
Challenge #5: Can one compute the R
S
surviving causal conditions without having to compute
the MF values of the
2
k
causal combinations in all N cases?
Solution #5: Usually, not all of the N winning causal combinations are different, i.e., the same
winner can occur for more than one case. Consequently, after the winning causal combination is
found for each of the N cases, the R
S
uniquely different
F
j*
(x) are found; and, they are
relabeled F
l
S
.The procedure to compute the R
S
surviving causal combinations is:
1. Compute
*( )
()
jx
Fx using (4.3).
75
2. Find the
S
R uniquely different F
l
S
to establish the R
S
surviving causal combinations
F
l
S
(l = 1,..., R
S
), as:
()
S
lj
F F j l (4.4)
From the structure of
F
j*( x)
(x) in the second line of (4.3),
F
l
s
in (4.6), and (4.2), follows that:
F
l
S
(x) = A
1
l
Ù...Ù A
k
l
(4.5)
m
F
l
S
(x) = min max m
C
1
l
(x),m
c
1
l
(x)
( )
,...,max m
C
k
l
(x),m
c
k
l
(x)
( ) { }
(4.6)
where l = 1,..., R
S
and x =1,..., N .
4.2.7. Step 2.3: Compute actual causal combinations
This step is the first one where the MF of the desired output is used. A measure of subsethood
between each of the surviving causal combinations and the desired output, called consistency, is
computed, and only the causal combinations whose consistencies are greater than 0.80 are kept;
they are called the actual causal combinations, and there are R
A
of them.
Using 0.8 as consistency cut-off raises an objection because it is a somewhat arbitrary number
and any value within a limited range of values about it would probably also be acceptable. Ragin
[23, Ch. 5] advocates ‘‘. . . looking for gaps in the upper range of consistency [subsethood] that
might be useful for establishing a threshold, keeping in mind that it is always possible to
examine several different thresholds and assessing the consequences of lowering or raising the
consistency [subsethood] cut-off.’’ Therefore, how to select the consistency cut-off is not clear.
So our sixth challenge is:
Challenge #6: How should the consistency cut-off be selected?
Solution #6: Ragin uses 0.8 (or some other single number) as the consistency cut-off threshold,
to identify actual causal combinations. Of course, using the number 0.8 is somewhat arbitrary,
i.e., any value within a limited range of values about 0.8 ought to be acceptable. In this paper we
use a consistency band, namely [0.75, 0.80], instead of just a crisp threshold. By using this band
the actual causal combinations will still be those whose consistencies are ≥ 0.80, but now instead
76
of discarding all causal combinations whose consistencies are < 0.80 the ones whose
consistencies are in [0.75, 0.80) will be treated as new remainders and are referred to by us as
consistency-band remainders. We consider causal combinations in the consistency band as
remainders because their consistencies are not so much less than 0.80 that they can be
confidently discarded, nor are they ≥ 0.80 so that they can be treated as actual causal
combinations
1
. Using a consistency band increases the robustness of solutions because instead of
discarding all causal combinations in [0.75, 0.80), they will now play a role in computing the
parsimonious solutions in Step 2.4.
4.2.8. Step 2.4: Compute the complex and parsimonious solutions
In this step the R
A
actual causal combinations are unioned (since they are all paths to the same
desired outcome), and then simplified in two different ways using the Quine McCluskey (QM)
algorithm. Both ways treat the causal conditions in the R
A
actual causal combinations as crisp
sets so that crisp set reduction techniques can be used. The rationale for doing this is that at this
point in fsQCA one truly believes that the R
A
actual causal combinations explain the desired
outcome; but, since they are connected by the word “OR” logical simplifications are possible.
The QM algorithm is used to minimize Boolean functions (see, e.g., Appendix B). It is difficult
to write code to implement the QM algorithm, so we have relied on third-party code that is
available from the Internet. Ragin has software for fsQCA that includes the QM algorithm; it can
be accessed at:
http://www.u.arizona.edu/~cragin/fsQCA/software.shtml.
We have used Espresso that can be accessed at:
www.dei.isep.ipp.pt/~acc/bfunc/.
The first application of the QM algorithm finds the prime implicants, each one of which is a
combination of primitive Boolean expressions that differ on only one cause and has the same
output as all of the others. Ragin refers to each prime implicant as a complex solution. There is
no challenge to finding the complex solutions because this application of the QM algorithm only
involves using the R
A
actual causal combinations, and R
A
is quite small as compared to the 2
k
original number of candidate causal conditions.
1
Prof. Ragin mentioned this idea to us when we visited him on February 12, 2013 at UCI.
77
The second application of the QM algorithm finds the minimal prime implicants that cover as
many of the primitive Boolean expressions as possible with a logically minimal number of prime
implicants. Ragin refers to each minimal prime implicant as a parsimonious solution. Finding the
minimal prime implicants not only involves using the R
A
actual causal combinations, but also
involves using remainders, i.e. those causal combinations that are in the subset of the 2
k
original
candidate causal combinations less the R
S
causal combinations that passed the frequency
threshold test in Step 2.1. Because R
S
is usually quite small as compared to 2
k
, for the rest of
this discussion, we approximate 2
k
- R
S
by 2
k
.
It is in computing the minimal prime implicants that one runs into a bottleneck. Ragin’s
software can only find them for k £11, whereas Espresso can only find them for k £15 . Recall
that if there are V variables, each described by
n
v
terms (v =1,...,V ) , there will be
k = n
1
+ n
2
+ + n
V
causal conditions if they treat independently. . If all
n
v
equal 1, then this
sum equals V, so that Ragin’s software can be used for as many as 11 variables, and Espresso can
be used for as many as 15 variables. So far there is no problem because in most linguistic
summarization applications 11 to 15 variables (or even fewer than 11) are quite adequate.
Suppose next that
n
v
= 2 for all V variables; then,
k = 2V , so that Ragin’s software can only be
used for as many as five variables, and Espresso can only be used for as many as seven variables.
Now there could be a problem if there are more than seven variables. For linguistic
summarization, many applications do not have more than seven variables so for those
applications there still will not be a problem.
Finally, suppose that
n
v
= 3 for all V variables; then,
k = 3 V , so that Ragin’s software can
only be used for as many as three variables, and Espresso can only be used for as many as five
variables. Now there definitely is a problem because many, if not most applications have more
than five variables. So, our seventh challenge is:
Challenge #7: How should multiple terms for a variable be handled so that one is able to
compute the parsimonious solutions?
78
Solution #7: We will explain in the next chapter that terms related to a variable should be
treated dependently. We provide a new way to find FOUs for variables that can solve this
problem.
4.2.9. Step 2.5:Perform counterfactual analysis so as to obtain the intermediate solutions
As stated in Chapter 2:
Counterfactual analysis (CA) begins with both the complex and parsimonious solutions and modifies the
complex solutions subject to the constraint that a parsimonious solution term must always be present (in some
form) in the final intermediate solutions. The modifications use causal combinations for which there either were
no cases or not enough cases, and require that the user bring a lot of substantive knowledge about the cases into
the modification process. Each modified complex solution is called a counterfactual, and each counterfactual is
usually less complex in its structure than is the complex solution, unless the complex term does not change as a
result of CA, in which case it becomes the counterfactual. Once all of the counterfactuals have been obtained
for all of the complex terms, they are combined using the set theory operation union. This result is called the
(set of) intermediate solutions, and it contains R
I
terms.
CA [23], [27] offers a way to overcome the limitations of a lack of empirical instances, i.e. the problem of
limited diversity and involves thought experiments in which the substantive knowledge of a domain expert is
used. Diversity refers to whether or not a case actually exists for a particular combination of causal conditions.
In most applications it is very common for no cases to exist for many combinations of causal conditions, and
this is referred to as ‘limited diversity’.
In the thought experiments one asks: Based on my expert knowledge, (1) Do I believe that C
i
strongly
influences the desired output? If the answer is YES, then stop, and C
i
is put on the list of substantive
knowledge. On the other hand, if the answer is NO or DON’T KNOW, then one asks: (2) Is it, instead, c
i
that
strongly influences the desired output? If the answer is YES, then c
j
is put on the list of substantive
knowledge. If the answer is NO or DON’T KNOW, then neither C
i
or c
i
are put on the list of substantive
knowledge, i.e. the substantive knowledge is silent about the causal condition or its complement.
If “expert knowledge” is available so that these two questions can be answered then there is no
challenge to this step; however, many times the “expert knowledge” needed to answer these
questions is not available, because the user of fsQCA does not have it. The only information
available to such a user about the variables is, as mentioned above, their measured values, but
only for the given cases. So, our eighth challenge is:
79
Challenge #8: How can substantive knowledge be extracted from the measured values of the
variables so that the above two knowledge questions can be answered, after which CA can be
performed?
Solution #8: In many applications an expert is not available or is unable to provide the
substantive knowledge that is needed in order to perform CA; however, data are available about
the causal conditions and the desired output. So, we create substantive knowledge directly from
this data. We call this data-based substantive knowledge (DBSK).
DBSK establishes if each causal condition or its complement (by themselves) implies the
desired outcome, i.e., if a causal condition is itself sufficient for the desired outcome. The three
steps of DBSK are
(i = 1,...,k):
1) If the consistency (subsethood) of C
i
in the desired output is
³ 0.75
1
, i.e.,
1
1
min ( ), ( )
( , ) 0.75
()
i
i
N
CO
x
Ki N
C
x
xx
ss C O
x
(4.7)
then C
i
is put on the list of DBSK.
2) If consistency of c
i
in the desired output is ³ 0.75, i.e.,
1
1
min ( ), ( )
( , ) 0.75
()
i
i
N
cO
x
Ki N
c
x
xx
ss c O
x
(4.8)
then c
i
is put on the list of DBSK.
3) If consistency of neither C
i
nor c
i
in the desired outcome is ³ 0.75
then DBSK is
unknown about the causal condition or its complement.
4.2.10. Step 2.6: Compute the simplified intermediate solutions
As stated in Chapter 2 “Because CA leads to a new set of solutions, it is possible that their
union can be simplified. This is accomplished by subjecting the [ R
I
] intermediate solutions to
the QM algorithm in which all remainders are set to absent.” This application of the QM
1
We use the lower range of consistency band for the consistency threshold in DBSK.
80
algorithm, leading to the R
SI
simplified intermediate solutions, is similar to computing the
complex solutions, as explained above in subsection 3.8; hence, there is no challenge to this step
because it only involves using the R
I
intermediate solutions, where R
I
is even smaller than R
A
.
4.2.11. Step 2.7: Compute the believable simplified intermediate solutions
There is no challenge in this step to compute consistencies. The R
SI
simplified intermediate
solutions whose consistencies are greater than consistency cut-off threshold construct the R
BSI
believable simplified intermediate solutions.
4.2.12. Step 3.1: Find the best instances (cases) for each believable simplified intermediate
solution
The method for doing this is explained in Chapter 2. Although it involves some tests, there is
no challenge to this step, because R
BSI
is rather small.
4.2.13. Step 3.2: Compute the coverage of each solution
As stated in Chapter 2: “Coverage is the assessment of the way respective terms in the
believable simplified intermediate solution cover observed cases.” Although there can be
different kinds of coverage [solution coverage, which is the proportion of cases that are
simultaneously covered by all of the terms (combined by the union); raw coverage, which is the
proportion of cases that are covered by each term one at a time; and, unique coverage, which is
the proportion of cases that are covered by a specific term (no other terms cover those cases)],
the specific formula for each kind of coverage is simple to implement; hence, there is no
challenge to this step.
4.3. Conclusion
In this chapter, the challenges to using fsQCA is stated and some solution to these challenges
are provided. These solutions need to be implemented by engineers and computer scientist who
want to apply fsQCA to some applications.
81
More details about solution to some of the challenges are provided in Chapter 5. The solutions
we have provided for the eight challenges are not necessarily unique and research is continuing
on improving some of them. To us, arguably the most controversial assumption that is made in
fsQCA is to treat multiple terms for the same variable as independent causal conditions. Doing
this can overload the QM algorithm software that is used to find the parsimonious solutions,
leads to linguistically complicated causal combinations that contain doublets (triplets) that then
need to be simplified, and leads to possible and impossible causal combinations. In the next
chapter, we provide a new way to establish membership function for linguistic variables by
distinguishing MF for variables and terms. The new calibration method solves the most of the
challenges we are facing using fsQCA.
82
Chapter 5
A New Methodology for Calibrating Fuzzy Sets in fsQCA Using
Level 2 and Interval Type-2 Fuzzy Sets
1
5.1. Introduction
The steps of fsQCA are summarized and explained in Chapter 2, and it is not necessary for the
reader to understand all of them to read the present chapter. The first four steps are very easy to
understand and are: 1) Choose a desired outcome and associated cases; 2) Postulate k causal
conditions; 3) Treat the desired outcome and causal conditions as fuzzy sets, and determine
membership functions (MFs) for all of them; and, 4) Evaluate these MFs for all available cases,
obtaining derived MFs. This chapter focuses mainly on steps 3 and 4, which together we refer to
as “calibrating the fuzzy sets.” The rest of fsQCA does not change.
The main calibration method that is presently used is Ragin’s direct method about
2
which
Ragin [23, Ch. 5] states:
Fuzzy sets are calibrated using external criteria, which in turn must follow from and conform to the researcher’s
conceptualization, definition, and labeling of the set in question. Using the … direct method, the researcher
specifies the values of an interval scale that corresponds to the three qualitative breakpoints
3
that structure a
fuzzy set: full membership, full non-membership, and the crossover point
4
. These three benchmarks are then
used to transform the original interval-scale values to fuzzy membership scores. … The end product of this
method is the fine-grained calibration of the degree of membership of cases in sets, with scores ranging from
0.0 to 1.0.
In this chapter we will explain why this calibration method, or any of the other calibration
methods that are being used by fsQCA scholars, must be applied with great care in order for their
results to actually correspond to fuzzy sets, and that many times (depending, as explained below,
1
This chapter is a duplicated of our INS paper [128].
2
Of course other methods can be used for calibration, including Ragin’s indirect method [31], but, regardless of
which method is used, the criticism that is given below applies to all of them.
3
The direct method is not limited to three breakpoints, e.g. [26, p. 156] explains it for three, five and seven
breakpoints, [21, p. 88] explains it for seven breakpoints, and [23] explains it for ten breakpoints; however, most
researchers only use three breakpoints because it becomes more and more difficult for a person to specify five, seven
or ten numerical breakpoints.
4
Equivalent terms for crossover point [21, p. 88] are neither in nor out [23, p. 29] and not fully out nor fully in [26,
p. 156].
83
upon the wording of the causal condition) they do not lead to fuzzy sets at all, even though users
think they do, which calls into question the validity of fsQCA since it is built upon fuzzy sets
1
. In
order to understand this strong (some might even say shocking) criticism better we first return to
some history and definitions about fuzzy sets and linguistic variables, much of which is well
known to fuzzy set scholars but may not be as well known to scholars in the fsQCA community.
5.1.1 Fuzzy Sets and Linguistic Variables
In 1965, when Zadeh introduced a fuzzy set [37] it was a new and novel mathematical
construct in which an object could simultaneously reside in more than one set but to different
degrees of membership; however, such a set was not yet connected to what he later called a
linguistic variable, something that occurred only around 1971 in [38], [39] (and then in the more
widely referenced [11], [40]). Because the distinction between a linguistic variable and a fuzzy
set is crucial to understanding our criticism, a few definitions are presented next.
Definition 5.1. A fuzzy set (FS) A is comprised of a domain X of real numbers (also called the
universe of discourse of A) together with a membership function (MF)
m
A
: X ®[0,1]. For each
xX , the value of
m
A
(x) is the degree of membership, or membership grade, of x in A. If
m
A
(x) = 1 or
m
A
(x) = 0 for xX , then the FS A is said to be a crisp set.
In this chapter we shall refer to A as a type-1 FS, because later we shall also use interval type-2
FSs (IT2 FSs) for which MF values are no longer single numbers but instead are intervals of
numbers that allow for MF uncertainties.
Definition 5.2. If a variable can take words in natural languages as its values, it is called a
linguistic variable, where the words are characterized by FSs defined in the universe of discourse
in which the variable is defined [41].
Definition 5.3. Each linguistic variable [11], [40], [42] is fully characterized by a quintuple
1
We were inspired to reach this conclusion by the criticism of Ragin’s calibration methods that are in [129].
84
(v,T, X,g,m) in which v is the name of the variable, T is the set of linguistic terms
1
of v that
refer to a base variable whose values range over the universal set X, g is a syntactic rule for
generating linguistic terms, and m is a semantic rule that assigns each linguistic term tT its
meaning,
m(t), which is a fuzzy set on X, that is,
m:T ® F(X ), where
F( X ) denotes the set
of fuzzy sets of X, one fuzzy set for each tT . It is common to refer to v as the linguistic
variable.
Example 5.1. Some examples of linguistic variables, v, are: Developed Country, Urban Country,
Industrial Country, Stable Country, Profitable Company, Institutional Veto Points, All-day
School Systems, Horsepower, Acceleration, Production Rate, etc. Some examples of the set of
linguistic terms, T, for these linguistic variables are
2
:
1. For Developed, Urban, Industrial, or Stable (Country), and Profitable (Company), T
{Barely, Hardly, Somewhat, Moderately, Fully, Extremely}
2. For Institutional Veto Points and All-day School Systems, T {None to Very Few, Some,
A Moderate Amount of, Many, A Large Number of, A Very Large Number of}
3. For Horsepower, Acceleration and Production Rate, T {Very Low, Low, Moderate, High,
Very High}
Observe that the linguistic terms must make linguistic sense for its linguistic variable, which is
where g in Definition 3 comes into play; so, for example, Somewhat Acceleration makes no
linguistic sense nor does Very High All-day School Systems. Note, also, that it is the elements of
T that are treated as fuzzy sets, and, of course, each of these fuzzy sets is described by a MF.
5.1.2 fsQCA Criticism
The concepts of linguistic variable and its linguistic terms seem to be missing explicitly in the
fsQCA literature
3
. This is of crucial importance because it is the linguistic terms that must be
1
Although “term” means one or more words, it is quite common in the FS literature to see “word” used instead of
“term,” even when a term includes more than one word. In this paper, we also interchange “term” and “word.”
2
Because some of the linguistic terms may be so similar to each other, it may not be necessary to use all of them.
One usually chooses the linguistic terms so that their membership functions overlap and cover X.
3
There is no reference to [11], [40], [130] in Ragin’s books [21], [23], [28], [126] or in the recent book [32].
Interestingly, though, the examples in [32] use one linguistic term as a causal condition for each linguistic variable,
whereas the examples in Ragin’s just referenced books do not use linguistic terms, but instead use linguistic
variables. Recently, however, Ragin stated: “Sets and set labels almost always involve adjectives in some way. For
85
calibrated and not the linguistic variable. Unfortunately, both of Ragin’s calibration methods [21]
(the direct and indirect methods) are widely used by many fsQCA adherents to calibrate the
linguistic variable.
Fuzzy sets are meant for linguistic terms of a linguistic variable that are naturally ordered.
Examples of variables that are (are not) naturally ordered are temperature, pressure, height,
profit, etc. (beautiful, ill, happy, etc.). It should be clear from the given examples that variables
that are “naturally ordered” means variables whose domains are naturally ordered sets, i.e. sets
that are equipped with a linear order relation
1
. An example of such a set is the set of real
numbers. Ragin’s fuzzy-set scoring methods are fuzzy methods only when they are applied to the
linguistic terms of naturally ordered linguistic variables.
fsQCA frequently requires a MF for a naturally ordered linguistic variable. In this chapter we
explain how such a MF can be obtained, even though at this point this sounds contradictory to
what we have just explained about MFs being for linguistic terms.
5.1.3 Size of a Vocabulary and Calibration
Importantly, the size of the vocabulary (i.e., the number of linguistic terms in T) for a linguistic
variable may affect the calibration of the fuzzy sets. If, for example, only two linguistic terms are
used to describe Profitable, namely Hardly Profitable and Fully Profitable, then their fuzzy sets
will look very different from their fuzzy sets when the Example 5.1 five terms are used for
Profitable, because the term Barely Profitable now appears before Hardly Profitable, and the
term Extremely Profitable now appears after Fully profitable. Except for [31], there does not
seem to be any discussion about this in the fsQCA literature.
5.1.4 When Present Implementations of fsQCA are Correct
As long as the causal conditions and desired outcome in fsQCA are treated properly as
linguistic terms (and not as linguistic variables), the size of the vocabulary for a linguistic
variable is known during calibration, and it is the linguistic terms that are calibrated, then
implementing present versions of fsQCA for an application are okay.
example, it is possible to assess degree of membership in heavy; it is not possible to assess degree of membership in
weight. This aspect of sets is fundamental to set-theoretic analysis.”
1
A linear order relation is a binary relation that is transitive, anti-symmetric, and total.
86
Note that Ragin’s direct method may still be used to calibrate the most left- and right-sounding
linguistic terms, such as Very Low and Very High, because those membership functions are
1
S-
shaped (Fig. 5.1) and can therefore be constructed using three (or more) breakpoints; however, it
cannot be used to directly calibrate the linguistic terms that lie between Very Low and Very High,
because those terms are not modeled using S-shaped membership functions (see [31, p. 41]).
Fig. 5.1. A representative S-shaped MF. Note that 0%, 50% and 100% denote the locations of x that indicate fully-
out, neither in nor out (crossover point) and fully-in membership, respectively; and, that sometimes [21] the 0% and
100% breakpoints are located at membership grades of 0.05 and 0.95, respectively, when a log-odds function is used
to mathematically describe the MF.
5.1.5 Robustness to Measurement Errors and Calibration
The robustness of fsQCA to measurement errors and to the entire calibration process are
serious concerns to fsQCA practitioners. Presently, fsQCA makes no direct allowances for
measurement errors or for preciseness of the choices for MF breakpoints. Some excellent
discussions about robustness are in [32, pp. 284–295]. We also address robustness in this
chapter.
5.1.6 Organization of Rest of This Chapter
The rest of this chapter is organized as follows: Section 5.2 describes a new calibration
methodology, i.e. a new way of collecting data from one or more subjects about the linguistic
terms that are associated with a linguistic variable and mapping that data into interval type-2
fuzzy sets; Section 5.3 explains the importance of the S-shaped MF to fsQCA, introduces a level
2 FS as a FS model for a linguistic variable, and provides a methodology for obtaining a kind of
S-shaped (type-1 or interval-valued) level 2 MF for a linguistic variable; Section 5.4 re-examines
Ragin’s well-studied Breakdown of Democracy example using new data provided to us by Prof.
Ragin; Section 5.5 examines the robustness of fsQCA to different kinds of uncertainties; Section
1
The importance of an S-shaped MF for fsQCA is discussed in Section 5.3.1.
1
0.5
0
0% 50% 100%
x
m
W
(x)
87
5.6 explains how more precise statements of fsQCA causal combinations can be obtained as a
byproduct of our new calibration methodology; Section 5.7 provides some discussions about
how using the S-shaped MFs for the RI L2 FSs obtained in this chapter, overcomes past
challenges to using fsQCA; and, Section 8 draws conclusions and proposes some future research
directions.
5.2. A New Way to Calibrate Fuzzy Sets for fsQCA
5.2.1. Introduction
We require that each linguistic variable used in fsQCA have a vocabulary of linguistic terms
assigned to it by one or more experts (practitioners), where the linguistic terms fit a naturally
ordered scale, and there can be as many linguistic terms for each linguistic variable, as desired.
The size of the vocabulary must be made known to the subjects (or single subject) during the
collection of the data from them during the calibration process.
We strongly believe that a FS model for a linguistic term must capture the linguistic
uncertainties of that term, so that those uncertainties can flow through fsQCA computations, just
as unpredictable uncertainties, that are modeled using probability, flow through probability
computations.
There are two kinds of uncertainties associated with a linguistic term (e.g., [17], [43]) (1) intra-
uncertainty, which is the uncertainty about the linguistic term that each individual has about it;
and (2) inter-uncertainty, which is the uncertainty about the linguistic term that a group of
individuals has about it. These uncertainties are associated with the maxim (e.g. [43]–[46])
words mean different things to different people, which has been the rationale for modeling a
linguistic term as an IT2 FS rather than as a T1 FS.
5.2.2 IT2 FSs
Because IT2 FSs may be unfamiliar to fsQCA scholars, we first present a few definitions and
some discussions about them [17], [46], [47]. Readers who are already familiar with IT2 FSs can
skip this section.
Definition 5.4. An IT2 FS denoted , is characterized by an IT2 MF , where
88
D[0,1] is the set of closed subintervals of
[0,1], i.e.
1
(5.1)
x and u are called the primary and secondary variables of , respectively, and is a 3D
MF. The numerical value of is called the secondary grade of ; however, because the
secondary grade of an IT2 FS is always 1, it plays no role of importance
2
for such a T2 FS.
Definition 5.5. The footprint of uncertainty of IT2 FS , , is:
(5.2)
It is a closed region that is bounded from below by the lower MF of [ ] and
from above by the upper MF of
[ ], where [48]
(5.3)
(5.4)
Definition 5.6. At each
x ÎX ,
J
x
is called the primary membership of , where
(5.5)
can be expressed as the set-theoretic union of
J
x
over xX .
Example 5.2. Fig. 5.2 depicts the MFs for crisp sets and T1 FSs, and the FOUs for IT2 FSs for
three linguistic terms associated with the variable Literacy—Low Literacy (L), Moderate
Literacy (M) and High Literacy (H). The MFs (FOUs) for L and H are called shoulder MFs
(FOUs), whereas the MF for M is called an interior MF (FOU). Sometimes the FOU of each IT2
FS is explained as a blurred version of the MF of each of its respective T1 FSs. Observe in Fig.
5.2c, for an IT2 FS, that at each x its primary memberships may be single points (the flat spots at
which the membership grades equal 1), indicating no uncertainty about them, or an interval of
1
Other kinds of IT2 FSs are described in [131]. Our kind of IT2 FS is the same as an interval-valued FS (IV
FS)[131], i.e. IT2 FS = IV FS. In order to preserve the connection to the already very large literature about
Definition 4 IT2 FSs, we continue to refer to such FSs in this paper as IT2 FSs.
2
In a general T2 FS (GT2 FS) can be different for xX and [0,1] u , in which case the third dimension
of a GT2 FS is very important.
89
values, indicating uncertainty about them. Fig. 5.3 illustrates , , and
J
x
for an interior FOU.
Fig. 5.2. (a) Crisp MF, (b) type-1 MF and (c) FOUs for the linguistic variable Literacy that is described by three
terms, Low Literacy (L), Moderate Literacy (M) and High Literacy (H).
Fig. 5.3. IT2 FS, , along with its lower and upper MFs, FOU and a primary membership. The flat spot where u =
1 is shared by both and .
5.2.3 IT2 FS as a Model for a Linguistic Term
It is said (e.g., [17], [43]) that an IT2 FS is a first-order uncertainty model for a linguistic term
1
because its grade of membership is the same for the entire FOU. As is explained below, data that
are collected from one or more subjects about a linguistic term will be mapped into an IT2 FS,
such as any of the ones that are, e.g., shown in Fig. 5.4 for five generic linguistic terms (words):
Very Low, Low, Moderate, High and Very High. Observe that there are only three kinds of
FOUs, Left-shoulder (W
1
), Interior (W
2
, W
3
and W
4
) and Right-shoulder (W
5
). The flat spots are
associated with the portions of the collected data that are in total agreement across all of the
subjects. Note that, in general, there is more linguistic uncertainty about a word that is located
near the middle of a scale than at the two ends of the scale, and that uncertainty is usually
asymmetric.
1
A GT2 FS would be called a second-order uncertainty model for a word because its grades of membership vary
over its FOU. As of the year 2015, it is not known what data to collect from a group of subjects in order to model a
linguistic term as a GT2 FS (e.g., see [132] for a way to model hedged words using GT2 FSs).
1
0 x
m
W
(x)
L M H
x
1
x
2
x
3
1
0 x
m
W
(x)
L M H
1
0 x
u
!
L
!
M
!
H
(a) (b) (c)
1
0 x
FOU (
!
A)
u
LMF(
!
A)
UMF(
!
A)
FOU (
!
A)
¢ x
J
¢ x
90
Fig. 5.4. IT2 FS models for five linguistic terms W
1
–W
5
. Here x has been normalized to [0, 1], but in general such
normalization is unnecessary.
Even though a linguistic variable may reside on a natural scale, sometimes that scale is
normalized to [0, 1] or [0, 10] without any loss of generality. Because the normalization constant
is known for each linguistic variable, it is straightforward to go back and forth between the
normalized and un-normalized scales if one wishes to do this.
5.2.4 Mapping Word Data into an FOU
A number of methods have been published on how to map data collected from a group of
subjects into the FOU of a word. The Interval Approach (IA) [17, Ch. 3], [49] was the first such
method, but it was replaced by the Enhanced Interval Approach (EIA) [50], which makes use of
more information from the collected data than does the IA, and also corrects a small problem
about the way in which the parameters of the LMF of an FOU are computed. More recently, the
HM method was developed [51], [52]; it makes use of even more information from the collected
data than does the EIA. The FOUs that are found from the HM method look like the ones in Fig.
5.4; but, the FOUs that are found from the EIA (or IA) approach do not look like those FOUs,
because the EIA does not make explicit use of a data region about which there is total agreement
across all of the subjects. Consequently, only the HM method is used in this chapter
1
. It is
reviewed in Appendix E.
The IA, EIA and HM method were all developed assuming that data can be collected from a
group of subjects, and that such a group is available. Sometimes such a group is not available
and only one knowledgeable expert is available, e.g. Prof. Ragin. Mendel and Wu [53] have
presented a different way to collect data from one person (it can also be used by a group of
1
Another approach for obtaining an IT2 FS word model from the same kind of data that are used by the IA and EIA
is [133]; it is based on calculating the median boundaries of the range of membership functions associated with the
words.
91
subjects), after which that data can be used to generate even more data for what might be called a
“virtual group of subjects.” The HM method is then directly applied to that larger set of data. The
way for going from data that are collected from a single subject to using that data in the HM
method is also reviewed in Appendix E.
5.2.5 Data Collection and Calibration
It is very important that methods for collecting data from a group of subjects or even from an
individual should not introduce methodological uncertainties into the data collection procedure.
Most people do not know what a fuzzy set is, and so a method that asks an individual to provide
a membership function (or FOU) for a linguistic term has methodological uncertainty associated
with it that becomes co-mingled with the linguistic term’s uncertainty, and the two kinds of
uncertainty cannot be separated. Consequently, we do not advocate asking subjects to provide
MFs or FOUs.
When data are collected from a group of n subjects, they are asked a question like [17], [49]–
[52]: Suppose that a word
1
can be located on a scale of l to r, and you want to locate the end-
points of the interval that you associate with the word on that scale. On the scale of l to r, what
are the endpoints that you associate with the left [right] end-point of the word? We have
administered this kind of survey many times and found that most people have no trouble in
answering this question. For each word, the i
th
subject provides interval endpoints
a
(i)
and
b
(i)
,
and the group of n subjects provides
[a
(i)
,b
(i)
]
i=1
n
.
Example 5.3. In [51] the HM method was applied to data collected (on a scale
[l,r] = [0,10])
from 175 subjects for 32 generic words (i.e., words not attached to any specific context) ranging
from Teeny-weeny to Maximum amount, including the words Low, Moderate, and High. The HM
FOUs for Low, Moderate, and High are depicted in Fig. 5.5.
When data are collected from a single subject, they are asked two similar questions like [54]:
Suppose that a word can be located on a scale of l to r, and you want to locate the end-points of
the interval that you associate with the word on that scale, but you are unsure of these two end-
1
We do not use “linguistic term” because most people are unsure about what this means.
92
points: (Q1)[(Q2)] On the scale of l to r, what are the endpoints of an interval of numbers that
you associate with the left [right] end-point of the word
1
? A single subject provides
[a
L
,b
L
] and
[a
R
,b
R
] for each linguistic term.
Fig. 5.5. Low, Moderate, and High FOUs obtained from the HM method when the data were obtained from a group
of 175 subjects.
Example 5.4. This example shows FOUs obtained from the HM method when data are available
from only one subject. We asked Prof. Charles Ragin to provide us left-hand and right-hand
intervals for a three term vocabulary (Low, Moderate, and High) for each variable in the context
of his famous Breakdown of Democracy example, for which the desired outcome is O =
Breakdown of Democracy (of 18 European countries between World Wars 1 and 2) and there are
five linguistic variables: A = Developed (country), B = Urban (country), C = Literate (country),
D = Industrial (country) and E =Stability (of a country). In [23] and [27] it is explained that
numerical values were initially obtained by Ragin for
2
A, B, C, D, e and o, and that MF(O) was
computed from MF(o) as 1 - MF(o), and, MF(E) was computed from MF(e) as 1 - MF(e).
Consequently, we asked him to provide end-point intervals also for A, B, C, D, e and o. His
endpoint intervals for e = Instablity (of a country) are presented in Table 5.1. Fig. 5.6 depicts
three normalized FOUs for Low, Moderate, and High Instability when the HM method was
applied to this data. Comparing the sizes of the respective FOUs in Figs. 5.5 and 5.6, observe
that less linguistic uncertainty (exemplified by smaller FOU area) is present, and FOUs have
smaller spans, when data are collected in a context then when they are not.
1
In addition, we instruct the subject that: If you are absolutely certain about the two end-points then you do not have
to provide end-point intervals, you only have to provide the left and right end-points. We also instruct the subject not
to overlap the left and right end-point intervals.
2
A lower case letter denotes the complement of its upper case version.
93
Table 5.1
Endpoint intervals for e = Instability (of a country) provided by
Prof. Charles Ragin
(e Î[0,21]).
Left-hand Interval Right-hand Interval
a
L
b
L
a
R
b
R
Low 0 3 4 5
Moderate 5 7 8 9
High 9 12 15 21
Fig. 5.6. HM FOUs for Instablity (of a country), when data were obtained from one expert, Prof. Charles Ragin.
Note that 10 / 21 xe .
5.2.6 Recapitulation
So far we have calibrated IT2 FSs for a collection of linguistic terms that are associated with a
linguistic variable. To remind the reader, it is not the linguistic variable that is calibrated; it is
the linguistic terms that are associated with the linguistic variable that are calibrated. The
“external criteria” mentioned by Ragin [23, Ch. 5] as a prerequisite to calibration are now
provided by collecting data intervals from either a group of subjects or just one subject. These
are totally different kinds of external criteria then are presently used in fsQCA. Our approach is
now totally in-line with the concepts of linguistic variables and linguistic terms, and so it is not
subject to criticism about confusing a linguistic term with a linguistic variable.
5.3. Membership Functions for Use in fsQCA
At this point, although we now have valid MFs for linguistic terms, they do not conform to the
so-called T1 MFs that are presently used in fsQCA, where, as explained above, those T1 MFs
may not really be MFs at all for T1 FSs. This section explains how to map our word FOUs into
proper S-shaped MFs similar to the ones that are presently used by practitioners of fsQCA.
94
5.3.1 Importance of an S-Shaped MF
One of the most interesting and important features of fsQCA is that it can completely remove a
linguistic variable from being considered as a sufficient condition for a desired outcome. For
example, Ragin’s two final intermediate sufficient conditions for Breakdown of Democracy are
[23, Ch. 5], [27]: (not developed country and not urban country and not industrial country) or
(unstable country). Observe that fsQCA has removed the variable Literacy from these sufficient
conditions.
Being able to completely remove a variable by fsQCA can only occur when a linguistic
variable is described by one linguistic term
1
. So, as soon as a practitioner of fsQCA uses even
two linguistic terms for a linguistic variable
2
it is no longer possible to completely remove the
linguistic variable from fsQCA [55]. Although it may be possible to remove one of the linguistic
terms, e.g. High IQ or Low IQ, it is not possible to simultaneously remove both Low IQ and High
IQ. So, using only one linguistic term for a linguistic variable is very important (some might
even say it is essential) to fsQCA.
As a result of this the MF that has been used by fsQCA scholars and practitioners is the S-
shaped MF depicted in Fig. 5.1. But, as we have explained above, this may not be a MF of a T1
FS, so then what exactly is it? The answer to this question lies in a kind of FS that has not been
used too often (even by FS scholars).
5.3.2 Level 2 Fuzzy Sets
One kind of generalization [42], [56]–[60] of an ordinary FS (T1 or T2) involves fuzzy sets
defined within a universal set whose elements are ordinary FSs. These FSs are known as level 2
(L2) fuzzy sets.
Definition 5.7. A level 2 FS (L2 FS),
is comprised of a domain of linguistic terms (words)
that are associated with the domain X of real numbers for a linguistic variable, together with a
1
This fact is not stated explicitly in the fsQCA literature.
2
It is customary in fsQCA to treat all of the linguistic terms of a linguistic variable as independent causal
conditions; however, there is no theoretical justification for doing this, and doing this can lead to all sorts of
problems with remainders that are used in some of the other steps of fsQCA [55] (see, also, Section 7).
95
MF
for a T1 L2 FS, or
, for an interval-valued (IV) L2 FS,
where
F( X ) denotes the set of all ordinary
1
FSs (or words) of X.
A linguistic variable can be treated as a L2 FS by using its linguistic terms that are in its term
set as its power set. Because the ordinary FSs of a linguistic variable are naturally ordered, the
MF of a L2 FS is also naturally ordered.
Example 5.5. Referring to Example 5.1, consider the linguistic variable Developed Country,
whose term set is T {Barely, Hardly, Somewhat, Moderately, Fully, Extremely} {B, H, S, M,
F, E}. When Developed Country (D) is treated as a T1 L2 FS, , then (using the so-called fuzzy
set notation for a FS [48], introduced by Zadeh [37])
(5.6)
When Developed Country (D) is treated as an IV L2 FS, , then
(5.7)
General versions of (5.6) and (5.7) are:
(5.8)
(5.9)
5.3.2.1 On Models for
t
i
: In order to use (5.8) or (5.9) in fsQCA we need to use a mathematical
model for
t
i
. One possible model is its 2D FOU, but then (5.8) would be a 3D MF and (5.9)
would be a 4D MF, neither of which can be connected to the S-shaped MF that is used in fsQCA.
The approach that we take is to replace
t
i
by a reduced-information version of it.
1
Higher-kinds of L2 FSs are possible, but in this paper we are only interested in T1 and IV L2 FSs.
F( X ) is also
known as the fuzzy power set of X. In order to distinguish between a L2 FS and a T1 FS or an IT2 FS, we shall refer
to the latter as ordinary FSs in the sequel.
96
Because the FOU of a word conveys both the linguistic uncertainties about the word as well as
the natural ordering of the word, we consider three possible reduced-information versions of
t
i
,
namely: (1) centroid of the IT2 FS for
t
i
,
C
t
i
, (2) center of gravity of the centroid,
c
t
i
, and (3)
maximum dispersion of the centroid,
max D
t
i
. Appendix F explains these three uncertainty
measures and how to compute them. Here we point out that: the centroid of an IT2 FS is an
interval of real numbers,
C
t
i
= [C
t
i
,C
t
i
], the center of gravity of the centroid is a single real
number,
c
t
i
= (C
t
i
+ C
t
i
) / 2, and the maximum dispersion of the centroid is also an interval of real
numbers,
max D
t
i
= [max D
t
i
,max D
t
i
], one that is wider than the centroid, and is analogous to the
± standard deviation interval about the mean of a random variable. Each of these uncertainty
measures summarizes the linguistic uncertainties about the word and preserves the natural
ordering of the word.
5.3.2.2 On Models for level 2 membership grades: In order to use (5.8) or (5.9) in fsQCA we
also need to choose
m
t
i
for (5.8) and
[m
t
i
,m
t
i
] for (5.9). There are no unique ways for making
these choices. Some examples of how to choose the grades are: (1)
m
t
i
may be specified by an
expert as a number (e.g., if
t
i
is a breakpoint term, and there are three such terms, then
m
t
i
could
be chosen as 0, 0.5 or 1; this is discussed in more detail in Section 4.2), or as an interval of
numbers (e.g.,
[0,0.1],
[0.4,0.6] and
[0.9,1]); (2)
[m
t
i
,m
t
i
]
may be specified as a normalized
1
[C
t
i
,C
t
i
] accounting for uncertainty about
c
t
i
; or, (3)
[m
t
i
,m
t
i
]
may be specified as a normalized
[max D
t
i
,max D
t
i
] accounting for uncertainty about
[C
t
i
,C
t
i
], etc.
5.3.2.3 MFs for reduced-information L2 FSs (RI L2 FS): The MFs for the RI L2 FSs and the
names for those FSs are given in the third and fourth columns of Table 5.2, respectively. We use
the notations and instead of and because the former MFs are
undefined over some
x ÎX , something that is remedied below, e.g. observe, in the first and
1
Normalization is to [0, 1] to conform to the range of membership grades of a FS.
97
second rows of Table 5.2, that and are defined only at M point values of x,
whereas in the third and fourth rows, is defined only over M intervals of x.
Fig. 5.7 depicts the RI MFs for three linguistic terms. Each of its four figures is for one of the
rows in Table 5.2. The shaded rectangles in Fig. 5.7c and 5.7d are called granules (e.g., [61],
[62]). Observe that maximum dispersion granules are wider than centroid granules because they
capture more uncertainty about word FOUs.
Table 5.2
Some L2 FSs for term set T and their RI L2 MFs.
t
i
Grade
RI L2 FS MF Name of RI L2 FS
a
c
t
i
m
t
i
T1 COG
c
t
i
[m
t
i
,m
t
i
]
IV COG
[C
t
i
,C
t
i
]
[m
t
i
,m
t
i
]
IV Centroid
[max D
t
i
,max D
t
i
]
[m
t
i
,m
t
i
]
IV Maximum Dispersion
a
COG is short for Center of Gravity.
(a) (b)
(c) (d)
Fig. 5.7. RI MFs for three linguistic terms: (a) T1 COG, (b) IV COG, (c) IV Centroid, and
(d) IV Maximum Dispersion FSs.
x
1
0
1
t
C
1
t
C
1
t
m
2
t
m
3
t
m
1
t
m
3
t
m
2
t
m
2
t
C
2
t
C
3
t
C
3
t
C
98
5.3.2.4 Approximated MFs for RI L2 FS: As has been mentioned above, and is clear from Fig.
5.7, there are intervals of x where the MF is undefined, something that has to be remedied
because a MF should be defined for
x ÎX . In this section we briefly describe two approaches
for accomplishing this. The results from both approaches can be summarized as
or . It is
m
v
(x) or
m
v
(x) that will be our MFs for a RI L2 FS model of a
linguistic variable for
x ÎX .
A rationale for approximation (that may involve both interpolation and extrapolation) is a
thought experiment in which linguistic terms are inserted to the left and to the right of the
actually used linguistic terms. Each of these linguistic terms contributes a point, as in Fig. 5.7a,
an interval, as in Fig. 5.7b, or a granule, as in Figs. 5.7c or 5.7d. The more linguistic terms there
are the more points, intervals or granules there will be.
Fig. 5.8. Piecewise-linear approximation (dashed) for Fig. 5.7a T1 COG MF;
(light weight solid lines) for Fig. 5.7b IV COG MF; and, (heavy weight solid lines) for Fig. 5.7c IV
Centroid MF.
(a) Piecewise-linear approximation: Examples of piecewise-linear approximations are depicted
in Fig. 5.8. The dashed T1 MF is obtained by using straight lines to connect a Fig. 5.7a left-dot to
its right-dot neighbor. The light weight solid straight lines describe the lower and upper MFs of
the IV COG MF, and are obtained by using straight lines to connect the Fig. 5.7b lower or upper
interval end-points (going from left to right), respectively. The heavy weight solid straight lines
describe the lower and upper MFs of the IV Centroid MF, and are obtained by using straight
lines to connect the Fig. 5.7c granule lower right corners or upper left corners (going from left to
99
right), respectively. All of these MFs are then extended, by means of vertical and horizontal
lines, to 0 and 1 as shown.
(b) Log-odds approximation: Ragin [21, pp. 87–94] prefers to approximate by using a log-odds
function, so in this section we explain how to obtain such an approximated MF for . We do
this for a three-word term set because we later apply these results to a term set that contains three
breakpoint terms.
Let
y
A
(x) be the odds of membership (the ratio of the membership of being in a generic T1 FS
A over the membership of not being in A), i.e.
y
A
(x) = m
A
(x) / (1- m
A
(x)), and
z
A
(x) be the log
odds of membership, i.e.
z
A
(x) = ln y
A
(x). Then
m
A
(x) =
exp(z
A
(x))
1+ exp(z
A
(x))
(5.10)
Given three points,
x
1
,x
2
,x
3
(to be defined below), whose membership grades are assigned the
values
1
0.05, 0.5 and 0.95, respectively, then [21, pp. 87–94]
z
A
(x) is found as:
z
A
(x) =
3
(x
2
- x
1
)
(x - x
2
) x < x
2
3
(x
3
- x
2
)
(x - x
2
) x ³ x
2
ì
í
ï
ï
î
ï
ï
(5.11)
Once
z
A
(x) has been computed by using (5.11),
m
A
(x) is computed by using (5.10).
Examples of log-odds approximations are depicted in Fig. 5.9. The dashed T1 MF is obtained
by using
x
1
= c
t
1
,
x
2
= c
t
2
and
x
3
= c
t
3
in (5.11). The solid lines describe the lower and upper MFs
of the IV Centroid MF, and are obtained by using
x
1
= C
t
1
,
x
2
= C
t
2
and
x
3
= C
t
3
to generate the
UMF and
x
1
= C
t
1
,
x
2
= C
t
2
and
x
3
= C
t
3
to generate the LMF.
1
0.05 and 0.95 are used instead of 0 and 1 because the log-odds transformation is incapable of producing
membership grades that are exactly equal to 0 or 1.
100
Fig. 5.9. Log-odds approximations: (dashed) for Fig. 5.7a T1 COG MF and
(solid lines) for Fig. 5.7c IV Centroid MF.
5.3.3 Recapitulation
It is or that is used in fsQCA. Their constructions have made use of: (1) Data
collected from a group of subjects or one expert about words in an application dependent
vocabulary; (2) HM method to map the data into an FOU for each word in the vocabulary; (3) A
L2 FS as the first step in obtaining a MF for the variable; (4) Replacing each term in the domain
of linguistic terms of the L2 FS by a reduced-information (RI) version of it, thereby obtaining the
MF (T1 or IV) of a RI L2 FS; and, (5) Approximating the MFs for the RI L2 FS, using
piecewise-linear or log-odds functions, thereby obtaining approximated MFs for the RI L2 FS.
These MFs are S-shaped.
We now have an answer to the question posed at the end of Section 3.1, namely “What exactly
is the MF for that is used by the fsQCA community?” Our answer is: It is the MF of an
approximated RI L2 FS for the entire linguistic variable. Note that it is not the MF of an ordinary
FS—T1 or IT2—but it is still the MF of a FS, albeit a L2 FS.
In the sequel, we shall shorten the phrase “approximate MF for a RI L2 FS” to “MF for the
linguistic variable.”
5.4. Breakdown of Democracy Example
In this section we re-examine Ragin’s Breakdown of Democracy example. To begin we remind
that reader that this example is taken from [23, Ch. 5] for which the desired outcome is O =
Breakdown of Democracy (of 18 European countries between World Wars 1 and 2) and there are
five causal conditions: A = Developed (country), B = Urban (country), C = Literate (country), D
= Industrial (country) and E = Stable (country). O, A, B, C, D and E are linguistic variables (v)
101
and Ragin’s term set,
T
R
(v) , for each of these linguistic variable contains three linguistic
breakpoint terms (we leave the extensions to more than three breakpoints to the readers):
T
R
(v) ={ fully out, neither in nor out, fully in} (5.12)
These linguistic breakpoint terms are linearly ordered and act as adjectives when applied to a
linguistic variable (e.g., fully out of being a Developed country).
5.4.1 FOUs
In order to obtain FOUs for the three linguistic breakpoint terms for each of the six linguistic
variables, we asked Prof. Ragin to provide us with ranges and uncertainty end-point intervals for
each of them. Since he had already chosen breakpoints in his earlier publications, we asked him
to provide his minimum and maximum % uncertainties for each of the breakpoints, from which it
was easy for us to construct the uncertainty end-point intervals. The uncertainty left and right
end-point intervals based on his data are summarized in Table 5.3. Fig. 5.10 depicts FOUs for the
Table 5.3 data computed by using the HM method.
Table 5.3
Three breakpoints (BP), left end-point intervals (LEPI) and right end-point intervals (REPI) for six linguistic variables
that are used in the Breakdown of Democracy example, from data provided by Prof. Charles Ragin
a
.
Linguistic
Variable
Variable’s
Range
W
1
= fully out
W
2
= neither in nor out
W
3
= fully in
BP LEPI REPI BP LEPI REPI BP LEPI REPI
A: Developed [300, 1200] 400
[320,
380]
[440,
480]
550
[467.5,
550]
[550,
632.5
900
[720,
810]
[990,
1080]
B: Urban [10, 80] 25
[20,
22.5]
[27.5,
30]
50
[45,
50]
[50,
55]
65
[61.75,
65]
[65,
68.25]
C: Literate [30, 100] 50
[40,
50]
[50,
60]
75
[63.7,
75]
[75,
86.25]
90
[81,
85.5]
[94.5,
99]
D: Industrial [10, 50] 20
[17,
20]
[20,
23]
30
25.5,
30]
[30,
34.5]
40
[36,
38]
[42,
44]
E: Stable [0, 21] 5
[4.5,
4.75]
[5.25,
5.5]
9.5
[8.55,
9.5]
[9.5,
10.45]
15
[12.75,
15]
[15,
17.25]
O: Breakdown
of Democracy
[-10, 10] -9
[-9.45,
-9]
[-9,
-8.55]
0
[-4,
0]
[0,
4]
10
[9.5,
10]
[10,
10]
a
The situations where the right end-point of LEPI equals the left end-point of REPI occurred (e.g. all of the W
2
words)
as a result of Prof. Ragin providing the zero up to maximum % of uncertainty about his breakpoint value.
The FOU centroids and the COGs of those centroids are summarized in Table 5.4, and the
FOU COGs and maximum dispersion intervals are summarized in Table 5.5. Observe from these
tables that the maximum dispersion intervals are wider than their respective centroid intervals.
These uncertainty intervals are used to generate MFs for the six linguistic variables, as explained
next.
102
(a) (b)
(c) (d)
(e) (f)
Fig. 5.10. HM method FOUs for Breakdown of Democracy example: (a) Developed, (b) Urban, (c) Literate,
(d) Industrial, (e) Stable, and (f) Breakdown of Democracy.
Table 5.4
COG and centroid interval endpoints for
W
1
= fully out,
W
2
= neither in nor out,and
W
3
= fully in.
Linguistic
Variable
A: Developed 381.26 [373.84, 388.68] 550.94 [534.21, 567.67] 902.79 [877.41, 928.17]
B: Urban 24.52 [23.72, 25.33] 50.06 [49.15, 50.98] 65.01 [64.52, 65.50]
C: Literate 49.87 [47.95, 51.80] 74.43 [72.00, 76.86] 88.21 [86.91, 89.51]
D: Industrial 19.90 [19.13, 20.67] 30.06 [29.03, 31.10] 39.24 [38.48, 39.99]
E: Stable 5.07 [4.99, 5.15] 9.55 [9.36, 9.74] 14.98 [14.43, 15.54]
O: Breakdown
of Democracy
-8.99 [-9.12, -8.87] 0.09 [-0.93, 1.11] 9.72 [9.67, 9.77]
Table 5.5
COG and maximum dispersion intervals for
W
1
= fully out,
W
2
= neither in nor out,and
W
3
= fully in.
Linguistic
Variable
A: Developed 381.26 [332.84, 429.16] 550.94 [461.66, 640.57] 902.79 [800.82, 997.93]
B: Urban 24.52 [20.91, 28.20] 50.06 [43.20, 56.04] 65.01 [61.72, 68.57]
C: Literate 49.87 [38.22, 62.74] 74.43 [63.47, 87.62] 88.21 [82.47, 95.54]
D: Industrial 19.90 [16.98, 23.27] 30.06 [24.56, 35.03] 39.24 [35.21, 43.56]
E: Stable 5.07 [4.64, 5.53] 9.55 [8.22, 10.72] 14.98 [12.15, 18.09]
O: Breakdown
of Democracy
-8.99 [-9.52, -8.49] 0.09 [-5.09, 5.63] 9.72 [9.23, 10]
103
5.4.2 MFs for the Six Linguistic Variables
Fig. 5.11 depicts approximated MFs for RI L2 FSs for the six linguistic variables, obtained by
using the log-odds approximation method that is described in Section 3.2.4. Each of the plots
contains four items:
(a) (b)
(c) (d)
(e) (f)
Fig. 5.11. Maximum Dispersion FOU (black), Centroid FOU (blue), COG T1 MF (black) and Ragin’s T1 MF (red
dashed) used in Breakdown of Democracy example, for: a) Developed, b) Urban, c) Literate, d) Industrial, e) Stable,
and f) Breakdown of Democracy.
1. FOU LMF and UMF obtained by using the maximum dispersion intervals that are given
in Table 5.5, referred to below as a Maximum Dispersion FOU (row 4 in Table 5.2).
2. FOU LMF and UMF obtained by using the centroids that are given in Table 5.4, referred
to below as a Centroid FOU (row 3 in Table 5.2).
3. T1 MF obtained by using the COG of the centroids that are given in Table 5.4, referred to
below as a COG T1 MF (row 1 in Table 5.2).
4. Log-odds T1 MF that was obtained just by using Ragin’s three breakpoints that are given
in Table 5.3, referred to below as Ragin’s T1 MF (row 1 in Table 5.2).
104
Observe in Fig. 5.11 that the Centroid FOUs are contained within the Maximum Dispersion
FOUs and are considerably smaller than the latter, and Ragin’s T1 MF always falls within the
two FOUs and is very close to the COG T1 MF (sometimes they are visually indistinguishable).
The closeness of the COG and Ragin’s T1 FSs is due to the way in which interval end-point
uncertainty bands were created. Recall that they were created about historical breakpoints, which
(as can be seen in Table 5.3) are always the COGs of the interval end-point uncertainty bands. In
a new application of our methodology, where historical breakpoints would not be already known,
an expert would only specify breakpoint end-point intervals, and there would be no curve that is
analogous to Ragin’s T1 MF for historical breakpoints.
5.4.3 fsQCA Using the MFs for Linguistic Variables
When T1 L2 FSs are used in fsQCA we refer to this as T1 fsQCA, whereas when IV L2 FSs
are used in fsQCA we refer to this as IV fsQCA. Because T1 fsQCA has been in the literature for
a very long time we do not review it here (see, e.g., [62] for a complete description of all of its
steps, and [23, Ch. 5], [32]). Additionally, software is available to perform T1 fsQCA (see Table
11.1 in [23] as well as accompanying discussions about it in its Section 11.1.10). Because IV
fsQCA is not well known, having only appeared in
1
[63], we include a summary of its steps in
Chapter 6.
5.4.3.1 T1 fsQCA: Beginning with the case data that are in [23], and using the COG T1 MFs
depicted in Fig. 5.11, we obtained MF values for each case in the linguistic variables A–E (they
are the causal conditions) as well as in desired outcome O. These values are given in Table 5.6.
The one causal combination, out of a possible number of
2
5
= 32 candidate causal combinations
that survives the T1 fsQCA MF>0.5 test for each case is given in the last column of that table
2
.
There are only nine uniquely different causal combinations in that column and they appear in the
first column of Table 5.7. Consistency (also known as subsethood) values for each of these nine
causal combinations are given in the last column of that table, from which we see that the first
six causal combinations have consistency values that are greater than the customary threshold
value of 0.80. Using these six causal combinations and completing all of the remaining steps of
1
[63] calls this IT2 fsQCA; however, we now prefer IV fsQCA and so this is what is used herein.
2
A proof that only one causal combination survives this test for a case is in [55].
105
T1 fsQCA, one obtains the complex, parsimonious and intermediate solutions that are given in
Table 5.8. These are exactly the same solutions that were obtained by Ragin [23, pp. 115–117].
Table 5.6
MF values for each case
1
using COG T1 MFs, and surviving causal combination for each case.
Case
Developed
(A)
Urban
(B)
Literate
(C)
Industrial
(D)
Stable
(E)
Breakdown of
Democracy
(O)
Surviving
Causal
Combination
1 0.81 0.13 0.99 0.75 0.43 0.95 AbCDe
2 0.99 0.89 0.99 1.00 0.98 0.04 ABCDE
3 0.58 0.98 0.99 0.91 0.92 0.10 ABCDE
4 0.19 0.08 0.99 0.01 0.92 0.88 abCdE
5 0.59 0.04 1.00 0.08 0.59 0.22 AbCdE
6 0.98 0.03 0.99 0.83 0.96 0.04 AbCDE
7 0.89 0.79 0.99 0.97 0.30 0.95 ABCDe
8 0.05 0.10 0.13 0.37 0.43 0.93 abcde
9 0.10 0.17 0.91 0.08 0.13 0.57 abCde
10 0.72 0.05 0.99 0.01 0.96 0.08 AbCdE
11 0.36 0.10 0.42 0.48 0.59 0.95 abcdE
12 0.98 1.00 1.00 0.95 0.99 0.04 ABCDE
13 0.03 0.18 0.61 0.00 0.00 0.88 abCde
14 0.02 0.02 0.01 0.11 0.01 0.95 abcde
15 0.02 0.04 0.17 0.00 0.85 0.78 abcdE
16 0.04 0.31 0.09 0.21 0.20 0.93 abcde
17 0.95 0.14 1.00 0.68 0.92 0.04 AbCDE
18 0.98 0.99 1.00 1.00 0.98 0.04 ABCDE
Table 5.7
Distribution of cases and consistency of T1 surviving
causal combinations using COG T1 MFs.
a
a
Shaded rows are actual causal combinations with
consistencies > 0.8.
Table 5.8
T1-fsQCA solutions using COG T1
MFs.
Solution T1 Causal Combination
Complex abd + ACDe
Parsimonious a + e
Intermediate abd + e
1
The numbered cases correspond to the following countries: 1 – Austria, 2 – Belgium, 3 – Czechoslovakia, 4 –
Estonia, 5 – Finland, 6 – France, 7 – Germany, 8– Greece, 9 – Hungary, 10 – Ireland, 11 – Italy, 12 – Netherlands,
13 – Poland, 14 – Portugal, 15 – Romania, 16 – Spain, 17 – Sweden, and 18 – United Kingdom.
COG T1 Surviving
Causal Combinations
Frequency Consistency
abcde 3 1
abcdE 2 0.98
AbCDe 1 0.97
ABCDe 1 0.97
abCdE 1 0.85
abCde 2 0.85
AbCdE 2 0.50
AbCDE 2 0.49
ABCDE 4 0.24
106
5.4.3.2 IV fsQCA: Beginning with the case data that are in [23], and using the Centroid FOUs
depicted in Fig. 5.11, we obtained LMF and UMF values for each case in the linguistic variables
A–E as well as in desired outcome O. These values are given in Table 5.9. The one causal
combination, out of a possible number of
2
5
= 32 candidate causal combinations that survives
the IV fsQCA MF > 0.5 test (see Step 6 in Chapter 6) for each case is given in Table 5.10.
Comparing the column of Table 5.10 with the last column of Table 6, observe that the
surviving causal combinations are the same.
Consistency values for the nine uniquely different causal combinations are given in Table
5.11. Comparing Tables 5.11 and 5.7, observe that they have exactly the same six causal
combinations that pass the respective T1 and IT2 consistency tests. Using these six causal
combinations and completing all of the remaining steps of IV fsQCA, one obtains the complex,
parsimonious and intermediate solutions that are given in Table 5.12. These are exactly the same
solutions that were obtained in Section 4.3.1 and by Ragin [23, pp. 115–117], the same solutions
were also obtained by us for the Maximum Dispersion FOUs (we do not show the details
because they are so similar to the ones shown for the Centroid FOUs).
Translating the common intermediate solution into words, we have:
IF Not Developed and Not Urban and Not Industrial THEN Breakdown of Democracy
IF Unstable THEN Breakdown of Democracy
Table 5.9
Lower and Upper MFs for each case using Centroid FOUs.
Case
Causal Conditions Outcome
1 0.79 0.83 0.12 0.14 0.99 1.00 0.70 0.80 0.40 0.46 0.95 0.95
2 0.99 0.99 0.87 0.91 0.99 0.99 1.00 1.00 0.98 0.98 0.04 0.04
3 0.54 0.62 0.97 0.98 0.99 0.99 0.90 0.93 0.91 0.92 0.10 0.11
4 0.16 0.23 0.07 0.09 0.99 0.99 0.01 0.01 0.91 0.92 0.87 0.89
5 0.55 0.62 0.03 0.04 1.00 1.00 0.07 0.10 0.56 0.61 0.20 0.24
6 0.97 0.98 0.03 0.04 0.99 0.99 0.79 0.86 0.95 0.96 0.04 0.04
7 0.88 0.90 0.75 0.82 0.99 1.00 0.96 0.97 0.27 0.34 0.95 0.95
8 0.05 0.06 0.09 0.11 0.10 0.17 0.31 0.43 0.40 0.46 0.93 0.94
9 0.08 0.11 0.15 0.19 0.88 0.93 0.06 0.09 0.10 0.15 0.51 0.62
10 0.69 0.75 0.05 0.06 0.99 0.99 0.01 0.01 0.95 0.96 0.07 0.08
11 0.30 0.44 0.09 0.12 0.36 0.48 0.41 0.55 0.56 0.61 0.95 0.95
12 0.98 0.98 1.00 1.00 1.00 1.00 0.94 0.96 0.99 0.99 0.04 0.04
13 0.03 0.03 0.17 0.20 0.50 0.70 0.00 0.00 0.00 0.00 0.87 0.89
14 0.02 0.02 0.02 0.02 0.01 0.02 0.10 0.14 0.00 0.01 0.95 0.95
15 0.02 0.02 0.03 0.04 0.14 0.21 0.00 0.01 0.84 0.86 0.76 0.81
16 0.03 0.04 0.28 0.34 0.07 0.12 0.18 0.25 0.17 0.23 0.93 0.94
17 0.94 0.95 0.12 0.15 1.00 1.00 0.62 0.74 0.91 0.92 0.04 0.04
18 0.98 0.99 0.99 0.99 1.00 1.00 1.00 1.00 0.98 0.98 0.04 0.04
107
Table 5.10
Surviving causal combination, ,
and its LMF and UMF values using
Centroid FOUs.
x
1
0.54 0.60
2
0.87 0.91
3
0.54 0.62
4
0.77 0.84
5
0.55 0.61
6
0.79 0.86
7
0.66 0.73
8
0.54 0.60
9
0.81 0.85
10
0.69 0.75
11
0.45 0.59
12
0.94 0.96
13
0.50 0.70
14
0.86 0.90
15
0.79 0.86
16
0.66 0.72
17
0.62 0.74
18
0.98 0.98
Table 5.11
Distribution of cases and consistency of IV
surviving causal combinations using Centroid
FOUs.
a
a
Shaded rows are actual causal combinations with
consistencies > 0.8.
Table 5.12
IV fsQCA Solutions using Centroid FOUs.
Solution IT2 Causal Combination
Complex
Parsimonious
Intermediate
Centroid FOU
Causal Combinations
Frequency Consistency
3 1.00
2 0.98
1 0.97
1 0.97
1 0.86
2 0.85
2 0.51
2 0.49
4 0.25
108
5.4.4 Recapitulation
We have demonstrated that by our calibration procedure one is able to obtain exactly the same
solutions to the Breakdown of Democracy example as obtained by Prof. Ragin. A reader may be
wondering at this point “So why was all of this needed? Can’t we just continue to use Ragin’s
breakpoints?” Our answers to these questions are: (1) We have used a very different calibration
method, one that overcomes our earlier criticism and therefore strengthens fsQCA as a viable
methodology; (2) Using our calibration method we are able to combine information from
different experts which is not possible using Ragin’s direct method; and, (3) The fact that our
two Intermediate solutions, one based on T1 FSs and the other based on IT2 FSs, are the same
(Table 5.13) demonstrates that fsQCA is robust to the interval end-point uncertainties provided
to us by Prof. Ragin. Those uncertainties led to word FOUs whose lower and upper MFs were
used in IV fsQCA, as is explained in Appendix F. Because robustness is such an important issue
in fsQCA, we examine it more closely next.
Table 5.13
Comparison of solutions to Breakdown of Democracy example using different kinds of FSs.
Solution Ragin’s T1 MF COG T1 MF Centroid FOUs
Maximum
dispersion FOUs
Complex abd + ACDe abd +ACDe
Parsimonious a + e a + e
Intermediate abd + e abd + e
5.5. Robustness of fsQCA in Breakdown of Democracy Example
In this section we study the robustness of fsQCA to the uncertainty about FOUs as well as to
the uncertainty about membership grades assigned to linguistic terms. One way to think about
what we are going to do is to examine Fig. 5.8 (or Figs. 5.7c and 5.7d) and ask: “How large can
the granules become before the IV fsQCA solutions to the Breakdown of Democracy example
change?”
5.5.1 Uncertainty About Models for
t
i
In this section we examine the robustness of fsQCA to uncertainty about the models for
t
i
(column 1 in Table 5.2, the horizontal dimension of the granules in Fig. 5.8), using log-odds
approximated MFs for the RI L2 FSs. The length of the horizontal dimension of the granules in
109
Fig. 5.8 is determined by the amount of uncertainty there is about the end-point intervals used in
the HM method. Hence, we began with Table 5.3’s left and right end-point intervals, and
increased each of these interval’s end points by 5% increments up to a maximum change of 30%
(e.g.,
[a,b] ®[a - 0.05a,b+ 0.05a]).
1
Doing this leads to word FOUs that look like the ones in
Fig. 5.10, but have larger centroid and maximum dispersion intervals (i.e., the horizontal lengths
of the granules shown in Fig. 5.8 increase). When a change to an fsQCA solution occurred, we
then reduced the total % increment at which this occurred by 1% increments, until we reached a
word end-point interval size for which the solution did not change.
Doing this, we found that (see Table 5.14): (1) When Centroid FOUs were used, the results
from IV fsQCA are identical to those from T1 fsQCA if less than 22% (i.e., 23% - 1%)
uncertainty values are used, and (2) When Maximum Dispersion FOUs were used the results from
IV fsQCA are identical to those from T1 fsQCA if less than 17% uncertainty values are used. The
robustness of IV fsQCA decreases when Maximum Dispersion FOUs are used because the
Maximum Dispersion generates wider intervals as compared to those generated by using the
Centroid. Observe also that the intermediate results from IV fsQCA with more uncertainty are
simpler than the intermediate results from T1 fsQCA (i.e.,
versus
). This is due to
passing the consistency threshold in IV fsQCA and not passing it in T1 fsQCA.
Table 5.14
Comparisons of fsQCA solutions for different amounts of uncertainty added to data endpoint intervals.
Solution T1 fsQCA
IV fsQCA using Centroid FOUs
IV fsQCA using Maximum dispersion
FOUs
10% Uncert. 23% Uncert. 10% Uncert. 18% Uncert.
Complex abd + ACDe
Parsimonious a + e
Intermediate abd + e
5.5.2 Uncertainty About Grades
In this section we examine the robustness of fsQCA to uncertainty about the grades chosen for
a L2 FS (column 2 in Table 5.2, the vertical length of the granules in Fig. 5.8 (or Figs. 5.7c and
5.7d) using piecewise linear
2
approximated MFs for the RI L2 FSs. The length of the vertical
1
If a % change causes a variable to exceed an end-point of its range, we use the end-point.
2
Piecewise-linear approximations are easier to obtain for this part of the robustness study than are the log-odds
approximations, because the latter have only been established for the three exact breakpoint values, 0.05, 0.5 and
0.95.
110
dimension of the granules in Fig. 5.8 is determined by the amount of uncertainty there is about the
breakpoint grades and has nothing to do with using the HM method to obtain FOUs for the
breakpoint words (the horizontal length of the granules in Fig. 5.8); but when the vertical length
of a granule changes so do the lower and upper FOU bounding functions.
Recall that Ragin assigned a fixed membership grade of 0.05, 0.5 and 0.95, respectively, to the
linguistic terms fully out, neither in nor out, and fully out. Instead of using these fixed grades, we
added ± 5% to each of them up to a maximum of ± 40%, obtaining intervals for each of these
breakpoint membership grades (e.g.,
0.5 ®[0.5- 0.05 ´ 0.5,0.5+ 0.05 ´ 0.5])
1
. Fig. 5.12 depicts
the resulting piecewise-linear approximations of IV Centroid FOUs when ± 10%, ± 20%, ± 30%
and ± 40% of the breakpoint grades were added to the above grades for each linguistic term. As
before, when a change to an fsQCA solution occurred, we then reduced the ± % change made to
the modified breakpoint values at which this occurred by ± 1% increments, until we reached a
breakpoint interval size for which the solution did not change.
Our results are summarized in see Table 5.15, from which we see that: (1) (compare columns 2
and 3) using T1 fsQCA results for piecewise-linear approximated MFs for T1 COG (RI L2 FSs)
are identical to those obtained by using Ragin’s T1 MF; (2) (compare columns 2, 4 and 5) using
IV fsQCA with IV COG, we obtain identical results to those from T1 fsQCA if less than ± 34%
uncertainty is used; (3) (compare columns 2, 6 and 7) using IV fsQCA with IV Centroid, we
obtain identical results to those from T1 fsQCA if less than ± 31% uncertainty is used; and, (4)
(compare columns 2, 8 and 9) using IV fsQCA with Max Dispersion, we obtain identical results
to those from T1 fsQCA if less than ± 39% uncertainty is used. So, fsQCA seems to be very robust
to uncertainty about breakpoint membership grades.
We do not want to read too much into these results because they are all based on replacing 0,
and 1 by 0.05 and 0.95, respectively, in order to use the log-odds approximations. It seems to us
that a more direct approach is to use a membership grade of 0 for fully out and 1 for fully in and
piecewise linear approximations.
1
If a % change caused 0.95 to exceed 1, we used 1. Even a -40% change to 0.05 does not cause its lower value to
become negative.
111
(a) (b)
(c) (d)
(e) (f)
Fig. 5.12. Piecewise-linear approximations of S-shaped MFs for Breakdown of Democracy example using the
centroid for
t
i
and different amounts of membership grade uncertainty, , for: a) Developed, b) Urban, c) Literate,
d) Industrial, e) Stable, and f) Breakdown of Democracy. The red curves are Ragin’s T1 MFs.
Table 5.15
Comparison of fsQCA solutions for different amounts of uncertainty (Un) added to breakpoint membership grades.
Piecewise-linear approximations
a
Solution
Ragin’s
T1 MF +
T1
fsQCA
T1 fsQCA
using T1
COG
IV fsQCA
using IV COG
IV fsQCA using IV
centroid
IV fsQCA using
Max dispersion
0% Un ± 30% Un ± 35% Un ± 30% Un ± 32% Un ± 30% Un ± 40% Un
Complex
abd +
ACDe
abd +
ACDe
Parsimonious a + e a + e
Intermediate abd + e abd + e
a
T1 COG, IV COG, IV Centroid and IV Max Dispersion correspond to the situations that are depicted in Figs. 5.7a, 5.7b,
5.7c and 5.7d, respectively. Ragin’s T1 MF is defined in item 4 of Section 4.2.
112
5.5.3 Conclusions
Based on the results presented in Sections 5.1 and 5.2, we conclude that fsQCA is relatively
robust in both the horizontal and vertical dimensions of Fig. 5.8 (or Figs. 5.7c and 5.7d). The
horizontal dimension lets us incorporate uncertainty about the models for the linguistic terms by
moving from point values (Figs. 5.7a and 5.7b) to intervals (Figs. 5.7c and 5.7d), whereas the
vertical dimension lets us incorporate uncertainty about the breakpoint membership grades, also
moving from point values (Fig. 5.7a) to intervals (Figs. 5.7b, 5.7c and 5.7d). Additionally, using
piecewise linear approximated RI L2 FSs gives the same results in IV fsQCA as using log-odds
approximated MFs in T1 fsQCA.
5.6. On Obtaining More Precise Causal Combinations
The final causal combinations that are obtained from fsQCA, be they the complex,
intermediate or parsimonious solutions, will be in terms of linguistic variables, just as they
presently are. This is because the S-shaped approximated MFs (FOUs) for RI L2 FSs that are
used in T1 or IV fsQCA are for the linguistic variables. Because those MFs (FOUs) were derived
from FOUs for all of the linguistic variable’s terms, it is (arguably, for the first time) possible to
also obtain more precise statements of those causal combinations, e.g. for their best instances, as
is explained next. Such more precise statements may be of value to practitioners of fsQCA.
Each case has a numerical value for each of its linguistic variables (e.g., as in [23]). Our rule
for mapping a number into a word is: At each measured value of x the winning word is the one
with the largest average MF grade, where for an IT2 FS the average MF grade is the average of
its LMF and UMF at x.
Example 5.8. In Fig. 5.13, at
x = x
2
, the winning word is found by solving
Avgm
W
3
(x
2
)>
?
Avgm
W
4
(x
2
). If the inequality is true then
x
2
is assigned to the word W
3
; otherwise,
it is assigned to W
4
. At
x
2
, Fig. 5.13 reveals that
Avgm
W
3
(x
2
) =
1
2
[m
W
3
(x
2
) + m
W
3
(x
2
)] and
Avgm
W
4
(x
2
) =
1
2
m
W
4
(x
2
), from which it follows that
x
2
is assigned to W
3
. At
x
1
, the winning
word is
W
2
.
113
Fig. 5.13. Mapping a number into a word.
Example 5.9. In Ragin’s Breakdown of Democracy example one intermediate solution is
abd— not developed, not urban, and not industrial. Its best instances are Estonia, Greece and
Italy. To us not developed, not urban, and not industrial is somewhat vague, mainly because of
the complement; however, because each case can have a linguistic term assigned to its
membership in the linguistic variable (by using the method in Example 8), it is now possible to
obtain a more precise statement of abd for these countries, e.g., for Estonia (Table 5.16), abd can
be replaced by Estonia is fully out of Developed, fully out of Urban, and fully out of Industrial,
and, for Italy (Table 5.16), abd can be replaced by Italy is fully out of Developed, fully out of
Urban, and neither in nor out of Industrial. It is important to observe that no complement of the
linguistic terms Developed, Urban or Industrial have been used.
If we had started with a larger vocabulary (e.g., for 5 or 7 breakpoints) then greater linguistic
resolution would have been obtained.
Table 5.16
On more precise causal combinations for Estonia, Greece and Italy, where the adjectives used for Developed, Urban,
and Industrial are fully out, neither in nor out, and fully in. The numbers in this table were obtained by using the case
data in [23] and Fig. 5.10.
Best
Instance
Average of
LMF and UMF
Precise
linguistic
term
Average of
LMF and UMF
Precise
linguistic
term
Average of
LMF and UMF
Precise
linguistic
term
Developed Urban Industrial
Fully
out
Neither
in nor
out
Fully
in
Fully
out
Neither
in nor
out
Fully
in
Fully
out
Neither
in nor
out
Fully
in
Estonia
0.84 0.13 0.10 Fully out 0.72 0.24 0 Fully out 1 0.01 0 Fully out
Greece
0.67 0 0 Fully out 0.58 0.33 0 Fully out 0.25 0.69 0
Neither in
nor out
Italy
0.61 0.28 0.14 Fully out 0.56 0.34 0 Fully out 0.19 0.80 0
Neither in
nor out
a
Bold-face numbers are the winning linguistic terms for each case (country).
W
1
W
3
W
5
W
4
W
2
m
W
j
(x)
x
x
1
x
2
0
1
114
5.7. Discussions
Linguistic summarizations for engineering and computer science applications [1], [2], [4], [6],
[7], [19], [64], [65] begin by establishing term sets that contain more than one linguistic term for
one or more linguistic variables and by then creating summarizations using different
combinations of the linguistic terms. When we first began working on fsQCA linguistic
summarizations (more than six years ago) we were also of this same mindset, but we ran into
obstacles that could not be overcome (although we tried many different approaches) [29]. All
obstacles were due to using more than one linguistic term for a linguistic variable and treating
each of them as an independent causal condition. Doing this caused the following problems: (1)
A linguistic variable could not be completely removed by means of fsQCA (as explained earlier
in Section 3.1); (2) Impossible causal combinations occurred [62]
1
, and exactly which causal
combinations they are and how many of them there are depends on how many linguistic terms
each linguistic variable is granulated into (their number becomes very large as the number of
linguistic terms increases); (3) Enumeration of the impossible causal combinations must be done
ahead of time so that they are not used either as remainders in the QM algorithm that obtains the
parsimonious solutions, or during counterfactual analysis; and, (4) Programs for implementing
the QM algorithm (e.g., Logic Friday that is available at http://sontrak.com/) often crash because
they are limited by the number of remainder terms.
By using the S-shaped MFs (FOUs) for linguistic variables obtained in this chapter, all of these
challenges disappear, i.e. it is possible to remove a linguistic variable from fsQCA; there are no
impossible causal combinations (see [32] for additional discussions); and existing QM algorithm
software does not get overwhelmed (unless the number of linguistic variables becomes too large,
but, for a linguistic summarization to be understandable to a human there really should not be too
many variables in the antecedent of an fsQCA rule).
1
An impossible causal combination is one whose MF can never be greater than 0.5, regardless of the case-based
data. For example, when two linguistic terms are used for IQ, namely, Low IQ (L) and High IQ (H), it is impossible
for the combination L and H to occur. Intuitively, this makes sense, because it is not possible for a person to
simultaneously be of Low and High IQ (see, also, [32]). In this simple situation it is easy to connect intuition and
mathematics; however, this is much more difficult to do when three or more linguistic terms are used to describe a
linguistic variable.
115
5.8. Conclusions and Directions for Further Research
In this chapter we have provided a new way for calibrating the fuzzy sets that are used in
fsQCA, one that is based on clearly distinguishing between a linguistic variable and the linguistic
terms for that variable, and that overcomes our criticism about the MF that is used by fsQCA
practitioners. The resulting fuzzy sets are reduced-information level 2 fuzzy sets (RI L2 FS)
whose MFs are approximated so that they are defined for
x ÎX , and, these MFs for the
linguistic variables.
Our new calibration method can be summarized as:
Linguistic variable ® Linguistic Terms ® Data ® FOU
® RI L2 FS ® Approximated MF for RI L2 FS
= MF for the Linguistic variable
(5.13)
Its major steps are: (1) For each linguistic variable, a vocabulary of linearly ordered linguistic
terms (words) are chosen; (2) Interval end-point data are collected for each of the linguistic
variables, either from a group of subjects or from one expert; (3) The data for each word are
mapped into the footprint of uncertainty (FOU) of an IT2 FS using the HM method; (4) A
reduced information L2 FS is created by replacing each word with an uncertainty measure and
choosing an appropriate membership grade for it; and, (5) The MF of the RI L2 FS is
approximated so that the resulting MF is for
x ÎX .
As mentioned above, the resulting approximated RI L2 FS MF is for the linguistic variable,
and, it is not the MF of an ordinary FS (T1 or IT2) but is instead the MF of a L2 FS. This MF has
an S-shape, (T1 or IV), which is the kind of MF shape that is so widely used by fsQCA scholars,
and (as explained in this chapter) is so important to fsQCA.
We have applied our new calibration procedure to Ragin’s Breakdown of Democracy example,
using new data provided to us by him, and have demonstrated that we are able to obtain his
earlier solutions using either T1 or IV fsQCA, something that should be reassuring to fsQCA
scholars. By using IV fsQCA we are also able to study the robustness of fsQCA to breakpoint
location uncertainties as well as to membership grade uncertainties, and have demonstrated that
IV fsQCA is robust to both. Finally, because the S-shaped MFs (FOUs) were derived from FOUs
for all of the linguistic variable’s terms, we have shown how it is possible to also obtain more
precise statements of those causal combinations that do not use complements of the causal
116
conditions (e.g., for their best instances), something that may be of added value to practitioners
of fsQCA.
One of the big challenges to using our new calibration procedure is to either modify existing
fsQCA software or to create new software for both T1 fsQCA and IV fsQCA. Software for IV
fsQCA is available at
1
http://sipi.usc.edu/~mendel.
We would like to remind the reader that as one goes from linguistic-term FOUs to their
centroids (or maximum dispersion intervals) to the COGs of those centroids information is lost.
How to use the entire FOU in fsQCA computations for such a L2 FS remains to be studied.
Finally, although our new calibration procedure has been motivated by fsQCA and has led to
the S-shaped interpolated MF for a RI L2 FS, we suggest that perhaps this mapping from a set of
FOUs for the linguistic terms of a linguistic variable into a L2 MF for the linguistic variable may
also be useful outside of fsQCA, e.g. in computing with words [17].
1
After reaching the page, click on Publication/Software/Software/I agree to these conditions. A software folder will
download, that contains the folder “IV fsQCA”. This folder will be online by end of May 2015.
117
Chapter 6
Interval Value fsQCA (IV fsQCA)
6.1 Introduction
FsQCA rules involve words that are modeled using type-1 fuzzy sets (T1 FSs). Unfortunately,
once the T1 FS membership functions (MFs) have been chosen, all uncertainty about the words
that are used in fsQCA disappears, because T1 MFs are totally precise. Interval type-2 FSs (IT2
FSs), on the other hand, are first-order uncertainty models for words. In this chapter, we extend
fsQCA to Interval-Value (IV) FSs using IT2 FSs. More specifically, we develop IV-fsQCA by
extending the steps of fsQCA from T1 FSs to IV FSs using IT2 FSs.
6.2 IV fsQCA Steps
In this section, the T1 FSs that are used in the steps of T1 fsQCA are replaced by the new S-
shaped approximated MFs for RI L2 FS, , i.e. the MF of the linguistic variable . We use
double-tilde over-bars over a letter to remind the reader that these quantities are modeled as L2
FSs. Some of the T1 fsQCA steps do not change when
are used; however, for
completeness, we show all of the steps of IV fsQCA. Our wording of the steps follows the
wording of the T1 fsQCA steps given in Chapter 2 and 3.
Step 1. Choose a desired outcome, , and its appropriate cases: Let be the finite space of
possible outcomes, , for a specific application, i.e.
(6.1)
The desired outcome, which is application dependent, is , where . IV fsQCA, just as
T1 fsQCA, focuses on one outcome at a time, and each IV fsQCA is independent of the others.
Step 1 for IV fsQCA is exactly the same as it is for T1 fsQCA.
Let
Cases
S be the finite space of all appropriate cases (x) that have been labeled 1, 2, …, N, i.e.
118
{ 1,2,..., }
Cases
SN (6.2)
Step 2. Choose k causal conditions (variables
1
), (i =1,…, k). Let be the finite space of
all possible causal conditions, , for the specific application, i.e. . A
subset of the possible causal conditions, , is chosen whose elements are re-numbered
1,2,...,k , i.e.
(6.3)
Step 2 for IV fsQCA is exactly the same as it is for T1 fsQCA.
Step 3. Treat the desired outcome and causal conditions as RI L2 FSs and determine MFs for
them: Obtaining an S-shaped approximated MF for a RI L2 FS (i.e., the MF of the linguistic
variable) is explained in Sections 2 and 3. Let be the MF of the desired outcome, , and
be the MF of causal condition i, , i.e.
2
(6.4)
(6.5)
This step is different from T1 fsQCA because it uses S-shaped approximated MFs.
Step 4. Evaluate the MFs for all N appropriate cases the results being the IV derived
membership functions, i.e. ( 1,..., and 1,..., ) x N i k
1
Because we are now using a MF for the entire linguistic variable, there will only be one causal condition per
variable, and so causal condition and variable are synonymous. If, instead, one chooses to use one of the linguistic
terms as a causal condition (e.g., Low profit, or High profit), then we advocate using only one of them at a time in
fsQCA, in order to avoid the impossible causal combinations problems that are explained in Section 7.
2
The notation 1/[a,b] indicates that membership grade is 1 for all elements in [a, b].
119
(6.6)
(6.7)
Step 5. Conceptually, create 2
k
candidate causal combinations (rules) and view each as a
corner in a 2
k
-dimensional vector space. Let be the finite space of 2
k
candidate causal
combinations, called (by us) firing interval fuzzy sets, , i.e. ( 1,..., ik and 1,...,2
k
j )
(6.8)
where
denotes the complement of . This step is taken from Fast fsQCA (Chapter 3), and is
actually the same as in Fast fsQCA, except that it now uses RI L2 FSs.
Step 6. Compute the
S
R surviving causal combinations (firing interval surviving rules)
(6.9)
where
Cases
xS
(6.10)
in which
(6.11)
Then, keep the adequately different
S
R
surviving causal combinations that occur for more than f
cases. This is a mapping from into that makes use of and , where
is a subset of , with
S
R elements, i.e. is first computed by using (6.9) and (6.10),
after which the
S
R uniquely different are relabeled ( 1,...,
S
lR , and
). The detailed computations are ( 1,...,
S
lR and 1,..., xN ):
120
(6.12)
(6.13)
(6.14)
(6.15)
(6.16)
In (6.16) f is an integer frequency threshold that must be set by the user. The firing intervals
for these
S
R surviving rules are denoted with associated re-numbered membership functions
.
This step is very different from T1 fsQCA because the winning causal condition in (6.9) is
computed by using the IT2 min-max theorem (see discussion at the end of this chapter), which is
an extension of the T1 min-max Theorem (chapter 3, that is used in T1 fsQCA) to IT2 FSs [63].
Step 7. Compute the consistencies (subsethoods) of the surviving causal combinations,
and keep only those
A
R causal combinations—the actual causal combinations (actual rules)—
whose consistencies are greater than 0.80. This is a mapping from into ,
where is a subset of , with
A
R elements, i.e. ( 1,...,
S
lR and 1,...,
A
mR ):
R
S
121
(6.17)
(6.18)
The firing intervals for these actual rules, that are denoted , have associated re-numbered
MFs 1,...,
A
mR , and can be expressed (the superscript A in each denotes
“actual”), as:
(6.19)
This step is different from T1 fsQCA because it uses S-shaped approximated MFs and a
subsethood formula in (6.17) that is valid for such FSs that is due originally to Vlachos and
Sergiadis [66], because (6.17) reduces to the same subsethood formula that is used in T1 fsQCA
when all IV L2 FSs reduce to T1 L2 FSs.
The rest of IV fsQCA simplifies the actual causal combinations, as in T1 fsQCA.
Step 8. Use the Quine-McCluskey (QM) algorithm to obtain complex solutions (prime
implicants) and parsimonious solutions (minimal prime implicants). A mapping from ,
and into is used to obtain the complex solution, i.e.
(6.20)
122
The parsimonious solution, , is obtained by a different application of the QM algorithm,
i.e.
(6.21)
This step of IV fsQCA is performed in the crisp domain so it is the same as in T1 fsQCA.
Step 9. Use additional substantive knowledge, obtained from an expert, to perform
Counterfactual Analysis (CA) on each term of the complex solutions (one at a time), but
constrained by each term of the parsimonious solutions (one at a time), to obtain intermediate
solutions. To begin, this substantive knowledge is used (in thought experiments) to establish the
presence or the absence of each causal condition or its complement on the desired outcome. This
is a transformation of into , where denotes knowledge applied to . Then
is used to map ( , ), by means of CA into , the space of intermediate solutions that
contains
I
R elements, i.e.
(6.22)
(6.23)
In the thought experiments one asks: Based on my expert knowledge, (1) Do I believe that
strongly influences the desired output? If the answer is YES, then stop, and is put on the list
of substantive knowledge. On the other hand, if the answer is NO or DON’T KNOW, then one
asks: (2) Is it, instead,
that strongly influences the desired output? If the answer is YES, then
is put on the list of substantive knowledge. If the answer is NO or DON’T KNOW, then
123
neither nor are put on the list of substantive knowledge, i.e. the substantive knowledge is
silent about the causal condition or its complement.
This step is the same as in T1 fsQCA and is also performed in the crisp domain.
Step 10. Perform QM on the intermediate solutions to obtain the simplified intermediate
solutions. It is quite possible that the union of the intermediate rules can be further simplified;
this step does this, and is similar to Step 8. This is a mapping from and , but into
by yet another application of the QM algorithm, i.e.
(6.24)
This step is the same as in T1 fsQCA and is also in the crisp domain.
Step 11. Retain only those simplified intermediate solutions whose consistencies are
approximately greater than 0.80, the believable simplified intermediate solutions. This is a
mapping from into , where is a subset of with R
BSI
elements, i.e.
(r = 1,..., R
SI
and s = 1,..., R
BSI
):
(6.25)
(6.26)
This step is different from the one in T1 fsQCA because it uses Vlachos and Sergiadis’s IT2 FS
subsethood measure [66], [67], as in Step 7.
124
6.3 IT2 Min-max Theorem
We use interval ranking [68] in order to find the winning causal conditions between an IT2
causal condition and its complement.
Theorem 1 (IT2 Min-max Theorem): Given k causal conditions, , ,…, and their
respective complements, , , …,
where
(6.27)
Consider the 2
k
candidate causal combinations ( 1,...,2 )
k
j where
or and 1,..., ik . Let where ( 1,2,..., xN ):
(6.28)
is determined by:
(6.29)
where
(6.30)
Then for each x (case) there is only one j, j*(x), for which has the
highest interval ranking and can be computed as , where:
(6.31)
125
In (6.30), denotes the winner of , namely or .
Proof: We used Ishibuchi and Tanaka’s [69] or Hu and Wang’s [70] approaches for
maximization problems of intervals. Intervals can be classified into non-overlapping, partially
overlapping and completely overlapping intervals. Only two types of intervals occur when a
causal condition is compared to its complement, as is depicted in Fig. 6.1 for four different
situations.
Fig. 6.1. Four different situations of the IT2 FSs for a case:
(a), (b) non-overlapping intervals, and (c),(d) overlapping intervals.
If A = a
L
, a
R
[ ]
and B = b
L
,b
R
[ ]
are two intervals, then Ishibuchi and Tanaka’s [69] approach
defines the order relation for maximization problems as
A ³ B iff a
L
³ b
L
and a
R
³ b
R
(6.32)
Applying (6.30) to and for four cases in Fig. 6.1
we have
(6.33)
126
Notice that in (6.33) the winning causal condition is the one whose interval has the higher upper
bound interval, as expressed in (6.30).
127
Chapter 7
On Establishing Nonlinear Combinations of Variables from
Small to Big Data for Use in Later Processing
1
7.1 Introduction
Suppose one is given data of any size, from small to big, for a group of v input variables that
one believes caused
2
an output, and that one does not know which (nonlinear) combinations of
the input variables caused the output. This chapter presents a very efficient method for
establishing the initial (nonlinear) combinations of variables that can then be used in later
modeling and processing. For example, in non-linear regression (e.g., [71], [72, Ch. 9]) one
needs to choose the nonlinear interactions among the variables as well as the number of terms in
the regression model
3
, in pattern classification (e.g., [73]) that is based on mathematical features
(e.g., [74]) one needs to choose the nonlinear nature of those features as well as the number of
such features, and in some neural networks (e.g., [75]) one needs to know which combinations of
the inputs and how many such combinations should be fanned out to one or more of the
network’s various layers. Our Causal Combination Method (CCM) that is described in Section
7.3 provides the initial combinations of the variables as well as their number, and can also be
used in later processing to readjust the combinations of the variables as well as their number that
are used in a model. Our Fast Causal Combination Method (FCCM) that is also described in
Section 7.3 is a very efficient way of implementing CCM for data of any size.
Establishing which combinations of variables to use in a model can be interpreted as a form of
data preprocessing. According to [76]: “Data preprocessing is a data mining technique that
involves transforming raw data into an understandable format. Real-world data is often
incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain
many errors. Data preprocessing is a proven method of resolving such issues. Data preprocessing
prepares raw data for further processing.” According to [12] data preprocessing includes
1
This chapter is a duplication of our paper [134].
2
How to choose the variables is crucial to the success of any model. This paper assumes that the user has already
established the variables that (may) affect the outcome.
3
According to [135], “…in practice, due to complex and often informal nature of a priori knowledge, …
specification of approximating functions may be difficult or impossible.”
128
cleaning, normalization, transformation, feature extraction and selection, etc. Our preprocessing
is about transformation of raw data into patterns.
CCM focuses on interconnections of either a causal condition (defined in Section 7.2) or its
complement where the connecting word is AND which is modeled using the minimum operation.
Note that, because you might be wrong about postulating a cause you protect yourself against
this by considering both the cause and its complement (an idea that was first suggested
1
by Ragin
in [21, p. 131]). The interconnection of either a causal condition or its complement for all of the
v variables is called a causal combination. As will be seen in Section 2, there can be a (very)
large number of candidate causal combinations. CCM prunes the (very) large number of
candidate causal combinations (using data about the v input variables) to a much smaller subset
of surviving causal combinations, and FCCM does this in a very efficient way. These surviving
causal combinations are the nonlinear combinations of the input variables that can then be used
in later modeling or processing.
In summary, this chapter presents a very efficient method for establishing the initial
(nonlinear) combinations of variables that can then be used in later modeling and processing by
using a novel form of preprocessing that transforms raw data into patterns through the use of
fuzzy sets. We will show that our method lends itself to massive distributed and parallel
processing which makes it suitable for data of all sizes from small to big.
The rest of this chapter is organized as follows: Section 7.2 describes the terminology and
approach that is used in the rest of the chapter; Section 7.3 provides the main results for CCM
and FCCM; Section 7.4 quantifies the computational speedup for FCCM; Section 7.5 provides
some additional ways to speed up FCCM; and Section 6 draws conclusions.
7.2. Terminology and Approach
A data pair is
(x(t), y(t)), where
x(t) = col(x
1
(t),...,x
v
(t)),
x
i
(t) is the i
th
input variable and
y(t) is the output for that
x(t) . Each data pair is treated as a “case,” index t denotes a data case
and there does not have to be a unique natural ordering of the cases over t (in a multi-variable
approximation application there is no natural ordering of the data cases, but in a time-series
1
Traditional interconnections usually do not consider both a cause and its complement; in fact, one almost never
sees the complement of a cause in an interconnection of causes (e.g., in the antecedents of either a crisp or fuzzy
rule).
129
forecasting application the data cases would have a natural temporal ordering). We assume that
N data pairs are available, and refer to the collection of these data pairs as
S
Cases
, where:
S
Cases
={(x(t), y(t))}
t=1
N
(7.1)
For Big Data [77], N ranges from huge (O(N) = 10
10
) to monster (O(N) = 10
12
) to very large
(O(N) = 10
>12
).
We begin by partitioning each input variable into subsets each of which may be thought of as
having a linguistic term
1
associated with it, e.g., the variable Pressure can be partitioned into
Low Pressure, Moderate Pressure and High Pressure. Because it is very difficult to know where
to draw a crisp line between each of the subsets, so as to separate one from the other, they are
modeled herein as fuzzy sets, and, there can be from 1 to
n
v
subsets (terms) for each input
variable. The terms for each input variable that are actually used in CCM are called causal
conditions.
If one chooses to use only one term for each variable (e.g., Profitable Company, Educated
Country, Permeable Oil Field, etc.), then the words “variable,” “term” and “causal condition”
can be interchanged, i.e., they are synonymous. If, on the other hand, one chooses to use more
than one term for each variable (e.g., Low Pressure, Moderate Pressure and High Pressure), i.e.,
to granulate [61] each variable, as is very commonly done in engineering and computer science
applications, then one must distinguish between the words “variable,” “term” and “causal
condition.” We elaborate further on this next.
If, e.g., there are V variables, each described by
n
v
terms (
v = 1,...,V ) then (as in fsQCA [21],
[23], [55], [78]) each of the terms will be treated as a separate causal condition
2
(this is
illustrated below in Example 7.1), and, as a result, there will be
k = n
1
+ n
2
+ + n
V
causal
conditions.
1
The actual names that are given to the subsets are not important for this paper, e.g., they may be given
linguistically meaningful names (as in our example of Pressure) or symbolic names (e.g., A, B, C; T1, T2, T3; etc.).
2
One may raise an objection to doing this because of perceived correlations between terms for the same variable
(e.g., perhaps Low Pressure and Moderate Pressure are highly correlated, or Moderate Pressure and High Pressure
are highly correlated). Such perceptions depend on how overlapped the fuzzy sets are for the terms and does not
have to be accounted for during CCM because the mathematics for CCM will take care of the overlap automatically.
130
We let x
v
(v = 1,...,V )
denote a variable and T
l
(x
v
)
(l = 1,...,n
v
)
denote the terms for each
variable. For simplicity, in this chapter the same numbers of terms are used for each variable, i.e.
n
v
= n
c
for "v , so that the total number of causal conditions is k = n
c
V .
The terms are organized according to the (non-unique) ordering of the V input variables, as
{T
1
(x
1
),...,T
n
c
(x
1
),....,T
1
(x
V
),...,T
n
c
(x
V
)}. This set of n
c
V terms is then mapped into an ordered set
of possible causal conditions, S
¢ C
, as follows:
{T
1
(x
1
),...,T
n
c
(x
1
),....,T
1
(x
V
),...,T
n
c
(x
V
)}
® { ¢ C
1
(x
1
),..., ¢ C
n
c
(x
1
),..., ¢ C
n
c
(V-1)+1
(x
V
),..., ¢ C
n
c
V
(x
V
)}
º { ¢ C
1
,..., ¢ C
n
c
,..., ¢ C
n
c
(V -1)+1
,..., ¢ C
n
c
V
}
(7.2)
A subset of the ¢ C
¢ i
possible causal conditions, S
C
, chosen (by the end-user) from S
¢ C
forms
the finite space of causal conditions, where S
C
º {C
i
ÎS
¢ C
, i = 1,...,n
c
V}. It is not uncommon for
one to choose S
C
= S
¢ C
.
A causal combination is a connection of
k = n
c
V
conditions, each of which is either a causal
condition or its complement for each variable.
We treat the causal conditions as fuzzy sets and determine membership functions (MFs) for
them ([21], [29] provide ways to obtain the MFs; Fuzzy c-Means (FCM) can also be used to do
this, We developed a new method to calibrate MFs in Chapter 5); however, the exact way in
which these MFs are obtained is not needed in the rest of this chapter), i.e.
(7.3a)
which can be expressed as:
( ) 1,...,
li
C v C c
i k nV (7.3b)
These MFs are evaluated for all N cases
1
, the results being called derived MFs, i.e.
(t =1 ,..., N and i =1 ,...,k)
1
In Section 7.6 we describe how this can be done using parallel or distributed processing.
131
m
C
i
D
:(S
Cases
,S
C
) ® [0,1]
t x
i
(t) m
C
i
(x
i
(t)) º m
C
i
D
(t)
ü
ý
ï
þ
ï
(7.4)
Next, we
1
conceptually postulate 2
k
( k = n
c
V ) candidate causal combinations. Let S
F
be the
finite space of 2
k
candidate causal combinations, F
i
, i.e.
S
F
= {F
1
,..., F
2
k
} ' F
j
= A
1
j
Ù A
2
j
Ù ... Ù A
k
j
A
i
j
= C
i
or c
i
ü
ý
ï
þ
ï
( j = 1,...,2
k
and i = 1,..., k) (7.5)
where c
i
denotes the complement of C
i
and Ù denotes conjunction and is modeled using
minimum. Our preprocessing that is described in Section 3 focuses on reducing the number of
elements in
S
F
from 2
k
, which can be very large, to a much smaller number.
Example 7.1. Regarding the allocation of money in a number of investment alternatives [17], let
x
1
= risk of losing capital, x
2
= vulnerability to inflation, x
3
= amount of profit received, and
x
4
= liquidity. In this example, it is assumed that x
1
, x
2
and x
3
are each described by the same
three terms: Low (L), Moderate (M) and High (H), whereas x
4
is described by Bad B), Fair (F)
and Good (G). Because V = 4 and n
c
= 3, k = n
c
V = 12, and:
{T
1
(x
1
),...,T
3
(x
1
),....,T
1
(x
4
),...,T
3
(x
4
)} = {Low(x
1
), Moderate(x
1
), High(x
1
),
Low(x
2
), Moderate(x
2
), High(x
2
),
Low(x
3
), Moderate(x
3
), High(x
3
),
Bad(x
4
), Fair(x
4
), Good(x
4
)}
º {C
1
,C
2
,C
3
,...,C
10
,C
11
,C
12
}
There will be 2
12
= 4096 candidate causal combinations, each with 12 terms that are
connected by AND, one example of which is
2
:
1 1 1 2 2 2 3 3 3 4 4 4
( ) ( ) ( ) ( )
x x x x x x x x x x x x
F L m h l M h l m h b f G
1
In CCM one must actually enumerate (create) all of the 2
k
candidate causal combinations. As we show below, a
distinguishing feature of FCCM is that we only have to conceptually postulate them.
2
How to simplify an interconnection of three terms for the same variable is not needed for CCM or FCCM, but may
be something that someone is interested in doing at the very end of their final processing in order to make such an
interconnection more linguistically interpretable. Discussions on how to do this that are based on the similarity of
fuzzy sets can be found in [29].
132
7.3. Main Results for the Causal Combination Method (CCM) and the Fast Causal
Combination Method (FCCM)
In this section we present the details for the CCM and FCCM.
7.3.1. Causal Combination Method (CCM)
In CCM one actually creates the 2
k
(k = n
c
V ) candidate causal combinations, computes each of
their MFs in all of the N cases, and then keeps only the ones—the R
S
surviving causal
combinations—whose MF values are > 0.5 (our reason for doing this is explained below in
Comment 1) for an adequate number of cases (which must be specified by the user). This is a
mapping from {S
F
, S
Cases
} into S
F
S
that makes use of m
A
i
j
(t), where S
F
S
is a subset of S
F
, with
R
S
elements, i.e. ( j = 1,...,2
k
, t = 1,..., N and l =1 ,..., R
S
) [27]
m
F
j
:(S
F
,S
Cases
) ® [0,1]
t m
F
j
(t) = min m
A
1
j
(t),m
A
2
j
(t),..., m
A
k
j
(t)
{ }
ü
ý
ï
þ
ï
(7.6)
m
A
i
j
(t) = m
C
i
D
(t) or m
c
i
D
(t) =1- m
C
i
D
(t) i =1 ,...,k (7.7)
t
F
j
:([0,1],S
Cases
) ® {0,1}
t t
F
j
(t) =
1 if m
F
j
(t) > 0.50
0 if m
F
j
(t) £ 0.50
ì
í
ï
î
ï
ü
ý
ï
ï
þ
ï
ï
(7.8)
N
F
j
:{0,1} ® I
t
F
j
N
F
j
= t
F
j
(t)
x=1
N
å
ü
ý
ï
þ
ï
(7.9)
F
l
S
:(S
F
, I) ® S
F
S
F
j
F
l
S
= F
j
( j ® l) N
F
j
³ f, j = 1 ,..., 2
k
{ }
ü
ý
ï
þ
ï
(7.10)
where F
¢ j
( ¢ j ® l) means F
¢ j
is added to the set of surviving causal combinations as F
l
S
, and l is
the index of the surviving set. In (10) f is an integer frequency threshold that must be set by the
133
user (see Comment 2 below). The surviving causal combinations are denoted F
l
S
with associated
re-numbered MFs m
F
l
S
(t), l =1 ,..., R
S
.
Comment 7.1. Each of the 2
k
candidate causal combinations can be interpreted as a corner in a
2
k
-dimensional vector space [23, Ch. 5]. Paraphrasing [27, p. 7]: Choosing the surviving causal
combinations according to (6)-(10) can be interpreted as keeping the adequately represented
causal combinations that are closer to corners and not the ones that are farther away from
corners. Regarding closeness to corners in a 2
k
-dimensional vector space, if crisp sets were used
instead of fuzzy sets, a candidate causal combination is either fully supported (i.e., its MF value
equals 1) or is not supported at all (i.e., its MF value equals 0), and only the fully supported
candidate causal combinations survive. Using fuzzy sets lets one back off from the stringent
requirement of using crisp sets, by replacing the vertex membership value of “1” with a vertex
membership value of >0.5, meaning that if the MF value for a causal combination is greater than
0.5 then the causal combination is closer to a vertex than it is away from its vertex. Only those
cases whose causal combination MF values are greater than 0.5 are said to support the existence
of a candidate causal combination.
Comment 7.2. In order to implement (7.10) threshold f (cut-off frequency) has to be chosen. This
choice is arbitrary and depends on an application and how many cases are available. Discussions
on how to choose f are given in [21], and [23]. Often f is set equal to the value of N
F
¢ j
that
captures more than 80 percent of the cases assigned to causal combinations [22]; alternatively, it
is sometimes chosen as 1. When the latter is done it is not uncommon for there to be many cases
that are associated with this cut-off frequency. They must all be kept, because as of yet there is
no natural way to determine which of these cases is more important than the other.
Comment 7.3. A brute force way to carry out the CCM computations in (7.6)-(7.10) is to create a
table in which there are N rows, one for each case, and
2
k
= 2
n
c
V
columns, one for each of the
causal combinations. The entries into this table are
m
F
j
(t) and there will be
N2
n
c
V
such entries.
Such a table is called a Truth Table by Ragin [21], [23]. One then searches through this very
134
large table and keeps only those causal combinations whose MF entries are > 0.5. If
f = 1 then
all such causal combinations, removing duplications, become the set of
R
S
surviving causal
combinations. For Big Data (and even for not-so-big data),
N2
n
c
V
will be enormous and so this
brute force way to carry out the computations in (6)-(10) is totally impractical.
7.3.2. Fast Causal Combination Method (FCCM)
Ragin [21] observed the following in an example with four causal conditions: “… each case
can have (at most) only a single membership score greater than 0.5 in the logical possible
combinations from a given set of causal conditions [i.e., in the candidate causal combinations].”
This somewhat surprising result is true in general and in [55] the following theorem that locates
the one causal combination for each case whose MF > 0.5 was presented:
Theorem 7.1 (Min-max Theorem) [Chapter 2]. Given k causal conditions, C
1
, C
2
,…, C
k
and
their respective complements, c
1
, c
2
, …, c
k
. Consider the 2
k
candidate causal combinations
( j = 1,...,2
k
)
12
...
j j j
jk
F A A A where A
i
j
= C
i
or c
i
and i = 1,...,k . Let
m
F
j
(t) = min{m
A
1
j
(t), m
A
2
j
(t),..., m
A
k
j
(t)}, t =1 ,2,..., N (7.11)
Then for each t (case) there is only one j, j*(t), for which m
F
j*(t )
(t) > 0.5 and m
F
j*(t )
(t) can be
computed as:
m
F
j*(t )
(t) = min max m
C
1
D
(t), m
c
1
D
(t)
( )
,...,max m
C
k
D
(t), m
c
k
D
(t)
( ) { }
(7.12)
F
j*(t )
(t) is determined from the right-hand side of (7.12), as:
11
*( )
*( ) *( )
1
( ) max ( ), ( ) ... max ( ), ( )
...
kk
D D D D
j t C c C c
j t j t
k
F t arg t t arg t t
AA
(7.13)
In (7.13), argmax m
C
i
D
(t),m
c
i
D
(t)
( )
denotes the winner of
max m
C
i
D
(t),m
c
i
D
(t)
( )
, namely C
i
or c
i
.
135
For completeness, a proof of this theorem is given in Appendix A, because this theorem is the
basis for the following FCCM, which is a very efficient (fast) method for computing (7.6)-
(7.10)
1
:
3. Compute F
j*(t)
(t) using (7.13).
4. Find the J uniquely different F
j*(t)
(t) and re-label them F
¢ j
( ¢ j = 1,..., J).
5. Compute t
F
¢ j
, where (t =1 ,..., N)
t
F
¢ j
(t) =
1 if F
¢ j
= F
j*(t )
(t)
0 otherwise
ì
í
ï
î
ï
(7.14)
6. Compute N
F
¢ j
, where
N
F
¢ j
= t
F
¢ j
(t)
x=1
N
å
(7.15)
7. Establish the R
S
surviving causal combinations F
l
S
(l = 1,..., R
S
), as:
F
l
S
=
F
¢ j
( ¢ j ® l) if N
F
¢ j
³ f
0 if N
F
¢ j
< f
ì
í
ï
î
ï
(7.16)
From the structure of F
j*(t)
(t) in the second line of (13),
F
l
s
in (16) can be expressed
2
as:
F
l
S
(t) = A
1
l
Ù... Ù A
k
l
(7.17)
where l = 1,..., R
S
and t =1 ,..., N .
Comment 7.4. Observe from Steps 1-5 that the explicit enumeration of all
2
k
= 2
n
c
V
candidate
causal combinations is not required in FCCM; it is required in CCM.
Example 7.2. In order to illustrate the FCCM computations we consider
3
a simplified Auto
MPG
4
application for Low MPG cars (from 14 four-cylinder automobiles
1
). We selected three
1
FCCM is modeled after Step 6NEW in Fast fsQCA, as described in Chapter 3.
2
Note that in FCCM j *(t) ® ¢ j ® l .
3
This example and its tables are taken from Examples 1 and 2 in Chapter 3; it is included in this paper in order to
illustrate the FCCM, because a reviewer of this paper felt doing this would help the readers better grasp the ideas of
FCCM.
4
The MPG data set can be obtained at: http://archive.ics.uci.edu/ml/datasets/Auto+MPG.
136
input variables, namely, Horsepower (H), Weight (W) and Acceleration (A), and two terms (Low
and High) for each variable; hence, there are six causal conditions: L
H
= Low Horsepower, H
H
=
High Horsepower, L
W
= Light Weight, H
W
= Heavy Weight, L
A
= Low Acceleration and H
A
=High Acceleration. The data in Table 7.1 show the variables and derived MFs. How these MF
values were actually obtained is explained in [29].
Table 7.1
Data- and fuzzy-membership-matrix (showing original variables and their derived
fuzzy-set membership function scores; adapted from Table 1 in [62]).
Case
Causal Condition and Derived MF scores
H MF(L
H
) MF(H
H
) W MF(L
W
) MF(H
W
) A MF(L
A
) MF(H
A
)
1 95 0 0.91 2372 0 0.08 15 0.92 0
2 88 0 0.12 2130 0.31 0 14.5 1 0
3 46 1 0 1835 1 0 20.5 0 1
4 87 0 0.06 2672 0 0.97 17.5 0 0.06
5 90 0 0.33 2430 0 0.22 14.5 1 0
6 95 0 0.91 2375 0 0.08 17.5 0 0.06
7 113 0 1 2234 0 0 12.5 1 0
8 88 0 0.12 2130 0.31 0 14.5 1 0
9 90 0 0.33 2264 0 0 15.5 0.63 0
10 95 0 0.91 2228 0.01 0 14 1 0
11 72 0.75 0 2408 0 0.16 19 0 0.8
12 86 0 0.02 2220 0.02 0 14 1 0
13 90 0 0.33 2123 0.35 0 14 1 0
14 70 0.89 0 1955 0.99 0 20.5 0 1
For six causal conditions there are
6
2 64 candidate causal combinations whose MFs have to
be evaluated for all 14 cases. Using the min-max formulas from Theorem 7.1 we found the
winning causal combination for each case. These results are summarized in Table 7.2.
1
The numbered cases correspond to the following cars: 1-Toyota Corona Mark II, 2-Datsun pl510 (70), 3-
Volkswagen 1131 Deluxe Sedan, 4-Peugeot 504, 5-Audi 100 LS, 6-Saab 99e, 7-BMW 2002, 8-Datsun pl510 (71),
9-Chevrolet Vega 2300, 10-Toyota Corona, 11-Chevrolet Vega (sw), 12-Mercury Capri 2000, 13-Opel 1900, 14
Plymouth Cricket. These cars all had MF values greater than zero in Low MPG cars.
137
Table7. 2
Min-max calculations and associated causal combinations (taken from Table 1 in [62])
Case
Maximum (MF, complement of MF)/Winner (W) Minimum
calculation
[Using (12)]
Causal
combination
[Using (13)]
Max(L H,l H)
/W
Max(H H,h H)
/W
Max(L W,l W)
/W
Max(H W,h W)
/W
Max(L A,l A)
/W
Max(H A,h A)
/W
1 1/ l H 0.91/ H H 1/ l W 0.92/ h W 0.92/ L A 1/ h A 0.91 l HH Hl Wh WL Ah A
2 1/ l H 0.88/ h H 0.69/ l W 1/ h W 1/ L A 1/ h A 0.69 l Hh Hl Wh WL Ah A
3 1/ L H 1/ h H 1/ L W 1/ h W 1/ l A 1/ H A 1 L Hh HL Wh Wl AH A
4 1/ l H 0.94/ h H 1/ l W 0.97/ H W 1/ l A 0.94/ h A 0.94 l Hh Hl WH Wl Ah A
5 1/ l H 0.67/ h H 1/ l W 0.78/ h W 1/ L A 1/ h A 0.67 l Hh Hl Wh WL Ah A
6 1/ l H 0.91/ H H 1/ l W 0.92/ h W 1/ l A 0.94/ h A 0.91 l HH Hl Wh Wl Ah A
7 1/ l H 1/ H H 1/ l W 1/ h W 1/ L A 1/ h A 1 l HH Hl Wh WL Ah A
8 1/ l H 0.88 h H 0.69/ l W 1/ h W 1/ L A 1/ h A 0.69 l Hh Hl Wh WL Ah A
9 1/ l H 0.67/ h H 1/ l W 1/ h W 0.63/ L A 1/ h A 0.63 l Hh Hl Wh WL Ah A
10 1/ l H 0.91/ H H 0.99/ l W 1/ h W 1/ L A 1/ h A 0.91 l HH Hl Wh WL Ah A
11 0.75/ L H 1/ h H 1/ l W 0.84/ h W 1/ l A 0.8/ H A 0.75 L Hh Hl Wh Wl AH A
12 1/ l H 0.98/ h H 0.98/ l W 1/ h W 1/ L A 1/ h A 0.98 l Hh Hl Wh WL Ah A
13 1/ l H 0.67/ h H 0.65/ l W 1/ h W 1/ L A 1/ h A 0.65 l Hh Hl Wh WL Ah A
14 0.89/ L H 1/ h H 0.99/ L W 1/ h W 1/ l A 1/ H A 0.89 L Hh HL Wh Wl AH A
Observe from the last column in this table that only six uniquely different, out of the 64
possible, causal combinations have survived, namely:
l
H
H
H
l
W
h
W
L
A
h
A
, l
H
h
H
l
W
h
W
L
A
h
A
, L
H
h
H
L
W
h
W
l
A
H
A
, l
H
h
H
l
W
H
W
l
A
h
A
, l
H
H
H
l
W
h
W
l
A
h
A
, L
H
h
H
l
W
h
W
l
A
H
A
.
The membership function grades for these six surviving causal combinations for all 14 cases are
summarized in Table 7.3, e.g. (the numerical values for the membership grades or their
complements are taken from Table 7.1)
MG(l
H
H
H
l
W
h
W
L
A
h
A
| Case 1) = min(1, 0.91, 1, 0.92, 0.92, 1) = 0.91
Observe, from Table 7.3, that for each case there is only one causal combination for which its
MF is greater than 0.5; it is shown in boldface and illustrates the truth of the Min-Max Theorem.
Table 7.3
Membership grades for the six surviving causal combinations and 14 cases; membership grades for
the six causal conditions are in Table 7.1 (taken from Table 1 in [16])
Case
Membership Grades of Surviving Causal Combinations (i.e., minimum of six causal-condition-MFs):
l
H
H
H
l
W
h
W
L
A
h
A
l
H
h
H
l
W
h
W
L
A
h
A
L
H
h
H
L
W
h
W
l
A
H
A
l
H
h
H
l
W
H
W
l
A
h
A
l
H
H
h
l
W
h
W
l
A
h
A
L
H
h
H
l
W
h
W
l
A
H
A
1 0.91 0.09 0 0.08 0.08 0
2 0.12 0.69 0 0 0 0
3 0 0 1 0 0 0
4 0 0 0 0.94 0.03 0
5 0.33 0.67 0 0 0 0
6 0 0 0 0.08 0.91 0
7 1 0 0 0 0 0
8 0.12 0.69 0 0 0 0
9 0.33 0.63 0 0 0.33 0
10 0.91 0.09 0 0 0 0
11 0 0 0 0.16 0 0.75
12 0.02 0.98 0 0 0 0
13 0.33 0.65 0 0 0 0
14 0 0 0.89 0 0 0.01
a
Each boldface membership grade is greater than 0.5.
138
Example 7.3. In this example, we illustrate the number of surviving causal combinations for
eight readily available data sets: Abalone [79], Concrete Compressive Strength [79], Concrete
Slump Test [79], Wave Force [80], Chemical Process Concentration Readings [3], Chemical
Process Temperature Readings [81], Gas Furnace [81] and Mackey-Glass Chaotic Time Series
[82]. Our results are summarized in Table 7.4.
To begin we found (one, two or three) MFs for each problem’s variables using a modification
of Fuzzy c-Means (FCM) [83] called Linguistically Modified Fuzzy c-Means (LM-FCM) that is
described in
1
[13]. For one term per variable we ran FCMs for two clusters, and, because the two
MFs are complements of one another,
2,3
we chose any one of the two as the term’s MF. For two
terms per variable we ran FCMs for three clusters and assigned Low to the left-most cluster and
High to the right-most cluster. For three terms per variable, we ran FCMs for five clusters and
assigned Low to the left-most cluster, Moderate to the middle cluster and High to the right-most
cluster. Of course, there are other ways to arrive at the MFs; and, since Theorem 1 is very
dependent upon the MFs, the results in Table 7.4 will be different for different choices for the
MFs.
Focusing next on the numbers in Table 7.4, observe that the number of surviving causal
combinations always increases as the number of terms per variable increases for all eight
problems. This makes sense from an information viewpoint, because describing a variable with
more terms implies more information is being extracted from the data, and this requires more
causal combinations to do it.
Observe, also that:
1) When there is only one term per variable and three variables (as occurs for Wave Force,
Chemical Process Concentration Reading and Chemical Process Temperature Readings),
then the number of surviving causal combinations is either the same number, or close to
1
FCM MFs have some problems when they are used for linguistic terms, i.e., they are not completely shoulder or
interior MFs. LM-FCM modifies FCM MFs so that the resulting MFs are completely shoulder or interior MFs and
are therefore more linguistically plausible.
2
In FCMs [83], the sum of all MFs for each case always adds to one.
3
Some may object to our referring to the two cluster situation as “one term per variable” and may instead choose to
refer to it as “two terms per variable,” in which case there would not be a Table 7.4 column labeled “one term per
variable.” We chose not to do this because when FCM is used to find the MFs then, as mentioned in the text, there is
a complimentary relationship between the MFs of the two terms. If, however, FCM is not used to find the MFs for
two terms, then such a complimentary relationship between the two MFs would not (necessarily) occur.
139
Table 7.4
Number of surviving causal combinations for eight problems.
Problem Cases
Variables
(V)
One Term per Variable Two Terms per Variable Three Terms per Variable
Candidate /
Possible
Causal
Combinations
a
Surviving
Causal
Combinations
Candidate
Causal
Combinations
b
Plausibly
Possible
Causal
Combinations
c
Surviving
Causal
Combinations
Candidate
Causal
Combinations
d
Plausibly
Possible
Causal
Combinations
e
Surviving
Causal
Combinations
Abalone [6] 4,177 7 128 55 16,384 2,187 118 2,097,152 16,384 352
Concrete
Compressive
Strength [6]
1,030 8 256 73 65,536 6,561 218 16,777,216 65,536 439
Concrete Slump
Test [6]
103 9 512 71 262,144 19,683 90 134,217,728 262,144 97
Wave Force [8] 317 3 8 8 64 27 20 512 64 28
Chemical
Process
Concentration
Reading [3]
194 3 8 8 64 27 22 512 64 24
Chemical
Process
Temperature
Readings [3]
223 3 8 6 64 27 10 512 64 16
Gas Furnace [3] 293 6 64 25 4,096 729 66 262,144 4,096 113
Mackey-Glass
Chaotic Time
Series [4]
1,000 4 16 8 256 81 16 4,096 256 22
a
This is
2
V
because all candidate causal combinations are possible (see Comment 7.5).
b
This is (2
2
)
V
= 4
V
.
c
Instead of 4 candidate causal combinations per variable, there are only 3 plausibly possible candidate causal combinations per variable; hence, the total number of plausibly possible candidate
causal combinations is 3
V
(see Comment 7.5).
d
This is (2
3
)
V
= 8
V
.
e
Instead of 8 candidate causal combinations per variable, there are only 4 plausibly possible candidate causal combinations per variable; hence, the total number of plausibly possible candidate
causal combinations is 4
V
(see Comment 7.5).
140
the same number, as the number of candidate possible causal combinations. This suggests
(to us) that one can (should) extract more information from these data sets by using more
than one term per variable.
2) In all other situations the number of surviving causal combinations is considerably
smaller than the number of candidate or plausibly possible causal combinations. This
difference increases when more terms per variable are used, e.g., using three terms per
variables the candidate causal combinations for the Concrete Slump Test data set is
134,217,728 whereas the number of surviving causal combinations is 97.
3) Abalone has seven variables and 4177 cases, and only 55/128, 118/2187 and 352/16385
surviving causal combinations occurred when one, two or three terms were used per
variable; these are very significant reductions from the plausibly possible number of
causal combinations (128, 2187 and 16385, respectively).
4) Concrete Compressible Strength and Concrete Slump Test have even greater reductions
from the plausibly possible number of causal combinations to the number of surviving
causal combinations. Concrete Slump Test is interesting because there are so few cases
(103). If one wants to use three (two) terms per variable then there are 97 (90) surviving
causal combinations, almost one per case, indicating (to us) that it is not a good idea to
use three (two) terms per variable for this data set when so few cases are available.
5) Forecasting the Mackey-Glass Chaotic Time Series is a very well studied problem, in
which it is quite common to begin with four past samples (variables) and exactly 16 rules.
This is in agreement with using two terms per variable where the number of surviving
causal combinations turns out to be 16.
Example 7.4. In this example we illustrate the surviving causal combinations for the Abalone
[79] data set. Abalone has seven variables, namely: Length, Diameter, Height, Whole Weight,
Shucked Weight, Viscera Weight, and Shell Weight. We used one MF for each variable, as
described in Example 7.3, and let
1
1
H High Length,
2
H High Diameter,
3
H High Height,
4
H High Whole Weight,
5
H High Shucked Weight,
6
H High Viscera Weight, and
7
H
High Shell Weight. Table 7.5 enumerates the 55 surviving causal combinations. Its last column
indicates the number of cases associated with each rule. Observe that the first six causal
1
You may not agree with our linguistic use of “High” for all of the variables; however, as explained in footnote 4,
the actual names given to subsets are not important for using the results in this paper.
141
combinations cover 3439 cases which are more than 82 percent of all of the cases; so, if one uses
the 80% rule that is mentioned in Comment 7.2, this could be done with only six rules.
Table 7.5
Surviving causal combinations and associated number of cases associated for
Abalone data set
Rule
Number
1
H
2
H
3
H
4
H
5
H
6
H
7
H
Number
of cases
1 0 0 0 0 0 0 0 1443
2 1 1 1 1 1 1 1 1362
3 0 0 1 1 1 1 1 229
4 0 0 0 1 1 1 1 181
5 0 0 0 0 1 0 0 124
6 0 0 0 1 1 1 0 100
7 1 1 0 1 1 1 1 75
8 0 1 1 1 1 1 1 74
9 0 0 0 0 0 1 0 64
10 0 0 0 0 0 0 1 58
11 0 0 0 0 1 1 0 52
12 1 0 1 1 1 1 1 33
13 0 0 0 0 0 1 1 33
14 0 0 1 0 0 0 0 31
15 0 0 0 1 0 1 1 30
16 0 0 1 1 1 1 0 28
17 0 0 0 1 1 0 1 28
18 0 1 0 1 1 1 1 25
19 0 0 1 1 0 1 1 24
20 0 0 1 1 1 0 1 22
21 0 0 1 0 0 1 0 15
22 0 0 1 0 1 0 0 13
23 0 0 1 0 0 0 1 13
24 1 0 0 1 1 1 1 12
25 0 0 0 1 0 0 1 11
26 1 0 0 1 1 1 0 10
27 0 0 0 1 1 0 0 9
28 0 0 0 0 1 0 1 9
29 0 0 1 0 1 1 0 8
30 0 0 1 0 0 1 1 8
31 0 1 0 0 0 0 0 8
32 0 0 1 1 1 0 0 6
33 0 0 0 1 0 1 0 5
34 0 0 1 1 0 0 1 4
35 1 1 0 0 0 0 0 4
36 1 1 1 1 1 1 0 3
37 1 1 1 1 0 1 1 3
38 0 1 0 1 1 0 1 2
39 0 0 1 0 1 0 1 2
40 1 1 1 1 1 0 1 1
41 0 0 0 0 1 1 1 1
42 0 1 0 0 1 1 0 1
43 1 0 0 1 1 0 1 1
44 1 1 0 1 1 1 0 1
45 0 0 1 0 1 1 1 1
46 0 1 1 1 1 0 1 1
47 1 1 0 0 0 0 1 1
48 1 1 0 0 1 1 0 1
49 1 1 0 0 1 0 0 1
50 0 1 1 1 1 1 0 1
51 1 1 1 0 0 0 1 1
52 0 1 0 1 0 1 1 1
53 0 1 1 1 0 1 1 1
54 1 0 1 1 0 1 1 1
55 0 1 0 1 1 1 0 1
142
Comment 7.5. We proved in Chapter 3 that if each of the V variables is described by only one
term (n
c
= 1), then all of the
2
k
= 2
V
candidate causal combinations can occur, i.e. are possible,
meaning that there may be actual cases for which every one of the 2
k
candidate causal
combinations has
1
a MF > 0.5. However, if one or more of the variables are described by more
than one term, then not all of the
2
k
= 2
n
c
V
candidate causal combinations are possible, meaning
that some of the 2
k
candidate causal combinations can never have a MF > 0.5, and this does not
depend on the cases (i.e., on the data) but instead derives from the mathematics that is associated
with
m
F
j
(t) = min m
A
1
j
(t),m
A
2
j
(t),..., m
A
k
j
(t)
{ }
> 0.5.
For example, if a variable is described by two terms, High (H) and Low (L), then of the four
possible combinations LH, lH, Lh and lh, when the MF of L is sufficiently to the left of the MF
of H (as usually occurs
2
) then it is theoretically impossible for LH to occur because MF(LH)
will always be < 0.5 regardless of the data (cases). We proved in Chapter 3 that instead of 2
2
= 4
candidate causal combinations per variable, there are only 3 plausibly possible candidate causal
combinations per variable, namely lH, Lh and lh; hence, the total number of plausibly possible
candidate causal combinations for V variables is 3
V
(and not
4
V
).
If a variable is described by three terms, High (H), Moderate (M) and Low (L), then of the
eight possible combinations LMH, LMh, LmH, Lmh, lMH, lMh, lmH and lmh, when the MF
of L is sufficiently to the left of the MF of M, the MF for M is reasonably situated between the
MFs of L and H (as usually occurs
3
), and the MF of H is sufficiently to the right of the MF of M,
then it is theoretically impossible for LMH, LMh, LmH and lMH to occur because the MFs of
these combinations will always be < 0.5 regardless of the data (cases). We proved in Chapter 3
1
Sketch the MFs for, e.g. Low (L) and Not Low (l). Let
x
1
denote the point at which
m
L
(x) = m
l
(x) . It will be
obvious that in the Min-Max Theorem
arg max(m
L
D
(x),m
l
D
(x)) = L if
x(x) £ x
1
and
arg max(m
L
D
(x),m
l
D
(x)) = l if
x(x) > x
1
. Since both L and l may occur (regardless of actual Case data) all postulated causal combinations
F
j
( j = 1,...,2
k
) are possible when only one term is assigned to each variable.
2
This is called (Chapter 3) the plausibly possible situation for two variables.
3
This is called (Chapter 3) a plausibly possible situation for three variables.
143
that instead of 2
3
= 8 candidate causal combinations per variable, there are only
1
4 plausibly
possible candidate causal combinations per variable, Lmh, lMh, lmH and lmh ; hence, the total
number of plausibly possible candidate causal combinations for V variables is 4
V
(and not
8
V
).
Similar results for more than three terms per variable have not been worked out.
While these theoretical results are very important in fsQCA because fsQCA uses remainder
causal combinations in later logical reductions (where it is imperative to use only the remainders
that are plausibly possible), they are only of passing interest to us because they are not needed
for FCCM (or CCM), i.e., FCCM will reveal exactly which of the 2
k
candidate causal
combinations are possible because of their actual occurrence as a result of that preprocessing.
7.4. Computational Speedup for the Fast Causal Combination Method
In order to establish how much faster FCCM is as compared to CCM, we have tabulated the
steps of CCM in Table 7.6 and FCCM in Table 7.7. Observe, from the bottom lines of these
tables, that:
Both CMM and FCCM depend on N linearly.
FCCM depends on
n
c
V linearly whereas CCM depends on
n
c
V greater than
exponential (because of the
n
c
V factor that multiplies
2
n
c
V
).
CCM requires
22
c
nV
c
O N n V operations. Because CCM and FCCM both require
computing m
C
i
D
(t) and m
c
i
D
(t) (t =1 ,..., N), we do not include these computations in
these tables.
FCCM requires
O N(2n
c
V + 2 + J)
( )
operations.
Consequently,
22
#CCM operations
Speedup 2
#FCCM operations (2 2 )
c
c
nV
c
nV
c
O N n V
O
O N n V J
(7.18)
1
If, perchance, the MF of Moderate intersects the MFs of Low and High at exactly the points at which their MFs
equal ½ then there are only 3 plausibly possible candidate causal combinations per variable, Lmh, lMh and lmH ,
and the number of plausibly possible candidate causal combinations becomes the same as in the case of two terms
per variable, namely
3
V
.
144
Speedup is independent of the number of cases. It depends on the number of variables
and the number of terms per variable, and can be huge, e.g. if
V = 10 and
n
c
= 3,
2
n
c
V
= 2
30
10
9
Speedup may be somewhat misleading. In the previous example CCM would require
9
(60 10 ) ON operations, whereas FCCM would require
60 (2 ) 60 O N N J O N operations; so CCM uses vastly more operations than
FCCM.
For more variables (i.e., a larger value of V)
2
n
c
V
becomes astronomical and CCM is
impossible to perform, but FCCM can still be performed.
Table 7.6
Computations for CCM (k = n
c
V)
Computation CCM Steps Computations
a
1
Enumerate and store the
2
k
candidate causal combinations
F
j
using (5)
The enumeration requires creating
2
k
permutations, each of
length k, in which each element is either
C
i
or
c
i
, and then
labeling each of the permutations, 1, 2, …,
2
k
.
2
Compute and store
m
F
j
(t)
using (6)
For each t
(t = 1,..., N):
For each j
( j = 1,...,2
k
):
Retrieve the k MF values for each
F
j
(requires k
operations)
Compute the minimum of the k MF values for each
F
j
(requires
k -1 operations)
Total operations required are:
2
k
Nk + 2
k
N(k -1) = 2
k
N(2k -1)
3
Compute and store
t
F
j
(t) using
(8)
The computation requires carrying out a test
2
k
times.
4
Compute and store
N
F
j
using
(9)
The computation requires
N -1 additions of two numbers,
repeated
2
k
times, for a total of
2
k
(N -1) additions.
5
Compute
F
l
S
using (10)
The computation requires
2
k
tests.
CCM requires performing
1
2 ( 1) 2 (2 )
c
nV k
c
kN N n V
operations
a
Each permutation, retrieval, minimum, test and addition is counted as one operation.
145
Table 7.7
Computations for FCCM (k = n
c
V)
Computation FCCM Steps Computations
a
1
Establish and store the N
F
j*(t)
(t)
using (13)
For each t
(t = 1,..., N):
Compute the maximum of two MF values (requires
k operations)
Extract the label of the winning MF value (requires
k extractions)
Store the extracted labels as an ordered chain of k
labels (1 or 0)
Total operations required are: 2kN.
2 Establish and store the J uniquely
different
F
j*(t)
(t)
This requires finding distinct ordered chains from the N
ordered chains (requires on the order of N operations).
3
Compute and store
t
F
¢ j
(t) using (14)
This computation requires carrying out a test N times.
4
Compute and store
N
F
¢ j
using (15)
This computation requires
N -1 additions, repeated J
times, for a total of
J(N -1) additions.
5
Establish
F
l
S
using (16) This computation requires J tests.
FCCM requires performing
N(2k + 2 + J) = N(2n
c
V + 2 + J) operations
a
Each maximum, extraction, sort, test and addition is counted as one operation.
7.5. Additional Ways to Speedup Computations
Sometimes one may want to perform FCCM for different combinations of causal conditions,
by either including more causal conditions into the mix of the original k causal conditions or by
removing some of the original k causal conditions. Presently, doing any of these things requires
treating each modified set of causal conditions as totally new FCCM. Next, we show that there
are much easier and faster ways to perform FCCM once it has already been performed for k
causal conditions.
Observe in (7.13) that, e.g., argmax m
C
1
(t), m
c
1
(t)
( )
is unchanged whether there are one, two,
three, etc. causal conditions. This means that, for each case, the winning causal combination for k
causal conditions includes the winning causal combination for ¢ k causal conditions, when
¢ k < k; so, if one knows the winning causal combination for k causal conditions and one wants
to know the winning causal combination for ¢ k < k causal conditions, one simply deletes the
undesired k - ¢ k causal conditions from the winning causal combination of k causal conditions.
146
These observations suggest that there are both a forward recursion and a backward recursion
for (7.13). In what follows, it is assumed that the smallest number of causal conditions for which
FCCM is performed is two.
Corollary 7-1-1 (Forward Recursion) (Chapter 3). For each case, it is true that ( k = 3,4,...):
F
j*(t)
(t |C
1
,C
2
,...,C
k
) = F
j*(t)
(t | C
1
,C
2
,...,C
k-1
)argmax m
C
k
D
(t), m
c
k
D
(t)
( )
(7.19)
Proof: See Section A.2 in Appendix A.
Corollary 7-1-2 (Backward Recursion) (Chapter 3). Let C
j
denote the suppression of causal
condition C
j
. Then it is true that:
F
j*(t)
(t |C
1
,C
2
,..., C
i
,...,C
k
) = F
j*(t)
(t | C
1
,C
2
,...,C
i-1
,C
i+1
,...,C
k
) (7.20)
Proof: Obvious from (7.13).
This backward recursion can also lead to a vast reduction in computation time. For example, if
the winning causal combination F
j*(t)
(C
1
,C
2
,...,C
k
) has been determined for five causal
conditions (k = 5), then it can be used to establish the winning causal combination for any
combination of four, three, or two of the causal conditions, by inspection! No new computations
have to be performed, because the winning combinations for fewer causal conditions are always
contained in the winning combinations for more causal combinations.
Example 7.5 (continuation of Example 7.1). Assume that
1 1 1 2 2 2 3 3 3 4 4 4
x x x x x x x x x x x x
L m h l M h l m h b f G is a
winning causal combination for Case t when twelve causal conditions are used. Then from
(7.20), one can conclude:
If one wants, e.g., to eliminate the liquidity variable from a study (i.e., causal conditions
b
x
4
, f
x
4
and
4
x
G ), then
1 1 1 2 2 2 3 3 3
x x x x x x x x x
L m h l M h l m h is the winning causal combination for Case
t when only the risk, inflation and profit variables are used.
147
If one wants, e.g., to eliminate all of the moderate and fair causal conditions from a study,
(i.e., M
x
1
, M
x
2
, M
x
3
and
4
x
F ) then
1 1 2 2 3 3 4 4
x x x x x x x x
L h l h l h b G is the winning causal combination
for Case t.
Of course, many other situations for
1 1 1 2 2 2 3 3 3 4 4 4
x x x x x x x x x x x x
L m h l M h l m h b f G
can be considered.
Very importantly, FCCM lends itself to parallel and distributed computations, and there are
many different ways to do this because (13) can be decomposed in many different ways, e.g.
11
*( )
( ) max ( ), ( ) ... max ( ), ( )
kk
D D D D
j t C c C c
F t arg t t arg t t (7.21)
where 1,..., tN .
If variables (or groups of variables) are measured by smart sensors in different locations, then
the winning causal condition for all of the k terms can be computed at those locations. These k
winners can then be transmitted to a central processor that computes the surviving causal
combinations. This could be done for all N cases at once, or for a case at a time, or in parallel for
batches of cases, where the batches are distributed among as many processors as is practical, e.g.,
11
11
11
*( ) 1 1
*( ) 2 1 2
*( )
( ) max ( ), ( ) ... max ( ), ( ) , 1,...,
( ) max ( ), ( ) ... max ( ), ( ) , 1,...,
( ) max ( ), ( ) ... max ( ), ( )
kk
kk
kk
D D D D
j t C c C c
D D D D
j t C c C c
D D D D
j t D C c C c
F t N arg t t arg t t t N
F t N arg t t arg t t t N N
F t N arg t t arg t t
, 1,...,
D
t N N
(7.22)
where the set of cases are distributed into D subsets. The final surviving causal combination can
be obtain by combining the result of distrusted computing, i.e.,
*( ) *( ) 1 *( ) 2 *( )
( ) ( ) ( ) ... ( )
j t j t j t j t D
F t F t N F t N F t N (7.23)
7.6. Conclusions
This chapter has presented a very efficient method for establishing the initial (nonlinear)
combinations of variables that can then be used in later modeling and processing (e.g.,
regression, classification, neural networks, etc.). by using a novel form of preprocessing that
148
transforms raw data into patterns through the use of fuzzy sets. Our method lends itself to
massive distributed and parallel processing which makes it suitable for data of all sizes from
small to big.
Variables are first partitioned into subsets each of which has a linguistic term (called a causal
condition) associated with it. Our Causal Combination Method uses fuzzy sets to model the
terms and focuses on interconnections (causal combinations) of either a causal condition or its
complement, where the connecting word is AND which is modeled using the minimum
operation. Our Fast Causal Combination Method is based on a novel theoretical result, leads to
an exponential speedup in computation and lends itself to parallel and distributed processing;
hence, it can be used on data from small to big.
Although this chapter has focused only on establishing nonlinear combinations of variables
from data, a reader may be wondering what one can do with the final FCCM surviving causal
combinations. It is important to realize that FCCM is not limited to Big Data. It can also be
applied to moderate and small data sets. For small to moderate data sets, one can use the
surviving causal combinations to perform regression, classification and fsQCA. There is a lot of
research going on worldwide to figure out how to do regression and classification for Big Data.
In the next chapter we will show how the surviving causal combinations can be used in a new
regression model, called Variable Structure Regression (VSR). Using the FCCM surviving
causal combinations one can simultaneously determine the number of terms in the (nonlinear)
regression model as well as the exact mathematical structure for each of the terms (basis
functions). VSR has been tested on the eight classical (and readily available) small to moderate
size data sets that are stated in Table 7.1 (four are for multi-variable function approximation and
four are for forecasting), has been compared against five other methods, and has ranked #1
against all of them for all of the eight data sets.
149
Chapter 8
Variable Structure Regression (VSR)
8.1 Introduction
Linear regression models relate a measured output (y) to a collection of p measured-variables
(x
1
,x
2
,..., x
p
) each of which is believed to contribute to the output. These models are very widely
used in just about all science, engineering and non-engineering real-world applications (e.g.,
behavioral science [84], biostatistics [85], business [86], econometrics [87], financial engineering
[88], insurance [89], medicine [90], petroleum engineering [91], etc. [92], [93]). A typical linear
regression model has the following structure [94]:
(8.1)
where b
0
, b
v
are the regression coefficients, and bias b
0
is a constant that does not depend on
any of the variables (including such a term is a standard practice in regression models).
Assuming that an output has a linear dependence on its variables is often too simplistic for
many real-world applications; hence, the following nonlinear regression model is often used:
y(x
1
,x
2
,..., x
p
) = b
0
+ b
v
j
v
(x
1
, x
2
,..., x
p
)
v=1
R
S
å
(8.2)
in which the regressors j
v
(x
1
, x
2
,..., x
p
) are nonlinear functions of x
1
, x
2
,..., x
p
. These nonlinear
functions are often also called basis functions, so in this paper we use the terms “regressors” and
“basis functions” interchangeably. Many choices have been made in the past for the basis
functions, e.g., polynomials (orthogonal and non-orthogonal), trigonometric, Gaussian, radial,
fuzzy, etc.
A search in Google under “nonlinear regression” (on October 13, 2013) listed about 4,510,000
results; so, it is beyond the realm of this paper to provide a complete list of articles that have
been written about nonlinear regression. Instead, we refer the readers to, e.g. [94], [95].
There are four major challenges to implementing (8.2): 1) Choosing the variables; 2) Choosing
the nonlinear structures of the regressors; 3) Choosing how many terms to include in (8.2),
namely
R
S
; and, 4) Optimizing the parameters that complete the description of the model.
y(x
1
, x
2
,..., x
p
) = b
0
+ b
v
x
v
v=1
p
å
150
For Challenge 1, how to choose the variables is crucial to the success of any regression model.
In this paper we assume that the user has established the variables that affect the outcome, using
methods already available for doing this (e.g., [96]). Our focus in this paper is on Challenges 2-
4.
For Challenge 2, in real-world applications the nonlinear structures of the regressors are
usually not known ahead of time, and are therefore chosen either
1
as products of the variables
(e.g., two at a time, three at a time, etc.), or in other more complicated ways (e.g., trigonometric-,
exponential-, logarithmic-functions, etc.). Sometimes a deep knowledge about the application
provides justifications for the choices made for the nonlinear terms; however, often one does not
have such deep knowledge, and a lot of time is spent, using trial and error, trying to establish the
nonlinear dependencies. We shall demonstrate below that Variable Structure Regression (VSR)
establishes the exact nonlinear structure for each of the
R
S
regressors in (2) automatically.
For Challenge 3, how to choose
R
S
is also usually done by trial and error, and this can be very
tedious to do. We shall also demonstrate below that VSR establishes
R
S
automatically.
For Challenge 4, in addition to the regression coefficients that appear in (8.2), each regressor in
VSR will be a parametric function of variables, and numerical values must be specified for all
such parameters. Usually, one does not know how to specify such numerical values ahead of
time. Instead, as we will explain below, VSR follows the now common practice of determining
numerical values for all such parameters, as well as for the regression coefficients in (8.2) by
using some given data and one or more optimization methods that make the regression model
optimally fit that data.
As a result of our solutions to Challenges 2-4, the nonlinear regression model in (8.2) will have
a variable structure, which is why we have called this kind of regression VSR. Exactly what we
mean by a “variable structure” is deferred until Sections 8.6, because the variability of the
structure in (8.2) occurs in two different ways in VSR.
The rest of this paper is organized as follows: Section 8.2 deals with measured data; Section
8.3 explains how the data are preprocessed; Section 8.4 explains how the antecedents of rules as
well as the number of rules are determined simultaneously from the data; Section 8.5 explains
how rules are established by using the results from Section 8.4, and how the formula for the VSR
1
Sometimes linear terms are also a part of (2).
151
model in (8.2) derives from them; Section 8.6 explains how all of the parameters in the VSR
model as well as its structure are optimized; Section 8.7 presents experimental results; Section
8.8 provides some discussions; and, Section 8.9 provides conclusions, the strengths of the VSR
method and future works.
8.2 Measured Data
A data pair is (x(t),y(t)) where
x = col(x
1
,x
2
,..., x
p
) and y(t) is the output for that x(t). Each
data pair is treated as a “case” and index t denotes a data case. In our studies there may or may
not be a natural ordering of the cases over t. In a multi-variable function approximation
application the data have no natural ordering; but in a time-series forecasting application the data
cases have a natural temporal ordering.
Assume we are given a data set. In simple validation, we assume that N data pairs, and refer to
the collection of these data pairs as S
Cases
, where:
S
Cases
= (x(t), y(t)) { }
t=1
N
(8.3)
Later in this paper N learning data pairs are divided into three data sets one of which is used for
training (adjusting model parameters), one is used for validation (using to estimate
generalization error), and the last one is used for testing (evaluating the performance of the
model). More specifically: 1) N
trn
of the
N data cases form the training set, S
Cases
trn
, where
S
Cases
trn
= x
trn
(t) : y
trn
(t)
{ }
t=1
N
trn
(8.4)
2) N
val
of the
N data cases form the validation set, S
Cases
val
, where
S
Cases
val
= x
val
(t) : y
val
(t)
{ }
t=1
N
val
(8.5)
and 3)
test
N
of the
N data cases form the testing set,
test
Cases
S , where
1
( ) : ( )
test
N
test
Cases test test
t
S t y t
x (8.6)
The training set is used to optimize the parameters of (8.2), the validation set is used to stop the
training, and the testing set is used to evaluate the overall performance of the optimized model;
this is fully explained in Section 8.6.
152
8.3 Preprocessing
Each variable ( 1,..., )
q
x q p , is mapped into subsets each of which may be thought of as
having a linguistic term
1
associated with it, e.g., the variable Pressure can be partitioned into
Low Pressure, Moderate Pressure and High Pressure. Because it is very difficult to know where
to draw a crisp line between each of the subsets, so as to separate one from the other, they are
modeled herein as fuzzy sets, and, there can be from 1 to
n
v
subsets (terms) for each input
variable. The terms for each input variable that are actually used in CCM are called causal
conditions.
If one chooses to use only one term for each variable (e.g., Profitable Company, Educated
Country, Permeable Oil Field, etc.), then the words “variable,” “term” and “causal condition”
can be interchanged, i.e., they are synonymous. If, on the other hand, one chooses to use more
than one term for each variable (e.g., Low Pressure, Moderate Pressure and High Pressure), i.e.,
to granulate each variable, as is very commonly done in engineering and computer science
applications, then one must distinguish between the words “variable,” “term” and “causal
condition.” We elaborate further on this next.
If, e.g., there are p variables, each described by
i
n
terms ( 1,..., qp ) then each of the terms
will be treated as a separate causal condition
2
(this is illustrated below in Example 8.1), and, as a
result, there will be
12 p
k n n n causal conditions.
We ()
li
Tx
( 1,..., )
i
ln ( 1,..., ) ip
denote the terms for each variable. For simplicity, in this
paper the same numbers of terms are used for each variable, i.e.
ic
nn for i , so that the total
number of causal conditions is
c
k n p .
The terms are organized according to the (non-unique) ordering of the p input variables, as
1 1 1 1
{ ( ),..., ( ),...., ( ),..., ( )}
cc
n p n p
T x T x T x T x . This set of
c
np terms is then mapped into an ordered set
of causal conditions,
C
S , as follows:
1
The actual names that are given to the subsets are not important for this paper, e.g., they may be given
linguistically meaningful names (as in our example of Pressure) or symbolic names (e.g., A, B, C; T1, T2, T3; etc.).
2
One may raise an objection to doing this because of perceived correlations between terms for the same variable
(e.g., perhaps Low Pressure and Moderate Pressure are highly correlated, or Moderate Pressure and High Pressure
are highly correlated). Such perceptions depend on how overlapped the fuzzy sets are for the terms and does not
have to be accounted for during CCM because the mathematics for CCM will take care of the overlap automatically.
153
1 1 1 1
1 1 1 ( 1) 1
1
{ ( ),..., ( ),...., ( ),..., ( )}
{ ( ),..., ( ),..., ( ),..., ( )}
{ ,..., ,..
cc
c c c
c
n p n p
n n p p n p p
n
T x T x T x T x
C x C x C x C x
CC
( 1) 1
., ,..., }
cc
n p n p
CC
A causal combination is a connection of
c
k n p
conditions, each of which is either a causal
condition or its complement for each variable.
There are many different ways to establish MFs. One way is to choose it as a prescribed
function, e.g., two-parameter sigmoidal function or two-parameter piecewise-linear function.
Another way is to use a modified version of Fuzzy c-Means (FCM) [83] called Linguistically
Modified Fuzzy c-Means (LM-FCM) [29]. Regardless of how one does this, in a later step of
VSR these MFs will be optimized (changed).
8.4 Establish Antecedents of Rules and the Number of Rules
VSR simultaneously establishes the if-part (the antecedent) of a rule, as well as the number of
rules, R
s
. Each rule will generate one basis function in (8.2), as is explained in Section 8.5. The
(compound) antecedent of each rule contains one linguistic term or its complement for each of
the k causal condition, and each of these linguistic terms is combined with the others by using the
word “and” (e.g.,
12
and and ... and
k
A A A ; Pressure is Not Low and Pressure is Moderate and
Pressure is Not High and Temperature is Low and Temperature is Not Moderate and
Temperature is Not High). This interconnection is called a causal combination.
Observe that different terms related to a variable are treated as independent causal conditions
in the antecedent. Note that in a traditional if-then rule the antecedents only use the terms and not
their complements. In VSR, protection about being wrong for postulating a term is achieved by
considering each term and its complement.
To begin, 2
k
candidate causal combinations (the 2 is due to both the term and its complement)
are conceptually postulated (we will show below that the causal combinations do not actually
have to be enumerated). If, e.g., 6 k there would be 64 candidate causal combinations, or, if
10 k , there would be 1024 candidate causal combinations.
One does not know ahead of time which of the 2
k
candidate causal combinations should
actually be used as a compound antecedent in a rule. VSR prunes this large collection by using
154
the MFs that were determined in Section 8.3, as well as the MF for "A
1
and A
2
and ... and A
p
"
(obtained using fuzzy set mathematics) and a simple test. The results of doing this are R
s
surviving causal combinations.
Let S
F
be the collection of the following 2
k
candidate causal combinations, F
j
( 1,...,2 and 1,..., )
k
j i k :
1
2
12
{ ,..., }
... ...
or
k
F
j j j j
j i k
j
i i i
FF
F A A A A
A C c
S
(8.7)
where Ù denotes conjunction (the “and” operator) and is modeled using minimum and c
i
denotes the complement of C
i
. The
S
R surviving causal combinations are found from all of the
2
k
candidate causal combinations by means of the following three-step (conceptual
1
) procedure:
1. Imagine that the 2
k
candidate causal combinations are enumerated (i.e., imagine a table
with 2
k
columns, each of which is one of the candidate causal combinations);
2. Imagine that the MF of each of the 2
k
candidate causal combinations is computed for
each of the N
trn
cases that are in the training set, S
Cases
trn
(the table has N
trn
rows, each
case corresponds to one row in this table, and the entry in each element of the table is the
MF value for a candidate causal combination evaluated for that case); and,
3. Only those candidate causal combinations are kept whose MF > 0.5 for at least f cases,
where f is a threshold that has to be specified ahead of time.
Comment 8.1 (Vector Space Interpretation): This conceptual procedure can be interpreted as
keeping the adequately represented causal combinations that are closer to corners in a 2
k
-
dimensional vector space and not the ones that are farther away from such corners. Regarding
closeness to corners in a 2
k
- dimensional vector space (Chapter 2), if crisp sets were used, then
a candidate causal combination would either be fully supported (i.e., it’s MF value equals 1) or
not supported at all (i.e., it’s MF value equals 0), and only the fully supported candidate causal
combinations would survive. Using fuzzy sets lets us back off from the stringent requirement of
1
A computationally efficient procedure is described below, which is why we refer to this three-step procedure as
“conceptual.”
155
crisp sets by replacing the vertex membership value of “1” with a vertex membership value of >
0.5, meaning that if the MF value is greater than 0.5 then the causal combination is closer to its
vertex than it is away from its vertex. Only those cases whose MF values are greater than 0.5 are
said to support the existence of a candidate causal combination. These ideas originated in fsQCA
Chapter 2-3.
Carrying out this conceptual procedure would require computing 2
k
trn
N
compound MFs, a
number that can easily become enormous due to possibly large values of N
trn
and/or k.
Importantly, it is not actually necessary to carry out the conceptual approach just stated, as we
explain next.
We have proven in Chapter 3 that for each case only one of the 2
k
candidate causal
combinations has a MF value that is > 0.5 (this means that, in the above table, each row will
contain only one element that is > 0.5). More importantly, they have provided a simple formula
for establishing exactly which candidate causal combination that is. Their result is provided in
the following:
Min-Max Theorem: Given p terms, C
1
, C
2
, …,
k
C and their respective complements, c
1
, c
2
, …,
k
c . Consider the 2
k
candidate causal combinations ( 1,...,2 )
k
j
1
... ...
jjj
j i k
F A A A
where A
i
j
= C
i
or c
i
and 1,..., ik . Let
1
1
( ) min{ ( ),..., ( ),..., ( )}
jjj
j
i k
F
AAA
t t t t , t = 1,2,..., N
trn
(8.8)
where
( ) ( ) or ( ) 1 ( ), 1,...,
j
i i i
i
C c C
A
t t t t i k (8.9)
Then for each t (case) there is only one j, j*(t), for which m
F
j*(t )
(t) > 0.5 and m
F
j*(t )
(t) can be
computed as:
*( ) 1 1
( ) min max , ( ) ,...,max , ( )
j t k k
F C c C c
t t t t t
(8.10)
F
j*(t )
(t) is determined from the right-hand side of (10), as:
1
Whether or not this kind of a result is true for other t-norms and complements is an open research problem.
156
11
**
*
( ) ( )
1
( ) max , ( ) .... max , ( )
...
kk
j C c C c
j t j t
k
F t arg t t arg t t
AA
(8.11)
In (8.11), argmax m
C
i
t ( ),m
c
i
t ( )
( )
denotes the winner of max m
C
i
t ( ),m
c
i
(t)
( )
, namely C
i
or c
i
.
Not all of the N
trn
winning causal combinations will be different, i.e. the same winner
frequently occurs for more than one case. Consequently, after the winning causal combination is
found for each of the N
trn
cases, the J uniquely different
*
()
j
Ft are found; and, they are
relabeled
j
F
( ¢ j = 1,..., J). Details of an application of this min-max Theorem are shown in
Section 8.7.
We are now ready to state how the R
S
surviving causal combinations are actually computed:
8. Compute F
j*
(t) using (8.11).
9. Find the J uniquely different
*
()
j
Ft and re-label them F
¢ j
( ¢ j = 1,..., J).
10. Compute t
F
¢ j
, where (t = 1,..., N
trn
)
t
F
¢ j
(t) =
1 if F
¢ j
= F
j*(t )
(t)
0 otherwise
ì
í
ï
î
ï
(8.12)
11. Compute N
F
¢ j
, where
N
F
¢ j
= t
F
¢ j
(t)
t=1
N
trn
å
(8.13)
12. Establish the R
S
surviving causal combinations F
v
S
(v = 1,..., R
S
), as:
F
v
S
=
F
¢ j
( ¢ j ® v) if N
F
¢ j
³ f
0 if N
F
¢ j
< f
ì
í
ï
î
ï
(8.14)
where F
¢ j
( ¢ j ® v) means F
¢ j
is added to the set of surviving causal combinations as
F
v
S
, and v is the index of the surviving set.
In order to implement (8.14) threshold f has to be chosen. In our works, we chose f = 1. This
choice is arbitrary and depends on an application and how many cases are available. Discussions
157
on how to choose f are given in Chapter 2-3. One popular way to choose f is as the integer such
that at least 80% of all cases are covered by the set of surviving causal combinations.
From
F
j
in (8.7) and
F
v
s
in (14), it follows that (v = 1,..., R
S
):
1 1 1 2 2
( ,..., ) ( ) ( ) ( ) ... ( )
S S v v v
v p v k p
F x x F A x A x A x x (8.15)
8.5 Establish Rules and VSR Equations
The R
S
surviving causal combinations lead to the following TSK rules [97], [98], [99, Ch. 13]:
11
: IF is ... and is , THEN ( )= , 1,...,
vv
v p k v v S
S x A x A y v R x (8.16)
where the constants
b
v
have yet to be determined (they will be the regression coefficients that
appear in (8.2)).
Note, again, that these rules are different from the usual kinds of rules that appear in a fuzzy
logic rule-based system because in (8.16) a term may be the complement of a fuzzy set rather
than just the fuzzy set.
The MF of the antecedents of each rule in (8.16) is m
F
v
S
(x), where:
12
12
( ) ( ) ( ) ... ( )
S v v v
vk
p
F A A A
x x x x (8.17)
Note that m
F
v
S
(x) is a highly nonlinear-function of the input variables because of the nonlinear
dependence of each MF on its input variable [e.g., m
A
1
v
(x
1
)]. The other source of non-linearity
comes from (8.15) where the MFs connected by the t-norms. Usually the t-norm
in (8.17) is
chosen as the product or minimum. The product is a favorite choice (and is ours) because it uses
all of the k MF values, whereas the minimum only uses one of them, the smallest one (it is
unforgiving).
Comment 8.2 (Modeling the Conjunction): VSR uses two models for AND, the minimum and
the product. The minimum is used first to establish the subset of the 2
k
candidate causal
combinations that should survive, because the minimum is very selective, and selectivity is
exactly what is needed to do this. The product is used to obtain the formula for the VSR model
158
because we want it to be a continuous function of its variables; this always occurs for the product
t-norm but does not necessarily occur for the minimum t-norm.
The formula for the VSR model begins with (8.16) and (8.17) and assumes that fired rules are
aggregated using Center of Sets (COS) defuzzification, and is (see [46, Ch. 13])
1
:
g (x) =
b
v
m
F
v
S
(x)
v=1
R
S
å
m
F
v
S
(x)
v=1
R
S
å
(8.18)
which can also be written, as:
g (x) = b
v
m
F
v
S
(x)
m
F
v
S
(x)
v=1
R
S
å
é
ë
ê
ê
ù
û
ú
ú
v=1
R
S
å
(8.19)
When a bias is added to (18), as is done in a regression model, then
y(x) = b
0
+g (x) = b
0
+ b
v
m
F
v
S
(x)
m
F
v
S
(x)
v=1
R
S
å
é
ë
ê
ê
ù
û
ú
ú
v=1
R
S
å
(8.20)
(8.20) is now in the form of a basis function expansion [100], [99, Ch. 13] in which the basis
functions
2
, denoted j
v
(x), are:
j
v
(x) º
m
F
v
S
(x)
m
F
v
S
(x)
v=1
R
S
å
(8.21)
Consequently, (8.20) can also be expressed as:
y(x) = b
0
+ b
v
j
v
(x)
v=1
R
S
å
(8.22)
(8.22) is our VSR model, as stated in (8.2), except that now we have an explicit formula for the
basis functions that is given by (8.21) along with (8.17), as well as a way to specify
R
S
.
1
In a usual fuzzy logic system [99], when (8.18) is obtained by using a COS defuzzifier,
b
v
is the Center-of-
Gravity (COG) of the consequent set for rule v. In VSR,
b
v
is treated as a regression coefficient that will later be
optimized using the N
trn
data pairs.
2
These basis functions have also been called fuzzy basis functions (FBFs) and (8.19) has also been called a FBF
expansion.
159
Comment 8.3 (Novelty of the VSR Equation): Fuzzy basis function expansions originated with
Wang and Mendel [44]; so, by itself there is nothing novel in the year 2013 about (8.22).
Choosing the antecedents by using the procedure that has been described in Section 8.4 is very
novel. Not only does it provide precise structural information about the dependence of each
antecedent on either the term or its complement, but it also provides the number of terms in the
basis function expansion, R
S
.
Comment 8.4 (On Providing a Linguistic Interpretation to the Basis Function Terms in (8.22)):
Once optimal values have been found for
b
1
,...,b
R
S
, it is possible to map each of these
regression coefficients into a fuzzy set that is associated with an output word, because they are
also the consequents in the rules in (8.16). In order to do this, one first decides on the number of
words to be used for the output (e.g., Low, Moderate and High) and uses that number for the
number of clusters when LM-FCM is then applied to
{y
trn
(t)}
t=1
N
trn
. Next, each
b
v
is located on the
y axis and is then projected vertically so that it intersects one or more of the just-determined
MFs for
y
trn
. Finally, the word that is associated with each
b
v
is chosen to be the one with the
largest of these MF values. By using this method, each rule in (8.16), or basis function in (8.22),
can be provided with a linguistic interpretation, something that may be of great value to an end-
user to help him or her to understand the VSR model.
8.6 Optimizing Parameters and Structure of VSR Model
In order to completely specify the VSR model in (8.22) one needs to specify its structure as
well as the numerical values for all of its parameters. The structure of (8.22) is established once
one knows R
s
and the surviving causal combinations. The parameters in (8.22) are of two kinds,
MF parameters that appear in the basis functions and the regression coefficients; both kinds of
parameters need to be determined before (8.22) is completely specified. Recall, however, that
both R
s
and the surviving causal combinations depend on the MFs for each of the p variables;
so, when the MF parameters change this may cause the structure of (8.22) to also change.
Consequently, we will optimize MF parameters of VSR model. Optimized MFs may change the
structure of the VSR model. This is summarized in the high-level flow chart in Fig. 8.1. Observe
that the outer loop is devoted to changing the structure using min-max Theorem and the inner
loop is devoted to parameter optimization.
160
Fig. 8.1. High-level flow-chart for the VSR model.
A. Parameter Optimization
There are different approaches for optimizing the parameters in the VSR model (e.g., see [99,
Ch. 13]), including determining both the MF parameters and regression coefficients
simultaneously by means of one nonlinear optimization, or determining them separately but
iteratively, iterating between a linear optimization for the regression coefficient parameters and a
nonlinear optimization for the MF parameters. VSR, as explained in this paper, uses the latter
approach because each of the optimization problems is of lower dimension than would be the
combined optimization problem; however, there is nothing sacrosanct about doing it this way.
A.1 Optimizing the Regression Coefficients: The least squares (LS) method (e.g., [36]) is used
to find the regression coefficients, b
0
and b
v
( v = 1,..., R
S
) by using the training data. The
training data are also used to compute the training error. In addition, the validation data are used
161
to compute a validation error that is needed later to help find the overall optimized VSR model,
as is explained below in Section C.
Using the notations in (8.4) and (8.5) for the elements in the training and validation sets, (8.22)
can be expressed for each of those data sets, as:
y
trn
(t) = b
0
+ b
v
j
v
(x
trn
(t))
v=1
R
S
å
t = 1,..., N
trn
(8.23)
y
val
(t) = b
0
+ b
v
j
v
(x
val
(t))
v=1
R
S
å
t = 1,..., N
val
(8.24)
Collecting the N
trn
and N
val
equations in (23) and (24), they can be expressed more compactly
in vector-matrix format, as:
(8.25)
(8.26)
where
y
trn
= [y
trn
(1),..., y
trn
(N
trn
)]
T
(8.27)
y
val
= [y
val
(1),..., y
val
(N
val
)]
T
(8.28)
(8.29)
F
trn
=
1
1
j
1
(x
trn
(1)) j
R
S
(x
trn
(1))
j
1
(x
trn
(N
trn
)) j
R
S
(x
trn
(N
trn
))
é
ë
ê
ê
ê
ê
ù
û
ú
ú
ú
ú
(8.30)
F
val
=
1
1
j
1
(x
val
(1)) j
R
S
(x
val
(1))
j
1
(x
val
(N
val
)) j
R
S
(x
val
(N
val
))
é
ë
ê
ê
ê
ê
ù
û
ú
ú
ú
ú
(8.31)
The least-squares optimized regression coefficients, , obtained by minimizing
, can be expressed, as [36]:
(8.32)
162
We do not actually compute using (8.32), because to do so is well known to be fraught with
numerical difficulties; instead, the Singular Value Decomposition (SVD) method is used (e.g.,
[36]) because of its excellent numerical properties.
After is computed, the training and validation RMSEs are computed, as:
(8.33)
(8.34)
Comment 8.5 (On Computing F
trn
and F
val
): In order to compute F
trn
in (8.30) the basis
functions have to be evaluated at the N
trn
data in the training data set. This is done (initially)
using the LM-FCM MFs that were obtained as described in Section 8.3, because those MFs are
available for all of the N
trn
data. In order to compute F
val
in (31) the basis functions have to be
evaluated at the N
val
data in the validation data set. This is where one needs the (piecewise-
linear) interpolated LM-FCM MFs that are described at the end of Section 8.3.
A.2 Optimizing MF Parameters: In this step, the MF parameters are optimized; however, as we
have mentioned above, when the MF parameters change, the basis functions in (8.21) must also
be changed (because the initial basis functions use the MFs that were obtained from LM-FCM),
after which the LS optimized regression coefficients must also be changed. Consequently, there
is a natural iteration (G times) between optimizing the regression coefficients and optimizing the
MF parameters.
In this step one uses the same surviving R
S
causal combinations (i.e., the same structure of the
VSR model) that were found in Section 8.4; however, because the MFs are changed in the
present step from those that were used in Section 8.4, the procedure of Section 8.4 (i.e., structure
Modification) is returned to after the G iterations of the inner parameter optimization loop have
been completed.
In order to optimize the MF parameters one must first choose a parametric model for each MF.
Note that the MFs found by LM-FCM are not parametric models. Although we continue to use
the same number of MFs for each variable we now choose simple parametric models for them,
163
models that have as few parameters as possible (parsimonious models), in order to keep the total
number of MF parameters that are optimized in this step as small as possible. We use piecewise
linear MF models and Quantum Particle Swarm Optimization (QPSO) (e.g., [101]–[103]) as our
MF parameter optimization method. Any other swarm optimization procedure can be used
instead of QPSO.
For a right-shoulder MF (Fig. 8.2), the MF model is ( 1,..., ) ip :
0
()
1
r
ii
r
rr ii
H i i i i rr
ii
r
ii
xa
xa
x a x b
ba
xb
(8.35)
For a left-shoulder MF (Fig. 8.2), the MF model is ( 1,..., ) ip :
1
()
0
l
ii
l
l ii
L i i i i ll
ii
l
ii
xa
xb
x a x b
ba
xb
(8.36)
For a middle MF (Fig. 8.2), the MF model is ( 1,..., ) ip :
0
2
( ) 1
2
0
m
ii
mm
mm ii
M i i i i i mm
ii
m
ii
xa
ab
x x a x b
ba
xb
(8.37)
Fig. 8.2. Piecewise-linear MFs.
QPSO (quantum particle swarm optimization) is a globally convergent iterative search
algorithm that does not use derivatives, generally outperforms the original PSO [104] and has
164
fewer parameters to control. It is a population-based optimization technique, where a population
is called a swarm that contains a set of M different particles. Each particle represents a possible
solution to an optimization problem (minimization problem in the present case). The position of
each particle is updated (in each QPSO iteration) by using its most recent own best solution,
mean of the personal best positions of all particles, and the global best solution found by all
particles so far.
QPSO is used to optimize the MF parameters by minimizing the objective function
. The MF parameters that have been collected into vector [see
(G-1)] are in the matrix and are initialized randomly. A description of our QPSO
algorithm is given in Appendix G.
QPSO is run a fixed pre-specified (G) number of times (we chose G = 200 iterations);
however, if objective function values for two consecutive iterations are very close (e.g., £ e
0
;
we
chose
e
0
= 10
-5
in all of our Section 8.7 applications) then iterations are stopped and G is set
equal to that value.
For each of the G iterations QPSO generates new MFs for each of the M particles, so that new
basis functions and regression coefficients are needed for each of these particles. These are re-
computed by using (8.21) and (8.32) for each of the M particles.
For each of the G iterations the validation error of each particle is computed. After G iterations
the one model that has the smallest validation error
1
is found and saved, i.e.:
2
*
1,..., 1,...,
1,..., 1,...,
min ( ) min ( ( )) ( ( ))
m val m val val m LS m
g G g G
m M m M
arg J g arg g g y (8.38)
(8.39)
The value
*
m
establishes
*
|
vm
x and
*
()
LS m
for the winning model, and that model is
expressed as:
* * * *
,0 ,
1
( | ) |
S
R
m LS m LS v m v m
v
yxx (8.4.)
1
It is well known that training error always decreases whereas validation error first decreases and then increases.
The best model occurs where the validation error has its global minimum.
165
To re-iterate: from the GM QPSO particles there is one best particle, the one that has the
smallest validation error (
*
m
); its parameters are used in (8.40) to describe the best model for a
fixed structure of the VSR model.
B. Structure Identification
After G iterations of parameter optimization have been completed, one passes from the inner
loop to the outer-loop structure-identification stopping rule. Until that stopping rule is satisfied,
the antecedents of the rules as well as their number are re-established by using the five-step
procedure that is described in Section 8.4. Now, however, the MFs that are used in that
procedure use the MF parameters that are in
*
m
.
Structure optimization is performed a pre-specified (
r
max
)
number of times (we chose
r
max
= 100 iterations; see Fig. 8.1) or until the same set of rules appears in any one of the
r
max
structure optimization iterations
1
.
The
r
max
iterations of the structure optimization outer loop lead to
r
max
models
{y
(r )
(x)}
r=1
r
max
,
each obtained as explained for (8.38)-(8.40) and described by
max
()
( ) * ( ) * ( ) *
,
1
1
| ( ) , ( ) , ( )
S
r
Rr
r r r
v m LS v m val m
v
r
g r g r J g r x (8.41)
Comment 8.6 (A Reason for “Variable Structure Designation): One reason for referring to
(8.22) as a “variable-structure” model is because of the structure optimization that occurs during
the complete design of the VSR model during whose iterations the structure of the model
changes. A second reason for this designation is provided in item 6 of Section 8.9.
A pseudo-code for structure and parameter optimization is given in Table 8.1.
C. Establishing the Final Model
Our final model is the model in (8.41) that has the smallest validation error, namely:
( *)
* ( *) * ( *) * ( *) *
,0 ,
1
( ) ( *) ( *) | ( *)
S
Rr
r r r
LS m LS v m v m
v
y g r g r g r xx (8.42)
1
When a set of rules re-appears, QPSO will converge to the same set of results if the number of particles and
iterations of QPSO are large enough. In our applications of VS FBF we chose 1000 particles and 200 iterations of
QPSO and have observed the re-appearance of a set of rules many times.
166
(8.43)
Table 8.1
Pseudo-Code for structure and parameter optimization
D. Test on Testing Data
MF values for the testing data are computed from (8.35)-(8.38), and then the FBFs of the
testing data are obtained, as:
1
1
( (1)) ( (1))
1
1 ( ( )) ( ( ))
S
S
test R test
test
test test R test test
NN
xx
xx
(8.44)
The testing error is
0.5
2
* * *
test m test test m LS m test
JN y (8.45)
Initialize MFs using LM-FCM;
Set e
0
inner loop stopping constant
Set r
max
to Maximum number of outer loop iterations
For r = 1 to r
max
Compute surviving causal combinations using the training data set
Create rules: S
v
(r)
If S
v
(r)
appears in previous iterations
Stop outer loop iteration
End If
Set M to number of Particles
Set G to maximum number of QPSO generations
Find basis functions in (21)
Find LS coefficients as in (32)
Find Optimized MFs using QPSO
Initialize J
trn
(r)
1 ( )
For g = 2 to G
Find basis functions in (21)
Find LS coefficients as in (32)
Find Optimized MFs using QPSO
Compute J
trn
(r)
g ( )
If J
trn
(r)
g ( ) - J
trn
(r)
g -1 ( ) < e
0
Stop inner loop iteration
End If
Set g = g +1
End For
Compute {F
val
(r)
(g)and
Find the winning model using (36)
Store , and S
v
(r)
Set r = r +1
End For
167
8.7 Experimental Results
In this section we apply the VSR method to eight readily available data sets that were also used
in [105]. They are: four multivariate regression problems (Abalone [79], Concrete Compressive
Strength [79], Concrete Slump Test [79], and Wave Force [80]) and four time-series prediction
problems (Mackey-Glass Chaotic Time Series [82], Chemical Process Concentration Readings
[81], Chemical Process Temperature Readings [81], and Gas Furnace [81]). We also compare the
results from the VSR method with the results obtained by using the same five methods that were
used in [105].
To do this we performed double Monte-Carlo simulations. We divided N data pairs into three
data sets: training, validation and testing. We randomly select about 20% of data for testing. The
remaining 80% of data are used for learning which are again divided into training and validation
folds. For each randomly selected testing data, we performed 30 Monte-Carlo simulations of the
five-fold cross-validation method on the learning data sets. Because we randomly select testing
data we also perform another Monte-Carlo simulation for the testing data, e.g., Gas Furnace
time-series prediction problem has 293 case which we randomly select 265 cases for learning
and 28 cases for testing. Then we randomly sampled the 265 cases into five subsets. In each of
the five folds we used four of the data subsets (212 cases) for training and the remaining data
subset (53 cases) for validation and perform 30 Monte-Carlo simulations on them. Because the
testing cases are selected randomly we repeat this procedure 15 times (second Mote-Carlo).
Because our procedure is the same for all eight problems, we provide all of its details only for
the Mackey-Glass time series prediction problem [82] after which we provide the Monte-Carlo
simulation results for all eight data sets.
A. Mackey–Glass Chaotic Time Series Prediction Problem [26]
A Mackey-Glass time series can be generated using the following nonlinear delay-time
differential equation:
dx(T )
dT
=
0.2x(T -t )
1+ x
10
(T -t )
- 0.1x(T ). (8.46)
168
As in [105] we chose 17 ,
x(0) = 1.2 and
x(T ) = 0 for
-t £ T < 0, and used
x(T ),
x(T - 6),
x(T -12) and
x(T -18) to predict
x(T + 6). The Mackey-Glass time series we actually used is
depicted in Fig. 8.3 (observe that
0.21£ x(T) £1.31). We generated 1000 cases
Cases
S where:
1000
1
( ), ( )
Cases
t
S t y t
x (8.47)
x(t) = x(t -1),x(t + 5),x(t +11),x(t +17) é
ë
ù
û
, t =1,...,1000 (8.48)
y(t) = x(t + 23), t =1,...,1000 (8.49)
We select 200 cases randomly for testing the model and to obtain the five data sets that were
used in five-fold cross-validation [106], we first randomly sampled the remaining 800 cases in
(8.47) into five subsets, each with 160 cases. In each of the five folds we used four of the data
subsets (640 cases) for training and the remaining data subset (160 cases) for validation. In the
sequel we illustrate the VSR procedure only for the first fold.
Fig. 8.3. Mackey-Glass time series.
169
To begin, we applied LM-FCM (for two clusters) to all 640
x(t) from the first fold and
obtained the two MFs that are depicted in Fig. 8.4. Because three of the four causal conditions
are time-delayed versions of
x(T ), we used the same MFs for
x(T ),
x(T - 6),
x(T -12) and
x(T -18)
as their initial MFs. During parameter optimization we use a two-parameter MF for
each of
x(T ) ,
x(T - 6) ,
x(T -12) and
x(T -18) , and we will see that those MFs will all
become different from one another as parameter optimizations proceed.
Fig. 8.4. LM-FCM MFs.
Next, the set of surviving causal combinations was established by using the five-step procedure
that is stated after Theorem 1. Because we cannot display the steps of this procedure for all 640
cases, we illustrate the computations for a small subset of 14 cases.
Table 8.2 depicts
x
1
º x(T -18),
x
2
º x(T -12),
x
3
º x(T - 6) and x
4
º x(T ) as well as their
MF grade in H (Fig. 8.4) for the 14 cases. For four causal conditions there are
2
4
= 16 candidate
causal combinations. The min-max formulas in (8.10) and (8.11) were used to find the winning
causal combination for each case. These results are summarized in Table 8.3. Each row in that
table gives the MF value as well as the winner of the maximum of the MFs for High and the
complement of High for each of the four x
i
, followed by the minimum result of the four winning
MF values, followed by the winning causal combination. Observe, from the last column of that
table, that only three out of the 16 possible causal combinations have survived, namely:
H
1
H
2
h
3
h
4
, h
1
h
2
h
3
h
4
and H
1
H
2
H
3
H
4
. Their MF grades, for all 14 cases, are summarized in Table
8.4 (this is included here only to illustrate the truth of Theorem 8.1). Observe, from Table 8.4,
that for each case there is only one causal combination for which its MF is greater than 0.5; it is
170
shown in boldface. Observe, also that H
1
H
2
h
3
h
4
is supported by only one case, h
1
h
2
h
3
h
4
is
supported by seven cases and H
1
H
2
H
3
H
4
is supported by six cases.
Table 8.2
Data (
i
x ) and Membership Grades (
i
H ) for 14 Cases
Case
Causal condition and derived MF scores
x 1 1
H x 2
2
H x 3
3
H x 4
4
H
1 0.8627 0.69 0.8125 0.52 0.7652 0.34 0.7206 0.20
2 0.6907 0.11 0.7198 0.19 0.7433 0.27 0.7618 0.34
3 1.1525 1.00 1.1679 1.00 1.1816 1.00 1.1936 1.00
4 0.4926 0.00 0.5260 0.00 0.5652 0.00 0.6084 0.02
5 1.2326 1.00 1.2377 1.00 1.2438 1.00 1.2512 1.00
6 0.5496 0.00 0.5991 0.01 0.6425 0.04 0.6797 0.10
7 1.0609 1.00 1.0832 1.00 1.1040 1.00 1.1234 1.00
8 1.0532 1.00 1.0759 1.00 1.0972 1.00 1.1171 1.00
9 0.9703 0.95 0.9645 0.95 0.9585 0.94 0.9529 0.94
10 0.7128 0.16 0.6975 0.13 0.6832 0.11 0.6719 0.09
11 0.6719 0.08 0.6650 0.07 0.6636 0.07 0.6685 0.08
12 1.0938 1.00 1.1139 1.00 1.1326 1.00 1.1498 1.00
13 0.7903 0.41 0.7849 0.41 0.7775 0.39 0.7683 0.36
14 0.6761 0.08 0.7205 0.19 0.7626 0.34 0.8015 0.49
Table 8.3
Min-Max Calculations and Associated Causal Combinations
Case
Maximum (MF, complement of MF)/Winner (W)
Minimum calculation
[Using (8.10)]
Causal combination
[Using (8.11)]
Max(
1
H ,
1
h )
/ W 1
Max(
2
H ,
2
h )
/ W 2
Max(
3
H ,
3
h )
/ W 3
Max(
4
H ,
4
h )
/ W 4
1 0.69/
1
H 0.52/
2
H 0.66/
3
h 0.80/
4
h 0.52
1 2 3 4
H H h h
2 0.88/
1
h 0.81/
2
h 0.73/
3
h 0.66/
4
h 0.66
1 2 3 4
h h h h
3 1.00/
1
H 1.00/
2
H 1.00/
3
H 1.00/
4
H 1.00
1 2 3 4
H H H H
4 1.00/
1
h 1.00/
2
h 1.00/
3
h 0.98/
4
h 0.98
1 2 3 4
h h h h
5 1.00/
1
H 1.00/
2
H 1.00/
3
H 1.00/
4
H 1.00
1 2 3 4
H H H H
6 1.00/
1
h 0.99/
2
h 0.96/
3
h 0.90/
4
h 0.90
1 2 3 4
h h h h
7 1.00/
1
H 1.00/
2
H 1.00/
3
H 1.00/
4
H 1.00
1 2 3 4
H H H H
8 1.00/
1
H 1.00/
2
H 1.00/
3
H 1.00/
4
H 1.00
1 2 3 4
H H H H
9 0.95/
1
H 0.95/
2
H 0.94/
3
H 0.94/
4
H 0.94
1 2 3 4
H H H H
10 0.84/
1
h 0.87/
2
h 0.89/
3
h 0.91/
4
h 0.84
1 2 3 4
h h h h
11 0.92/
1
h 0.93/
2
h 0.93/
3
h 0.92/
4
h 0.92
1 2 3 4
h h h h
12 1.00/
1
H 1.00/
2
H 1.00/
3
H 1.00/
4
H 1.00
1 2 3 4
H H H H
13 0.59/
1
h 0.59/
2
h 0.61/
3
h 0.64/
4
h 0.59
1 2 3 4
h h h h
14 0.92/
1
h 0.81/
2
h 0.66/
3
h 0.51/
4
h 0.51
1 2 3 4
h h h h
171
Table 8.4
MF Grades for Three Surviving Causal Combinations and
14 Cases (MFs for the Six Causal Conditions are in Table
8.2)
Case
Memberships of surviving causal combinations
a
H 1H 2h 3h 4 h 1h 2h 3h 4 H 1H 2H 3H 4
1 0.52 0.31 0.20
2 0.11 0.66 0.11
3 0.00 0.00 1.00
4 0.00 0.98 0.00
5 0.00 0.00 1.00
6 0.00 0.90 0.00
7 0.00 0.00 1.00
8 0.00 0.00 1.00
9 0.06 0.05 0.94
10 0.13 0.84 0.09
11 0.07 0.92 0.07
12 0.00 0.00 1.00
13 0.41 0.59 0.36
14 0.08 0.51 0.08
a
Each boldface MF is greater than 0.5.
When all of the 640 training cases were used eight causal combinations (rules) survived; they
are given in Table 8.5. Observe that the first two rules are supported by a combined 600 cases,
leaving only 40 cases to be shared among the other six rules.
The eight surviving causal combinations were substituted into (8.22), (8.21) and (8.17) in order
to establish the VSR equation, whose nine regression coefficients were then computed via SVD,
the results being:
0
2.1215 and
b
1
,...,b
8
whose values are given in the last column of Table
8.5.
The QPSO-optimized MFs are depicted in Fig. 8.5; we used 1000 particles and a maximum of
200 QPSO generations. Observe that these MFs are different from the LM-FCM MFs in Fig. 8.4.
Table 8.5
Fold-1 First Iteration Surviving Causal Combinations and
Regression Coefficients for the Mackey-Glass Chaotic Time-Series
Prediction Problem
a
Rule
Number
H x(t-18) H x(t-12) H x(t-6) H x(t)
Number
of cases
Regression
Coefficients
1 0 0 0 0 364 0.1731
2 1 1 1 1 236 -0.9544
3 0 1 1 1 12 -0.8633
4 0 0 0 1 11 -0.1243
5 1 1 1 0 3 0.3420
6 1 1 0 0 2 -0.5567
7 0 0 1 1 5 0.3247
8 1 0 0 0 7 2.6770
a
1 (0) represents the causal condition (complement of the causal condition) in a
surviving causal combination (rule).
172
Fig. 8.5. Optimized MFs for the first iteration
(r = 1).
After finding the optimized MFs and regression coefficients, the following training and
validation objective function values were computed for the first complete iteration
(r = 1) of the
first fold, using (8.33) and (8.34): (1) 0.0273
trn
J and (1) 0.0254
val
J .
This entire procedure was then repeated for
r = 2. The optimized MFs from r = 1 were used to
establish the structure of a new set of rules for
r = 2 (see Fig. 8.1), etc. When
r = 9 the rules
become the same as those from
r = 8, so it was not necessary to go through all of the
r
max
= 100
iterations in the outer loop. Fig. 8.6 shows the surviving rules
1
for each of the outer-loop
iterations. Observe that the sets of rules change for the outer-loop iterations, which is why we
call this method “Variable Structure” regression. We began with eight rules, reached a maximum
of 10 rules and finished with six rules.
1
The antecedent of each rule contains a causal condition or its complement; hence, each rule can be represented as a
binary number. Its corresponding decimal number is used to show the surviving rules in each outer-loop iteration.
173
Fig. 8.6. Decimal equivalent number of surviving rules for each outer-loop iteration.
Our final model was established by using (8.40) and (8.41). The final model with smallest
validation error is 3 r
. Its rules are given in Table 8.6,
0
0.4501 and its regression
coefficients
b
1
,...,b
8
are given in the last column of that table. Observe that the final VSR model
has eight rules just as did the original model; however, the last two rules in the final model are
different from the last two rules in the first model (Table 8.5). Observe also, from Tables 8.5 and
8.6 that the number of cases covered by each rule is also different. Cases have re-aligned
themselves as the structure of the VSR model changed.
The final optimized MFs are depicted in Fig. 8.7; they are quite different looking from the LM-
FCM MFs in Fig. 8.4.
The final training and validation RMSEs are
*
0.0042
trn
J and
*
0.0044
val
J , both of which
are close to an order of magnitude smaller than their values from the first iteration
1
.
1
To find the final model across the five folds VSR is first performed on all folds after which the final winning
model across the five folds is the one that has the smallest RMSE on all of the data.
174
Fig. 8.7. Fold-1 final optimized MFs.
Table 8.6
Fold-1 Final Rules and Regression Coefficients for
Mackey-Glass Chaotic Time Series Prediction Problem
a
Rule
Number
H x(t-18) H x(t-12) H x(t-6) H x(t)
Number
of cases
Regression
Coefficient
1 0 0 0 0 321 0.7328
2 1 1 1 1 208 -0.3147
3 0 1 1 1 23 -0.9112
4 0 0 0 1 8 -1.0021
5 1 1 1 0 23 0.8922
6 1 1 0 0 31 0.0674
7 0 1 0 0 19 1.3423
8 0 1 0 1 7 0.0457
a
1 (0) represents the causal condition (complement of the causal condition)
in a surviving causal combination (rule).
B. Multivariate Regression and Time Series Prediction Problems
In this section we summarize results obtained when VSR was applied to four multivariate
regression problems (Abalone [79], Concrete Compressive Strength [79], Concrete Slump Test
[79], and Wave Force [80]) and four time-series prediction problems (Mackey-Glass Chaotic
Time Series [82], Chemical Process Concentration Readings [81], Chemical Process
Temperature Readings [81], and Gas Furnace [81]). Performance results for these eight problems
were reported on recently in [105]. So that it is easy to compare the results in this paper with the
ones in [105], we use the same style tables as the ones in [105].
Table 8.7 summarizes the characteristics of the multivariate regression problems [79], [80] and
the time-series prediction problems [81], [82]. It also contains the number of the rules in a fuzzy
175
rule base obtained by applying the Wang-Mendel method [34] to the data. Observe that, except
for Wave Force Prediction, the results from linear regression (third row) are much worse for the
multivariate regression problems than they are for the time-series prediction problems.
Table 8.7
Characteristics of Fuzzy Rules Bases for Multivariable Regression and Time-Series Prediction Problems Constructed by Fuzzy
Rules Generation Method in [105]
Problems
Characteristics
Abalone
[79]
Concrete
Compressive
Strength
[79]
Concrete
Slump
Test
[79]
Wave
Force
Prediction
[80]
Chemical
Process
Concentration
Reading
Prediction
[81]
Chemical
Process
Temperature
Reading
Prediction
[81]
Gas
Furnace
Prediction
[81]
Mackey-
Glass
Chaotic
Time-
Series
Prediction
[82]
Number of Attributes
(variables)
7 8 9 3 3 3 6 4
Number of Samples
in the Data Set
(Cases)
4177 1030 103 317 194 223 293 1000
Standard Deviation
of the Predicted
Errors Predicted by
the Linear Regression
Method
1.54 6.29 1.47 0.08 0.21 0.09 0.22 0.0845
Number of Isosceles
Triangular Fuzzy
Sets For each
Attribute
5 4 3 4 4 4 4 5
Number of Generated
Fuzzy Rules in the
Fuzzy Rule Base
Based on [42]
121 296 83 24 27 16 72 68
We performed double Monte-Carlo simulations of the five-fold cross-validation method for
VSR on all eight problems. We divided N data pairs into three data sets: training, validation and
testing. We randomly select about 20% of data for testing. The remaining 80% of data are used
for learning which are again divided into training and validation folds. Table 8.8 shows number
of cases for each data set for all eight problems. For each randomly selected testing data, we
performed 30 Monte-Carlo simulations of the five-fold cross-validation method on the learning
data sets. Because we randomly select testing data we also perform another 15 Monte-Carlo
simulations for the testing data set. As in Section A, we used 1000 particles, a maximum of 200
QPSO generations and a maximum of 100 outer loop iterations. Table 8.9 compares the number
of fuzzy rules in the FRI reduced fuzzy rule base [105] and the average number of rules from the
VSR method (for the 15 30 5 2250 Monte-Carlo simulations). Except for the two concrete
problems the number of rules from the VSR method is considerably smaller than the number of
rules from FRI. We provide an explanation for this below.
176
Table 8.8
Number of cases for each data set
Data Set
Number of Cases
Training Validation Testing All data
Wave Force 228 57 32 317
Gas Furnace 212 53 28 293
Mackey Glass 640 160 200 1000
Chemical Process Temperature Reading 144 36 43 223
Chemical Process Concentration Reading 124 31 39 194
Concrete Compressive Strength 660 165 205 1030
Concrete Slump Test 68 17 18 103
Abalone 2672 668 837 4177
Table 8.9
Comparison of Number of Fuzzy Rules Obtained From VSR and FRI [105] Methods for Different Problems
Problems
Characteristics
Abalone
[79]
Concrete
Compressive
Strength
[79]
Concrete
Slump
Test
[79]
Wave
Force
Prediction
[80]
Chemical
Process
Concentration
Reading
Prediction
[81]
Chemical
Process
Temperature
Reading
Prediction
[81]
Gas
Furnace
Prediction
[81]
Mackey-
Glass
Chaotic
Time-
Series
Prediction
[82]
Number of Fuzzy
Rules in the
Reduced Rule Base
FRI [6]
a
41 45 26 24 27 16 72 68
Average Number of
Fuzzy Rules in VSR
31.99 74.02 39.11 8.03 9.12 6.94 15.23 14.63
a
The numbers in this row were taken from the second row of Table IV in [105].
Table 8.10 illustrates the average RMSEs and standard deviation of all data sets for all eight
problems. Table 8.11 shows a comparison of the average RMSE after double Monte-Carlo
simulations of the five-fold cross-validation method (15 30 5 2250 RMSEs) obtained by the
VSR method. In all cases they are smaller than the average RMSEs for the FRI method in [105]
that uses interval type-2 fuzzy sets.
Table 8.12 shows that the VSR method gets smaller average RMSEs than the methods
presented in [105], [107]–[109]. It is interesting to note the reduction in RMSE by FRI for the
Compressive Concrete Strength and Concrete Slump problems is not very large (as compared to
prior methods), but it is large from VSR. It appears that more rules are needed to achieve a
significant reduction in RMSE, and VSR is able to find these. Note, also that the average VSR
RMSE for the Mackey-Glass prediction problem is an order of magnitude smaller than that from
the best previously reported method in [105], and this is accomplished on average using only
14.63 rules (as compared to using 68 rules) and type-1 fuzzy sets.
177
Table 8.10
Average RMSE and Standard Deviation of Double Monte-Carlo for
Different Problems
Data Set
RMSE
Training Validation Testing All data
Abalone
Mean 2.1648 2.1800 2.1524 2.1658
STD 0.0577 0.1168 0.1097 0.0477
Concrete Compressive
Strength
Mean 5.687 6.2433 8.823 6.8723
STD 0.3354 0.4573 0.4168 0.5401
Concrete Slump Test
Mean 1.3782 2.5523 2.3206 2.5341
STD 0.3713 0.9276 0.9475 1.3005
Wave Force
Mean 0.1245 0.1286 0.1367 0.1268
STD 0.0047 0.0126 0.0214 0.0036
Chemical Process
Concentration Reading
Mean 0.2977 0.3338 0.3434 0.3146
STD 0.0152 0.0463 0.04269 0.0057
Chemical Process
Temperature Reading
Mean 0.1951 0.1961 0.2070 0.1991
STD 0.0843 0.0980 0.0996 0.0864
Gas Furnace
Mean 0.3136 0.3277 0.4066 0.3310
STD 0.0605 0.0925 0.2205 0.0753
Mackey Glass
Mean 0.0052 0.0052 0.0056 0.0054
STD 0.0049 0.0047 0.0065 0.0052
Table 8.11
Average RMSE and Standard Deviation of Double Monte-Carlo Runs for Different Problems
Problems
Methods
Abalone
[79]
Concrete
Compressive
Strength
[79]
Concrete
Slump
Test
[79]
Wave
Force
Prediction
[80]
Chemical
Process
Concentration
Reading
Prediction
[81]
Chemical
Process
Temperature
Reading
Prediction
[81]
Gas
Furnace
Prediction
[81]
Mackey-
Glass
Chaotic
Time-
Series
Prediction
[82]
Average FRI
RMSE [6]
a
2.5296
0.0020
13.6080
0.0320
5.9325
0.0160
0.1396
0.0001
0.3370
0.0002
0.2965
0.0037
0.7787
0.0031
0.0475
0.01
Average VSR
RMSE
2.1658
0.0477
6.8723
0.5401
2.5341
1.3005
0.1268
0.0036
0.3146
0.0057
0.1991
0.0864
0.3310
0.0753
0.0054
0.0052
a
The numbers in this row were taken from Table V in [105].
Table 8.12
Comparisons of Average RMSE for Different Methods
a
on Different Problems
Methods
Average
RMSE
Problems
HS Method
[109] (Using
Type-1
Gaussian
Fuzzy Sets)
CCL Method
[107] (Using
Type-1
Gaussian
Fuzzy Sets)
CK Method
[108]
(Using
Type-1
Gaussian
Fuzzy Sets)
FRI [105]
(Using Type-1
Gaussian
Fuzzy Sets)
FRI [105]
(Using the
Optimal
Learned
Interval Type-2
Gaussian Fuzzy
Sets)
VSR
Abalone Problem [13] 3.1511 2.6470 3.1599 2.6312 2.5296 2.1658
Concrete Compressive
Strength problem [13]
14.7940 15.6982 15.0666 14.2704 13.6080 6.8723
Concrete Slump Test
Problem [13]
6.4636 6.7137 6.3193 6.5963 5.9325 2.5341
Wave force prediction
Problem [18]
0.1617 0.1710 0.2164 0.1486 0.1396 0.1268
Chemical Process
Concentration reading
prediction [2]
0.3705 0.3818 0.3721 0.3375 0.3370
0.3146
Chemical Process
Temperature Reading
prediction [2]
0.4972 0.5383 0.4911 0.3490 0.2965
0.1991
Gas Furnace Problem [2] 1.2640 2.0914 1.2688 0.8573 0.7787 0.3310
Mackey-Glass Chaotic Time
Series [8]
0.0712 0.1609 0.0973 0.0605 0.0475 0.0054
a
The numbers for the first five methods were taken from Table VI in [105].
178
Using the Friedman Test [110], [111] we made a statistical performance analysis between the
VSR method and the methods presented in [105], [107]–[109]. Based on the experimental results
shown in Tables 8.11 and 8.12, we obtained the rank and average ranks of the VSR method and
the methods presented in [105], [107]–[109] for all eight problems. Observe, from Table 8.13,
that the VSR method has the top rank (1) and average rank of all the methods presented in [105],
[107]–[109].
Table 8.13
Ranks and Average Ranks for Different Methods
a
With Respect to Multivariable Regression Problems and Time Series
Prediction Problems
Methods
Average
RMSE
Problems
HS Method
[109] (Using
Type-1 Gaussian
Fuzzy Sets)
CCL Method
[107] (Using
Type-1
Gaussian
Fuzzy Sets)
CK Method
[108] (Using
Type-1
Gaussian
Fuzzy Sets)
FRI [105]
(Using Type-
1 Gaussian
Fuzzy Sets)
FRI [105]
(Using the
Optimal
Learned
Interval Type-2
Gaussian Fuzzy
Sets)
VSR
Abalone Problem [13] 5 4 6 3 2 1
Concrete Compressive
Strength problem [13]
4 6 5 3 2 1
Concrete Slump Test
Problem [13]
4 6 3 5 2 1
Wave force prediction
Problem [18]
4 5 6 3 2 1
Chemical Process
Concentration reading
prediction [2]
4 6 5 3 2 1
Chemical Process
Temperature Reading
prediction [2]
5 6 4 3 2 1
Gas Furnace Problem [2] 4 6 5 3 2 1
Mackey–Glass Chaotic
Time Series [8]
4 6 5 3 2 1
Average Ranks 4.25 5.625 4.875 3.25 2 1
a
The numbers in the first five columns were computed by using the numbers in Table VII in [105].
8.8 Discussion
VSR begins with a large collection of candidate causal combinations and finishes with a much
smaller collection of them using data to accomplish this. As explained in Comment 8.5, each
term in the VSR model has a linguistic interpretation, because each term in the final optimized
VSR model is associated with one linguistic if-then rule. In retrospect, one may therefore choose
to view VSR as a new method for extracting rules from data or as a rule reduction technique.
Space does not permit us to provide overviews of both of these subjects for which there are
many references. Instead, we shall only make a small number of observations.
The direct extraction of rules from data began in 1991 [34]. The WM method, and its
extensions [112] are very widely used. Table 8.14 compares WM and VSR rules. After going
179
through this table, the reader will observe that WM and VSR rules are very different, most
notably in their number, nature, novelty, length of antecedent and intended use. Although VSR
rules were developed in the context of regression, they can also be used for data mining.
Although WM rules were developed for data mining, they have also been used for prediction
(e.g., [112]) and regression (e.g., [113]); however, they do not seem to lead to accurate results.
Whether or not the VSR rules are as easy to interpret as the WM rules, because the former use
the complement of a term as well as the term, remains to be seen.
Rule reduction traces its origin to [114] who state: “… dense rule bases should be reduced so
that only the minimal necessary number of rules remain still containing the essential information
in the original base, and all other rules are replaced by the interpolation algorithm that however
can recover them with a certain accuracy prescribed before reduction.” Yang and Shen [103]
state: “Fuzzy rule interpolation significantly improves the robustness of fuzzy inference system.
It provides a way to reduce the complexity of fuzzy systems by omitting those rules [that] can be
approximated by their neighboring ones. Also, importantly, it enhances the applicability of fuzzy
systems by allowing a certain conclusion to be generated even if the existing rule base does not
cover a given observation.” Although VSR does reduce the original set of causal combinations,
there is no rule interpolation associated with the remaining rules. So, VSR is not the same as
traditional rule reduction.
180
Table 8.14
Comparisons of WM and VSR Rules
Item WM Rules
IF x
1
is A
1
v
... and x
p
is A
p
v
, THEN y
v
= B
v
VSR Rules
a
IF x
1
is A
1
v
... and x
p
is A
p
v
, THEN y
v
= b
v
( Û B
v
)
Partitioning of variables
— Each of V variables
(x
1
,x
2
,...,x
V
) is partitioned
into a pre-assigned number of terms (fuzzy sets),
n
1
,n
2
,...,n
V
, respectively
— Each of V variables
(x
1
,x
2
,...,x
V
) is partitioned into one
term (fuzzy set) and its complement; or
— Each of V variables
(x
1
,x
2
,...,x
V
) is partitioned into
n
1
,n
2
,...,n
V
terms, respectively (called causal conditions),
as well as their complements
Membership functions — MFs must be specified for each term — MFs must be specified for each term
— MF of complement is computed from MF of the term
Number of rules
—
n
v v=1
V
Õ
—
R
S
— Found by very novel data processing using min-max
Theorem
—
R
S
<< n
v v=1
V
Õ
Nature of a rule — Each rule is complete in that the data determines
both its (compound) antecedent and consequent
— Each rule is incomplete in that the data only determines its
(compound) antecedent
— Rule consequents become regression coefficients that are
learned/optimized from the data
Length of antecedent — V
— One term for each variable
— Terms are connected using AND
— jV, when each variable is partitioned into j terms and its
complement (each compound antecedent contains j terms
per variable, either the term or its complement)
— Terms are connected using AND
Conflicting rules
b
— Can occur
— Resolved in a simple way (see [45])
— Do not occur
Novelty — First method to extract rules directly from data — Use of the complement of a term
Incomplete set of rules
c
— Can occur
— Due to a priori partitioning of the V variables
into
n
1
,n
2
,...,n
V
terms and not enough data
— Two other versions of the original WM Method
have been developed for completing the set of
rules [112]
— Does not occur
— The data establishes how many rules there should be, rather
than an a priori partitioning of the V variables into
n
1
,n
2
,...,n
V
terms.
Interpretability — Easy, because antecedents use terms — Complements of terms may be difficult to interpret
Intended use — Data mining — Regression (could also be used in rule-based classification)
a
For illustrative purposes, we only show a VSR rule for which each variable is partitioned into one term and its complement.
b
Rules with the same antecedent but different consequents.
c
When the data does not reveal all
n
v v=1
V
Õ
rules.
8.9 Conclusions
This paper has presented a new kind of non-linear regression model, Variable Structure
Regression (VSR), in which type-1 fuzzy sets are used to pre-process variables to simultaneously
determine how many regressors there are and how the variables should be combined in each of
the regressors. A novel feature of this new model is it not only uses a linguistic term for a
variable but it also uses the complement of that term.
We have also presented an iterative procedure for optimizing both the structure and parameters
of the VSR model. The most novel part of this procedure is the outer structural optimization loop
(Fig. 8.1). Although we have used QPSO for the inner parameter optimization loop, any
evolutionary computing method could be used.
181
Results obtained from VSR were compared against five other methods for eight readily
available data sets (four for multivariate approximation and four for forecasting) and VSR ranks
#1 in all cases.
Stepping back from all of the details, we wish to summarize the strengths of VSR:
1) VSR automatically finds the number of terms
(R
S
+1) in the nonlinear regression model
(2), thereby freeing the end-user from trial and error time-consuming studies to determine
this.
2) VSR automatically establishes the mathematical structures of each of the terms in the
nonlinear regression model (2), thereby further freeing the end user from trial and error
time-consuming studies to determine this.
3) Each term in the VSR model has a linguistic interpretation, because each term in the final
optimized VSR model is associated with one linguistic if-then rule, thereby providing the
end user with a physical understanding for each term.
4) The expert knowledge of long-time employees can be preserved, because qualitative
expert knowledge may be combined with quantitative data
1
[99]
a) Each such piece of knowledge contributes one new term to the VSR model
b) That term always retains its identity so it is easy to evaluate the relative
importance of qualitative and quantitative terms
5) The importance of each rule can be determined automatically, by examining the number
of cases that support each rule, thereby helping the end user to better understand the
model
2
.
6) The VSR model acts like a multitude of models, because the number of its terms that are
actually activated (i.e., produce a non-zero component) by a set of measured variables
changes automatically, i.e. different groups of terms in the VSR model (8.2) are activated
depending upon what the numerical values are for the variables, and this happens on the
fly
3
(this is the second reason for the designation “VSR”)
1
This was not mentioned earlier. If an expert’s knowledge can be put into the form of an if-then rule, then it can also
be quantified using the mathematics of fuzzy sets and fuzzy logic.
2
One may also be able to apply methods from classical regression modeling that focus on establishing which terms
in (8.2) are really significant, to further simplify (8.2).
3
This is also due to the mathematics of fuzzy sets and fuzzy logic and is true for any fuzzy logic rule-based
regression model.
182
a) A variable-structure model is very different from a traditional regression model
in which all of the terms in (8.2) are always activated and contribute to the final
answer regardless of the numerical values of the input variables
b) It is as though the end-user decided to partition the multi-dimensional variable-
space and found a different model for each of the partitions, without actually
having to do this; both the partitioning and the different model for each partition
occur automatically, thereby leading to a very novel kind of regression model,
one that could not have been obtained by the end user without an enormous
amount of trial and error.
c) Because each term in the VSR model corresponds to a rule, this variable-
structure property is equivalent to choosing the most important rules that
correspond to different regions of the variable space, which could correspond to
a special area or zone
7) Applications of the VSR model to eight readily available data sets have demonstrated that
it provides the best results as compared to five other methods (one of which used interval
type-2 fuzzy sets).
Two weaknesses of VSR are:
1) Parameter optimization is a computational bottleneck because of the large number of
particles and iterations that are required by a swarm optimization algorithm. Other faster
swarm algorithms are needed; or, perhaps the next generation of computers—quantum
computers—will resolve this.
2) No mathematical theory is available to explain why VSR works so well, even for only
one term per variable. Such a theory would be very helpful.
We conclude this article with some interesting VSR research issues:
1) Compare the VSR model to a feed-forward neural network (FFNN). VSR uses sigmoidal
preprocessing on the measurements and then nonlinear combining of those results; a
FFNN combines measurements linearly and then applies a sigmoidal nonlinearity to that
result. Is there some sort of mathematical connection between the two?
2) Extend VSR:
a) From T1 to Interval Type-2 (IT2) to General Type-2 (GT2) VSR models. We
presently do not have a min-max theorem for IT2 or GT2 FSs, and without such
183
a result the computations of the surviving casual combinations would be very
time consuming.
b) To non-singleton fuzzification in order to compensate for noisy data.
c) To more than one term per variable. Computing the surviving causal
combinations in the cases of two and three terms per variable has been studied
extensively in [35]. There are some very interesting theoretical results that occur.
d) To a VS Classifier (VSC), in which the nonlinear discriminant functions would
have the same structure as (8.22), but all of its parameters would be optimized so
as to minimize classification errors.
184
Chapter 9
Conclusions and Future Works
It is quite common these days for people who work in the general field of computational
intelligence (CI), which includes fuzzy sets as one of its major pillars (the others being neural
networks and evolutionary computing), to inquire how a CI technique can be used to solve
problems in interdisciplinary or non-traditional (i.e., non-engineering or non-computer-science)
fields. The expectations are there will be a flow from CI into these fields. Rarely, does the flow
occur in the other direction. Charles Ragin’s fsQCA is one of those remarkable exceptions and
represents a flow from social science and political science into CI.
We explained for the first time fsQCA in a very quantitative way, something that is not found
in the existing literature, and something that is needed if engineers and computer scientists are
to use fsQCA.
This thesis has also provided a new theoretical result for fsQCA, a result that could only have
been obtained after a formal quantification of fsQCA had occurred. We have provided various
ways to speed up the computations within fsQCA. The min-max Theorem is our most useful
result and is one that we now use all of the time. Doing so has led to a modification of Steps 5
and 6 in fsQCA to Steps 5NEW and 6NEW in “Fast fsQCA.”
By using the recursive formula for consistency we have not only been able to speed up its
computations, but have also gained an understanding about when a rule can be obliterated, which
has helped us in choosing cases.
The embedding corollaries are very useful in that they also provide insights about fsQCA.
The fact that all of the
2
k
candidate causal combinations postulated in Steps 5 or 5NEW may
not exist, when a variable is described by two or more terms, was a surprise to us. This has
profound implications on the way in which the parsimonious solutions have to be computed when
using the QM algorithm; it also affects Step 9 where Counterfactual Analysis (CA) is performed,
because CA must now also be constrained to the subset of possible remainders.
In order to solve the problem of dealing multiple term per variable, we have provided a new
way for calibrating the fuzzy sets that are used in fsQCA, one that is based on clearly
185
distinguishing between a linguistic variable and the linguistic terms for that variable, and that
overcomes our criticism about the MF that is used by fsQCA practitioners. The resulting fuzzy
sets are reduced-information level 2 fuzzy sets (RI L2 FS). This MF has an S-shape, (T1 or IV),
which is the kind of MF shape that is so widely used by fsQCA scholars, and (as explained in this
paper) is so important to fsQCA.
We have applied our new calibration procedure to Ragin’s Breakdown of Democracy example,
using new data provided to us by him, and have demonstrated that we are able to obtain his earlier
solutions using either T1 or IV fsQCA, something that should be reassuring to fsQCA scholars.
By using IV fsQCA we are also able to study the robustness of fsQCA to breakpoint location
uncertainties as well as to membership grade uncertainties, and have demonstrated that IV fsQCA
is robust to both. Finally, because the S-shaped MFs (FOUs) were derived from FOUs for all of
the linguistic variable’s terms, we have shown how it is possible to also obtain more precise
statements of those causal combinations that do not use complements of the causal conditions
(e.g., for their best instances), something that may be of added value to practitioners of fsQCA.
Because words mean different things to different people (e.g., [17]) the fuzzy sets that are used
in fsQCA should be IV fuzzy sets rather than type-1 fuzzy sets. This means that fsQCA needs to
be re-examined for IV FSs. We extended steps of fsQCA to IV fuzzy sets because we believe
uncertainty about a linguistic variable can be captured with such FSs rather than T1 FSs. To do
this, we have extended Ragin’s direct method to generate IV MFs and used them for IV fsQCA.
In order to implement IV fsQCA, three major modifications were made to fsQCA: (1) we
extended our T1 min-max theorem to IT2 FSs so that surviving causal combinations could be
easily computed; (2) we used an interval ranking procedure to help establish a surviving causal
combination; and, (3) we used the Vlachos-Sergiadis’s IT2 FS subsethood measure instead of
Kosko’s subsethood measure. Note that IV fsQCA reduces to fsQCA when all of the IV FSs
reduce to T1 FSs.
We also used some key steps of fsQCA to present a very efficient method for establishing the
initial (nonlinear) combinations of variables that can then be used in later modeling and
processing (e.g., regression, classification, neural networks, etc.) by using a novel form of
preprocessing that transforms raw data into patterns through the use of fuzzy sets. Our method
186
lends itself to massive distributed and parallel processing which makes it suitable for data of all
sizes from small to big.
We also have shown how the surviving causal combinations can be used in a new regression
model, called Variable Structure Regression (VSR). Using the surviving causal combinations one
can simultaneously determine the number of terms in the (nonlinear) regression model as well as
the exact mathematical structure for each of the terms (basis functions). VSR has been tested on
the eight classical (and readily available) small to moderate size data sets (four are for multi-
variable function approximation and four are for forecasting), has been compared against five
other methods, and has ranked #1 against all of them for all of the eight data sets.
All of the results in this report were possible only after we quantified fsQCA.
We believe that fsQCA has the potential to be widely applicable in engineering and computer
science. In future work we shall report on its use to an Auto-MPG application and Breakdown of
democracy example as well as to other applications.
We would like to remind the reader that as one goes from linguistic-term FOUs to their
centroids (or maximum dispersion intervals) to the COGs of those centroids information is lost.
How to use the entire FOU in fsQCA computations for such a L2 FS remains to be studied.
Finally, although our new calibration procedure has been motivated by fsQCA and has led to
the S-shaped interpolated MF for a RI L2 FS, we suggest that perhaps this mapping from a set of
FOUs for the linguistic terms of a linguistic variable into a L2 MF for the linguistic variable may
also be useful outside of fsQCA, e.g. in computing with words [17].
Validation of fsQCA also needs to be examined because engineers and computer scientists are
so used to doing this for, e.g. classification problems. Validation experiments will provide a level
of confidence to engineers and computer scientists so that they will try fsQCA on other
problems. We reported on some validation experiments in [115]; however, we need to study
more theoretical aspects of it.
187
Appendix A. Proofs
A.1 Proof of Theorem 3.1
When
12
...
j j j
jk
F A A A , where A
i
j
= C
i
or c
i
, then
m
F
j
(x)
is given by (3.15), where
m
A
i
j
(x) = m
C
i
D
(x) or m
c
i
D
(x), or equivalently, as:
m
A
i
j
(x) = min m
C
i
D
(x),m
c
i
D
(x)
{ }
or max m
C
i
D
(x),m
c
i
D
(x)
{ }
(A.1)
where
max m
C
i
D
(x),m
c
i
D
(x)
{ }
> 0.5
min m
C
i
D
(x),m
c
i
D
(x)
{ }
£ 0.5
ì
í
ï
î
ï
; (A.2)
Consequently, only if m
F
j*( x )
(x) is given by (3.16) can (3.15) have a MF value that is 0.5 .
Observe that (3.17) is an immediate consequence of (3.
A.2 Proof of Corollary 3-1-1
It is easy to prove both (3.22) and (3.23) by using (3.16), (3.17) and mathematical induction.
Here the proofs are illustrated for k = 3.
(a) Proof of (3.22): From (3.17), it follows that:
F
j*(x)
(x | C
1
,C
2
) = argmax m
C
1
D
(x),m
c
1
D
(x)
( )
argmax m
C
2
D
(x),m
c
2
D
(x)
( )
(A.3)
F
j*(x)
(x | C
1
,C
2
,C
3
) = argmax m
C
1
D
(x),m
c
1
D
(x)
( )
argmax m
C
2
D
(x),m
c
2
D
(x)
( )
argmax m
C
3
D
(x),m
c
3
D
(x)
( )
(A.4)
Comparing (A.3) and (A.4), it is easy to see that:
F
j*(x)
(x | C
1
,C
2
,C
3
) = F
j*(x)
(x | C
1
,C
2
)argmax m
C
3
D
(x),m
c
3
D
(x)
( )
(A.5)
which is (3.22).
(b) Proof of (3.23): From (3.16), it follows that:
m
F
j*( x )
(x | C
1
,C
2
) = min max m
C
1
D
(x),m
c
1
D
(x)
( )
,...,max m
C
k
D
(x),m
c
k
D
(x)
( ) { }
(A.6)
188
m
F
j*( x )
(x | C
1
,C
2
,C
3
) = min max m
C
1
D
(x),m
c
1
D
(x)
( )
,max m
C
2
D
(x),m
c
2
D
(x)
( )
,max m
C
3
D
(x),m
c
3
D
(x)
( ) { }
(A.7)
(A.7) can also be expressed as:
m
F
j*( x )
(x | C
1
,C
2
,C
3
) = min min max m
C
1
D
(x),m
c
1
D
(x)
( )
,max m
C
2
D
(x),m
c
2
D
(x)
( ) { }
,max m
C
3
D
(x),m
c
3
D
(x)
( )
{ }
(A.8)
which can then be expressed as:
m
F
j*( x )
(x | C
1
,C
2
,C
3
) = min m
F
j*( x )
(x | C
1
,C
2
),max m
C
3
D
(x),m
c
3
D
(x)
( ) { }
(A.9)
which is (3.23). QED.
A.3 Proof of Corollary 3-1-3
(3.26) follows from (3.23) by making use of the facts: min(a,b) = a if a £ b , or min(a,b) = b
if b < a .
A.4 Proof of Theorem 3.2
Beginning with (3.29) and N > N
1
, it follows that:
ss
K
(F
l
S
,O | N) =
min(m
F
l
S
(x),m
O
D
(x)) + min(m
F
l
S
(x),m
O
D
(x))
x=N
1
+1
N
å
x=1
N
1
å
m
F
l
S
(x)
x=1
N
å
=
min(m
F
l
S
(x),m
O
D
(x))
x=1
N
1
å
m
F
l
S
(x)
x=1
N
å
+
min(m
F
l
S
(x),m
O
D
(x))
x=N
1
+1
N
å
m
F
l
S
(x)
x=1
N
å
=
min(m
F
l
S
(x),m
O
D
(x))
x=1
N
1
å
m
F
l
S
(x) + m
F
l
S
(x)
x=N
1
+1
N
å
x=1
N
1
å
+
min(m
F
l
S
(x),m
O
D
(x))
x=N
1
+1
N
å
m
F
l
S
(x)
x=1
N
å
=
1
1+
m
F
l
S
(x)
x=N
1
+1
N
å
m
F
l
S
(x)
x=1
N
1
å
´
min(m
F
l
S
(x),m
O
D
(x))
x=1
N
1
å
m
F
l
S
(x)
x=1
N
1
å
+
min(m
F
l
S
(x),m
O
D
(x))
x=N
1
+1
N
å
m
F
l
S
(x)
x=1
N
å
(A.10)
189
Using the formula for ss
K
(F
l
S
,O | N
1
) that is given in (3.28) it is easy to see that (A.10) can be
expressed as in (30).
A.5 Proof of Corollary 3-2-1
(a) Necessity of (3.33): When ss
K
(F
l
S
,O | N) ³ ss
K
(F
l
S
,O | N
1
), then, using (3.30), it follows
that:
1
1
1
1
11
11
1
min( ( ), ( ))
1
( , | ) ( , | )
( ) ( )
1
()
S
l
SS
ll
S
l
N
D
O
F xNSS
K l K l NN
FF x N x
N
F x
xx
ss F O N ss F O N
xx
x
(A.11)
This inequality can be reorganized, by bringing its leftmost term to its right-hand side, as
follows:
min(m
F
l
S
(x),m
O
D
(x))
x=N
1
+1
N
å
m
F
l
S
(x)
x=1
N
å
³ ss
K
(F
l
S
,O | N
1
) 1-
1
1+
m
F
l
S
(x)
x=N
1
+1
N
å
m
F
l
S
(x)
x=1
N
1
å
é
ë
ê
ê
ê
ê
ê
ê
ù
û
ú
ú
ú
ú
ú
ú
³ ss
K
(F
l
S
,O | N
1
) 1-
m
F
l
S
(x)
x=1
N
1
å
m
F
l
S
(x)
x=1
N
1
å
+ m
F
l
S
(x)
x=N
1
+1
N
å
é
ë
ê
ê
ù
û
ú
ú
³ ss
K
(F
l
S
,O | N
1
)
m
F
l
S
(x)
x=N
1
+1
N
å
m
F
l
S
(x)
x=1
N
1
å
+ m
F
l
S
(x)
x=N
1
+1
N
å
é
ë
ê
ê
ù
û
ú
ú
min(m
F
l
S
(x),m
O
D
(x))
x=N
1
+1
N
å
m
F
l
S
(x)
x=1
N
å
³ ss
K
(F
l
S
,O | N
1
)
m
F
l
S
(x)
x=N
1
+1
N
å
m
F
l
S
(x)
x=1
N
å
é
ë
ê
ê
ù
û
ú
ú
(A.12)
Because m
F
l
S
(x)
x=1
N
å
> 0 and m
F
l
S
(x)
x=N
1
+1
N
å
> 0 , (A.12) can be expressed as:
1
1
1
1
1
min( ( ), ( ))
( , | )
()
S
l
S
l
N
D
O
F xN S
Kl N
F xN
xx
ss F O N
x
(A.13)
190
The left-hand side of this last equation is ss
K
(F
l
S
,O | N
2
) [see (3.31)]; hence, it follows that
(A.13) can be expressed as in the top part of (3.33).
The proof for ss
K
(F
l
S
,O | N) < ss
K
(F
l
S
,O | N
1
) is so similar to the one just given for
1
( , | ) ( , | )
SS
K l K l
ss F O N ss F O N , that we leave it to the reader.
(b) Sufficiency of (3.33): Begin with the top part of (3.33), which is also (A.13), and reverse the
steps that led to it. Doing this one will obtain the top part of (3.32), since the left-hand side of
(A.11) is ss
K
(F
l
S
,O | N), by virtue of Theorem 3.2.
A.6 Proof of Corollary 3-2-2
(a) Sufficiency of (3.36): To begin, we express ss
K
(F
l
S
,O | N) in (30) in terms of r, as:
11
1
1
11
1
11
1
12
1
min( ( ), ( )) ( )
( , | ) ( , | )
1
( ) ( )
()
( , | ) ( , | )
1
()
SS
ll
SS
ll
S
l
S
l
NN
D
O
FF x N x N SS
K l K l NN
FF x N x
N
F xN SS
K l K l N
F x
x x x
ss F O N ss F O N
xx
x
ss F O N ss F O N
x
(A.14)
Beginning with the identity
m
F
l
S
(x)
x=1
N
å
= m
F
l
S
(x)
x=1
N
1
å
+ m
F
l
S
(x)
x=N
1
+1
N
å
(A.15)
it is straightforward to show that
m
F
l
S
(x)
x=1
N
å
m
F
l
S
(x)
x=N
1
+1
N
å
=
m
F
l
S
(x)
x=1
N
1
å
m
F
l
S
(x)
x=N
1
+1
N
å
+1 = r +1 (A.16)
so that
m
F
l
S
(x)
x=N
1
+1
N
å
m
F
l
S
(x)
x=1
N
å
=
1
r +1
(A.17)
Substituting (A.17) into (A.14), it follows that:
12
1
( , | ) ( , | ) ( , | )
11
S S S
K l K l K l
ss F O N ss F O N ss F O N (A.18)
191
which is an interesting alternate way to express ss
K
(F
l
S
,O | N).
If (3.36) is true, making use of the fact that ss
K
(F
l
S
,O | N
1
) = 0.8 + D(N
1
), it follows that (A.18)
becomes:
ss
K
(F
l
S
,O | N) <
1
r +1
r ´ (0.8 + D(N
1
))+ (0.8 - D(N
1
)r)
[ ]
=
1
r +1
0.8(r +1)
[ ]
= 0.8(A.19)
which is (3.35) and completes the proof of the sufficiency of (36).
(b) Necessity of (3.36): Substituting ss
K
(F
l
S
,O | N
1
) = 0.8 + D(N
1
) into the right-hand side of
(3.30), and assuming that ss
K
(F
l
S
,O | N) < 0.8, it follows that:
1
1+
m
F
l
S
(x)
x=N
1
+1
N
å
m
F
l
S
(x)
x=1
N
1
å
´ (0.8 + D(N
1
)) +
min(m
F
l
S
(x),m
O
D
(x))
x=N
1
+1
N
å
m
F
l
S
(x)
x=1
N
å
< 0.8 (A.20)
Performing straightforward algebra on (A.20), and using (3.31) and (3.34), it follows that:
m
F
l
S
(x)
x=1
N
1
å
m
F
l
S
(x)
x=1
N
å
´ (0.8 + D(N
1
)) +
min(m
F
l
S
(x),m
O
D
(x))
x=N
1
+1
N
å
m
F
l
S
(x)
x=1
N
å
< 0.8
(0.8 + D(N
1
)) m
F
l
S
(x)
x=1
N
1
å
+ min(m
F
l
S
(x),m
O
D
(x))
x=N
1
+1
N
å
< 0.8 m
F
l
S
(x)
x=1
N
å
D(N
1
) m
F
l
S
(x)
x=1
N
1
å
+ min(m
F
l
S
(x),m
O
D
(x))
x=N
1
+1
N
å
< 0.8 m
F
l
S
(x)
x=N
1
+1
N
å
D(N
1
)
m
F
l
S
(x)
x=1
N
1
å
m
F
l
S
(x)
x=N
1
+1
N
å
+
min(m
F
l
S
(x),m
O
D
(x))
x=N
1
+1
N
å
m
F
l
S
(x)
x=N
1
+1
N
å
< 0.8
ss
K
(F
l
S
,O | N
2
) < 0.8 - D(N
1
)
m
F
l
S
(x)
x=1
N
1
å
m
F
l
S
(x)
x=N
1
+1
N
å
= 0.8 - D(N
1
)r (A.21)
which is (3.36). This completes the proof of the necessity of (3.36).
It is straightforward to obtain (3.37) from (3.36), by expressing ss
K
(F
l
S
,O | N
1
) = 0.8 + D(N
1
)
as
D(N
1
) = ss
K
(F
l
S
,O | N
1
)- 0.8 , (A.22)
192
substituting (A.22) into (3.36), and then solving for r.
A.7 Proof of Theorem 3.3
Let
S
F
be the finite space of 2
k
postulated candidate causal combinations ( j = 1,...,2
k
and
i = 1,...,k ), where [see (3.6)]:
S
F
= {F
1
,..., F
2
k
} ' F
j
= A
1
j
Ù A
2
j
Ù ... Ù A
k
j
A
i
j
= C
i
or c
i
ü
ý
ï
þ
ï
(A.23)
For simplicity,
iL
C is simplified below to L
i
.
Equation (3.17), in our Min-Max Theorem 1, lets us establish the winning causal combination
in Step 6NEW. It involves the MFs for L
i
and l
i
for all k causal conditions. Because there is no
coupling between the causal conditions in (3.17), we shall focus on argmax m
L
D
(x),m
l
D
(x)
( )
[where
m
l
D
(x) = 1- m
L
D
(x)], dropping the causal condition index i for notational simplicity, and
shall determine if it is possible for argmax m
L
D
(x),m
l
D
(x)
( )
to be either L or l, in which case these
two combinations are indeed possible.
Let
x
1
be the value of x when m
L
(x) = m
l
(x). It is obvious, from Fig. A.1, that
argmax m
L
D
(x),m
l
D
(x)
( )
=
L x(x) £ x
1
l x(x) > x
1
ì
í
ï
î
ï
(A.24)
Since both L and l may occur, the good news is that all postulated candidate causal combinations
F
j
( j = 1,..,2
k
)
are possible when only one term is assigned to each variable.
193
Fig. A.1. MF of Low (L) and its complement (l).
A.8 Proofs of Theorem 4
A.8.a Proof based on min-max Theorem 3.1: As in the proof of Theorem 3.3, we begin with
(3.17), which lets us establish the winning causal combination in Step 6NEW. (3.17) now
simultaneously involves the MFs for L and l, and H and h for all variables; hence, we focus on
argmax m
L
D
(x),m
l
D
(x)
( )
argmax m
H
D
(x),m
h
D
(x)
( )
, again dropping the causal condition index i for
notational simplicity, and shall determine if it is possible for
argmax m
L
D
(x),m
l
D
(x)
( )
argmax m
H
D
(x),m
h
D
(x)
( )
to include LH , Lh , lH and lh , in which case
these four candidate possible combinations would indeed be possible.
We will make use of (A.24) for L and l. The comparable results for H and h are easily deduced
from Fig. A.2, and are:
argmax m
H
D
(x),m
h
D
(x)
( )
=
H x(x) ³ x
2
h x(x) < x
2
ì
í
ï
î
ï
(A.25)
194
Fig. A.2. MF of High (H) and its complement (h).
When L, l, H, and h are displayed on the same figure (e.g., Fig. A.3), it becomes evident that
the universe of discourse of each variable is divided into three regions based on the relative
locations of x
1
and x
2
; hence, three situations can occur:
(a) x
1
< x
2
: Based on the geometry depicted in Fig. A.3, it is straightforward to conclude that:
argmax m
L
D
(x),m
l
D
(x)
( )
argmax m
H
D
(x),m
h
D
(x)
( )
=
Lh
lh
lH
x(x) £ x
1
x
1
< x(x) < x
2
x(x) ³ x
2
ì
í
ï
ï
î
ï
ï
(A.26)
Observe that LH is not possible.
Fig. A.3. MFs assigned to variable v
i
when
x
1
< x
2
.
195
(b) x
1
> x
2
: Based on the geometry depicted in Fig. A.4, it is straightforward to conclude that:
argmax m
L
D
(x),m
l
D
(x)
( )
argmax m
H
D
(x),m
h
D
(x)
( )
=
Lh
LH
lH
x(x) £ x
2
x
2
£ x(x) £ x
1
x(x) ³ x
1
ì
í
ï
ï
î
ï
ï
(A.27)
Observe that lh is not possible.
Fig. A.4. MFs assigned to variable v
i
when
x
2
< x
1
.
(c) x
1
= x
2
: In this case (e.g., see Fig. A.3 or Fig. A.4) there is no middle line in (A.26) [or
(A.27)], so that:
argmax m
L
D
(x),m
l
D
(x)
( )
argmax m
H
D
(x),m
h
D
(x)
( )
=
Lh x(x) £ x
1
lH x(x) ³ x
1
ì
í
ï
î
ï
(A.28)
Observe that LH and lh
In case (c), because L = h and l = H , (A.28) simplifies further to:
argmax m
L
(x),m
l
(x) ( )argmax m
H
(x),m
h
(x) ( ) =
L x(x) £ x
1
H x(x) ³ x
1
ì
í
ï
î
ï
=
h x(x) £ x
1
l x(x) ³ x
1
ì
í
ï
î
ï
(A.29)
196
A.8.b Proof based on (3.6):
From (3.6), recall that:
12
...
j j j
jk
F A A A (A.30)
For a causal combination to survive Step 6 of fsQCA,
m
F
j
(x) = min m
A
1
j
(x),m
A
2
j
(x),..., m
A
k
j
(x)
{ }
> 0.5
(A.31)
If, however, any one of the m
A
i
j
(x) in (A.31) is always less than 0.5 for all x, then regardless of
the other k -1 MF values, m
F
j
(x) can never pass the test in (A.31), in which case F
j
is not a
possible candidate causal combination.
In our situation where a variable is modeled using two terms, say A
1
j
and A
2
j
, then it is useful
to re-express (A.31), as:
m
F
j
(x) = min min m
A
1
j
(x),m
A
2
j
(x)
é
ë
ù
û
,m
A
3
j
(x)...,m
A
k
j
(x)
{ }
> 0.5 (A.32)
Now, if
min m
A
1
j
(x),m
A
2
j
(x)
é
ë
ù
û
< 0.5
for all x, then regardless of the other k - 2 MF values, m
F
j
(x)
can never pass the test in (A.32), in which case F
j
is not a possible candidate causal
combination.
By this argument, we direct our attention at min m
A
1
j
(x),m
A
2
j
(x)
é
ë
ù
û
, where A
1
j
= L or l and
A
2
j
= H or h . MFs for L, l, H and h are depicted in Fig. A.3 for x
1
< x
2
. Using these MFs, it is
straightforward to show (just sketch the minimum values of the left-hand side of (A.33)), that:
min(L,h) > 0.5 x(x) £ x
1
min(l,h) > 0.5 x
1
£ x(x) £ x
2
min(l, H ) > 0.5 x(x) ³ x
2
min(L, H ) < 0.5 "x(x) ÎX
ì
í
ï
ï
î
ï
ï
(A.33)
Observe, from this equation, that Lh, lH and lh are possible, but LH is impossible, which is in
agreement with item a in Theorem 3.4.
Similar proofs can be obtained for items b and c in Theorem 3.4, and are left to the reader.
197
A.9 Proof of Theorem 3.5
As in the proofs of Theorems 3.3 and 3.4, we begin with (3.17), which lets us establish the
winning causal combination in Step 6NEW. It now simultaneously involves the MFs for L and l,
M and m, and H and h for all variables; hence, we focus on
argmax m
L
D
(x),m
l
D
(x)
( )
argmax m
M
D
(x),m
m
D
(x)
( )
argmax m
H
D
(x),m
h
D
(x)
( )
,
again dropping the causal condition index i for notational simplicity, and shall determine if it is
possible for argmax m
L
D
(x),m
l
D
(x)
( )
argmax m
M
D
(x),m
m
D
(x)
( )
argmax m
H
D
(x),m
h
D
(x)
( )
to include
lmh, lmH, lMh, lMH, Lmh, LmH, LMh,
and LMH,in which case the eight possible candidate
combinations would indeed be possible.
We will make use of (A.24) for L and l and (A.25) for H and h. The comparable results for M
and m are easily deduced from Fig. A.5, and are:
argmax m
M
D
(x),m
m
D
(x)
( )
=
m x(x) £ x
3
M x
3
< x(x) < x
4
m x(x) ³ x
4
ì
í
ï
ï
î
ï
ï
(A.34)
Fig. A.5. MF of Moderate (M) and its complement (m).
198
Table 3.8 reveals there are 29 situations that must be considered. In order to illustrate how the
Table 3.8 results were obtained, we examine the 12
th
situation, when x
1
< x
2
and x
1
< x
3
< x
4
< x
2
.
One way to determine argmax m
L
D
(x),m
l
D
(x)
( )
argmax m
M
D
(x),m
m
D
(x)
( )
argmax m
H
D
(x),m
h
D
(x)
( )
is to
use the forward recursion in (3.22) of Corollary 3-1-1. Another way is to use a geometric
approach as we have done for the case of two terms per variable. We take the latter approach.
Figure A.6 depicts the six MFs when x
1
< x
3
< x
4
< x
2
. Observe that there are five regions
(between the vertical lines) for which
argmax m
L
D
(x),m
l
D
(x)
( )
argmax m
M
D
(x),m
m
D
(x)
( )
argmax m
H
D
(x),m
h
D
(x)
( )
may be different. It follows, from an examination of Fig. A.6 in each of the five regions, that
1
:
argmax m
L
D
(x),m
l
D
(x)
( )
argmax m
M
D
(x),m
m
D
(x)
( )
argmax m
H
D
(x),m
h
D
(x)
( )
=
Lmh x(x) < x
1
lmh x
1
< x(x) < x
3
lMh x
3
< x(x) < x
4
lmh x
4
< x(x) < x
2
lmH x(x) > x
2
ì
í
ï
ï
ï
î
ï
ï
ï
(A.35)
Observe that rows two and four are the same.
Fig. A.6. MFs assigned to variables v
i
when
1 3 4 2
.
1
Observe, also, that in each region the winning causal combination has a MF value that is > 0.5.
199
We leave it to the reader to work out the results for the remaining 28 situations. This is very
straightforward to do, albeit tedious because there are so many situations.
Finally, by examining the 29 situations in Table 3.9, it is possible to combine some of them,
thereby obtaining the results that are in Table 3.8, for 23 situations.
Table A.1
Possible candidate causal combinations for three terms per variable
a
and 29 situations
Situation lmh lmH lMh lMH Lmh LmH LMh LMH
Number
of possible
combinations
x
1
< x
2
x
3
< x
4
< x
1
< x
2
P P P P 4
x
3
< x
1
< x
4
< x
2
P P P P P 5
x
3
< x
1
= x
4
< x
2
P P P P 4
x
3
< x
1
< x
4
= x
2
P P P P 4
x
3
= x
1
< x
4
< x
2
P P P P 4
x
3
= x
1
< x
4
= x
2
P P P 3
x
3
< x
1
< x
2
< x
4
P P P P P 5
x
3
= x
1
< x
2
< x
4
P P P P 4
x
1
< x
3
< x
2
< x
4
P P P P P 5
x
1
< x
3
< x
2
= x
4
P P P P 4
x
1
< x
3
= x
2
< x
4
P P P P 4
x
1
< x
3
< x
4
< x
2
P P P P 4
x
1
< x
2
< x
3
< x
4
P P P P 4
x
2
< x
1
x
3
< x
4
< x
2
< x
1
P P P P 4
x
3
< x
2
< x
4
< x
1
P P P P P 5
x
3
< x
2
= x
4
< x
1
P P P P 4
x
3
< x
2
< x
4
= x
1
P P P P 4
x
3
= x
2
< x
4
< x
1
P P P P 4
x
3
= x
2
< x
4
= x
1
P P P 3
x
3
< x
2
< x
1
< x
4
P P P P P 5
x
3
= x
2
< x
1
< x
4
P P P P 4
x
2
< x
3
< x
1
< x
4
P P P P P 5
x
2
< x
3
< x
1
= x
4
P P P P 4
x
2
< x
3
= x
1
< x
4
P P P P 4
x
2
< x
3
< x
4
< x
1
P P P P 4
x
2
< x
1
< x
3
< x
4
P P P P 4
x
1
= x
2
x
3
< x
4
£ x
1
P P P 3
x
3
< x
1
< x
4
P P P P 4
x
1
£ x
3
< x
4
P P P 3
a
P denotes that causal combination is possible to occur.
200
Appendix B. Prime Implicants and Minimal Prime Implicants
The actual rules (primitive Boolean expressions) can be simplified in two different ways,
leading to two sets of sufficient conditions—the
C
R prime implicants and the
P
R minimal prime
implicants. Prime implicants are obtained from the primitive Boolean expressions by using
Boolean algebra reduction techniques (that are equivalent to set theoretic reduction techniques)
that simplify (reduce) those expressions until no further simplifications are possible. The
following are examples of reduction techniques that are used frequently: ABC ABC ABC ,
1 Aa and ABC AB AB . The latter, known as the absorption rule, is true because:
( ) ( ) ABC AB ABC AB C c ABC ABC ABc ABC ABc AB C c AB (B.1)
Sometimes it is possible to perform these reductions by hand; however, when there are many
causal conditions and combinations it is very tedious (and next to impossible) to do this by hand.
The Quine-McCluskey (QM) minimization method can be used to obtain the prime implicants
automatically. This requires setting the causal conditions in (see Fig. 2.3)
A
F
S as present and the
causal conditions in both
S
F
F
SS and
SA
FF
SS as absent.
To implement the QM algorithm we used free software called “Logic Friday” that is available
at: http://sontrak.com/.
Many times there are too many prime implicants, i.e., they are not all needed in order to cover
the primitive Boolean expressions in
A
F
S . A second running of the QM algorithm, in which
subsets
S
F
F
SS and
A
F
S are combined by the union operation and are then simplified
(reduced), produces the minimal prime implicants. This requires setting the causal conditions in
A
F
S as present, the causal conditions in
SA
FF
SS as absent, and the causal conditions in
S
F
F
SS as don’t care. In other words, remainders are set to be present for the desired outcome
if and only if they result in simplifications of the primitive Boolean expressions; otherwise, they
are treated as absent.
When
SA
FF
SS is vacuous (i.e., it contains no elements), a situation that may occur, then
there will be no minimal prime implicants, and consequently no parsimonious solutions.
201
Appendix C. Rules of Counterfactual Analysis (CA)
No rules for CA appear in Ragin’s publications. After applying CA to many examples, and
interacting with Prof. Ragin many times about CA, we arrived at the following seven rules for
CA:
Rule 1: If the complex solution term does not contain the parsimonious solution term, then
CA is not performed for that combination of parsimonious and complex solution terms;
otherwise, CA can be performed.
Rule 2: The complex solution term acts as its own initial counterfactual.
Rule 3: If the substantive knowledge is silent about a causal condition, whether or not it
appears in a counterfactual (e.g., ()
j
K C unknown ), then no change is made to the
counterfactual.
Rule 4: If the substantive knowledge about a causal condition is identical to the way that the
causal condition appears in the counterfactual, then no change is made to the counterfactual.
Rule 5: If there is substantive knowledge about a causal condition (e.g., ()
jj
K C C or
()
ll
K C c ), but that causal condition does not appear explicitly in the counterfactual, then the
counterfactual remains unchanged (due to absorption, e.g., ABC ABCD ABC ). In effect,
such substantive knowledge can be bypassed during the CA for that complex solution term.
Rule 6: If the substantive knowledge about a causal condition is the complement of the way
that the causal condition appears in the counterfactual, and this causal condition does not appear
explicitly in the parsimonious term for which the counterfactual has been obtained, then the new
counterfactual no longer contains that causal condition [due to, e.g. A a I , so, for example,
() ABC ABc ABC c AB].
Rule 7: If the substantive knowledge about a causal condition is the complement of the way
that the causal condition appears in the counterfactual, and this causal condition also appears
explicitly in the parsimonious term for which the counterfactual has been obtained, then the
substantive knowledge about this causal condition is rejected and the counterfactual remains
unchanged. In essence, such knowledge is treated as a contradiction of the way that the causal
condition appears in that parsimonious term.
202
In our approach to CA, set theory simplifications are performed after a counterfactual is
obtained for each causal condition. The result is a transformation from one counterfactual to the
next until all k causal conditions have been considered, the last counterfactual being called the
simplified counterfactual for the complex term. This is repeated for all of the complex terms and
parsimonious terms after which set theory simplifications are applied to all of the simplified
counterfactuals to obtain the intermediate solutions.
Example C-1: In this example, we focus on only one of the complex terms as well as two
parsimonious terms. Note that, because there are two parsimonious terms, there must actually be
at least two complex terms; however, to keep this example short we are only focusing on one of
them.
The complex term is ABcDF; parsimonious terms are Bc and E; and, substantive knowledge is
( ) { , , , , , } K C a b C unknown e F .
Cycle 1 for parsimonious term Bc: Bc is contained in ABcDF and so we can perform CA for
ABcDF (Rule 1). The initial counterfactual is ABcDF (Rule 2). This counterfactual flows to
the final simplified counterfactual as follows:
() CA A
ABcDF BcDF (Rule 6),
() CA B
BcDF BcDF (Rule 7),
() CA C
BcDF BcDF (Rule 7),
() CA D
BcDF BcDF
(Rule 3),
() CA E
BcDF BcDF (Rule 5), and
() CA F
BcDF BcDF (Rule 4). BcDF is the
simplified counterfactual for the complex term ABcDF.
Cycle 2 for parsimonious term E: E is not in ABcDF; hence, CA is not performed for this
term (Rule 1).
203
Appendix D. Geometry of Consistency and Best Instances
In order to understand how Ragin determines the best instances for each believable simplified
intermediate solution, it is important to first discuss the geometry of consistency. The geometry
of consistency lets us reconnect the
BSI
R believable simplified intermediate solutions to the fuzzy
natures of the desired outcome and the causal conditions that appear in each of these solutions.
Fig. D.1. Consistency regions. Regions A and B are where maximum consistency can occur.
Figure D.1 is modeled after Fig. 3.1 in [23]. The 45-degree line in Fig. D.1 is very important
and derives from the consistency formula (2.15), because maximum consistency is 1, and this
can only occur when min( ( ), ( )) ( )
SS
ll
D
O
FF
x x x , for 1,2,..., xN , which will be true only if
( ) ( )
S
l
D
O
F
xx for 1,2,..., xN . So, if ()
D
O
x lies above the 45-degree line ( ) ( )
S
l
D
O
F
xx ,
for 1,2,..., xN (anywhere in the region AB ), then ( , ) 1
S
Kl
ss F O . This leads to:
Geometrical Fact #1: For maximum consistency the pairs ( ( ), ( ))
S
l
D
O
F
xx ( 1,..., xN ) must
be above the 45-degree line, i.e., they must be in the upper triangle AB on the plot of ()
D
O
x
versus ()
S
l
F
x .
Next we explain why, if all cases hypothetically were to lie in AB , it is impossible for all of
them to lie in Region A, but that it is possible for all cases to lie in Region B.
1
1 0.5
0.5
0
0
Desirable
region
m
F
l
S
(x)
m
O
D
(x)
m
O
D
(x) = m
F
l
S
(x)
A
B
204
1. 7The MF of each of the
R
A
actual causal combinations is > 0.50 for at least one case.
This is because only those
S
R causal combinations whose firing levels are > 0.50 for at
least one case make it to the consistency test, and that test reduces the number of causal
combinations from
S
R to
A
R . Each of the
A
R causal combinations still has a firing level
that is > 0.50 for at least one case.
2. During QM and CA, causal conditions are only removed from a causal combination;
hence, the number of causal conditions in a causal combination can never increase as a
result of QM and CA. For QM, this is obvious from the fact that, in obtaining both the
primary and minimal primary implicants, set theory (e.g., absorption) is used to combine
terms, meaning that the number of causal conditions in a term, after QM, is never larger
than the number of causal conditions in a term before QM. For CA, it should also be clear
that statement 2 is also true from the fact that an intermediate solution has a number of
causal conditions between the number in the complex and parsimonious solutions.
3. If
*1
12
( | , ,..., )
i
Fk
x C C C has been computed for
1
k causal conditions
1
12
, ,...,
k
C C C and
one now considers
2
k causal conditions
2
12
, ,...,
k
C C C , where
21
kk , then (for all x)
* 2 * 1
1 2 1 2
( | , ,..., ) ( | , ,..., )
ii
F k F k
x C C C x C C C (D-1)
This is obvious from the conjunctive nature of a causal combination when minimum is
used for the conjunction operations, and it leads to:
4. When causal conditions are removed from an existing causal combination, the firing
level for the resulting causal combination can never be smaller than the prior firing level,
i.e. firing levels tend to become strengthened when fewer causal conditions are included
in a causal combination. Just read (D-1) form right to left.
5. Geometrical Fact #2: There must be at least one case for which ( ) 0.5
BSI
l
F
x , which
means it is not possible for all cases to be in Region A in Fig. 6. Prior to QM and CA we
began with
A
R causal combinations, which, from Item 1 had MF values greater than 0.50
for at least one case. The
A
R causal combinations are reduced to
SI
R causal
combinations by the QM-CA-QM calculations (Steps 8-10). From Items 2 and 4, it must
therefore be true that ( ) 0.5
BSI
l
F
x for at least one case.
205
6. Geometrical Fact #3: A case can only be in Region B if ( ) 0.5
BSI
l
F
x ; hence, if all
cases lie in AB this is only possible if all cases lie in Region B.
Item 6 does not mean that all cases will lie in Region B. It means that Region B is the most
desirable region for a case to lie in to obtain the largest possible value for consistency, which is
why it is labeled “Desirable region” in Fig. D.1. Ragin (in e-mail to the first author, on October
26, 2010) suggests that Region B should extend a bit below ( ) the 45-degree line to account
for the subjectivity of MF values.
In general, the
BSI
R believable simplified intermediate solutions are connected by the word
OR; and, in the procedure below the maximum operation is used to model this word. Ragin’s
procedure (rules) for determining the Best Instances is ( 1,..., xN ):
1. For Case x, compute
1,...,
max ( )
BSI
s
BSI
F
sR
x .
2. If
1,...,
max ( ) 0.50
BSI
s
BSI
F
sR
x , then this case is not a Best Instance for any of the terms in the
believable simplified intermediate solutions.
3. If
1,...,
max ( ) 0.50
BSI
s
BSI
F
sR
x , then let
1,...,
( ) max ( ) 0.50
BSI
s
BSI
F
sR
s x arg x . It is possible that
there can be more than one such value of ¢ s (x) because there can be ties.
4. For each such
() sx , if
'( )
( ) ( )
BSI
sx
D
O
F
xx , then this case is a Best Instance for
'( )
BSI
sx
F .
On the other hand, if
'( )
( ) ( )
BSI
sx
D
O
F
xx then this case is not a Best Instance for any of
the terms in the believable simplified intermediate solutions.
206
Appendix E. The HM Method
The material in this appendix is taken from [51], [52] and is included here for completeness.
E.1 Calibration for Data Collected From a Group of Subjects
Going from n data intervals
[a
(i)
,b
(i)
]
i=1
n
for a word to an IT2 FS for the word using the HM
Method is done in two parts, one that focuses on removing uninformative intervals—the Data
Part—and the other that focuses on mapping the remaining set of intervals into the FOU—the
Fuzzy Set Part.
E.1. Data Part
The Data Part uses statistics and probability, starting with the n intervals, to (see [50] for the
details of each of these steps
1
):
1) Remove bad data (i.e., intervals that fall outside of [l, r]—not everyone takes a survey
seriously)—
n ® ¢ n .
2) Remove outliers (using Box and Whisker tests [116]) in two steps,
¢ n ® ¢ ¢ n ® ¢ m .
3) Keep only the data intervals that are within an acceptable two-sided tolerance limit (using
statistical tolerance limits [50]), in two steps,
¢ m ® m
+
® ¢ ¢ m .
4) Remove data intervals that have no overlap or too little overlap with other data intervals
(using a statistical test that enforces the maxim [17] words must mean similar things to
different people (for effective communication to occur)),
¢ ¢ m ® m.
At the end of the Data Part, the original n data intervals have been reduced to a set of m
(surviving) data intervals
{[a
(i)
,b
(i)
]}
i=1
m
, where m £ n.
E.2. Fuzzy Set Part
The Fuzzy Set Part uses these m data intervals to (see [52] for the details of each of these
steps):
5) Establish the nature of the FOU as either a Left-shoulder, Interior or a Right-shoulder FOU
(the data speaks, i.e. we do not choose the nature of the FOU ahead of time) by using one-
1
The processing steps in the Data Part for the HM method are the same as in the Data Part of the EIA.
207
sided tolerance intervals for the end-points
[a
(i)
,b
(i)
] of the m data intervals,
a for
{a
(i)
}
i=1
m
and
b for
{b
(i)
}
i=1
m
.
The first (last) word [
i = 1 (m)] is always modeled as a Left (Right)-
shoulder FOU. For
i = 2,...,m-1, if
a < 0 , the word is modeled as (Fig. E.1a) a Left-
shoulder FOU; or, if
b > r , the word is modeled as (Fig. E.1c) a Right-shoulder FOU; or,
otherwise the word is modeled as (Fig. A.1b) an Interior FOU.
Fig. E.1. Three kinds of FOUs and their parameters: (a) Left-shoulder FOU, (b) Interior FOU, and
(c) Right-shoulder FOU. In these figures l = 0 and r = 1.
6) Compute the overlap
[o
l
,o
r
] of the m intervals. For a Left-shoulder FOU,
[o
l
,o
r
] = [0,min
i
b
(i)
]
i=1
m
; for an Interior FOU,
[o
l
,o
r
] = [max
i
a
(i)
,min
i
b
(i)
]
i=1
m
; and, for a Right-
shoulder FOU,
[o
l
,o
r
] = [max
i
a
(i)
,10]
i=1
m
.
7) Remove the overlap
[o
l
,o
r
] from each of the original m intervals,
{[a
(i)
,b
(i)
]}
i=1
m
. For a Left-
shoulder FOU, this leaves one new set of smaller intervals,
{[o
r
,b
(i)
]}
i=1
m
; for a Right-
shoulder FOU, this leaves one new set of smaller intervals,
{[a
(i)
,o
l
]}
i=1
m
; and, for an Interior
FOU, this leaves two new sets of smaller intervals,
{[a
(i)
,o
l
]}
i=1
m
and
{[o
r
,b
(i)
]}
i=1
m
.
8) Map the set (s) of smaller intervals into the two parameters that define the respective FOU.
For a left- or right-shoulder FOU (Figs. A.1a and A.1c) there are exactly two such
parameters,
b
l
and
b
r
, or
a
l
and
a
r
; but, for an interior FOU (Fig. A.1b) there are four such
parameters,
a
l
,
a
r
,
b
l
and
b
r
. For the interior FOU,
{[a
(i)
,o
l
]}
i=1
m
are mapped into
a
l
and
a
r
,
and
{[o
r
,b
(i)
]}
i=1
m
are mapped into
b
l
and
b
r
.
1
0
o
r
b
r
b
l
x
µ(x)
(a) Left-Shoulder FOU (b) Interior FOU (c) Right-Shoulder FOU
1
0
o
l
a
r
a
l
x
µ(x)
r
1
0
o
l
a
r
a
l
x
µ(x)
r b
r
b
l
o
r
!
L
!
R
a ΄
b ΄
208
In Step 8, the mappings are done so that two measures of uncertainty about the m smaller-
length data intervals are mapped into two comparable measures of uncertainty about the FOU.
As an illustration of this, consider the left-hand portion of the FOU for an Interior FOU, (Fig.
A.1b), and the mapping of
{[a
(i)
,o
l
]}
i=1
m
into
a
l
and
a
r
. This mapping is obtained in four steps:
i. The centroid of , , and the average centroid, , [67], [117] are computed as [52]:
(E.1)
(E.2)
ii. The standard deviation of , , [67], [117] are computed as [52]:
(E.3)
iii. The sample mean and standard deviation of the m intervals
[a
(i)
,o
l
]
i=1
m
are computed, and
are denoted
ˆ m
LH
and
ˆ s
LH
, respectively.
iv.
a
l
and
a
r
are solved for from the following two equations:
(E.4)
is used in (E.4) because it is larger than and because the average of
and contains
a
l
+ a
r
as does , so that it would be impossible to
unravel both
a
l
and
a
r
by using both and the average standard deviation.
The solutions to (A.4) are:
a
l
= max(0,o
l
- 3 2 ˆ s
LH
)
a
r
= min(o
l
,6 ˆ m
LH
+ 3 2 ˆ s
LH
- 3o
l
)
ì
í
ï
î
ï
(E.5)
For
a
l
, the max guards against
o
l
- 3 2 ˆ s
LH
being negative, and for
a
r
, the min guards against
6 ˆ m
LH
+ 3 2 ˆ s
LH
- 3o
l
being larger than
o
l
.
Comparable results for the right-hand portion of the FOU for an Interior FOU, (Fig. E.1b),
are:
209
b
l
= max(o
r
,6 ˆ m
RH
- 3 2 ˆ s
RH
- 5o
r
)
b
r
= min(r,o
r
+ 3 2 ˆ s
RH
)
ì
í
ï
î
ï
(E.6)
where
ˆ m
RH
and
ˆ s
RH
are the sample mean and standard deviation, respectively, of
{[o
r
,b
(i)
]}
i=1
m
.
Compare Figs. E.1a and E.1b to observe that the Left-shoulder FOU is the same as , so (E.6)
is also used to determine the two parameters of a Left-shoulder FOU. Similarly, the Right-
shoulder FOU is the same as , so (E.5) is also used to determine the two parameters of a Right-
shoulder FOU.
E.2 Calibration for Data Collected From One Subject
Starting with the left-hand and right-hand intervals, [a
L
,b
L
] and [a
R
,b
R
][54]:
1) Assume that each of the end-point intervals is uniformly distributed, and then compute the
mean and variance for both of them.
2) Assign the mean and variance of the left and right intervals from Step 1 to uniform
probability distributions and generate 100 random numbers (L
1
,L
2
,..., L
50
;R
1
,R
2
,..., R
50
) .
Form 50 end-point pairs from these random numbers
1
{(L
1
,R
1
),...,(L
50
,R
50
)}.
3) Assume each pair of end-points has been collected from a different subject (or the same
subject who is sampled 50 times, where the spacing of the samples is long enough so that
the subject does not remember his/her past responses), a virtual group of subjects.
4) Apply the HM method to the 50 intervals to obtain the (Person) FOU for the word. This
FOU only accounts for a person’s intra-uncertainty about the word.
Using the additional data that are provided by a single subject, this four-step procedure reduces
to the HM method for collecting data from a group of subjects
2
.
1
We chose 50 intervals because the convergence results given in [54] have demonstrated that FOUs converge in a
mean-square sense when around 30 or more intervals are used. Any number ≥ 30 should be adequate.
2
Of course, the interval end-point data could also be obtained from a group of subjects, in which case the four-step
procedure of this section could be applied to each of the subject’s data, leading to a larger set of n starting data
intervals than in Section 5.2. Our experience with extracting data from subjects by means of questionnaires is that
they like simple questions. The Section 5.2 questions are simpler than the ones in Section 5.3, so we do not advocate
doing what we have just suggested.
210
Appendix F. Uncertainty Measures for IT2 FSs
In this appendix we provide background information about the centroid of an IT2 FS, (2)
center of gravity of the centroid, and (3) maximum dispersion of the centroid.
F.1 Centroid of an IT2 FS
It is always possible to cover the FOU of an IT2 FS, , by a collection of T1 FSs (Fig. F.1)
that are called embedded T1 FSs, so that can be expressed as the set-theoretic union of those
T1 FSs [17], [118]. Conceptually, it is then possible to compute the center of gravity (COG) for
each of the embedded T1 FSs. Regardless of how many embedded T1 FSs it takes to cover
1
the total number of COGs has both a smallest value, , and largest value, . The
collection of these COGs is called the centroid of , , where
2
(F.1)
is a measure of the uncertainty in .
The larger (smaller) in length that is the larger (smaller) is the uncertainty in .
Geometrically, larger (smaller) uncertainty in is manifested by fatter (thinner) FOUs. In the
limiting case when both and become the same T1 MF reduces to one
number, the COG, , of that T1 MF.
Fig. F.1. Covering an FOU by T1 FSs. Each T1 FS (called an embedded T1 FS) contains one dashed line from the
left-portion of the FOU, the common flat top, and one dashed line from the right-portion of the FOU. and
are also embedded T1 FSs. is the union of all of the embedded T1 FSs. In general, embedded
T1 FSs do not have to be straight lines.
1
For continuous universes of discourse, this number is uncountably infinite; however, for sampled continuous
universes of discourse (as is the case when a digital computer is used to solve problems involving IT2 FSs), there are
a large but countable number of such T1 FSs.
2
Strictly speaking, the centroid is also a FS; however, because its membership grade is 1 for all of its elements (due
to the membership grades of an T2 FS all being equal to 1), it is customary not to show its membership grades.
1
0 x
u
FOU (
!
A)
FOU (
!
A)
211
Generally speaking, closed-form formulas do not exist for computing and , even for the
seemingly simple FOU in Fig. F.1. Fortunately, there are many simple algorithms for computing
and , and software is available on-line for doing this (e.g., Enhanced KM Algorithms
http://sipi.usc.edu/mendel/software/,
https://sites.google.com/site/drwu09/publications/completepubs)
but none are not needed for reading the rest of this Appendix.
For linguistic terms, whose FOUs are located in a natural ordering on the variable’s numerical
axis, the centroid is also located in a natural ordering on the variable’s numerical axis. This is
shown in Fig.F.2 for the three linguistic terms, W1–W3.
Fig. F.1. Mapping each word’s FOU into an interval of numbers, its Centroid
C
W
i
, or a single number,
its COG
c
W
i
, the dot in the middle of the Centroid.
F.2 COG of Centroid of an IT2 FS
The COG,
, of is a number that expresses the average uncertainty about each linguistic
term. We shall refer to
as the COG of the word. COGs are also located in a natural ordering
on the variable’s numerical axis, as is indicated in Fig. F.2. The formula for is:
(B.2)
It lies along the x-axis.
212
F.3 Maximum Dispersion of the Centroid
Just as the standard deviation provides a useful measure of dispersion about the mean in
probability, it also provides a useful measure of dispersion about the centroid end-points for an
IT2 FS. Wu and Mendel [67] show that the variance of an IT2 FS , , is the union of relative
variance, , of all its embedded T1 FSs
A
e
, i.e.,
(F.3)
where (for sampled values of x)
(F.4)
in which is computed by (F.2). How to compute (by using EKM algorithms) is explained
in [67].
The standard deviation of , , is
(F.5)
One way that has been used [67] is to combine it with to provide left and right end point
intervals of dispersion about , as
(F.6)
We refer to the outer interval of as , where
(F.7)
It is an interval of real numbers that lies along the x-axis and is of length .
A figure showing for the three FOUs depicted in Fig. F.1 would look very similar to
the centroid intervals that are shown on that figure. Each of the
intervals would also be
213
centered around the COG of each centroid, but the
max D
W
i
interval would be of length
2 v
r
(W
i
) whereas
C
W
i
is of length
C
W
i
- C
W
i
.
214
Appendix G. QPSO Algorithm
This appendix provides a brief summary of QPSO [101], [102], [104], [119], [120]. Because
of our choice of the MF depicted in Fig. 8.4, each particle m ( 1,..., mM ) in QPSO contains two
MF parameters per variable
(a
m,i
,b
m,i
) for all p variables. The generalization of our discussions
below to MFs that are described by more than two parameters is straightforward.
The current position (vector) of the m
th
particle is defined, as:
(G.1)
A particle best position (pbest, i.e., the position that produces the minimal value of over
the entire history of that particle),
p
m
= ( p
m,1
, p
m,2
,..., p
m,2 p
), is computed as (w =1 ,...,2p):
, , ,
( 1) ( ) (1 ) ( )
m w m w gbest w
p g p g p g (G.2)
where 1,..., 1 gG
is the index of a generation (iteration) and h is a random variable
uniformly distributed in (0,1]; and,
p
gbest
(g)
[whose components are p
gbest,w
(g) ] denotes the
global best position (gbest) found in the history of the entire swarm, i.e.
(m = 1,..., M)
p
gbest
(g) = arg max
p
m
(g),"m=1,...,M
J p
m
(g)
( )
(G.3)
A global point, the mean best position of the population, is introduced into QPSO; it is
denoted as
m, and is defined as the sample mean of the
p
m
(g)
positions of all M particles, i.e.
m(g) =
1
M
p
m
(g)
m=1
M
å
(G.4)
At the end of each generation, a new position of a particle is obtained, as (w =1 ,...,2p):
, , ,
1
( 1) ( 1) | ( ) ( ) | ln
m w m w w m w
g p g m g g
u
(G.5)
where parameter , called the contraction–expansion coefficient, can be tuned to control the
convergence speed of the algorithm, and u is also a random variable uniformly distributed in
(0,1]. In (G.5), the plus
and minus signs are randomly selected to generate the new position of a
particle.
Note that QPSO finds optimized MFs based on the following criterion:
215
(G.6)
Pseudo-code for QPSO is given in Table G.1.
Table G.1
Pseudo-Code for QPSO
Initialize randomly
(m = 1,..., M)
Set
For g = 1 to G-1
Calculate m(g) =
1
M
p
m
(g)
m=1
M
å
Calculate
J p
m
(g)
( )
(m = 1,..., M)
p
gbest
(g) = arg min
P
m
( g),"m=1,...,M
J p
m
(g) ( )
For m = 1 to M (number of particles)
Calculate
F
trn
using (30)
Calculate using (32)
Calculate
If
End If
For w = 1 to 2p (number of components in each particle)
h = rand(0,1)
, , ,
( 1) ( ) (1 ) ( )
m w m w gbest w
p g p g p g
u = rand(0,1)
If
rand(0,1) > 0.5
q
m,w
(g +1) = p
m,w
(g +1) -k | m
w
(g) -q
m,w
(g) | ln
1
u
Else
q
m,w
(g +1) = p
m,w
(g +1) +k | m
w
(g) -q
m,w
(g) | ln
1
u
End If
End For
End For
End For
216
Appendix H. Publications
1. J. M. Mendel and M. Korjani, “A New Methodology for Calibrating Fuzzy Sets in fsQCA
Using Level 2 and Interval Type-2 Fuzzy Sets,” Submitted to Information Sciences, 2015
2. J. M. Mendel and M. Korjani, “On establishing nonlinear combinations of variables from
small to big data for use in later processing,” Information Sciences, vol. 280, pp. 98-110,
2014.
3. J. M. Mendel and M. M. Korjani, “Charles Ragin's Fuzzy Set Qualitative Comparative
Analysis (fsQCA) used for linguistic summarizations,” Information Sciences, vol. 202, pp. 1-
23, 2012.
4. J. M. Mendel and M. M. Korjani, “Theoretical Aspects of Fuzzy Set Qualitative Comparative
Analysis (fsQCA),” Information Sciences, 2013.
5. M. M. Korjani and J. M. Mendel, “Interval type-2 fuzzy set qualitative comparative analysis
(IT2-fsQCA),” in Proc. of NSFIPS 2014, Boston, MA, June 2014.
6. M. M. Korjani and J. M. Mendel, “Fuzzy Set Qualitative Comparative Analysis (fsQCA):
challenges and applications,” Proc. NAFIPS 2012, Berkeley, CA, August 2012 (Won Best
paper award).
7. M. M. Korjani and J. M. Mendel, “Validation of fuzzy set qualitative comparative analysis
(fsQCA) by means of a granular description of a function,” NAFIPS 2012.
8. J. M. Mendel and M. M. Korjani, “Fast fuzzy set qualitative comparative analysis,” Proc.
NAFIPS 2012, Berkeley, CA, August 2012.
9. M. M. Korjani and J. M. Mendel, “Fuzzy Love Selection by Means of Perceptual
Computing,” IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), 766-770,
2013.
10. M. M. Korjani and J. M. Mendel, “Non-linear Variable Structure Regression (VSR) and its
Application in Time-Series Forecasting,” IEEE International Conference on Fuzzy Systems,
2014.
11. M. M. Korjani, J. M. Mendel and I. Ershaghi, “A Predictive Model for Improving the
Efficiency of Frac Jobs,” SPE Western Regional Meeting 2015.
217
Appendix I. Patent
1. M. M. Korjani, J. Mendel, and F. Liu, “A Predictive model of tight oil reservoir,” US
70205.0493US01, Submitted Dec. 2014.
218
Reference
[1] R. R. Yager, “Database discovery using fuzzy sets,” Int. J. Intell. Syst., vol. 11, no. 9, pp. 691–712, 1998.
[2] R. R. Yager, “A new approach to the summarization of data,” Inf. Sci. (Ny)., vol. 28, no. 1, pp. 69–86, 1982.
[3] J. T. Rickard, J. Aisbett, R. R. Yager, and G. Gibbon, “Linguistic Weighted Power Means,” pp. 2185–2192,
2011.
[4] J. Kacprezyk and R. R. Yager, “Linguistic summaries of data using fuzzy logic,” Int. J. Gen. Syst., vol. 30,
no. 2, pp. 133–154, Jan. 2001.
[5] J. Kacprzyk, A. Wilbik, and S. Zadrozny, “Linguistic summarization of trends: a fuzzy logic based
approach,” pp. 2166–2172, 2006.
[6] J. Kacprzyk and S. Zadrozny, “Linguistic database summaries and their protoforms: towards natural
language based knowledge discovery tools,” Inf. Sci. (Ny)., vol. 173, no. 4, pp. 281–304, Jun. 2005.
[7] J. Kacprzyk and S. Zadrozny, “Computing With Words Is an Implementable Paradigm: Fuzzy Queries,
Linguistic Data Summaries, and Natural-Language Generation,” Fuzzy Systems, IEEE Transactions on, vol.
18, no. 3. pp. 461–472, 2010.
[8] A. Niewiadomski, “A Type-2 Fuzzy Approach to Linguistic Summarization of Data,” IEEE Trans. Fuzzy
Syst., vol. 16, no. 1, pp. 198–212, Feb. 2008.
[9] A. Niewiadomski and I. Superson, “On multi-subjectivity in linguistic summarization of relational
databases,” vol. 8, no. 1, pp. 15–34, 2014.
[10] J. Kacprzyk and A. Wilbik, “A comprehensive comparison of time series described by linguistic summaries
and its application to the comparison of performance of a mutual fund and its benchmark,” 2010 IEEE
World Congr. Comput. Intell. WCCI 2010, pp. 0–7, 2010.
[11] L. A. Zadeh, “Outline of a New Approach to the Analysis of Complex Systems and Decision Processes,”
Syst. Man Cybern. IEEE Trans., vol. SMC-3, no. 1, pp. 28–44, 1973.
[12] L. A. Zadeh and L. Fellow, “Fuzzy Logic = Computing with Words,” vol. 4, no. 2, pp. 103–111, 1996.
[13] L. a Zadeh, “From Computing with Numbers to Computing with Words: From Manipulation of
Measurements to Manipulation of Perceptions,” Int. J. Appl. Math. Comput. Sci., vol. 12, no. 3, pp. 307–
324, 2002.
[14] L. A. Zadeh, “A New Direction in AI: Toward a Computational Theory of Perceptions,” AI Mag., vol. 22,
no. 1, pp. 73–84, 2001.
[15] L. a. Zadeh, “Toward a theory of fuzzy information granulation and its centrality in human reasoning and
fuzzy logic,” Fuzzy Sets Syst., vol. 90, no. 2, pp. 111–127, Sep. 1997.
[16] L. A. Zadeh, “Toward human level machine intelligence-is it achievable?,” Ieee Icci, no. August, pp. 11–22,
2008.
219
[17] J. Mendel and D. Wu, Perceptual Computing: Aiding People in Making Subjective Judgments. Wiley-IEEE
Press, 2010.
[18] D. J. Hand, P. Smyth, and H. Mannila, Principles of Data Mining. Cambridge, MA, USA: MIT Press, 2001.
[19] D. Wu and J. M. Mendel, “Linguistic Summarization Using IF-THEN Rules,” no. 213, 2009.
[20] D. Wu and J. M. Mendel, “Linguistic summarization using IFTHEN rules and interval Type-2 fuzzy sets,”
IEEE Trans. Fuzzy Syst., vol. 19, no. 1, pp. 136–151, 2011.
[21] C. C. Ragin, Redesigning Social Inquiry: Fuzzy Sets and Beyond. University of Chicago Press, 2008.
[22] P. C. Fiss, “Building better causal theories: A fuzzy set approach to typologies in organization research,”
Acad. Manag. J., vol. 54, no. 2, pp. 393–420, 2011.
[23] B. Rihoux and C. C. Ragin, Configurational Comparative Methods: Qualitative Comparative Analysis
(QCA) and Related Techniques (Applied Social Research Methods). SAGE Publications, Inc, 2008.
[24] W. V Quine, “The Problem of Simplifying Truth Functions,” Am. Math. Mon., vol. 59, no. 8, pp. 521–531,
Oct. 1952.
[25] E. J. Mccluskey, Introduction to the theory of switching circuits. McGraw-Hill, 1965.
[26] C. C. Ragin, Fuzzy-Set Social Science. University Of Chicago Press, 2000.
[27] J. M. Mendel and M. M. Korjani, “Charles Ragin’s Fuzzy Set Qualitative Comparative Analysis (fsQCA)
used for linguistic summarizations,” Inf. Sci. (Ny)., vol. 202, pp. 1–23, Oct. 2012.
[28] D. Byrne and C. C. Ragin, The SAGE Handbook of Case-Based Methods. SAGE Publications, 2009.
[29] M. M. Korjani and J. M. Mendel, “Fuzzy set Qualitative Comparative Analysis (fsQCA): Challenges and
applications,” 2012 Annu. Meet. North Am. Fuzzy Inf. Process. Soc., pp. 1–6, Aug. 2012.
[30] J. M. Mendel and M. M. Korjani, “Fast Fuzzy set Qualitative Comparative Analysis (Fast fsQCA),” 2012
Annu. Meet. North Am. Fuzzy Inf. Process. Soc., pp. 1–6, Aug. 2012.
[31] J. M. Mendel and C. C. Ragin, “fsQCA : Dialog Between Mendel and Ragin,” vol. USC-SIPI R, no.
January, 2012.
[32] C. Q. Schneider and C. Wagemann, “Standards of Good Practice in Qualitative Comparative Analysis
(QCA) and Fuzzy-Sets,” Comp. Sociol., vol. 9, no. 3, pp. 397–418, 2010.
[33] H. M. Hersh and A. Caramazza, “A fuzzy set approach to modifiers and vagueness in natural language.,”
Journal of Experimental Psychology: General, vol. 105, no. 3. American Psychological Association, US, pp.
254–276, 1976.
[34] L. X. Wang and J. M. Mendel, “Generating fuzzy rules by learning from examples,” IEEE Trans. Syst. Man
Cybern., vol. 22, no. 6, pp. 1414–1427, 1992.
[35] J. M. Mendel and M. M. Korjani, “Theoretical aspects of Fuzzy Set Qualitative Comparative Analysis
(fsQCA),” in Information Sciences, 2013, vol. 237, pp. 137–161.
220
[36] J. M. Mendel, Lessons in Estimation Theory for Signal Processing Communications and Control.
Englewood Cliffs, NJ: Prentice Hall Signal Processing Series, 1995.
[37] L. a. Zadeh, “Fuzzy Sets,” Inf. Control, 1965.
[38] L. A. Zadeh, “Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems,” G. J. Klir and B. Yuan, Eds. River Edge, NJ,
USA: World Scientific Publishing Co., Inc., 1996, pp. 148–179.
[39] L. A. Zadeh, “Quantitative fuzzy semantics,” Inf. Sci. (Ny)., vol. 3, no. 2, pp. 159–176, Apr. 1971.
[40] L. a. Zadeh, “The concept of a linguistic variable and its application to approximate reasoning—I,” Inf. Sci.
(Ny)., vol. 8, no. 3, pp. 199–249, Jan. 1975.
[41] L. X. Wang, A Course in Fuzzy Systems and Control. Prentice Hall PTR, 1997.
[42] G. J. Klir and B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Upper Saddle River, NJ,
USA: Prentice-Hall, Inc., 1995.
[43] J. M. Mendel, “Computing with words and its relationships with fuzzistics,” Inf. Sci. (Ny)., vol. 177, no. 4,
pp. 988–1006, Feb. 2007.
[44] J. M. Mendel, “The perceptual computer: an architecture for computing with words,” 10th IEEE Int. Conf.
Fuzzy Syst. (Cat. No.01CH37297), vol. 1, pp. 35–38, 2001.
[45] J. M. Mendel, “Computing with words, when words can mean different things to different people,” Int’l.
ICSC Congr. Comput. Intell. Methods Appl. Third Int’l. ICSC Symp. FL Appl’s., pp. 158–164, 1999.
[46] J. M. Mendel, Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Directions, vol. 2. 2001.
[47] J. M. Mendel, “Type-2 fuzzy sets and systems: an overview,” Computational Intelligence Magazine, IEEE,
vol. 2, no. 1. pp. 20–29, 2007.
[48] J. Aisbett, J. T. Rickard, and D. G. Morgenthaler, “Type-2 fuzzy sets as functions on spaces,” IEEE Trans.
Fuzzy Syst., vol. 18, no. 4, pp. 841–844, 2010.
[49] F. Liu and J. M. Mendel, “Encoding Words Into Interval Type-2 Fuzzy Sets Using an Interval Approach,”
Fuzzy Systems, IEEE Transactions on, vol. 16, no. 6. pp. 1503–1521, 2008.
[50] D. Wu, J. M. Mendel, and S. Coupland, “Enhanced Interval Approach for Encoding Words Into Interval
Type-2 Fuzzy Sets and Its Convergence Analysis,” Fuzzy Systems, IEEE Transactions on, vol. 20, no. 3. pp.
499–513, 2012.
[51] M. Hao and J. M. Mendel, “Modeling words by normal interval type-2 fuzzy sets,” Norbert Wiener in the
21st Century (21CW), 2014 IEEE Conference on. pp. 1–8, 2014.
[52] M. Hao and J. M. Mendel, “Encoding words into normal interval type-2 fuzzy sets: HM method,” Submitt.
to Inf. Sci., 2015.
[53] J. M. Mendel and D. Wu, “Determining interval type-2 fuzzy set models for words using data collected from
one subject: Person FOUs,” Fuzzy Systems (FUZZ-IEEE), 2014 IEEE International Conference on. pp. 768–
775, 2014.
221
[54] J. M. Mendel and D. Wu, “Determining Interval Type-2 Fuzzy Set Models for Words Using Data Collected
From One Subject : Person FOUs,” 2014 IEEE Int. Conf. Fuzzy Syst., 2014.
[55] J. M. Mendel and M. M. Korjani, “Theoretical aspects of Fuzzy Set Qualitative Comparative Analysis
(fsQCA),” Inf. Sci. (Ny)., vol. 237, pp. 137–161, Jul. 2013.
[56] G. de Tré and R. de Caluwe, “Level-2 fuzzy sets and their usefulness in object-oriented database
modelling,” Fuzzy Sets Syst., vol. 140, no. 1, pp. 29–49, Nov. 2003.
[57] D. Dubois and H. Prade, “Fuzzy sets, probability and measurement,” Eur. J. Oper. Res., vol. 40, no. 2, pp.
135–154, May 1989.
[58] M. Ganesh, Introduction to Fuzzy Sets and Fuzzy Logic. Prentice-Hall of India, 2006.
[59] S. Gottwald, “Set theory for fuzzy sets of higher level,” Fuzzy Sets Syst., vol. 2, pp. 125–151, 1979.
[60] G. S. Kanhaiya and S. B. Nimse, “A comparative study of Level II fuzzy sets and Type II fuzzy sets,” Int’l.
J. Adv. Res. Comput. Eng. Technol., vol. 1, no. 4, pp. 215–219, 2012.
[61] A. Bargiela and W. Pedrycz, Granular Computing: An Introduction. Springer US, 2003.
[62] J. M. Mendel and M. M. Korjani, “Theoretical aspects of Fuzzy Set Qualitative Comparative Analysis
(fsQCA),” Inf. Sci. (Ny)., vol. 237, pp. 137–161, Jul. 2013.
[63] M. M. Korjani and J. M. Mendel, “Interval Type-2 Fuzzy Set Qualitative Comparative Analysis ( IT2-
fsQCA ),” 2014.
[64] N. Generation, J. Kacprzyk, and S. Zadro, “Computing With Words Is an Implementable Paradigm : Fuzzy
Queries , Linguistic Data,” vol. 18, no. 3, pp. 461–472, 2010.
[65] D. Wu, J. M. Mendel, and L. Fellow, “Linguistic Summarization Using IF – THEN Rules and Interval
Type-2 Fuzzy Sets,” vol. 19, no. 1, pp. 136–151, 2011.
[66] I. K. Vlachos and G. D. Sergiadis, “Subsethood, entropy, and cardinality for interval-valued fuzzy sets—An
algebraic derivation,” Fuzzy Sets Syst., vol. 158, no. 12, pp. 1384–1396, Jun. 2007.
[67] D. Wu and J. M. Mendel, “Uncertainty measures for interval type-2 fuzzy sets,” Inf. Sci. (Ny)., vol. 177, no.
23, pp. 5378–5393, Dec. 2007.
[68] S. Karmakar and A. K. Bhunia, “A comparative study of different order relations of intervals,” Reliab.
Comput., vol. 16, pp. 38–72, 2012.
[69] H. Ishibuchi and H. Tanaka, “Multiobjective programming in optimization of the interval objective
function,” Eur. J. Oper. Res., vol. 48, no. 2, pp. 219–225, 1990.
[70] B. Q. Hu and S. Wang, “A novel approach in uncertain programming part I: new arithmetic and order
relation for interval numbers,” J. Ind. Manag. Optim., vol. 2, no. 4, p. 351, 2006.
[71] C. Ritz and J. C. Streibig, Nonlinear Regression with R. Springer New York, 2008.
[72] A. Sen and M. Srivastava, Regression Analysis. Springer New York, 1990.
222
[73] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. Wiley, 2012.
[74] S. S. Viglione, “4 Applications of Pattern Recognition Technology,” in Adaptive, Learning and Pattern
Recognition Systems Theory and Applications, vol. Volume 66, J. M. M. and K. S. F. B. T.-M. in S. and
Engineering, Ed. Elsevier, 1970, pp. 115–162.
[75] S. S. Haykin, Neural Networks and Learning Machines, no. v. 10. Prentice Hall, 2009.
[76] C. Janssen, “http://www.techopedia.com/definition/14650/data-preprocessing,” 2015. .
[77] T. C. Havens, J. C. Bezdek, C. Leckie, L. O. Hall, and M. Palaniswami, “Fuzzy c-Means Algorithms for
Very Large Data,” Fuzzy Systems, IEEE Transactions on, vol. 20, no. 6. pp. 1130–1146, 2012.
[78] J. M. Mendel and M. M. Korjani, “Charles Ragin’s Fuzzy Set Qualitative Comparative Analysis (fsQCA)
used for linguistic summarizations,” Inf. Sci. (Ny)., vol. 202, pp. 1–23, Oct. 2012.
[79] M. Lichman, “{UCI} Machine Learning Repository.” 2013.
[80] R. J. Hyndman, “Time Series Data Library,” 2015. [Online]. Available:
https://datamarket.com/data/list/?q=provider:tsdl.
[81] G. E. P. Box and G. M. Jenkins, Time series analysis: forecasting and control. Holden-Day, 1976.
[82] R. S. Crowder, “Predicting the Mackey-Glass time series with cascade-correlation learning,” in
Connectionist Models: Proceedings of the 1990 Summer School, 1990, pp. 117–123.
[83] J. C. Bezdek, Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers,
1981.
[84] J. Cohen, Applied Multiple Regression/correlation Analysis for the Behavioral Sciences, no. v. 1. Taylor &
Francis, 2003.
[85] E. Vittinghoff, D. V Glidden, S. C. Shiboski, and C. E. McCulloch, Regression Methods in Biostatistics:
Linear, Logistic, Survival, and Repeated Measures Models. Springer New York, 2012.
[86] E. W. Frees, Data Analysis Using Regression Models: The Business Perspective. Prentice Hall, 1996.
[87] J. Kmenta, Elements of Econometrics, no. v. 1. University of Michigan Press, 1997.
[88] D. Ruppert, Statistics and Data Analysis for Financial Engineering. Springer, 2010.
[89] E. W. Frees, Regression Modeling with Actuarial and Financial Applications. Cambridge University Press,
2010.
[90] W. Vach, Regression Models as a Tool in Medical Research. Taylor & Francis, 2012.
[91] I. B. Mutafar and I. Razali, “A study on prediction of output in oilfield using multiple linear regression,”
Int’l. J. Appl. Sci. Technol., vol. 1, no. 4, pp. 107–113, 2011.
[92] N. R. Draper and H. Smith, Applied Regression Analysis. Wiley, 2014.
223
[93] M. H. Kutner, C. J. Nachtsheim, and J. Neter, Applied Linear Regression Models. McGraw-Hill Higher
Education, 2003.
[94] D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to Linear Regression Analysis. Wiley, 2012.
[95] G. A. F. Seber and C. J. Wild, Nonlinear Regression. Wiley, 2003.
[96] L. L. Nathans, F. L. Oswald, and N. Kim, “Interpreting multiple linear regression: a guidebook of variable
importance,” Pract. Assessment, Res. Eval., vol. 17, no. 9, 2012.
[97] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its applications to modeling and control,”
IEEE Trans. Syst. Man Cybern., vol. 15, pp. 116–132, 1985.
[98] M. Sugeno and G. T. Kang, “Structure identification of fuzzy model,” Fuzzy Sets Syst., vol. 28, pp. 15–33,
1988.
[99] J. M. Mendel, Uncertain rule-based fuzzy logic system: introduction and new directions. Prentice--Hall
PTR, 2001.
[100] L. X. Wang and J. M. Mendel, “Fuzzy basis functions, universal approximation, and orthogonal least-
squares learning,” IEEE Trans. Neural Networks, vol. 3, pp. 807–814, 1992.
[101] M. Xi, J. Sun, and W. Xu, “An improved quantum-behaved particle swarm optimization algorithm with
weighted mean best position,” Appl. Math. Comput., vol. 205, pp. 751–759, 2008.
[102] X.-Z. Wang, Y.-L. He, L.-C. Dong, and H.-Y. Zhao, “Particle swarm optimization for determining fuzzy
measures from data,” Information Sciences, vol. 181. pp. 4230–4252, 2011.
[103] L. Yang and Q. Shen, “Closed form fuzzy interpolation,” Fuzzy Sets Syst., vol. 225, pp. 1–22, 2013.
[104] R. C. Eberhart and Y. Shi, “Particle swarm optimization: developments, applications and resources,” in
Evolutionary Computation, 2001. Proceedings of the 2001 Congress on, 2001, vol. 1, pp. 81–86.
[105] S.-M. Chen, Y.-C. Chang, and J.-S. Pan, “Fuzzy rules interpolation for sparse fuzzy rule-based systems
based on interval type-2 Gaussian fuzzy sets and genetic algorithms,” Fuzzy Syst. IEEE Trans., vol. 21, no.
3, pp. 412–425, 2013.
[106] J. Kittler and P. A. Devijver, “Statistical properties of error estimators in performance assessment of
recognition systems.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 4, pp. 215–220, 1982.
[107] Y.-C. Chang, S.-M. Chen, and C.-J. Liau, “Fuzzy interpolative reasoning for sparse fuzzy-rule-based
systems based on the areas of fuzzy sets,” Fuzzy Syst. IEEE Trans., vol. 16, no. 5, pp. 1285–1301, 2008.
[108] S.-M. Chen and Y.-K. Ko, “Fuzzy interpolative reasoning for sparse fuzzy rule-based systems based on-cuts
and transformations techniques,” Fuzzy Syst. IEEE Trans., vol. 16, no. 6, pp. 1626–1648, 2008.
[109] Z. Huang and Q. Shen, “Fuzzy interpolative reasoning via scale and move transformations,” Fuzzy Syst.
IEEE Trans., vol. 14, no. 2, pp. 340–359, 2006.
[110] M. Friedman, “The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of
Variance,” J. Am. Stat. Assoc., vol. 32, no. 200, pp. 675–701 CR – Copyright © 1937 American Stati,
Dec. 1937.
224
[111] R. L. Iman and J. M. Davenport, “Approximations of the critical region of the fbietkan statistic,” Commun.
Stat. - Theory Methods, vol. 9, no. 6, pp. 571–595, Jan. 1980.
[112] L.-X. Wang, “The WM method completed: a flexible fuzzy system approach to data mining,” Fuzzy Syst.
IEEE Trans., vol. 11, no. 6, pp. 768–782, 2003.
[113] J. Casillas, P. Martínez, and A. D. Benítez, “Learning consistent, complete and compact sets of fuzzy rules
in conjunctive normal form for regression problems,” Soft Comput., vol. 13, no. 5, pp. 451–465, 2009.
[114] L. T. Koczy and K. Hirota, “Size reduction by interpolation in fuzzy rule bases,” Syst. Man, Cybern. Part B
Cybern. IEEE Trans., vol. 27, no. 1, pp. 14–25, 1997.
[115] M. M. Korjani and J. M. Mendel, “Validation of Fuzzy set Qualitative Comparative Analysis (fsQCA):
Granular description of a function,” 2012 Annu. Meet. North Am. Fuzzy Inf. Process. Soc., pp. 1–6, Aug.
2012.
[116] A. Hayter, Probability and Statistics for Engineers and Scientists. Cengage Learning, 2012.
[117] N. N. Karnik and J. M. Mendel, “Centroid of a Type-2 Fuzzy Set,” Inf. Sci., vol. 132, no. 1–4, pp. 195–220,
Feb. 2001.
[118] J. M. Mendel, R. I. John, F. Liu , “Interval Type-2 Fuzzy Logic Systems Made Simple,” vol. 14, no. 6, pp.
808–821, 2006.
[119] M. Xi, J. Sun, and W. Xu, “An improved quantum-behaved particle swarm optimization algorithm with
weighted mean best position,” Appl. Math. Comput., vol. 205, no. 2, pp. 751–759, Nov. 2008.
[120] R. Li, W. J. Li, L. Zhang, and M. Li, “An improved quantum-behaved particle swarm classifier based on
weighted mean best position,” in Proceedings - 2009 IEEE International Conference on Intelligent
Computing and Intelligent Systems, ICIS 2009, 2009, vol. 4, pp. 327–331.
[121] M. J. Wierman and W. J. Tastle, “Measurement theory and subsethood,” 2010 Annu. Meet. North Am. Fuzzy
Inf. Process. Soc., no. i, pp. 1–5, Jul. 2010.
[122] M. Wierman, T. Clark, J. Larsen, and J. Mordeson, “An Empirical Study Using Fuzzy Implication Operators
and Subsethood Measures to Detect Causality,” vol. 68178, no. 2.
[123] M. M. Korjani and J. M. Mendel, “Fuzzy set Qualitative Comparative Analysis (fsQCA): Challenges and
applications,” Fuzzy Information Processing Society (NAFIPS), 2012 Annual Meeting of the North
American. pp. 1–6, 2012.
[124] B. Kosko, Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence.
Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1991.
[125] E. Mendelson, Schaum’s outline of theory and problems of Boolean algebra and switching circuits.
McGraw-Hill, 1970.
[126] C. C. Ragin, The Comparative Method: Moving Beyond Qualitative and Quantitative Strategies. University
of California Press, 2014.
[127] C. C. Ragin, The Comparative Method: Moving Beyond Qualitative and Quantitative Strategies. University
of California Press, 1989.
225
[128] J. M. Mendel and M. M. Korjani, “A New Methodology for Calibrating Fuzzy Sets in fsQCA Using Level 2
and Interval Type-2 Fuzzy Sets,” Inf. Sci. (Ny)., 2015.
[129] G. Lakoff, “Set theory and fuzzy sets: their relationship to natural language,” Qual. Multi-Method Res., vol.
12, no. 1, pp. 9–14, 2014.
[130] P. Taylor and L. A. Zadeh, “A Fuzzy-Set-Theoretic Interpretation of Linguistic Hedges A Fuzzy-Set-
Theoretic Interpretation of Linguistic H e d g e s,” no. February 2013, pp. 37–41, 2008.
[131] H. Bustince Sola, J. Fernandez, H. Hagras, F. Herrera, M. Pagola, and E. Barrenechea, “Interval Type-2
Fuzzy Sets are generalization of Interval-Valued Fuzzy Sets: Towards a Wider view on their relationship,”
Fuzzy Systems, IEEE Transactions on, vol. PP, no. 99. p. 1, 2014.
[132] A. Bilgin, H. Hagras, A. Malibari, M. J. Alhaddad, and D. Alghazzawi, “Towards a general type-2 fuzzy
logic approach for Computing With Words using linear adjectives,” Fuzzy Systems (FUZZ-IEEE), 2012
IEEE International Conference on. pp. 1–8, 2012.
[133] H. Tahayori and A. Sadeghian, “Median Interval Approach to Model Words with Interval Type-2 Fuzzy
Sets,” Int. J. Adv. Intell. Paradig., vol. 4, no. 3/4, pp. 313–336, Feb. 2012.
[134] J. M. Mendel and M. M. Korjani, “On establishing nonlinear combinations of variables from small to big
data for use in later processing,” Inf. Sci. (Ny)., vol. 280, no. 0, pp. 98–110, 2014.
[135] V. Cherkassky and F. M. Mulier, Learning from Data: Concepts, Theory, and Methods. Wiley, 2007.
Abstract (if available)
Abstract
Linguistic summarization is a data mining or knowledge discovery approach that describes a pattern in a database. Using techniques to generate linguistic summaries not only facilitate understanding and communication of data, but also can be used in decision‐making. In this thesis Fuzzy Set Qualitative Comparative Analysis (fsQCA) as a linguistic summarization approach is proposed. FsQCA is a methodology for obtaining linguistic summarizations from data that are associated with cases. It was developed by the eminent sociologist Prof. Charles C. Ragin, but has, as of this date, not been applied by engineers or computer scientists. Unlike more quantitative methods that are based on correlation, fsQCA seeks to establish logical connections between combinations of causal conditions and an outcome, the result being rules that summarize (describe) the sufficiency between subsets of all of the possible combinations of the causal conditions (or their complements) and the outcome. The rules are connected by the word OR to the output. Each rule is a possible path from the causal conditions to the outcome. We, for the first time, explained fsQCA in a very quantitative way, something that is needed if engineers and computer scientists are to use fsQCA. ❧ Having been able to summarize fsQCA mathematically, it is possible to study some of its key steps in order to better understand them and to even enhance them. We focus on how to greatly speed up some of the computationally intensive steps of fsQCA and show how to use the speed‐up equations to obtain some interesting and important properties of fsQCA. These properties not only provide additional understanding about fsQCA, but also lead to different ways to implement fsQCA. ❧ To actually apply fsQCA to some engineering data problems, there are some challenges that had to be overcome. We explain the challenges. Many of these challenges are results of the way membership functions are determined which is called calibration method. We explain why calibration methods that are being used by fsQCA scholars, must be applied with great care in order for their results to actually correspond to fuzzy sets, and that many times they do not lead to fuzzy sets at all, even though users think they do, which calls into question the validity of fsQCA since it is built upon fuzzy sets. We provide a new methodology for calibrating the fuzzy sets that are used in fsQCA. The result is approximated Reduced Information Level 2 Fuzzy Set Membership Function (RI L2 FS MF), and is not the MF of an ordinary FS but instead is the MF of a level 2 FS, one that has an S‐shape, the kind of shape that is so widely used by fsQCA scholars, and is so important to fsQCA. ❧ FsQCA rules involve words that are modeled using type‐1 fuzzy sets (T1 FSs). Unfortunately, once the T1 FS membership functions (MFs) have been chosen, all uncertainty about the words that are used in fsQCA disappears, because T1 MFs are totally precise. Interval type‐2 FSs (IT2 FSs), on the other hand, are first‐order uncertainty models for words. In this thesis, we extend fsQCA to IT2 FSs. More specifically, we develop IT2-fsQCA by extending the steps of fsQCA from T1 FSs to IT2 FSs. ❧ Using some key steps of fsQCA, we present a very efficient method for establishing nonlinear combinations of variables from small to big data for use in later processing (e.g., regression, classification, etc.). Variables are first partitioned into subsets each of which has a linguistic term (called a causal condition) associated with it. Our Causal Combination Method uses fuzzy sets to model the terms and focuses on interconnections (causal combinations) of either a causal condition or its complement, where the connecting word is AND which is modeled using the minimum operation. Our Fast Causal Combination Method is based on a novel theoretical result, leads to an exponential speedup in computation and lends itself to parallel and distributed processing
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Aggregation and modeling using computational intelligence techniques
PDF
Advances in linguistic data-oriented uncertainty modeling, reasoning, and intelligent decision making
PDF
Uncertainty quantification and data assimilation via transform process for strongly nonlinear problems
PDF
Application of data-driven modeling in basin-wide analysis of unconventional resources, including domain expertise
PDF
Robust real-time algorithms for processing data from oil and gas facilities
PDF
Continuum modeling of reservoir permeability enhancement and rock degradation during pressurized injection
PDF
Modeling and simulation of complex recovery processes
PDF
Novel multi-stage and CTLS-based model updating methods and real-time neural network-based semiactive model predictive control algorithms
PDF
Governance networks: internal governance structure, boundary setting and effectiveness
PDF
From matching to querying: A unified framework for ontology integration
PDF
Toward understanding speech planning by observing its execution—representations, modeling and analysis
PDF
A study of diffusive mass transfer in tight dual-porosity systems (unconventional)
PDF
Economic model predictive control for building energy systems
PDF
Multiscale and multiresolution approach to characterization and modeling of porous media: From pore to field scale
PDF
Motion pattern learning and applications to tracking and detection
Asset Metadata
Creator
Korjani, Mohammad Mehdi
(author)
Core Title
Intelligent knowledge acquisition systems: from descriptive to predictive models
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
07/22/2015
Defense Date
05/07/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
breakdown of democracy,causal combination,causal condition,data mining,footprint of uncertainty,fsQCA,fuzzy set qualitative comparative analysis,fuzzy sets,interval type‐2 fuzzy sets,interval‐valued fsQCA,level 2 fuzzy sets,linguistic summarization,nonlinear regression,OAI-PMH Harvest,prediction,regression,type‐1 fsQCA,type‐1 fuzzy sets
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Mendel, Jerry M. (
committee chair
), Ershaghi, Iraj (
committee member
), Jenkins, Brian Keith (
committee member
)
Creator Email
korjani@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-603949
Unique identifier
UC11301954
Identifier
etd-KorjaniMoh-3671.pdf (filename),usctheses-c3-603949 (legacy record id)
Legacy Identifier
etd-KorjaniMoh-3671.pdf
Dmrecord
603949
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Korjani, Mohammad Mehdi
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
breakdown of democracy
causal combination
causal condition
data mining
footprint of uncertainty
fsQCA
fuzzy set qualitative comparative analysis
fuzzy sets
interval type‐2 fuzzy sets
interval‐valued fsQCA
level 2 fuzzy sets
linguistic summarization
nonlinear regression
prediction
regression
type‐1 fsQCA
type‐1 fuzzy sets