Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Advances in linguistic data-oriented uncertainty modeling, reasoning, and intelligent decision making
(USC Thesis Other)
Advances in linguistic data-oriented uncertainty modeling, reasoning, and intelligent decision making
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Advances in Linguistic Data-Oriented Uncertainty Modeling, Reasoning, and Intelligent Decision Making by Mohammad Reza Rajati A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements of the Degree DOCTOR OF PHILOSOPHY (Electrical Engineering) May 2015 Copyright 2015 Mohammad Reza Rajati Dedicated to my parents, Ali Rajati and Nahid Hakami-Kermani, the best teachers I have ever had. ii Acknowledgements And, as the Cock crew, those who stood before The Tavern shouted “Open then the Door! You know how little while we have to stay, And, once departed, may return no more”. Omar Khayyam (Translated from Persian by Edward Fitzgerald) First and foremost, I would like to express my earnest gratitude towards my advisor and Committee Chair, Prof. Jerry M. Mendel, for his passion, enthusiasm, understanding, and for the valuable life lessons I received from him. His rigorous methodologies and patience made my PhD studies a pleasant journey for me. Completion of the present dissertation would not be possible without spending huge amounts of time with him using his critical thinking and elegant and unique problem solving stratgies. I would also like to thank my PhD Qualifying Exam and Dissertation Committee members, Profs. Fred Aminzadeh, Iraj Ershaghi, Edmond A. Jonckheere, and Shriknath Narayanan for their valuable time and the constructive feedback they provided to me for enhancement of the present work. Special thanks goes to Prof. Lotfi A. Zadeh of UC Berkeley, father of fuzzy logic, who has been a true inspiration to me since I began to become familiar with intelligent systems. Important parts of the present work are based on his ideas, and my face to face discussions and private communications with him played a pivotal role in completion of this dissertation. I would also like to appreciate the reasearch atmosphere of the Department of Electrical Engi- neering at USC, which contributed to my productivity and problem solving abilities. My interac- tions with students, faculty members, and staff were extremely helpful in my academic endeavor. iii Discussions and collaborations with many people contributed to the completion and better- ment of this dissertation. I would like to specificaly thank Drs. Reza Banirazi, Ali Bolourchi, Lisa Brenskelle, Mohsen Farhadloo, Minshen Hao, Hamid Hatami-Hanza, Reza Jafarkhani, Mostafa Kalami, Dongwoo Kang, Abe Kazemzadeh, Vahid Keshavarzzadeh, Bart Kosko, Hadi Meidani, Sanjay Purushotham, Joaqu´ ın Rapela, Terry Rickard, Elham Sahebkar Khorasani, David Wilkin- son, Dongrui Wu, Iman Yadegaran, Daoyuan Zhai, and many other anonymous people whom I met in various occasions including conferences or who were reviewers of my papers. I would like to cordially acknowledge Annenberg Fellowship Program, Summer Research Institute Fellowship Program, and CiSoft (A USC-Chevron USA Inc. Alliance), for their generous support of my research. During my undergraduate and graduate studies at Amirkabir University of Technology (Tehran Polytechnic), K. N. Toosi University of Technology, and University of Southern California, I at- tended many classes that significantly shaped my academic career. There is a very long list of classes that I took with many excellent instructors, but I would like to specifically acknowledge Profs. Edmond A. Jonckheere, Bart Kosko, Jerry Mendel, and Robert Scholtz in the Electrical Engineering Department of USC for what I learned from them in their classes. I studied Math- ematical Statistics in the USC Department of Mathematics. I appreciate the great amount of knowledge I earned about probability and statistics from the professors in that Department. I would like to thank Prof. Hamid Khaloozadeh of K. N. Toosi University of Technology, my MSc advisor, to whom I owe a significant part of my whole academic life, and Prof. Witold Pedrycz, my MSc. co-advisor, for all I have learned from him. I would also like to thank Prof. Mehrdad Abedi, with whom I took my first course in Electrical Engineering, for making my first impression of Electrical Engineering a very memorable one. iv The first time I heard about fuzzy logic was during an informal course for freshmen in Tehran Polytechnic in Fall 2001. I would like to thank its instructor, Dr. Aidin Mehdipour, who was then a senior in the EE Department of Tehran Polytechnic. My first formal course on computational intelligence was with Prof. Mohammad Bagher Menhaj, from whom I learned basic concepts of fuzzy logic, neural networks, and evolutionary computation, and with whom I published my first academic papers in international conferences. My knowledge about fuzzy logic became more complete after taking a graduate level course on fuzzy control with Dr. Alireza Fatehi, to whom I also owe an important part of my academic life. It would be unjust not to mention the great efforts of those teachers with whom I took classes in “lower education.” Undoubtedly, they contributed enormously to my knowledge of mathemat- ical concepts, foreign languages, and Persian, as well as other subjects. I hereby wholeheartedly thank them for what they have done for me. Last, but not least, I would like to extend my highest gratitude to my beloved family for their continuous and unconditional support and love through these many tough years of separation from them. My father is the most important source of inspiration and wisdom in my life. I have inherited his passion for poetry, languages, philosophy, and knowledge through both genes and pedagogy. My mother is a unique example of a soft-hearted person with exceptional patience and vision, to whom I owe even my ability to read and write! I would not be where I am without their teaching and guidance, and I cannot find proper words for thanking them for all they have generously granted me. My beloved siblings, Ahmad Reza and Sepideh have always been very supportive and kind to me, and I hereby thank them. Mohammad Reza Rajati November 2014, Los Angeles, CA v Table of Contents Abstract xv 1 Prologue 1 1.1 Introduction: Computing with Words and Advanced Computing with Words . . . 1 1.2 Some ACWW Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Linguistic Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Advanced Computing with Words and Linguistic Probabilities . . . . . . . . . . 8 1.5 Outline of The Present Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Linguistic Goal Oriented Decision Making with Rule Based Systems 15 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 A Heuristic Approach to Goal Oriented Decision Making with Rule-Based Sys- tems: Exact Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.1 Step (ii): Obtain Descriptive Rules . . . . . . . . . . . . . . . . . . . . . 24 2.2.2 Step (iii): Group The Rules . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.3 Step (iv): Establish Intermediate Goals . . . . . . . . . . . . . . . . . . 25 2.2.4 Step (v): EstablishW, The Set of Pairs of Consecutive Intermediate Goals 28 2.2.5 Step (vi): Determine Decision Rules Based on Exact Matching . . . . . . 29 2.3 Extension of Linguistic Goal Oriented Decision Making (LGODM) Using Simi- larity between Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.3.1 Steps Needed for Constructing LGODM Based on Similarity . . . . . . . 38 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.4.1 Jaccard Similarity between Rules . . . . . . . . . . . . . . . . . . . . . 40 2.4.2 Dealing with Rule Weights . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.4.3 Obtain Transition and Progression Rules . . . . . . . . . . . . . . . . . . 41 2.4.4 Repeat obtaining Transition and Progression Rules for All Pairs of Con- secutive Goals inW . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.4.5 Obtain Sustainment Rules . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.4.6 Construct the Rule Base for The Decision Maker . . . . . . . . . . . . . 45 2.5 Pruning The Similarity-Based Decision Rules . . . . . . . . . . . . . . . . . . . 47 2.6 Implementation of The Fuzzy Decision System . . . . . . . . . . . . . . . . . . 53 2.7 LGODM for Enhanced Oil Recovery with Steam . . . . . . . . . . . . . . . . . 54 2.7.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 vi 2.7.2 Design of A Choke Decision System Using LGODM . . . . . . . . . . . 59 2.7.3 Validation of The Choke Decision System . . . . . . . . . . . . . . . . . 61 2.8 Conclusions and Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3 Modeling Linguistic Probabilities and Linguistic Quantifiers Using Interval Type-2 Fuzzy Sets 71 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.2 Enhanced Interval Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.3 Modeling Probability Words and Linguistic Quantifiers Using EIA . . . . . . . . 76 3.4 Establishing Reduced-Size V ocabularies . . . . . . . . . . . . . . . . . . . . . . 79 3.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4 Uncertainty Modeling and Reasoning with Linguistic Belief Structures 89 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3 Extension of Belief Structures to Fuzzy Focal Elements and Numeric Mass As- signments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.4 Belief Structures with Fuzzy Focal Elements and Fuzzy Probability Mass Assign- ments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.5 Linguistic Belief Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.6 Combination of Evidence and Operations on Belief Structures . . . . . . . . . . 101 4.6.1 Dempster’s Rule of Combination and Operations on Belief Structures . . 102 4.6.2 Expected Value of A Linguistic Belief Structure . . . . . . . . . . . . . . 107 4.7 Reasoning with Linguistic Belief Structures: Two Examples . . . . . . . . . . . 107 4.8 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5 Extension of Set Functions to Interval Type-2 Fuzzy Sets: Applications to Evidential Reasoning with Linguistic Belief Structures 123 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.2 The Extension Principle for Set-Valued Set Functions . . . . . . . . . . . . . . . 125 5.3 Evidential Reasoning with An Interval Type-2 Fuzzy Valued Measure . . . . . . 129 5.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 5.3.2 Extension of Belief Structures to Fuzzy Focal Elements and Numeric Mass Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.3.3 Belief Structures with Fuzzy Focal Elements and Fuzzy Probability Mass Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.3.4 Linguistic Belief Structures . . . . . . . . . . . . . . . . . . . . . . . . 136 5.3.5 Extending the Concept of Belief Interval Using Extension Principle for Set Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.3.6 A Fuzzy-Valued Measure for Linguistic Belief Structures . . . . . . . . . 142 5.4 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 vii 6 Probability Calculations Using the Generalized Extension Principle for Type-1 Fuzzy Sets: Applications to Advanced Computing with Words 146 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 6.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.3 Implementation of The Solution to The PJW Problem . . . . . . . . . . . . . . . 153 6.3.1 Modeling Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.3.2 Approximate Solution to the Optimization Problem . . . . . . . . . . . . 160 6.4 On Correctness of The Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 6.5 An Engineering ACWW Problem . . . . . . . . . . . . . . . . . . . . . . . . . 172 6.6 Other Zadeh ACWW Challenge Problems . . . . . . . . . . . . . . . . . . . . . 179 6.6.1 Tall Swedes Problem (AHS) . . . . . . . . . . . . . . . . . . . . . . . . 179 6.6.2 Robert’s Problem (RP) . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 6.6.3 Swedes and Italians Problem (SIP) . . . . . . . . . . . . . . . . . . . . . 185 6.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 6.8 The Relationship between Perceptual Computing and Advanced Computing with Words using the GEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 6.9 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 7 Probability Calculations Using Variations of The Generalized Extension Principle for Interval Type-2 Fuzzy Sets: Applications to Advanced Computing with Words 194 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 7.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 7.3 Extensions of the GEP to IT2 FSs . . . . . . . . . . . . . . . . . . . . . . . . . 200 7.3.1 Extension of the GEP for Real-Valued Functions to IT2 FSs . . . . . . . 200 7.3.2 Extension of the GEP to IT2 FSs Using the GEP for Embedded T1 FSs . 202 7.3.3 Two-Stage Extension of The GEP to IT2 FSs Using The Extension of The GEP to IT2 FSs for Real-Valued Functions and Embedded Type-1 Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 7.4 Extension of Zadeh’s Solution to Interval Type-2 Fuzzy Sets . . . . . . . . . . . 208 7.4.1 The Probability of an IT2 Fuzzy Event . . . . . . . . . . . . . . . . . . . 208 7.4.2 Solution of The PJW Problem Using The Extension of The GEP for Real- Valued Functions to IT2 FSs . . . . . . . . . . . . . . . . . . . . . . . . 209 7.4.3 Solution of the PJW Problem via Extension of the GEP to IT2 FSs Using the One-Stage and Two-Stage GEP for Embedded T1 FSs . . . . . . . . 210 7.4.4 An Interesting Special Case . . . . . . . . . . . . . . . . . . . . . . . . 211 7.5 Implementation of The Solutions to The PJW Problem . . . . . . . . . . . . . . 213 7.5.1 Modeling Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 7.5.2 Solving The PJS Problem by Extension of GEP for Real-Valued Func- tions to IT2 FSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 7.5.3 Solving The PJW Problem by Extension of The GEP to IT2 FSs Using The GEP for Embedded T1 FSs . . . . . . . . . . . . . . . . . . . . . . 227 7.6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 viii 8 Syllogistic Reasoning for Advanced Computing with Words 234 8.1 Our Solutions to The Tall Swedes Problem . . . . . . . . . . . . . . . . . . . . . 234 8.1.1 Translating the Tall Swedes Problem into A Novel Weighted Average . . 235 8.1.2 Solution Using T1 FSs and the FWA . . . . . . . . . . . . . . . . . . . . 237 8.1.3 Solution Using IT2 FSs and the LWA . . . . . . . . . . . . . . . . . . . 241 8.2 Zadeh’s Methodology for Solving The Magnus Problem . . . . . . . . . . . . . 244 8.3 Implementation of Zadeh’s Solution to The Magnus Problem . . . . . . . . . . . 247 8.4 Critique of Zadeh’s Solution to The Magnus Problem . . . . . . . . . . . . . . . 250 8.5 Fuzzy Reasoning and Calculation of Linguistic Upper and Lower Probabilities via Linguistic Weighted Averages for The Magnus Problem . . . . . . . . . . . . 252 8.6 Our Solution to The Swedes and Italians Problem . . . . . . . . . . . . . . . . . 258 8.7 Implementation of the Solution to The Swedes and Italians Problem . . . . . . . 263 8.8 Syllogistic Reasoning Using The Fuzzy Belief Measure for Advanced Computing with Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 8.9 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 9 Epilogue: Advanced Computing with Words: Status, Challenges, and Future 279 A List of Patents and Publications Related to The Dissertation 287 B Some Proofs and Important Theorems 290 B.1 Properties of The Numeric Probability of T1 and IT2 Fuzzy Events . . . . . . . . 290 B.2 -cut Decomposition Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 B.3 Proof of Theorem 8.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 B.4 Proof of Theorem 8.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 B.5 Proof forAtleast(Q) =Q WhenQ Is Monotonically Non-decreasing . . . . . 296 B.6 Existence of Solutions to The Optimization Problems of Equation (8.32) . . . . . 296 BIBLIOGRAPHY 297 ix List of Figures 1.1 Fuzzy Logic = Computing with Words: BCWW, ICWW, and ACWW. 0.1 . . . . 6 2.1 The process of grouping the descriptive rules according to their consequents C(i)’s. x = (x 1 ;x 2 ;:::;x p ) andA = (A 1 ;A 2 ;:::;A p ) are used to express the rules more compactly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2 Rule comparisons flow so as to gradually incrementy to become High when it is initially described by either Low, Medium Low, Medium, or Medium High. . . 27 2.3 Rule comparisons flow so as to gradually drive y to become Low when it is initially described by either Medium Low, Medium, Medium High, or High. . . 27 2.4 Rule comparisons flow so as to gradually drivey to become Medium when it is initially described by either Low, Medium Low, Medium High, or High. . . . . . 27 2.5 The schema for the procedure of obtaining rules to drive the system from a less desirable goalC to a more desirable goal,C + . . . . . . . . . . . . . . . . . . 42 2.6 The schema for performing the procedure on consecutive pairs of intermediate goals for obtaining all of theT M andP M rules for moving the system towards the ultimate goal,C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.7 The schema for obtaining sustainment rules for keeping the system in the ulti- mate goal,C .L is the number of descriptive rules in theC group. . . . . . . 45 2.8 Cyclic Steam Stimulation. 0.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.9 A schema of the LGODM in a choke decision system. . . . . . . . . . . . . . . 61 2.10 A NARX model of an oil well. The boxes that contain the letter D represent tap delays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.11 Simulation of oil well production including the LGODM.z(t) = (P (t);(t);Q(t)). The boxes containing the letter D represent tap delays. Heavier lines represent vector signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 2.12 Simulated production for well 6. . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.13 Recommended choke changes for well 6. . . . . . . . . . . . . . . . . . . . . 68 2.14 Simulated pressure for well 6. . . . . . . . . . . . . . . . . . . . . . . . . . . 68 2.15 Simulated temperature drop for well 6. . . . . . . . . . . . . . . . . . . . . . 69 3.1 FOUs of the vocabulary of 34 linguistic probabilities on a percentage scale. . . 77 3.2 FOUs of the vocabulary of 24 linguistic quantifiers on a percentage scale. . . . 78 4.1 V ocabulary of IT2 FSs representing linguistic probabilities. . . . . . . . . . . . 108 x 4.2 V ocabulary of IT2 FSs representing quality of service. . . . . . . . . . . . . . . 109 4.3 Average quality of service in the restaurant. . . . . . . . . . . . . . . . . . . . 111 4.4 Lower and upper probabilities that the quality of service is good in the restaurant calculated using Yen’s measures. . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.5 IT2 FS models of usuality words. . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.6 IT2 FS models of usuality words. . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.7 Focal elements of the belief structure e B Travel calculated by DNLWA. . . . . . 118 4.8 Probability mass assignments of the belief structure e B Travel . The words in the numerator of the expressive formula for the DNLWA are given in the title of each subfigure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.9 The IT2 FS modeling the hypothesis of spending around $10,000. . . . . . . . 120 4.10 The lower and upper probabilities of the hypothesis of spending around $10,000, computed using Yen’s measures. . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.1 FOU of an IT2FS described by nine parameters. . . . . . . . . . . . . . . . . . 154 6.2 V ocabulary of IT2FSs representing linguistic heights. . . . . . . . . . . . . . . 155 6.3 V ocabulary of IT2FSs representing linguistic probabilities. . . . . . . . . . . . 156 6.4 Middle embedded type-1 fuzzy sets for left shoulder, interior, and right shoulder FOUs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.5 V ocabulary of linguistic heights modeled by middle embedded type-1 fuzzy sets. 159 6.6 V ocabulary of middle embedded type-1 fuzzy set models of linguistic probabil- ities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 6.7 Scatter plots for eachP W using UMFs as T1FS models of words, when distri- butions are Gaussian. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 6.8 The detected envelopes P W (v) for plots in Fig. 6.7. UMFs were used as T1FS models of words and probability distributions were Gaussian. . . . . . . . . . . 165 6.9 The detected envelopes P W (v) when middle embedded type-1 fuzzy set mod- els of words and Gaussian distributions were used. . . . . . . . . . . . . . . . . 166 6.10 The detected envelopes P W (v), which are solutions to the PJW problem “Prob- ably John isW . What is the probability that John isW ?” . . . . . . . . . . . . 173 6.11 V ocabulary of IT2FSs representing linguistic reliability [173, Fig. 7.5]. . . . . . 174 6.12 The detected envelopes P R when UMF fuzzy set models of words and Weibull distributions were used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6.13 The perceptual computer that uses FS models for words. . . . . . . . . . . . . 191 7.1 FOU of an IT2 FS described by nine parameters. . . . . . . . . . . . . . . . . 214 7.2 V ocabulary of IT2 FSs representing linguistic heights obtained by EIA. . . . . . 215 7.3 V ocabulary of IT2 FSs representing linguistic probabilities obtained by EIA. . . 217 7.4 V ocabulary of IT2 FSs with normal LMFs, representing linguistic heights. . . . 219 7.5 V ocabulary of IT2 FSs with normal LMFs, representing linguistic probabilities. 219 7.6 The detected envelopes e P f W (v) when IT2 FS models of words in Fig. 7.5 and the GEP for real-valued functions were used. . . . . . . . . . . . . . . . . . . . 226 xi 7.7 The detected envelopes e P f W (v) when IT2 FS models of words in Fig. 7.5 and the two-stage GEP were used. . . . . . . . . . . . . . . . . . . . . . . . . . . 230 8.1 (a) The 7-word vocabulary. (b) T1 FS models for Tall and notTall. (c) T1 FS models for Most and Few. (d) Average height, computed by an FWA. . . . . . 238 8.2 (a) The five-word vocabulary. (b) IT2 FS models for g Tall and g notTall. (c) IT2 FS models for g Most and g Few. (d) Average height, computed by a LWA. Note that all upper membership functions in (a), (b), (c), and (d) are the same as the T1 FSs in Figs. 8.1(a), 8.1(b), 8.1(c), and 8.1(d), respectively. . . . . . . . . . 242 8.3 The membership functions of the vocabulary of type-1 linguistic probabilities. . 249 8.4 The membership functions ofMost Most. . . . . . . . . . . . . . . . . . . 249 8.5 Linguistic information about the distribution of blond people among Swedes. . 253 8.6 Linguistic lower probability that Magnus is blond. . . . . . . . . . . . . . . . . 257 8.7 Linguistic upper probability that Magnus is blond. . . . . . . . . . . . . . . . . 258 8.8 The fuzzy set model for ‘Much taller”. . . . . . . . . . . . . . . . . . . . . . . 264 8.9 Complement of not Much taller (the dark shaded FOU), which is equal to notMuchtaller 1 [notMuchtaller 2 . . . . . . . . . . . . . . . . . . . . . . 266 8.10 The fuzzy sets g AH 11 and g AH 12 which are calculated by ordinary LWAs. . . . 266 8.11 The fuzzy set g AH 1 , which is the union of g AH 11 and g AH 12 . . . . . . . . . . . 267 8.12 The fuzzy setnot g AH 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 8.13 The fuzzy sets e E, e F , and e G whose union yieldsnot g AH 1 . . . . . . . . . . . . 268 8.14 The fuzzy sets g AH 21 , g AH 22 and g AH 23 . . . . . . . . . . . . . . . . . . . . . . 269 8.15 The fuzzy set g AH 2 , which is the union of g AH 21 , g AH 22 , and g AH 23 . . . . . . . 270 8.16 The vocabulary of interval type-2 fuzzy set models for amounts of difference. . 271 8.17 V ocabulary of IT2 FSs representing linguistic probabilities. . . . . . . . . . . . 274 8.18 V ocabulary of IT2 FSs representing the focal elements of e B Height . . . . . . . . 274 8.19 Probability of the fuzzy event Short. . . . . . . . . . . . . . . . . . . . . . . . 275 B.1 A convex T1 FSA and its complement,B, which is also convex. . . . . . . . . 292 xii List of Tables 1.1 Some of Zadeh’s ACWW problems involving linguistic probabilities and linguis- tic quantifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2 Engineering versions of some of Zadeh’s ACWW problems. . . . . . . . . . . . 14 2.1 Increase in well production in the presence of the LGODM . . . . . . . . . . . . 66 3.1 Parameters of FOUs of the vocabulary of 17 words derived from Improbable and Probable as well as Tossup. The scale is percentage. . . . . . . . . . . . . . . . . 79 3.2 Parameters of FOUs of the vocabulary of 18 words derived from Unlikely and Likely as well as Tossup and Possible. The scale is percentage. . . . . . . . . . . 80 3.3 Parameters of FOUs of the vocabulary of 24 linguistic quantifiers. The scale is percentage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.4 Pairwise similarities between the 17 words of the vocabulary derived from Im- probable, Probable and Tossup. . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.5 Pairwise similarities between the 18 words of the vocabulary derived from Un- likely, Likely, Tossup, and Possible. . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.6 Pairwise similarities between the 24 words of the vocabulary of linguistic quanti- fiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.1 Membership function parameters of the probability words depicted in Fig. 6.3 . . 109 4.2 Jaccard Similarities between e Y AQoS and members of the vocabulary of linguistic quality of service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.3 Jaccard Similarities between e Y AQoS and members of the vocabulary of linguistic quality of service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.4 Jaccard Similarities between g LProb (Good) and g LProb + (Good) and members of the vocabulary of linguistic probabilities e P i . . . . . . . . . . . . . . . . . . . 113 4.5 FOU parameters of usuality words . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.6 Jaccard Similarities between g LProb ( g 10k) and g LProb + ( g 10k) and members of the vocabulary of linguistic probabilities e P i . . . . . . . . . . . . . . . . . . . . 121 6.1 Membership function parameters of the height words depicted in Fig. 6.2 . . . . 154 6.2 Pairwise similarities between the height words depicted in Fig. 6.2 . . . . . . . . 155 6.3 Membership function parameters of the probability words depicted in Fig. 6.3 . . 157 6.4 Pairwise similarities between the probability words depicted in Fig. 6.3 . . . . . 157 xiii 6.5 Membership function parameters of the height words depicted in Fig. 6.5 . . . . 159 6.6 Membership function parameters of the probability words in Fig. 6.6 . . . . . . . 161 6.7 Similarities between the T1 FSs depicted in Fig. 6.8 and UMFs for linguistic probability words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 6.8 Similarities between the T1 FSs depicted in Fig. 6.9 and middle T1 FSs for lin- guistic probability words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 6.9 Summary of the solutions to the problem “What is the probability that John isW , given “Probably John is tall” . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 6.10 Similarities between the words in Fig. 6.10 and the UMFs of the linguistic prob- ability words of Fig.6.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 6.11 Membership functions of the words depicted in Fig. 6.11 [173, Table 7.13] . . . . 175 6.12 Pairwise similarities between the words depicted in Fig. 6.11 . . . . . . . . . . . 175 6.13 Similarities between the words depicted in Fig. 6.12 and linguistic probability words176 6.14 Summary of the solutions to the problem “What is the probability that the relia- bility of product X isR” given “Probably product X has high reliability” . . . . . 179 7.1 Membership function parameters of the height words depicted in Fig. 7.2 . . . . 216 7.2 Pairwise similarities between the height words depicted in Fig. 7.2 . . . . . . . . 216 7.3 Membership function parameters of the probability words depicted in Fig. 7.3 . . 217 7.4 Pairwise similarities between the probability words depicted in Fig. 7.3 . . . . . 218 7.5 Membership function parameters of the height words depicted in Fig. 7.4 . . . . 220 7.6 Pairwise similarities between the height words depicted in Fig. 7.4 . . . . . . . . 220 7.7 Membership function parameters of the probability words depicted in Fig. 7.5 . . 221 7.8 Pairwise similarities between the probability words depicted in Fig. 7.5 . . . . . 221 7.9 Similarities between the IT2 FSs depicted in Fig. 7.6 and the linguistic probability words depicted in Fig. 7.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 7.10 Similarities between the IT2 FSs depicted in Fig. 7.7 and linguistic probability words in Fig. 7.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 7.11 Summary of the solutions to the problem “What is the probability that John isW , given “Probably John is tall” . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 8.1 Similarities betweenAH in Fig. 8.1(d) and the five words in Fig. 8.1(a) . . . . . 240 8.2 Similarities between g AH in Fig. 8.2(d) and the five words in Fig. 8.2(a). . . . . . 245 8.3 Similarities between Zadeh’s solutionMost Most and linguistic probabilities 250 8.4 Similarities between the linguistic lower and upper probabilities with the mem- bers of the vocabulary of linguistic probabilities . . . . . . . . . . . . . . . . . . 259 8.5 Similarities between g AH 2 and members of the vocabulary of linguistic height differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 xiv Abstract In this dissertation, we focus on data oriented uncertainty modeling, reasoning, and infer- ence, and their applications to intelligent systems that implement the paradigm of Computing with Words (CWW). Computing with Words problems have been classified into at least two cat- egories: Basic Computing with Words, and Advanced Computing with Words. Basic Computing with Words mainly deals with applications of rule-based systems while Advanced Computing with Words deals with implicit assignment of linguistic truth, probability, and possibility through intricate natural language statements. In this dissertation, we present a Linguistic Goal-Oriented Decision-Making method using rule-based systems. Unlike previous applications of rule-based systems that use them in mere function approximation schemes or apply them to modeling rather uncomplicated expert knowl- edge, our proposed goal-oriented decision-making method attempts at determining the desired states of a system (described using words) by investigating the linguistic rules that specify the conditions that yield it, and then designs a methodology to move the system states towards more desirable states by changing a decision variable and comparing the rules that describe the condi- tions that yield each of the states of the system. This approach is another realization of decision- making with words or control with words, and can be seen as a bridge between applications of rule-based systems and Computing with Words. We apply our method to the problem of enhanced oil recovery as well. We also proceed with solving Advanced Computing with Words problems that deal with im- plicit assignments of truth, probability, and possibility to various attributes through natural lan- guages. We specifically focus on problems that deal with linguistic probabilities. We demonstrate xv how Interval Type-2 Fuzzy Set (IT2 FS) models of probability words can be synthesized us- ing data collected from subjects, and establish a general framework based on Dempster-Shafer Theory of Evidence to calculate probabilities and perform inference based on natural language information containing linguistic quantifiers and linguistic probabilities via constructs that are called Linguistic Belief Structures. We demonstrate Novel Weighted Averages and Doubly Nor- malized Weighted Averages as essential tools for inferring from Linguistic Belief Structures. We also develope an Extension Principle for extending set-valued functions to Interval Type-2 Fuzzy Sets and apply it to inference from Linguistic Belief Structures. We also use Syllogistic Reasoning and the methodology of Linguistic Belief Structures to solve Advanced Computing with Words challenge problems that were proposed by Zadeh. We also implement Zadeh’s methodology of handling linguistic probabilities, which involves the Gen- eralized Extension Principle. When Solving Advanced Computing with Words problems, the Generalized Extension Principle yields functional optimization problems that are very difficult to solve analytically, so we devise a numerical method to deal with those optimization problems. As Zadeh’s methodology is based on Type-1 Fuzzy Sets (T1 FSs), we first implement it for T1 FSs. Then we extend it to Interval Type-2 Fuzzy Sets, since they are viable models of various types of uncertainty that are associated with a word. A critical review of the status, challenges, and future of Advanced Computing with Words is also presented. xvi Chapter 1 Prologue Everything is vague to a degree you do not realize till you have tried to make it precise. Bertrand Russell 1.1 Introduction: Computing with Words and Advanced Comput- ing with Words C OMPUTING with Words (CWW or CW) [31,80,98,168,170,175,199,226,262,270,325, 327,329,345] was probably conceived as the main area of application of fuzzy logic [316] when the field was originated. It is a methodology of computation whose objects are words rather than numbers, although those words are linked to numbers and classical calculations that can be carried out by computing machinery, via membership functions of fuzzy sets associated with words. To do that, a process called precisiation of meaning [333] is essential, which, in a nutshell, is determining membership functions of the fuzzy sets that model the words. Zadeh distinguishes two subareas for CWW: 1 (i). Basic Computing with Words (BCWW), which mainly deals with descriptions of complex systems in terms of fuzzy IF-THEN rules [334]. Soft constraints are generally possibilistic, i.e. they describe (physical) attributes like speed, temperature, pressure, color, etc. The soft constraints are assigned explicitly by the IF-THEN rules. BCWW mainly deals with the function approximation properties of fuzzy systems [134,264], and has been applied to engineering problems [176] such as modeling [110,248], control systems design [204,267], clustering [205] and data mining [265, 269]. (ii). Advanced Computing with Words (ACWW) [175, 333], which deals with the assignment of soft constraints through complicated natural language statements. Its related problems are closer to natural languages, in that the assignments of soft constraints are usually im- plicit, e.g. in the statement ”Most Swedes are tall,” the soft constraint ”Most” is implicitly assigned to the proportion of Swedes who are tall. Besides possibilistic soft constraints, ACWW deals with veristic and probabilistic [330, 331] soft constraints. World Knowl- edge 1.1 or Common Sense Knowledge plays a pivotal role in solving ACWW problems. The fuzzy logic community has mainly focused on the function approximation applications of fuzzy logic, especially after the seminal work of Mamdani and Assilian [160], which uses an interpretation of fuzzy systems that is particularly suitable for function approximation. Such an interpretation is different from the natural language aspect of fuzzy logic and from the idea of building computational machinery that is fed with words, computes with words, and communi- cates solutions of problems in terms of words. 1.1 World Knowledge means information either given in the statement of a problem or needed to solve the problem. 2 Nevertheless, CWW has witnessed a resurrection through its application to various real- world problems including analysis of complex systems [259], control [162, 348, 351], risk as- sessment [154], decision-making [60, 92, 97, 99, 116, 163, 166, 284], classification [7], website quality assessment [100], reinforcement learning [352], text processing [346], object oriented programming [21], information retrieval [22,136], expert systems [124,126,202,247], reputation systems [314], and natural language generation [117]. Many theoretical approaches have been proposed to formalize CWW, among which are: 2- tuple models [98], rough sets [207], concept algebra [273], fuzzy arithmetic [48], voting model semantics [141], ontologies [93, 222], Turing Machines [209, 260], fuzzy automata [313], for- malization of the Generalized Constraint Language (GCL) [125] (GCL was proposed by Zadeh [326,330,338]), fuzzy Petri nets [30], meta-linguistic axioms [256], approximate reasoning [299], and Perceptual Computing (Per-C) [164, 166, 173, 194]. It is not surprising that different people have different viewpoints on CWW [175]. In many of the above contributions, the term CWW has been used inclusively, in the sense that any application of fuzzy logic for reasoning with words has been considered as an instance of CWW. While there is no problem with such a viewpoint (especially since the founding father even classifies rule- based function approximation applications of fuzzy logic as BCWW), we prefer to view CWW as a methodology of computation whose inputs, reasoning procedures, and outputs involve natural language words, phrases, and propositions rather than just numeric values. Nevertheless, the boundaries of CWW seem to be fuzzy themselves. In fact, there are CWW applications for which the assignments of soft constraints are not implicit [277], and are even performed through fuzzy IF-THEN rules [282], but still the inputs and outputs are natural language words. Those applications also usually do not deal with linguistic truth, possibility, or probability, something that is expected from ACWW problems. On the other hand, there are applications of fuzzy logic 3 for function approximation that deal with linguistic truth [78, 228] and linguistic probability [66, 144, 146, 249]. We adhere to two tests 1.2 for calling a computational method CWW [173] both of which we suggest must be passed or else the work should not be called CWW. A third test is optional but is strongly suggested. The tests are: (i). A word must lead to a membership function rather than a membership function leading to a word. (ii). The output from CWW must be at least a word and not just a number. (iii). Because words mean different things to different people, they should be modeled using at least Interval Type-2 Fuzzy Sets (IT2 FSs). Test number 3 is ”optional” so as not to exclude much research on CWW that uses Type-1 Fuzzy Sets (T1 FSs), even though we strongly believe that this test should also be a require- ment for CWW, since a T1 FS cannot simultaneously capture intra-personal and inter-personal uncertainties about a word, whereas an IT2 FS can. 1.3 Although Zadeh has partitioned CWW into only BCWW and ACWW, we feel that CWW should be partitioned into BCWW, Intermediate CWW (ICWW) and ACWW, as depicted in Fig. 1.1, which is our schema of the viewpoint of Fuzzy Logic = Computing with Words. To avoid con- fusion, in this chapter we call the function approximation applications of fuzzy logic, BCWW; 1.2 Detailed discussions about these tests are given in [173, pp. 312-313]. 1.3 Another test appears in [173]: Numbers alone may not activate the CWW engine (e.g., IF-THEN rules). Numbers are modeled as singleton FSs and there is nothing fuzzy about them. Jon Garibaldi (Univ. of Nottingham) has pointed out (in a conversation with the first author) that in e.g., a medical application, patient data may all be numbers, rules are used, and then outputs are not defuzzified, but instead are mapped into a linguistic output (with similarity). We agree with this, and have therefore dropped this test. 4 they deal with fuzzy systems that have numerical outputs, like fuzzy rule-bases, fuzzy classifiers, and fuzzy clustering. We reserve the name ICWW for those CWW problems that involve simple assignments of attributes, including the usage of IF-THEN rules [285] or aggregation operators such as Ordered Weighted Averages (OWAs) [300], Fuzzy Weighted Averages (FWAs) [59,82,150] or Linguistic Weighted Averages (LWAs) [277], but whose outputs are linguistic. In Linguistic summarization, data lead to a model that may involve simple assignments of attributes through IF-THEN rules as well as implicit assignments of attributes through linguistic quantifiers (e.g., most people who aren’t exposed to sunlight have serious vitamin D deficiency). Thus, linguistic summarization seems to have some features of ACWW, but those features appear in the answer that it yields, not necessarily in the world knowledge that it deals with, which is why we have located it in ICWW. Observe in Fig. 1.1 that fuzzy IF-THEN rules occur in both BCWW and ICWW problems. ACWW problems involve intricate assignments of truth, probability, and pos- sibility. Observe that the Perceptual Computer (Per-C) methodology can address both ICWW and ACWW problems, which is illustrated in Fig. 1.1 by showing an overlap between Per-C and both ICWW and ACWW. In this chapter, we focus mainly on ACWW and provide a roadmap for solving ACWW prob- lems. We briefly review how truth, probability, and possibility can be dealt with in the framework of ACWW, and sketch methodologies for solving some selected ACWW problems by different methodologies, and provide a perspective of the status, challenges and future of ACWW. 1.2 Some ACWW Problems Zadeh has introduced some ACWW problems that involve everyday reasoning and decision mak- ing by linguistic probabilities [326,333,338,339,341]. Some of those problems are given in Table 5 Figure 1.1: Fuzzy Logic = Computing with Words: BCWW, ICWW, and ACWW. 1.4 1.1. To demonstrate how such problems occur for more realistic statements, we show, in Table 1.2, the Engineering versions of the problems for the first four rows of Table 1.1. We do this because we strongly believe that for ACWW to be taken more seriously, its problems must be shown to be of more practical relevance. 1.3 Linguistic Probabilities Since the inception of fuzzy logic, there has always been a controversy over its relationship with probability theory [324]. It is now of wide acceptance that fuzzy logic and probability theory are “complementary rather than competitive” [324]. Perhaps the first serious attempt to connect fuzzy set theory to probability theory was completed by Zadeh himself in [317], where he devised a numeric probability measure for fuzzy events. 1.4 The focus of this figure is the applied facet of fuzzy logic; it excludes other aspects of it, such as the logical and pure mathematical aspects. 6 In natural languages, one deals with quantifiers like most, few, some, a lot, and linguistic probabilities such as likely, unlikely, etc. Linguistic quantifiers represent a soft constraint on the portion of a population, and linguistic probabilities represent the linguistic uncertainty about the probability of occurrence of some event. Zadeh established a framework based on fuzzy set theory for analyzing fuzzy quantifiers [320] and syllogistic reasoning with them [322]. In these works, he showed that linguistic quantifiers have a very close relationship to linguistic probabilities, i.e., he showed that mathematically, a fuzzy quantifier can be viewed as a linguistic probability. Ad- ditionally, studies in the field of psychology have demonstrated that probability terms in natural languages can better be modeled by intervals and fuzzy sets [258], i.e. interval and fuzzy proba- bilities. Together, these facts call for a mathematical framework in which fuzzy probabilities can be analyzed rigorously. Zadeh invented a notion of fuzzy probability for fuzzy events [321], one that relies on the concept of fuzzy -count of a fuzzy set. Yager has a totally different approach to calculating the fuzzy probability of a fuzzy event [288, 290], one in which the fuzzy probability of a fuzzy event is constructed by calculating the probabilities of each-cut of that event. Although the abovementioned attempts at rigorizing the notion of fuzzy probabilities are ele- gant, they do not go through the axiomatic approach to definition of fuzzy probabilities. Recently, however, Halliwell and Shen [88, 89] have provided a mathematically sound axiomatic approach to the define a fuzzy probability measure, one that can be considered as the general setting for analyzing fuzzy probabilities, and is one that we shall adopt. In his recent articles, Zadeh advocates constructing a computational theory of perceptions [326, 329]. Apparently, the great ability of humans to make decisions in the presence of uncer- tainty is of special interest to the AI community. Such a feature is due to humans’ capability for manipulating imprecise perceptions. As a result, humans would like to interact using words and 7 to evaluate situations using everything from numbers to intervals of numbers to words. Hence, it is highly desirable to have a computational landscape for analyzing perceptions. Zadeh argues that fuzzy logic provides such a landscape. Humans’ perceptions can lead to their assessment of probability of events in real life. They assign linguistic words to probabilities of events. An im- portant problem is how to infer from such linguistic probabilities using the mathematics of fuzzy logic. Mendel has argued that a scientifically correct model for a word is a type-2 fuzzy set so that the linguistic uncertainties computed by such fuzzy sets can propagate through all computations [168, 175], because words mean different things to different people and their uncertainty can be modeled using type-2 fuzzy sets whereas it cannot be using type-1 fuzzy sets, and therefore, human perceptions must be modeled with type-2 fuzzy sets. Linguistic quantifiers and linguistic probabilities are among the important natural language information carriers through which people describe their perceptions and decisions; hence, a formal framework for manipulation of type-2 fuzzy probabilities is needed. 1.4 Advanced Computing with Words and Linguistic Probabilities Computing with words (CWW) is believed to play a pivotal role in the mainstream of research on automation of everyday reasoning and decision-making [93], [142], [206], [342]. Everyday rea- soning exploits information stated in natural language; therefore, pertinent techniques are needed to describe and interpret information stated in natural language propositions, combine informa- tion coming from multiple sources with different degrees of reliability, and make decisions based on such information. 8 In essence, CWW can be viewed as a computational theory of perceptions [328]. Human per- ceptions are about physical and mental objects like length, color, speed, time, direction, strength, likelihood, truth, intent, etc. The key challenge for building a computational theory of perceptions is that perceptions of such objects are imprecise, due to the limited accuracy of human’s senso- rimotor system. As a result, one of the central concepts of CWW is “precisiation of meaning,” which calls for building a computational model pertinent to the semantics of a statement in a nat- ural language. This is mainly performed by assigning fuzzy sets to linguistic attributes, be they possibilistic, probabilistic, or veristic [337]. It is also believed that for CWW, a viable method of completing this task is collecting data about a linguistic constraint from subjects, and building type-2 fuzzy sets based on the empirical data [151]. As we mentioned earlier, recently, Zadeh proposed that two levels of CWW could be distin- guished [170]: In basic (level 1) CWW, information carriers are numbers, intervals, and words. Propositions are usually simple “assignment” ones like: X is between 3 and 4 (X2 [3; 4]),X is very small, or IfX is high, thenY is low. In advanced (level-2) CWW, information carriers can also be more complicated natural language propositions involving modifiers, truth values, possibilistic constraints, linguistic probabilities, usuality constraints, etc. Essentially, in level-2 CWW, assignment of constraints in such statements can be implicit, e.g., using a modifier “Some” in the statement “Some PhD students at the University of Southern California are Iranian” implies assigning a linguistic value (Some) to the portion of USC’s PhD students who are Iranian. Zadeh asserts that in level-2 CWW, propositions (and not just words) are precisiated, i.e. the whole statement is reduced down to the assignment of generalized constraints to some variables; therefore, level-2 CWW must exploit techniques for capturing the semantics of a proposition in a natural language in terms of the semantics of its building blocks. 9 As was mentioned earlier, a scientifically correct first order model for a word is a type-2 fuzzy set. Zadeh also predicts that type-2 fuzzy sets will play a central role in Advanced (or level- 2) CWW [170]; thus, it can be envisioned that type-2 fuzzy sets will be used as the rudiments of Advanced CWW in the forthcoming years. In essence, we view CWW as the interactions between a group of people and at least another person (or computer). Henceforth, the group’s perception of a word needs to affect the uncertainty about the word in the word’s fuzzy set model. This can be viewed as the main motivation for using type-2 fuzzy sets in CWW. Moreover, as will be seen in the sequel, type-2 fuzzy sets provide a natural framework to provide uncertain numeric values like “about 20% ” as solutions to CWW problems, and the term “about” captures the inter-person and intra-person uncertainties propagated via information aggregation methods. One of the most important problems in CWW is aggregation of linguistic information for hierarchical multi-criteria decision making. Linguistic Weighted Averages are believed to provide an appropriate framework for performing this task [277], [279], especially when we deal with words precisiated with type-2 fuzzy sets. A set of challenge problems for testing CWW methodologies has been proposed in [336], [325], [339], [333]. Perhaps the most famous challenge problem in this set is the so-called “tall Swedes” problem: Most Swedes are tall. What is the average height of Swedes? One of the most challenging problems in ACWW is manipulation of propositions containing linguistic and imprecise probabilities [340], [170], and some of Zadeh’s challenge problems deal with linguistic probabilities. 10 1.5 Outline of The Present Dissertation This dissertation presents some advances in data oriented uncertainty modeling, reasoning, and intelligent decision-making that are relevant to the paradigm of Computing with Words. Basic Computing with Words mainly deals with applications of rule-based systems. In this dissertation, we present a linguistic goal oriented decision-making method based on rule-based systems. The method tries to establish a rule-base that describes the output of a system in terms of some in- dependent variables and a decision variable based on historical data that are available about the system. Then, those descriptive rules are grouped, and conditions that lead to particular linguistic outcomes (e.g., Low, Medium, or High production of an oil well are determined. Then, the de- scriptive rules in each group are compared to those of a group with more desirable outcome, so that necessary changes in the decision variable that make the system from a less desirable out- come towards a more desirable outcome are determined. Consequently, some decision rules are synthesized that prescribe future changes in the decision variable based on the current value of the decision variable and the values of the independent variable. Since this procedure resembles mak- ing decision or control rules in natural language, one can call it goal oriented control with words, and it can be seen as another bridge between rule-based systems and the paradigm of computing with words. The main direction of the rest of the present research is towards using linguistic probabili- ties for reasoning in the presence of uncertainty. Linguistic probabilities carry both probabilistic and linguistic (fuzzy) uncertainties. We will show how they are modeled using data collected from subjects, how one can perform inference using them, and how they can be employed to solve Advanced Computing with Words problems. In particular, we show how natural language statements about linguistic probabilities can be modeled as Linguistic Belief Structures. We also 11 demonstrate how Linguistic Weighted Averages, Arithmetic Operations, Doubly Normalized Lin- guistic Weighted Averages, and set-valued Extension Principle can be used to infer probabilities and average values from Linguistic Belief Structures. The rest of this dissertation is organized as follows: In Chapter 2, we present a Linguistic Goal Oriented Decision-Making (LGODM) methodology with rule-based systems. We apply LGODM to the problem of enhanced oil recovery. In Chapter 3, we demonstrate how Interval Type-2 Fuzzy Set (IT2 FS) models of probability words can be synthesized using data collected from subjects. In Chapter 4, we establish a general framework, based on Dempster-Shafer Theory of Evidence, to calculate probabilities and perform inference based on natural language information containing linguistic quantifiers and linguistic probabilities. In Chapter 5 an Extension Principle is devised for extending set functions to Interval Type-2 Fuzzy Sets, and it is used to employ the notion of a broadened belief interval to infer linguistic probabilities from Linguistic Belief Structures. In Chapter 6, we implement Zadeh’s methodology of handling linguistic probabilities, which involves the Generalized Extension Principle. As Zadeh’s methodology is based on Type-1 Fuzzy Sets (T1 FSs), we implement it for T1 FSs. In Chapter 7, we present various versions of the Generalized Extension Principle for Interval Type-2 Fuzzy Sets, and implement solutions to Advanced Computing with Words problems using them. In Chapter 8, we use Syllogistic Reasoning and the methodology of Linguistic Belief Structures that were developed in Chapter 4, to solve Advanced Computing with Words challenge problems that were proposed by Zadeh. Finally, in Chapter 9, we present a critical review of the status, challenges, and future of Advanced Computing with Words. 12 Table 1.1: Some of Zadeh’s ACWW problems involving linguistic probabilities and linguistic quantifiers. World Knowledge Problem Statement Most Swedes are tall. What is the average height of Swedes? Probably John is tall. What is the probability that John is short/very tall/not very tall? Most Swedes are much taller than most Italians. What is the difference between the average height of Swedes and the average height of Italians? Usually Robert leaves the office at 5 p.m. everyday and usually it takes him about an hour to get home. At what time does Robert get home? What is the probability that he is home before 6:15 p.m.? Vera has a son in mid-twenties and a daughter in mid-thirties. Usually mother’s age at birth of a child is between approxi- mately 20 and approximately 40. What is Vera’s age? Most Swedes are tall, and most tall Swedes are blond. What is the probability that Magnus (a Swede picked at random) is blond? Usually, most United flights from San Francisco leave on time. I am scheduled to take a United flight from San Francisco. What is the probability that my flight will be delayed? Usually several cars are stolen every day in Berkeley. What is the average number of cars stolen per month in Berkeley? X is a real-valued random variable. Usu- allyX is much larger than approximatelya and usuallyX is much smaller than approx- imatelyb. What is the probability thatX is approxi- matelyc, wherec is a number betweena and b? A and B are boxes, each containing 20 balls of various sizes. Most of the balls in A are large, a few are medium and a few are small; and most of the balls in B are small, a few are medium and a few are large. The balls in A and B are put into box C. What is the number of balls in C which are neither large nor small? A box contains about 20 balls of various sizes and there are many more large balls than small balls. What is the number of small balls? 13 Table 1.2: Engineering versions of some of Zadeh’s ACWW problems. World Knowledge Problem Statement Most of the products of Company X have somewhat short life-times. What is the average life-time that is ex- pected from the products of Company X? Probably product X is highly reliable. What is the probability that the reliability of X is low? Most of the products of Company X have much lower maintenance costs than most of the products of Company Y . What is the difference between the average maintenace costs of the products of Com- pany X and the average maintenace costs of the products of Company Y? Usually Product I lasts for about 3 years and is then replaced by a refurbished one, and usually, the refurbished Product I lasts for about 2 years. What is the probability that a new Product I is not needed until the seventh year? 14 Chapter 2 Linguistic Goal Oriented Decision Making with Rule Based Systems Setting a goal is not the main thing. It is deciding how you will go about achieving it and staying with that plan. Tom Landry, American football player and coach W E present a novel goal oriented decision making algorithm that uses a rule-based de- scription of a system to build a set of fuzzy rules in order for the output of that system to attain a linguistic goal. The linguistic goal oriented decision making (LGODM) method uses a set of descriptive rules that are synthesized using historical data. Then, those descriptive rules are grouped according to their consequents, so as to identify the situations in which more desired outcomes occur. The rules having a less desirable goal in their consequent are compared to those with more desirable consequent to identify pairs of rules that have exactly the same antecedents (excluding the antecedent that describes the decision variable), but have different consequents. By doing this, we wish to find situations in which all of the influential variables except the decision variable are commensurate with more desired outcomes. Such situations call for change in the 15 decision variable so that the desired outcome is attained; therefore, we construct rules that express what changes in the decision variable have to be made in such situations to drive the system from a less desirable outcome to a more desirable outcome. The concept of exact matching is sometimes too restrictive and may lead to not using some of the rules, since their antecedents do not match any other rule exactly; hence, we extend the LGODM strategy using the concept of similarity, and construct weighted decision rules. Then we demonstrate how such decision rules can be pruned, if they include groups of rules that share exactly the same antecedents and consequents, but different weights. We apply the linguistic decision making method to the problem of designing a decision system for changing the choke setting in oil wells with Cyclic Steam Stimulation to enhance their production rates. Then we train neural network models of wells using historical data, which act as virtual wells, and validate LGODM by applying it to those virtual wells. 2.1 Introduction Fuzzy logic [316] has demonstrated its immense strength in applications [176] such as control [201, 204, 302, 312] and decision making [19, 41], especially through the calculi of fuzzy IF- THEN rules [334]. A topic of interest in fuzzy decision making is goal oriented decision making [72,180,181,227,307], which focuses on decision making for attaining goals that are specified in natural languages. Rule-based fuzzy systems have been used in the field of mobile robotics to make a mobile robot reach a prespecified goal, (which is usually a location) and have included reactive behavior to events (e.g., collision avoidance). In mobile robotics, the rules that make the robot attain its goal are known from common-sense, and are provided to the robot [12, 152, 246, 354]. In [307], a layered structure of rules was devised for goal-oriented mobile navigation, where the outputs 16 of the rule-bases in each layer were fused so that the robot accomplished sub-goals that let it get closer to the desired goal while also avoiding obstacles. In [311], a population-based search algorithm was used to determine the initial values of input nodes that cause some other nodes in a fuzzy cognitive map [133] to reach a specified value or wind up in a limit cycle, indicating that a certain non-fuzzy goal was attained. Modeling goals as fuzzy goals or soft goals has also found applications in the field of Require- ments Engineering [234], which is a sub-area of Software Engineering that formulates, maintains, and documents software requirements. Fuzzy IF-THEN rules can be provided by experts for at- taining the fuzzy goals associated with functional requirements for softwares. Another area that found goal oriented design using fuzzy sets useful was design of VLSI systems [69], where char- acteristics of the final system can be specified using natural language words, which means that the design goals are fuzzy. The common feature of goal-oriented methods that deal with fuzzy goals is that they rely on designing rules for reaching goals that are given by experts, mainly based on common-sense. In reality, systems that are being dealt with today may be so complex that it is not easy for experts to provide rules for attaining system goals. One might imagine that this problem could be addressed by using online unsupervised or semi-supervised learning methods, e.g., reinforcement learning (and all of its variations including Q-learning), because they are able to cope with goal directedness [130], even when goals are fuzzy [25]. Examples of such approaches for training fuzzy systems were proposed in [20, 67, 77, 113, 148, 149, 304, 349, 350]. However, some real complex systems may be very slow, in the sense that it may take a lot of time for the decision maker to learn how to guide the system towards the goal, and this may result in, for example, great economic losses. Sometimes, as in the case of this chapter, historical data about the system over a long period of time may be available, and that data can be mined for finding a decision 17 maker that drives the system towards its goal, instead of training a decision maker online for attaining the goals. The above discussion may give rise to a solution to goal oriented problems that involves training a decision maker that mimics a human decision maker who knows how to attain the goals. An example of such an approach was given in [265, 266], where a fuzzy system was given examples of the behavior of a driver, so as to learn how to park a truck. Unfortunately, a human decision maker who knows how to attain the goals is a luxury that is not available for all applications. Sometimes real life processes are so complex that it is not possible for humans to know how to move such complicated systems towards their goals. This can be simply a result of having too many variables that affect the output of the system for a human to handle, or be a direct result of the complexity of the system. Even if a complex system can be managed by human decision makers, their complexity may not permit the human decision makers to make the best possible decisions. In this study, we present a new methodology for decision making and control with goals that are described in natural languages, called Linguistic Goal Oriented Decision Making (LGODM). The essence of this approach is to transform a set of natural language IF-THEN rules that describe the behavior of the output of a system into a set of decision rules that let a system achieve a goal that is also described linguistically. These rules can either be extracted from data [4, 105, 114, 195, 265, 269, 303] or may be provided by experts. Examples of statements that describe goals for specific systems are: “Increase the amount of flow” in a process control system; “Increase the profit” in a business activity; “Increase the production rate” for an oil well; “Reduce the frequency of vibrations” in a mechanical system, and “Achieve medium glucose levels in the blood” for the human body. 18 In order to realize which antecedents result in a specified outcome (e.g., moderate blood glu- cose level), in our methodology we first group the rules according to their consequents. Note that classification of rules according to their consequents to linguistically determine the situations (an- tecedents) that result in a certain outcome (consequent) was performed in [32,147]. In particular, in [32], conditions that result in particular levels of trustworthiness in internet marketing were obtained by determining the rules that have a certain level of truthworthiness in their consequents. Grouping the descriptive rules according to their consequents may be confused with rule-based classification; however, rule grouping is very different from using rule-based systems for classifi- cation [4,43,104]. More specifically, in order to solve classification problems, prototype patterns are fed into a fuzzy system, and its structure and parameters are modified so that its defuzzified output value determines the class into which each pattern falls as accurately as possible. On the other hand, in our methodology, using historical data, a fuzzy system consisting of IF-THEN rules is first established that describes the relationship between some influential variables and the output of a system. Those descriptive rules are then grouped according to their consequents to realize which conditions result in a particular outcome. Finally, decision rules are constructed that determine what actions have to be completed so that the output of the system is forced towards attaining a prescribed goal. An alternative to LGODM that may come to the mind of a reader is to use any of the various kinds of adaptive control [14, 138] including adaptive neural control [36, 111, 135, 185, 208, 225]. Adaptive control is basically a control methodology that uses signals that carry information about the behavior of a system (e.g., the tracking error between the output of the system and a reference model) in order to tune the parameters of a controller so that the performance of the controller improves over time. Consequently, when there are descriptive rules about a system or control rules provided by experts, one should be able to improve them by tuning their parameters (e.g., 19 the parameters of the fuzzy sets used in those rules) using pertinent error signals; this is the essence of adaptive fuzzy control [34,101,155,237,238,244,255,263,267,347], as well as fuzzy neural network-based adaptive control [35, 275]. Adaptive fuzzy control is often done in two ways: indirect adaptive fuzzy control and direct adaptive fuzzy control. Indirect adaptive fuzzy control involves fuzzy systems that are constructed using knowledge about the plant, i.e. the descriptive rules. The knowledge about the plant is used to design a controller that cancels out the nonlinearity of the system and forces it to obey the dynamics of a stable linear system satisfying some desired performance criteria for tracking a reference model. Moreover, some parametric rules (e.g., rules whose consequent fuzzy sets are described by free parameters) are used along with the rules that describe the plant. Since the control law is designed based on the descriptive rules, this means that some parameters are used in the control law that change in time. An adaptation law (often based on Lyapunov synthesis) is used to change those parameters over time so that the tracking error of the closed-loop system obeys a linear system with desired properties. Direct adaptive fuzzy control begins with knowledge about controling the plant in terms of fuzzy IF-THEN rules, incorporates some parametric control rules into such knowledge, and uses an adaptation law to adjust the parameters. Those parameter changes result in consecutive decre- ments in a Lyapunov function that guarantees asympototic convergence of the output of the system to the output of a reference model. In both the indirect and direct adaptive fuzzy control methods, parameters have to be adapted during the operation of the system. As mentioned before, some complex systems are so slow that it is not feasible to wait for them to operate long enough so that an adaptive controller can have its parameters tuned for making them obey a desired behavior. Also, expert knowledge or mimicry of previous operations by humans may be inadequate to operate such systems. Consequently, 20 adaptive control methods and rule-based mimicry of human operators are inadequate for designing the behavior of such a system. On the other hand, there might be a lot of historical data available about such a system. Numerical data about a system can be used to build a descriptive rule-based model of the system. Here, we provide a methodology that converts such a descriptive rule-based model into a decision maker that causes the system to attain some prescribed goals. This methodology does not rely on online tuning of parameters to make the system obey a desired behavior; therefore, it is not an adaptive method. Instead, it transforms a set of descriptive rules into a set of decision rules based on mining the situations where the system’s behavior is in accord with a desired goal, and drives the system towards such situations. The rest of this chapter is organized as follows: In Section 2.2, we provide a heuristic ap- proach to LGODM with rule based systems that uses grouping of the descriptive rules and exact matching. In Sections 2.3 and 6.7, we extend exact matching to similarity matching. In Section 2.5, we provide a method for pruning the Section 2.3 weighted decision making rules. In Section 2.6, we demonstrate how the decision system is implemented. In Section 2.7, we apply LGODM to a choke that controls the flow of fluids coming from oil wells whose production is enhanced by using steam injection. Finally, in Section 2.8, we provide some conclusions and future works. 2.2 A Heuristic Approach to Goal Oriented Decision Making with Rule-Based Systems: Exact Matching Assume thatg(x;u) : X 1 X 2 X p U! Y is a function, and that the variable that can be manipulated by a decision maker isu, while other variablesx = (x 1 ;x 2 ;:::;x p ) cannot 21 be directly changed by the decision maker. We callu the decision variable. 2.1 In particular, when x = (~ x(t);t) = (~ x 1 (t); ~ x 2 (t);:::; ~ x p1 (t);t), y(t) = g(x(t);u(t)) = g((~ x(t);t);u(t)) can represent the output equation of a (discrete-time) dynamical system. We chooseu(t) so thaty(t) satisfies a linguistic goal. In the framework of control systems, we determine a control inputu(t) for the system at each timet, so thaty(t) gets closer 2.2 to the linguistic goal that is set for it. Next, we provide an outline for a method that performs this task using exact rule matching, and elaborate on each step later in Sections 2.2.1 to 2.2.3. (i). Establish fuzzy sets that describe the variablesx i ,u,y. Assume that the number of those fuzzy sets areN x i ;N u ; andN y , respectively. 2.3 (ii). ObtainL descriptive rules that have the influential variablesx 1 ;x 2 ; ;x p and the decision variableu in their antecedents and the outputy in their consequents using the fuzzy sets obtained in Step (i). 2.4 (iii). Assume that the consequents of the rules are ordered and collected inG =fC(1);C(2);:::; C(M)g. Group the descriptive rules intoM groups, according to thoseM possible conse- quents for the output variabley. 2.5 (iv). Establish the ultimate linguistic goalC = C(q)2G to be achieved, and then determine all of the intermediate goals for achievingC(q). 2.1 The decision variable is also commonly referred to as the manipulated variable or control variable. We prefer the term “decision variable,” because we perceive it to be more general than the other two names. 2.2 A qualitative measure of the closeness ofy(t) to the fuzzy set that describes the linguistic goal can be defined as the distance ofy(t) from the centroid of the fuzzy set that describes the linguistic goal. 2.3 This can be done, e.g., by fuzzy clustering using historical data. 2.4 This can be done by many methods of rule extraction; we use the Wang-Mendel method [265,269] in this chapter. 2.5 MNy , since it is possible that we obtain descriptive rules in which some of theNy fuzzy sets never appear. 22 (v). EstablishW, the set of ordered pairs of intermediate goals that drive the output toC(q). 2.6 (vi). Repeat the following steps for all of the members ofW: (a) Call a member ofW, (C ;C + ). (b) Determine all possible ordered pairs of descriptive rules such that the first one hasC in its consequent (i.e., it is from theC group) and the second one has the desiredC + in its consequent (i.e., it is from theC + group). For each of those ordered pairs: i. Call the elements of an order pair, respectivelyR andR + , where: R : If (x 1 isA 1 andx 2 isA 2 and . . . andx p isA p ) andu isB theny isC R + : If (x 1 isA + 1 andx 2 isA + 2 and . . . andx p isA + p ) andu isB + theny isC + (2.1) ii. ForR andR + , ifA i = A + i A i ;8i (i.e., if the rule describes a condition where allx i ’s are in exactly the same regime- exact matching), then construct a Transition decision rule (T -Rule), that is derived fromR andR + , as: T (R ;R + ) : If (x 1 isA 1 andx 2 isA 2 and . . . andx p isA p ) andu isB then u isB + B (2.2) 2.6 For example, ifG =fC(1);C(2);C(3)g andC(q) =C(2), thenW =f(C(1);C(2)); (C(3);C(2))g. 23 (c) IfC + =C , i.e. ifR + hasC =C(q) in its consequent (an example of such a rule is shown in (2.3)), construct a sustainment decision rule (S-Rule) as in (2.4), to sustain the output inC . R : If (x 1 isA 1 andx 2 isA 2 and . . . andx p isA p ) andu isB theny isC (2.3) S(R ) : If (x 1 isA 1 andx 2 isA 2 and . . . andx p isA p ) andu is (already)B then u isAboutZero (not changed) (2.4) The collection of all of theT -Rules andS-Rules that are obtained through the above procedure comprise the rule base for LGODM with exact matching. In the following sub-sections, we discuss the Steps (ii)-(vi) of LGODM with exact matching in more detail. 2.2.1 Step (ii): Obtain Descriptive Rules Assume thaty = g(x;u) 2.7 is modeled (by collecting data and expert knowledge) using a set of fuzzy descriptive rules (1lL): R l : Ifx 1 isA l 1 andx 2 isA l 2 and . . . andx p isA l p andu isB l theny isC l (2.5) These rules have the input variablesx = (x 1 ;:::;x p ) as well as the decision variableu in their antecedents and the output,y, in their consequent (e.g., for increasing the production rate of oil wells, (Section 2.7), the variables pressure, tempreature drop, and time are the input variables 2.7 y;x, andu may or may not depend ont; therefore, we omit their possible dependency ont . 24 and the choke setting is the decision variable). In Step (i), a term set has been associated with each variable (e.g., Low, Medium Low, Medium, Medium High, and High for the problem of increasing the production rate of an oil well) and the fuzzy sets in each rule (2.5) are chosen from these term sets. 2.2.2 Step (iii): Group The Rules For LGODM, each of theL descriptive rules is put into a group that is associated with the fuzzy set in its consequent, i.e. rules whose consequents have exactly the same word are collected into the same group. Doing this lets one linguistically describe under what conditions (antecedents) the output is considered to be one of the C(i)’s (each C(i) 2 fC(1);C(2);:::;C(M)g is a possible term for the output), e.g., for the production rate of an oil well, the rules that have Low, Medium Low, Medium, Medium High, and High in their consequents can be grouped into five groups. Based on this grouping, one can state in natural language in what situations the outputy will be in each of the groups. The procedure of grouping the descriptive rules is illustrated in Fig. 2.1. 2.2.3 Step (iv): Establish Intermediate Goals In many decision making problems abrupt changes in the output are not desirable; therefore, it is necessary to drive the output variable to its desired value gradually. To do this, we may set intermediate goals so thaty reaches its desired linguistic target by transitioning gradually from one intermediate goal to the next. Assume, e.g., that the outputy is described by the following five words: Low, Medium Low, Medium, Medium High, High. Assume, also, that the goal is to makey become High, but to do this gradually, instead of drivingy from Low or Medium directly to High, we take a different approach, by establishing decision rules that drivey from being Low 25 R 1 C(1) : If x is A C (1) 1 and u is B C (1) 1 then y is C(1) R 2 C(1) : If x is A C (1) 2 and u is B C (1) 2 then y is C(1) R L 1 C(1) : If x is A C (1) L 1 and u is B C (1) L 1 then y is C(1) R 1 C(2) : If x is A C ( 2) 1 and u is B C ( 2) 1 then y is C(2) R 2 C(2) : If x is A C ( 2) 2 and u is B C ( 2) 2 then y is C(2) R L 2 C(2) : If x is A C ( 2) L 2 and u is B C ( 2) L 2 then y is C(2) R 1 C( M ) : If x is A C ( M ) 1 and u is B C ( M ) 1 then y is C(M) R 2 C( M ) : If x is A C ( M ) 2 and u is B C ( M ) 2 then y is C(M) R L M C( M ) : If x is A C ( M ) L M and u is B C ( M ) L M then y is C(M) Group C(1) Group C(2) Group C(M) L descriptive rules L 1 rules L 2 rules LM rules Figure 2.1: The process of grouping the descriptive rules according to their consequentsC(i)’s. x = (x 1 ;x 2 ;:::;x p ) andA = (A 1 ;A 2 ;:::;A p ) are used to express the rules more compactly. 26 Low // // Medium Low // // Medium // // Medium High // // High Figure 2.2: Rule comparisons flow so as to gradually increment y to become High when it is initially described by either Low, Medium Low, Medium, or Medium High. Low oo Medium Low oo Medium oo Medium High oo High Figure 2.3: Rule comparisons flow so as to gradually drivey to become Low when it is initially described by either Medium Low, Medium, Medium High, or High. Low // Medium Low // Medium oo Medium High oo High Figure 2.4: Rule comparisons flow so as to gradually drivey to become Medium when it is initially described by either Low, Medium Low, Medium High, or High. to Medium Low, from being Medium Low to Medium, from being Medium to Medium High, and from being Medium High to High. Fig. 2.2 summarizes how y can be incremented so as to gradually become High. It is only wheny starts out as Medium High that no intermediate goal is needed. For y starting out in Low, three intermediate goals are needed; for y starting out in Medium Low two such goals are needed; and fory starting out in Medium, only one such goal is needed. Assume, next, that the goal is to makey become Low or Medium. Figs. 2.3 and 2.4 show how this can also be done in a gradual manner. 27 2.2.4 Step (v): EstablishW, The Set of Pairs of Consecutive Intermediate Goals After determining intermediate goals, the set of ordered pairs of consecutive intermediate goals, W has to be established. The scenarios that are mentioned in Section 2.2.3 can be summarized as follows. Assume thatG =fC(1);C(2);:::;C(M)g, and one wants to drive the output to the goalC(q) by meeting all the possible intermediate goals one by one: (i). If q = M, i.e. C(q) = C(M), thenW =f(C(1);C(2)); (C(2);C(3));:::; (C(M 1);C(M))g. (ii). If q = 1, i.e., C(q) = C(1) thenW = f(C(M);C(M 1)); (C(M 1);C(M 2));:::; (C(2);C(1))g. (iii). If 1 < q < M, thenW =f(C(1);C(2)); (C(2);C(3));:::; (C(q 1);C(q)); (C(q + 1);C(q));:::; (C(M 1);C(M 2)); (C(M);C(M 1))g. Example 2.1. Assume thatG =fLow;MediumLow;Medium;MediumHigh;Highg. Then: (i). IfC(q) =Low, thenW =f(H;MH); (MH;M); (M;ML); (ML;L)g. (ii). IfC(q) =High, thenW =f(L;ML); (ML;M); (M;MH); (MH;H)g. (iii). IfC(q) =MediumHigh, thenW =f(L;ML); (ML;M); (M;MH); (H;MH)g. whereL;ML;M;MH; andH are acronyms for members ofG. There are other conceivable scenarios for determining the intermediate goals, which are ap- plication dependent. For example, one can imagine that in Example 2.1, the setW is designed so that the output goes from Low to Medium and from Medium to High, and the intermedi- ate goals are passed. Or, it may be desired that the output goes to Low when it is Medium 28 Low, and when it is Medium or Medium High, it gradually goes to High, in which case,W = f(ML;L); (M;MH); (MH;H)g. For the purposes of this chapter, however, we consider the three scenarios: q =M;q = 1; 1<q <M. Our algorithms are general and do not depend on a particular choice ofW. 2.2.5 Step (vi): Determine Decision Rules Based on Exact Matching Determining the decision rules for u is done by comparing the rules in the rule groups. Assume that one wishes to drive the output fromC toC + . Consider the pair of descriptive rules in (2.1), the first of which (R ) has C in its consequent, whereas the second one has C + . An exact matching decision rule is constructed only when the antecedents of both rules (in parentheses), excluding u, include exactly the same fuzzy sets. The main reason for considering such rules is that we seek situations where the variables x 1 ;x 2 ;:::;x p are in a regime so that if only the decision variableu is changed, the output will be driven towardsC + . Comparing the pair of rules in (2.1) suggests that, when8i;A i = A + i A i , the decision variable u must be changed in order to make y go from C to C + ; therefore, a decision rule to obtainC + is: if the variables (x 1 ;:::;x p ) are in the regime that potentially gives the desired outcomeC + (e.g., x 1 isA 1 andx 2 isA 2 and . . . andx p isA p ), butu is not in the regime that gives the desired outcome (e.g.,u isB , but it should beB + fory to be inC + ), then changeu so thaty can go fromC toC + . The specific decision rule, based on the comparison of the rules in (2.1), is shown in (2.2). Note that denotes fuzzy subtraction. The fuzzy setB + B is computed by using fuzzy arithmetic [122] and the Extension Principle [318], as: B + B (z) = sup z=sv min ( B +(s); B (v)) (2.6) 29 Therefore, for each descriptive rule in a rule group with a less desired goal, sayC , one (or more) descriptive rule(s) in the more desired group C + has to be found which has exactly the same antecedents that describex 1 ;x 2 ;:::;x p . Then by rule comparison, one (or more) decision rule(s) in the form of (2.2) has to be synthesized. The only way to have more than one exact match to a descriptive ruleR is that there are multiple rules inC + all of whose antecedents that describex i ’s are exactly the same except the one that describesu. Lemma 2.1. For eachR in the form of (2.1), the maximum number of different rules in the form ofR + in (2.1) that satisfy the condition8i;A i =A + i isN u . Proof. Because the rules in the form ofR + must satisfy8i;A i = A + i and their antecedents are necessarily equal toC + (since they are in theC + group), the only way that can make them different from each other is that they have different fuzzy sets describingu (i.e., differentB + ’s). There areN u different fuzzy sets that describeu, therefore, at mostN u differentR + ’s may exist inC + that satisfy8i;A i =A + i . Once this procedure (finding possible exact match(es) inC + and synthesizing a decision rules based on rule comparison) is completed for all of the descriptive rules inC , all the decision rules for driving the output fromC toC + have been obtained. This procedure is described in more algorithmic rigor in Steps 6.1 and 6.2 that are stated at the beginning of Section 2.2. Those steps have to be performed for all of the pairs of intermediate goals, as established in Step (v). 30 Example 2.2. Consider the following pair of descriptive rules that have exactly the same an- tecedents (excluding the decision variable): R : If (x 1 isLow andx 2 isHigh andx 3 isHigh) andu isLow then y isLow R + : If (x 1 isLow andx 2 isHigh andx 3 isHigh) andu isHigh then y isMedium (2.7) The decision rule that is inferred from the rules in (2.7) is: T (R ;R + ) : If (x 1 isLow andx 2 isHigh andx 3 isHigh) andu isLow then u isHigh Low (2.8) As mentioned, in connection with Figs. 2.2-2.4,C + may not be the final desired output,C , becausey must transition fromC toC + to ... toC , although sometimes (depending onC ), C + = C . Consequently, one not only needs a decision rule for changingu so thaty is driven towardsC , but one also needs a decision rule to maintainu wheny is already in its desired state. Thus, the two decision rules associated with the desired outcomeC + =C are: 31 T (R ;R ) : If (x 1 isA 1 andx 2 isA 2 and . . . andx p isA p ) andu isB then u isB B S(R ) : If (x 1 isA 1 andx 2 isA 2 and . . . andx p isA p ) andu is (already)B then u isAboutZero (not changed) (2.9) whereR is the rule in the groupC , an intermediate goal from whichy has to be driven towards C (for example, Medium High in Fig. 2.2 and both Medium High and Medium Low in Fig. 2.4). The collection of all of the decision rules serves as the heart of a fuzzy decision system that forces the output to fulfill the linguistic goal that was determined for it. Theorem 2.1. The number of decision rules yielded by exact matching is less than or equal to P ij(C(i);C(j))2W L i N u +L q , whereL i is the number of descriptive rules in theC(i) group, N u is the number of fuzzy sets that describe the decision variable u, and L q is the number of descriptive rules in the group associated withC =C(q). Proof. Assume that (C(i);C(j))2W. According to Lemma 2.1, the number of exact matches in C(j) for a descriptive rule that belongs to group C(i) is less than or equal to N u , therefore the number of exact matches inC(j) for all of the rules inC(i) is at mostL i N u ; hence, the number ofT -Rules is at most P ij(C(i);C(j))2W L i N u . Moreover, The number of sustainment rules isL q , hence the number of decision rules yielded by exact matching is less than or equal to P ij(C(i);C(j))2W L i N u +L q . 32 Corollary 1. If each of the intermediate goalsC(i)2G appears only once as the first element of an ordered pair inW (e.g., as occurs in the three cases given in Section (v)), then the number of decision rules is less than or eqal toLN u +L q , whereL is the number of descriptive rules. Proof. In this case, P ij(C(i);C(j))2W L i = L, hence, according to Theorem 2.1, the number of decision rules is less than or equal toLN u +L q . Example 2.3. In Example 2.2, we showed decision rules that drive the output to the intermediate goal, Medium. Now we show rules that drive the output from Medium High to the final goal, High. The first representative rule is selected from the group whose consequent is Medium High and the second representative rule is selected from the group whose consequent is High, i.e. the desired goal: R : If (x 1 isMediumLow andx 2 isMedium andx 3 isMedium) andu isLow then y isMediumHigh R : If (x 1 isMediumLow andx 2 isMedium andx 3 isMedium) andu isHigh then y isHigh (2.10) 33 The following pair of decision rules rules can be inferred from (2.10): T (R ;R ) : If (x 1 isLow andx 2 isHigh andx 3 isMedium) andu isLow then u isHigh Low S(R ) : If (x 1 isLow andx 2 isHigh andx 3 isMedium) andu is (already)High then u isAboutZero (not changed) (2.11) In anticipation of using similarity in Section 2.3 instead of exact matching, consider again the pair of descriptive rules that are given in (2.1), where the derived decision rule is given in (2.2). Note that this decision rule requires8i;A i = A + i , which can be quantified by means of the following weight: w = 8 > < > : 1 8i;A i =A + i 0 otherwise (2.12) This weight can be incorporated intoT (R ;R + ), as follows: T (R ;R + ) : If (x 1 isA 1 andx 2 isA 2 and . . . andx p isA p ) andu isB then u isB + B with weightw(R ;R + ) (2.13) which means that the decision rule is synthesized only if8i;A i =A + i . It is possible that for a given ruleR i in a group, no rule is found in the target group that has exactly the same antecedents as those ofR i , so that exact matching between the antecedents of 34 the descriptive rules that are being compared is impossible, in which case, the information carried byR i cannot be used in building the decision maker. In the extreme case, there may be few rules with common antecedents found by exact matching, which means that most of the information in the descriptive rules remain unused in the decision rules, so that the decision maker is unreliable. Consequently, we have found that exact matching is too stringent. In the next section we explain how exact matching can be relaxed by using similarity instead of exact matching, and how the weight of the decision rules can be calculated to be a number between 0 and 1 to reflect the degree of matching between the parent rules. In the next section, we explain similarity-based LGODM. 2.3 Extension of Linguistic Goal Oriented Decision Making (LGODM) Using Similarity between Rules To consider the situation where the antecedents of the rules being compared are not exactly the same, we relax to our original decision-making philosphy: to find pairs of rules in two different groups whose antecedents (excepting the one describing the decision variable u) are fired by the same regimes of variables, to find pairs of rules in two different groups whose antecedents, excepting the one that describes u, are fired by regimes of variables that are similar enough. To enforce this, each rule in a group will be compared to all of the rules in the target group to synthesize the decision rules, and similarity of the antecedents in two rules will be used to construct a weight that is analogous tow in (2.12). 35 Assume that we begin with the pair of descriptive rules (R ;R + ) in (2.1), which are repeated here for the convenience of the readers R : If (x 1 isA 1 andx 2 isA 2 and . . . andx p isA p ) andu isB theny isC R + : If (x 1 isA + 1 andx 2 isA + 2 and . . . andx p isA + p ) andu isB + theny isC + (2.14) According to (2.14),x can beA orA + , andu can beB orB + ; therefore, to movey from C to C + , we consider each of the pairs (A ;B ); (A ;B + ); (A + ;B ), and (A + ;B + ) to see what kind of decision rule (if any) emanates from it. (i). (A ;B ) causesy to be inC . The only change that can be made in this situation is for B to becomeB + , hoping that going from (A ;B ) to the destination (A ;B + ) yields C + . The decision rule whose antecedents are (A ;B ) and consequent isB + B will cause some sort of transition fromC . IfA andA + are similar enough, the destination (A ;B + ) will be similar to the ideal situation of (A + ;B + ), which (according to (2.14)) yieldsC + . Consequently, whenA andA + are similar enough the transition fromC will actually be towardsC + . Thus, a measure of similarity ofA andA + ,w(R ;R + ), will be assigned to the decision rule as a measure of surity that the decision rule contributes to movingy fromC towardsC + . This leads to the following Transition Rule (T -Rule): T (R ;R + ) : If (x 1 isA 1 andx 2 isA 2 and . . . andx p isA p ) andu isB then u isB + B with weightw(R ;R + ) (2.15) 36 Exactly, how to calculatew(R ;R + ) is explained in Section 2.3.1, and its formula is given in (2.17). (ii). (A + ;B ) describes a condition wherex isA + , which (according to (2.14)) is commen- surate with y being C + ; however, in this condition, u is B (as inR ), but it needs to becomeB + (as inR + ) so thaty becomesC + . This means that u should beB + B so thaty progresses towardsC + . This leads to the following Progression Rule (P-Rule): P(R ;R + ) : If (x 1 isA + 1 andx 2 isA + 2 and . . . andx p isA + p ) andu isB then u isB + B with weightw(R ;R + ) (2.16) (iii). (A + ;B + ) describes a condition when (according to (2.14)) it is evident thaty is already inC + ; therefore, no decision rule is needed. One may argue that for this antecedent, one needs to design a rule that enforces u to beAroundZero, to maintainy inC + ; however, this is not beneficial for the decision-making system, unlessC + =C , because to reachC usuallyy must move fromC + towards an even more desirable goal,C ++ , using decision rules obtained by comparing the descriptive rules in groupsC + andC ++ . So, keepingy in C + , unlessC + =C is not beneficial. (iv). (A ;B + ) is an interesting pair, because theR + rule in (2.14) indicates that whenu isB + , y should be inC + . Becausex isA and notA + , unlessA is very similar toA + , one cannot obtain a decision rule having (A ;B + ) in its antecedent, and since it is already in B + , even if A = A + , no decision rule is needed for u. Therefore, no decision rule is created for (A ;B + ). 37 So, as a result of examining the four pairs, we see that only two of them lead to a decision rule, one being a Transition rule, and the other being a Progression rule. Both rules involve a weight that will be captured using a measure of similarity. Exactly how to use these rules and calculate the weights are described next. 2.3.1 Steps Needed for Constructing LGODM Based on Similarity Assume that the descriptive rules have been grouped according to their consequents. Also assume that the consequents of the rules are described by fuzzy setsC(1);C(2);:::;C(M), as there are M groups. The following steps are needed to construct a LGODM system based on the concept of similarity of rules. As in exact matching, rules are needed to drive the output of the system towards a more desirable goal (for example from Low to Moderate, and from Moderate to High), and rules are needed to keep the output in the desirable goal. As in Section 2.2, assume that one wishes to design rules that drive the output to the ultimate goal,C(q) = C , whereC is a member of ofW = C(1);:::;C(M). Steps 1-5 of the exact matching procedure in Section 2.2 remain unchanged, and so are not repeated here: Step 6 has many changes, the details of which are discussed in Sections 2.4.1 - 2.4.5. (vi). Repeat the following steps for all of the members ofW, the set of pairs of consecutive intermediate goals: (a) Call a member ofW, (C ;C + ). (b) Determine all possible ordered pairs of descriptive rules such that the first one hasC in its consequent (i.e., it is from theC group) and the second one has the desiredC + in its consequent (i.e., it is from theC + group). For each of those ordered pairs: i. Call the elements of the ordered pair, respectivelyR andR + , as in (2.14). 38 ii. ForR andR + , calculate the following Jaccard rule similarity: w(R ;R + ) = 1 p p X r=1 R Xr min( A r (x); A + r (x))dx R Xr max( A r (x); A + r (x))dx (2.17) iii. ForR andR + , construct aT -Rule and aP-Rule as in (2.15) and (2.16), re- spectively: iv. Modify the consequents ofT (R ;R + ) andP(R ;R + ) to obtainM P andN T - Rules that take into account the effect of the rule weight on theT -Rules and P-Rules, as: T M (R ;R + ;w) : If (x 1 isA 1 andx 2 isA 2 and . . . andx p isA p ) andu isB then u isD (2.18) P M (R ;R + ;w) : If (x 1 isA + 1 andx 2 isA + 2 and . . . andx p isA + p ) andu isB then u isD (2.19) whereD, which accounts for bothB + B andw(R ;R + ), is obtained as: D (z) = min( B + B (z);w(R ;R + )) (2.20) and B + B is computed in (2.6). 39 (c) IfC + =C , i.e. ifR + hasC =C(q) in its consequent (an example of such a rule is shown in (2.3)), construct a sustainment decision rule (S-Rule) as in (2.4), to sustain the output inC . The collection of all of theT M ,P M , andS rules comprise the similarity-based LGODM rules. 2.4 Discussion 2.4.1 Jaccard Similarity between Rules As already mentioned, the credibility of theP andT -rules depends on the similarity of the an- tecedents of the pair of descriptive rules that are being compared; therefore, we have assigned a credibility degree or a weight to the decision rules to attenuate their strength. This weight should reflect how similar the fuzzy sets in the antecedents of the two rules that are being compared are. If they have exactly the same fuzzy sets in their antecedents, the weight should be 1. If all of the antecedents are totally dissimilar (8r;A + r \A r =;), the weight should be 0; and if some of the pairsA r ;A + r are similar, the weight should be a number between 0 and 1. To calculate such a weight, we need to be able to measure the similarity [42, 46, 81] between two fuzzy sets. Jaccard similarity measure [46, 109] is a famous similarity measure for two fuzzy sets on a universe of discourseX, and is calculated as: s J (A;B) = R X min( A (x); B (x))dx R X max( A (x); B (x))dx (2.21) 40 w(R ;R + ) in (2.15), and (2.16) is defined by us as the average Jaccard similarity between thep antecedents of the two descriptive rules 2.8 in (2.14), i.e.: w(R ;R + ) = 1 p p X r=1 s J (A r ;A + r ) = 1 p p X r=1 R Xr min( A r (x); A + r (x))dx R Xr max( A r (x); A + r (u))dx (2.22) 2.4.2 Dealing with Rule Weights There are many ways of dealing with rule weights [106,106,187] in rule-based systems. We deal with them as a degree of truth and take a t-norm between the consequent of the rule and its weight, so as to attenuate the contribution of the rule to the output of the fuzzy system. This is what we have done in (2.20). 2.4.3 Obtain Transition and Progression Rules Fig. 2.5 summarizes our algorithm (C ;C + ) for driving the output fromC toC + . The rules havingC in their consequent are shown asR i C ; i = 1; 2;:::;L , and the rules havingC + in their consequent are shown asR j C + ; j = 1; 2;:::;L + . First, all possible pairs of (R i C ;R j C + ) are constructed. Then, operator acts on each pair to yield two rules and a weight, i.e.: (R ;R + ) = T (R ;R + );P(R ;R + );w(R ;R + ) (2.23) 2.8 The average similarity between the fuzzy sets in two rules can be viewed as a measure of similarity between rules. The concept of similarity between rules was previously applied to the problem of rule-base simplification in [235]. 41 {R 1 C − , R 2 C − , ..., R L − C − } {R 1 C + , R 2 C + , ..., R L + C + } (R 1 C − , R 1 C + ), ..., (R 1 C − , R L + C + ), (R 2 C − , R 1 C + ), ..., (R 2 C − , R L + C + ), ..., (R L − C − , R 1 C − ), ..., (R L − C − , R L + C + ) … … Φ Ψ Φ Ψ Ξ(C − , C + ) … Φ Ψ … … … C − C + Figure 2.5: The schema for the procedure of obtaining rules to drive the system from a less desirable goalC to a more desirable goal,C + . 42 R andR + are dummy variables for the generic rules in (2.14), andT ,P, andw are respec- tively determined from (2.15), (2.16), and (2.17). Next, operator acts on the three outputs of (R ;R + ) to yieldT M andP M , i.e.: (T (R ;R + );P(R ;R + );w(R ;R + )) = T M (R ;R + );P M (R ;R + ) (2.24) whereT M andP M are defined in (2.18) and (2.19). We call the process of obtaining allT M and P M rules for driving the output fromC toC + , (C ;C + ), where: (C ;C + ) = [ i=1;:::;L j=1;:::;L + fT M (R i C ;R j C + );P M (R i C ;R j C + )g (2.25) Observe that the consecutive action of and occurs for each pair of rules (R i C ;R j C + ); i = 1; 2;:::;L ; j = 1; 2;:::;L + , and there are two decision rules,T M andP M for each pair of descriptive rules, (R i C ;R j C + ); i = 1; 2;:::;L ; j = 1; 2;:::;L + . Consequently, there are a total of 2L L + decision rules for driving the output fromC toC + (how to prune the decision rules is explained in Section 2.5). 43 C(1), C(2), ..., C(q), ..., C( M ) C(1), C(2) ( )→, ...,→ C(q - 1), C(q) ( ) , C(q + 1), C(q) ( )←, ...,← C( M ), C( M - 1) ( ) Ξ Ξ Ξ Ξ Ξ C( M ), C( M - 1) ( ) … … … Figure 2.6: The schema for performing the procedure on consecutive pairs of intermediate goals for obtaining all of theT M andP M rules for moving the system towards the ultimate goal,C . 2.4.4 Repeat obtaining Transition and Progression Rules for All Pairs of Consecu- tive Goals inW The procedure of obtainingT M andP M rules has been explained so far just for two consecutive goals, (C ;C + )2W, but this has to be repeated for all of the ordered pairs that belong toW. Rules that are obtained by doing this are collected inL 1 , where: L 1 = [ (C ;C + )2W (C ;C + ) (2.26) This is summarized in Fig. 2.6 for a particularW, where 1<q<M. Example 2.4. If the descriptive rules have three consequents Low, Medium, High, and the ulti- mate goal is Medium, the above procedure has to be performed twice, once forC =Low;C + = Medium, and once for C = High;C + = Medium, i.e. for all of the members ofW = f(Low;Medium); (High;Medium)g. In other words, (Low;Medium) and (High;Medium) have to be calculated. Moreover, if the ultimate goal is High, the above procedure has to be per- formed twice, once for C = Low;C + = Medium, and once for C = Medium;C + = High, i.e. for all of the members ofW = f(Low;Medium); (Medium;High)g. In other words, (Low;Medium) and (Medium;High) have to be calculated. 44 2.4.5 Obtain Sustainment Rules If the target group is the group of rules having the ultimate goal in their consequents, i.e. if C + =C , operator acts on each of the rules of the group, which are in the form (2.3), to yield the sustainment rule in (2.4), as is depicted in Fig. 2.7. The collection of sustainment rules isL 2 , where: L 2 = [ j=1;2;:::;Lq (R j C(q) ) (2.27) R 1 C * , R 2 C * , ..., R L * C * C * … S R 1 C * ( ) , ..., S R L * C * ( ) Λ Λ Figure 2.7: The schema for obtaining sustainment rules for keeping the system in the ultimate goal,C .L is the number of descriptive rules in theC group. 2.4.6 Construct the Rule Base for The Decision Maker The set of rules,L , for the LGODM fuzzy decision system is comprised of the rules that result from applying to all of the members ofW and the rules that result from appling to the rules in the groupC(q) =C , i.e.: 45 L =L 1 [L 2 = 0 @ [ (C ;C + )2W (C ;C + ) 1 A [ 0 @ [ j=1;2;:::;Lq (R j C(q) ) 1 A (2.28) Assume that in a decision rule, eachx i ; (i = 1; 2;:::;p),u, and u can be described by a finite number of fuzzy sets, say N x i , N u , and N u , respectively; then, the number of possible (unweighted) decision rules in the form of (2.15) and (2.16) is: n R N u N u p i=1 N x i (2.29) Assume thatW is the set of all pairs of intermediate goals. We showed that each rule comparison between the rule groups leads to 2L m L k weighted rules, and therefore, 2L m L k weights. Conse- quently, the number of all rule comparisons is P (m;k)j(C(m);C(k))2W 2L m L k . Considering theL q sustainment rules (whose weights are 1), there are a total ofn w weighted decision rules, where: n w X (m;k)j(C(m);C(k))2W 2L m L k +L q (2.30) This means that the number of weights that are obtained is alson w . Lemma 2.2. Assume that the collection of all exact matching rules isL E . ThenL E L . Proof. Obvious, because sustainment rules are the same using either exact matching or similarity, and exact matching decision rules are equivalent to similarity matching decision rules whose weights equal 1, i.e. they are a subset of similarity matching rules. 46 2.5 Pruning The Similarity-Based Decision Rules In this section, we show that it is possible to have decision rules with exactly the same consequents and antecedents, but different weights, and use this fact to prune the rules. To do this, we invoke the following pigeonhole principle: Proposition 2.1 (The Pigeonhole Principle [79]). For natural numbersk;m, ifn > km objects are distributed amongm sets, then one of the sets contains at leastk + 1 objects. This principle is instantiated in real-world situations, for example, when there aren pigeons inm nests (k = 1), andn>m, then at least one nest has two or more pigeons in it. Lemma 2.3. Ifn w >n R , there are LGODM decision rules that have exactly the same antecedents and consequents, but different weights. Proof. Assume that the weights are pigeons and the unweighted rules are the nests. If n w > n R , i.e, when the number of weighted rules is larger than the maximum number of unweighted rules whose antecedents and consequents are in the form of those in (2.15) and (2.16) excluding their weights, there has to be at least one unweighted rule (nest) that has two weights (pigeons) associated with it. In other words, some of the weighted rules have exactly the same antecedents and consequents, but different weights. In this section, we show that it is sufficient to retain only the rule with the largest weight (if they happen to have the same weights, pruning removes the duplicate rules) among these rules. Example 2.5. Assume that one has 30 descriptive rules that havex 1 ;x 2 ;x 3 , andu in their an- tecedents and y in their consequent. Assume that each of the antecedents and the consequent (i.e., x 1 ;x 2 ;x 3 ;u, and y) can be described by three fuzzy sets, e.g. Low, Medium, High, and 47 W =f(Low;Medium); (Medium;High)g If the decision rules are compared for LGODM, there are seven possibilities for the sets that describe u,fAboutZero;Low Medium;Low High;Medium Low;Medium High;High Low;High Mediumg. Therefore, there are 3 4 7 = 567 possible unweighted rules. On the other hand, assume that there are 20 de- scriptive rules in each of the three groups, C(1) = Low;C(2) = Medium;C(3) = High, i.e. L 1 = L 2 = L 3 = 20. According to (2.30), the number of weights and weighted rules are 2L 1 L 2 + 2L 2 L 3 +L 3 = 2 20 20 + 2 20 20 + 20 = 1620. Since 1620> 567, by the pigeonhole principle, one can conclude that there are decision rules that have the same antecedents and consequents, but different weights. In Lemma 2.3, we presented a sufficient condition for the set of LGODM decision rules to contain rules that have exactly the same antecedents and consequents, but different weights. In the sequel, we show how to use this to prune the decision rules. In the next Lemma, we demonstrate that for a group of weighted rules with exactly the same antecedents and consequents but different weights, the result of inference will be equal to the result of inference from the rule with the largest weight. Lemma 2.4. Assume that the following set ofM decision rules have exactly the same antecedents and consequents, but different weights: R i : If (x 1 isA 1 andx 2 isA 2 and . . . andx p isA p ) andu isB then u isE, weight =v i ; i = 1; 2;:::;M (2.31) Assume that min(a;b) is used as both the t-norm and the material implication for Mamdani in- ference from the decision rules in (2.31), and max(c;d) is used for combining the fuzzy sets 48 inferred from them. 2.9 Then, for any arbitrary set of inputs (x 1 ;x 2 ;:::;x p ) = (x 0 1 ;x 0 2 ;:::;x 0 p ) andu =u 0 , the fuzzy set,F (see (2.35)), aggregated from the set ofM rules is equal to the fuzzy setF i ? determined just fromR i ? , the rule with the maximum weightv i ? max(v 1 ;:::;v M ). Proof. We have to modify the consequents ofR i in (2.31) to obtain unweighted rules in the following form: M i : If (x 1 isA 1 andx 2 isA 2 and . . . andx p isA p ) andu isB then u isD i i = 1; 2;:::;M (2.32) where D i(z) = min(v i ; E (z));i = 1; 2;:::;M, andz denotes a variable inZ, the domain of the output of the decision system, u. Assume that the inputs to the decision rulesM i arex r = x 0 r ;r = 1; 2;:::;p, andu = u 0 . The fuzzy inference fromM i yieldsF i (e.g., [176, 201, 204, 268]: F i(z) = (A;B))D i(x 0 ;u 0 ;z) = D i(z)^ A 1 (x 0 1 )^ A 2 (x 0 2 )^ Ap (x 0 p )^ B (u 0 ) = D i(z)^f i (x 0 ;u 0 );i = 1; 2;:::;M;8z2Z (2.33) wherex 0 = (x 0 1 ;x 0 2 ;:::;x 0 p ) andA = (A 1 ;A 2 ;:::;A p ). Note that, sincex 0 andu 0 are the inputs to the fuzzy inference system and are numerical values,f i (x;u 0 ) is a numerical value for fixedx 0 andu 0 ; therefore, F i is written only as a function ofz.f i (x 0 ;u 0 ) is called the firing level ofM i . Because the antecedents and the consequents of allM rules are the same, all of the firing levels 2.9 Note that we use the notation min(a;b) =a^b and max(c;d) =c_d for minimum ofa andb and maximum ofc andd whenever it is more convenient to do so. 49 are the same; hence,8i; f ? (x 0 ;u 0 )f i (x 0 ;u 0 ) = min(min r ( Ar (x 0 r )); B (u 0 )). Consequently, the fuzzy setF i inferred from each of the rules is: F i(z) = min(f i (x 0 ;u 0 ); D i(z)) = min(f i (x 0 ;u 0 ); min(v i ; E (z)) = min(f ? (x 0 ;u 0 );v i ; E (z)) i = 1; 2;:::;M (2.34) Because max is used for aggregation of the rules, the fuzzy setF inferred from all M rulesM i is calculated, as: F (z) = F 1(z)_::: F M (z) = (f ? (x 0 ;u 0 )^v 1 ^ E (z))_:::_ (f ? (x 0 ;u 0 )^v M ^ E (z)) = (f ? (x 0 ;u 0 )^ E (z))^ (v 1 _:::_v M ) (2.35) Becausev 1 _:::_v M = max(v 1 ;:::;v M )v i ?, (2.35) becomes: F (z) =f ? (x 0 ;u 0 )^ E (z)^v i ? =f ? (x 0 ;u 0 )^ D i ? (z) = F i ? (z) (2.36) which means that all the rules except forR i ? (i.e., the rule with the largest weight) can be ne- glected without any change in the aggregated output fuzzy set associated with all M rulesM i . Theorem 2.2. Assume a weighted rule base where rules have exactly the same antecedents and consequents, but have different rule weights, i.e.: R ij : If (x 1 isA j 1 andx 2 isA j 2 and . . . andx p isA j p ) andu isB j then u isE j , weight =v ij ; i = 1; 2;:::;M j ;j = 1; 2;:::;N (2.37) 50 wherej denotes the group of rules that have exactly the same antecedents and consequents, andi denotes a rule in one of those groups that has a weight that is possibly different from the weights of the other rules in that group. Then, provided that t-norm, t-conorm, and the implication operator are the same as those in Lemma 5.1, the aggregated output fuzzy set associated with the rule base in (2.37) is equivalent to one associated with a rule base that contains only the rule having the highest weight from each group, i.e.: R i ? j : If (x 1 isA j 1 andx 2 isA j 2 and . . . andx p isA j p ) andu isB j then u isE j , weight =v i ? j ; i ? = arg max(v ij ) i=1;2;:::;M j ; j = 1; 2;:::;N (2.38) Proof. The proof is straightforward using Lemma 5.1. Here, we only give a sketch of the proof. According to Lemma 5.1, inference from thej th group of rules is equivalent to inference from the rule in that group with the highest weight. Because the max t-conorm is used to aggregate the rules, and it is associative, inference from all of the rules can be reformulated as the inference from each group of the rules and then taking a union between all those results. However, inference from each group is equivalent to inference from the rule with highest weight within that group. Therefore, inference from all of the rules is equivalent to taking the union between the results of inference from rules with highest weights within each group, which is equivalent to the inference from (2.38). 51 Theorem 2.2 provides a simple method for pruning weighted rules, 2.10 namely: determine groups of rules with exactly the same antecedents and consequents, and then discard all but one rule that has the largest weight in each group. Our complete LGODM procedure includes rule pruning, i.e., after performing Steps 1-5 of the Section 2.2 procedure: (vi). Repeat the following steps for all of the members ofW, the set of pairs of consecutive intermediate goals: (a) Call a member ofW, (C ;C + ). (b) Determine all possible ordered pairs of descriptive rules such that the first one hasC in its consequent (i.e., it is from theC group) and the second one has the desiredC + in its consequent (i.e., it is from theC + group). For each of those ordered pairs: i. Call the elements of the ordered pair, respectivelyR andR + , as in (2.14). ii. ForR andR + , calculate the following Jaccard rule similarity: w(R ;R + ) = 1 p p X r=1 R Xr min( A r (x); A + r (x))dx R Xr max( A r (x); A + r (x))dx (2.39) iii. ForR andR + , construct aT -Rule and aP-Rule as in (2.15) and (2.16), re- spectively: (c) IfC + =C , i.e. ifR + hasC =C(q) in its consequent (an example of such a rule is shown in (2.3)), construct a sustainment decision rule (S-Rule) as in (2.4), to sustain the output inC . 2.10 Another approach to dealing with the problem of too many rules is to prune those rules that have low credibility by only using those rules whose weights are above a certain threshold. In the extreme case, when only rules are kept whose weights equal unity, the similarity approach reduces to the exact matching approach presented in Section 2.2. 52 (vii). Collect all the decision rules that were obtained in Steps 6.1-6.3. Determine groups of decision rules that have exactly the same antecedents and consequents, but possibly have different weights. (viii). In each of the groups obtained in Step (vii), keep the rule that has the largest weight and discard those with smaller weights. (ix). Modify the consequent of all of the Step (viii) rules by taking a t-norm between their mem- bership functions and their weight, according to (2.20), and call these the “modified deci- sion rules”. There areN P such rules. One can also imagine applying rule-base simplification techniques [16,38,128,129,197,253, 310] to obtain an even smaller number of LGODM rules, but doing this is left for future research. 2.6 Implementation of The Fuzzy Decision System The modified decision rules comprise the pruned fuzzy rule base for LGODM. Since the effects of rule weights have been taken into account in the last step of LGODM, standard Mamdani inference along with min t-norm, max t-conorm, singleton fuzzification, and center of gravity defuzzification can be used to implement LGODM. Assume that we obtainedN P pruned rules whose antecedents are modified using their weights, and are therefore, unweighted: R j : If (x 1 isA j 1 andx 2 isA j 2 and . . . andx p isA j p ) andu isB j then u isD j j = 1; 2;:::;N P (2.40) 53 where D j ’s are obtained using (2.20). Assume that the inputs to the decision rules are x r = x 0 r ;r = 1; 2;:::;p, andu =u 0 . The firing level of thej th rule is calculated as: f j (x 0 ;u 0 ) = ^ r=1;2;:::;p A j r (x 0 r ) (2.41) Fuzzy inference from thej th rule yields a fuzzy setF j : F j(z) = (A j ;B j ))D j(x 0 ;u 0 ;z = D j(z)^f j (x 0 ;u 0 ); j = 1; 2;:::;N P ;8z2Z (2.42) The aggregated fuzzy setF is obtained by taking the union betweenF j ’s: F (z) = _ j=1;2;:::;N P F j(z) = _ j=1;2;:::;N P D j(z)^f j (x 0 ;u 0 ) (2.43) The output of the fuzzy decision system, u, is obtained by defuzzification ofF : u = R Z z F (z)dz R Z F (z)dz (2.44) 2.7 LGODM for Enhanced Oil Recovery with Steam In this section, 2.11 we apply the LGODM methodology to enhancing the oil production rate for oil wells that undergo Enhanced Oil Recovery using steam injection. 2.11 Parts of the material in this section about thermal oil recovery are adopted from [3, 8]. 54 2.7.1 Background Oil production in petroleum reservoirs commonly experiences three phases called 2.12 primary, secondary, and tertiary. The primary recovery phase relies mainly upon the natural pressure of the reservoir and grav- ity forces to extract the oil. If needed, artificial lift techniques [70] such as rod pumps can be used to assist the recovery process. The primary recovery phase results in the recovery of between 5% to 15% of the total oil in a reservoir [251]. To extract more amounts of oil in the reservoir, secondary recovery techniques are used that usually include injecting water (water flooding) or gas into the well, so as to force the oil to the surface. Secondary oil recovery contributes to the recovery of 20% to 40% of the total oil in a reservoir [231]. In areas where oil fields are mature, tertiary recovery techniques (also known as enhanced oil recovery techniques) are used that contribute to extracting 30% to 60% of the total oil in the reservoir. One of the most common methods of enhanced oil recovery is thermal oil recovery, which consists of injecting steam into the well so as to decrease the viscosity of the crude oil making it flow to the surface. Cyclic Steam Stimulation (CSS), (also called Cyclic Steam Injection (CSI) and Huff and Puff ) [8] is a very common thermal recovery method, which involves cyclic injection of steam into the reservoir (see Fig. 2.8). Each cycle of steam injection has three steps, injection (huff), soaking, and production (puff), and each well goes through a number of cycles to extract crude oil that cannot be extracted using primary and secondary recovery techniques. In the injection step, steam is injected into the well for a period of time (usually days to weeks) to stimulate the reservoir, i.e. to heat the crude oil in the reservoir to a temperature at which its 2.12 It can include an extra phase called quaternary, to refer to very advanced techniques that are available due to recent development of new technologies. 55 Figure 2.8: Cyclic Steam Stimulation. 2.13 viscosity is low enough to become mobilized. After enough steam is injected to stimulate the well so that it continues to produce oil for weeks, in the soaking step the well is shut down, and the steam is left to remain in the well so that it “soaks” for a few days. The well is then opened and production begins due to the natural flow of liquid. As production progresses over time, the reservoir temperature reduces so that the oil flow rate decreases until it reaches a low enough level that calls for another cycle of steam injection, soaking, and production. 2.13 This image is a work of a United States Department of Energy (or predecessor organization) employee, taken or made as part of that person’s official duties. As a work of the U.S. federal government, the image is in the public domain. 56 The control of both the flow of steam during the injection phase and the liquid offload in the production phase, is performed by a choke [139], which is a valve-like apparatus whose opening is stated in terms of percentage open (0% to 100%) or as an equivalent 64 th of an inch diameter. According to [8]: At the early stages of CSS application, CSS was considered as an oldschool oil pro- duction method in which operations are ahead of research developments (Ramey et al., 1969). The literature shows that many publications, explaining CSS pro- cesses, were based on field experiences rather than research work. There are a lot of unknowns about the process parameters such as the number of stimulation cycles, well orientation and number of wells, operating condition, the increase of water cut, among others. Therefore, in early CSS field applications, the process was performed as trial and error fieldscale experiment (Ramey, 1967). After many research studies and field experiences, important technology problems were reduced. Due to the trial and error nature of the cyclic steam stimulation process in the early stages, as well as the complexity of the process to the extent that it is not amenable to accurate physical modeling, there are still mature oil fields that use cyclic steam stimulation, but rely on operators and rather ad-hoc methods to change the choke in order to control flow rate of the liquid during the production step of each cycle so as to prevent wasting steam when no liquid is coming out of the well. Blowing steam by the well results in waste of reservoir’s expensive thermal energy, which is very undesirable. Therefore, automation of choke changes needs an automatic decision-making system. 57 Due to lack of proper mathematical models for these fields, a model-based decision-making approach is not an option for automation of the choke changes; therefore, one must rely on data- driven modeling and decision-making methodologies for developing a choke decision system. Additionally, due to the complexity of the cyclic steam stimulation procedure and the long time that it takes to complete a cycle, there is a considerable skepticism that operators know all the conditions that result in higher production rates. This rules out designing a decision-maker that merely relies on expert knowledge or learns to mimic the behavior of experts. On the other hand, the nature of the problem suggests that the offload from the well has three phases: mostly liquid, mixed liquid and gas, and mostly gas, which correspond to high production, medium production, and low production. The decision maker has to change the choke so as to drive the output of the wells towards higher production. In Section 2.7.2, we show how LGODM can be utilized to design such a decision maker for the oil wells. Synthesis of the choke decision system for CSS seems to be a proper application for LGODM because: (i). Reliable mathematical models of the oil wells that undergo CSS are not available. (ii). Decisions are made for the system by human operators based on heuristics. (iii). Human-made decisions are not fully reliable. (iv). There is a multitude of data available for oil wells that undergo CSS because CSS is usually utilized in mature oil fields that have been in operation for many years. (v). Oil production is a slow process; it takes a long time for a self-adapting or learning con- troller to “observe” all modes of production and learn how to react. 58 Consequently, LGODM can be used to convert “low quality” descriptive rules to control rules with “higher quality,” i.e., to infer how to change the choke to obtain better production rates than decisions made by human operators. 2.7.2 Design of A Choke Decision System Using LGODM The first step in designing a LGODM system is selecting variables to establish the rules that de- scribe production rateQ 0 (t). Using assistance from petroleum engineering experts, we selected Pressure, 2.14 Temperature Drop, 2.15 Current Choke Setting, and Time in the cycle 2.16 as the vari- ables for constructing a set of descriptive rules. Those variables are denoted asP 0 (t); 0 (t);u(t);t, respectively. The decision variable u(t) is the choke setting, and the output of the system (an oil well) is the production rate Q 0 (t). Because the scales of the variables P 0 (t); 0 (t);Q 0 (t) are very different, we normalized them by dividing them by their maximum value (which was obtained from the historical data). The normalized variables are denoted P (t);(t), and Q(t). P 0 (t); 0 (t);Q 0 (t) are measured for hundreds of wells for long periods of time, e.g. from around one to two years. The frequency of measuring P 0 (t); 0 (t) is every five minutes, but Q 0 (t) is measured at much lower frequencies, e.g., every few days. We applied a first-order hold toQ 0 (t) to obtain the Q 0 (t) values at the same frequency as P 0 (t) and 0 (t). Note that the dependence ofP;;Q; andu on time is shown as their dependence on time in a cycle, which means that the beginning of the production cycle was considered as the initial time for all variables. We used the Wang-Mendel method [265, 269] to establish a set of descriptive rules from historical data for each well. To do this, we described each of the variablesP 0 (t); 0 (t);u(t);Q 0 (t) 2.14 Pressure of the flow that comes out of the well onto the surface. 2.15 Temperature Drop is the difference between the temperatures before and after the choke. 2.16 Time in the cycle is the time passed since the well started for production after the last soaking phase. 59 with three fuzzy sets called Low, Medium, and High, and the variablet with three fuzzy sets called Beginning, Middle, and End. An example of a descriptive rule obtained by the WM method is: If [P 0 (t) isLow and 0 (t) isHigh andt isEnd andu(t) isHigh] thenQ 0 (t) isMedium (2.45) An example of an LGODM decision rule is: If [P 0 (t) isLow and 0 (t) isHigh andt isEnd andu(t) isHigh] then u 0 (t + 1) isMedium High, weight = 0.85 (2.46) Because a choke is a physical device that controls a physical system, according to petroleum engineering experts, there is a threshold on the amount it is allowed to change. Abrupt changes in the choke can produce large tranisents in the variables of the oil well, which are highly unde- sirable. Thus, u 0 (t) from the LGODM fuzzy system is adjusted to u(t), as: u(t + 1) = sgn(u 0 (t + 1)) min(ju 0 j;) (2.47) where is the threshold for changes in the choke setting and is set to be 6:4 in this study, which is 10% of the maximum possible choke. It follows from (2.47) that: u(t + 1) =u(t) + u(t + 1) (2.48) The schema of determiningu(t + 1) is depicted in Fig. 2.9. 60 Rule Based Fuzzy Decision System ′ θ t ( ) ′ P t ( ) u t ( ) t Δ ′ u t + 1 ( ) u t + 1 ( ) Threshold Linguistic Goal Oriented Decision Making for the Choke Δu t + 1 ( ) Figure 2.9: A schema of the LGODM in a choke decision system. 2.7.3 Validation of The Choke Decision System As mentioned in Section 2.7.1, CSS field experiences are ahead of academic research and, as a result, in some oil fields no physical modeling has been performed for steamed oil wells; therefore, we cannot rely on physical models to validate the efficiency of the LGODM by simulation, before implementing it in the oilfield. Field tests are practical ways to test the algorithm in a real setting. Unfortunately, field tests are very expensive and time consuming. Consequently, we established a data-driven model of a virtual well for which we can examine the validity of LGODM. Our virtual well is a predictive model of an actual well; it provides an estimate of P (t + 1);(t + 1);Q(t + 1) given delayed versions of them and delayed versions of u(t), and is a 61 Nonlinear Auto-Regressive Model with eXogeneous inputs (a NARX model) that uses historical data [40, 85, 94, 186], i.e.: 2.17 (P (t + 1);(t + 1);Q(t + 1)) = ' (P (t);:::;P (tk);(t);:::;(tk);Q(t);Q(tk);u(t);:::;u(tk u )) (2.49) Lettingz(t) (P (t);(t);Q(t)), one can express (2.49) as: z(t + 1) =' (z(t);z(t 1);:::;z(tk);u(t);u(t 1);:::;u(tk u )) (2.50) We approximated the function'() with a feedforward neural network (FFNN), ^ '() whose inputs are l = (z(l);z(l 1);:::;z(lk);u(l);u(l 1);:::;u(lk u )); (l = 1; 2;:::;L) and outputs isz(l + 1). A schema for the neural network depicted in Fig. 2.10. We trained the FFNN using a Levenberg-Marquardt static backpropagation algorithm [85,86]. By static backpropagation, we mean that theL inputs l and theL targetsz(l + 1) are presented to the network during the training and the outputs of the network are not fed back to the network during training. The following Mean-Square Error (MSE) objective function was minimized to train the neural network: J = L1 X l=0 kz(l + 1) ^ '( l )k 2 2 (2.51) wherekk 2 denotes theL 2 norm and ^ '( l ) denotes the response of the network to l . 2.17 Note that the variable Time in Cycle,t, is the time passed since the beginning of the production phase. Therefore, we use it as the index of variables in the NARX model as well. Obviously, NARX is a dynamical system and its variables depend on time. 62 ˆ ϕ ⋅ ( ) ˆ P t + 1 ( ) D D D D D Q t ( ) D Q t - 1 ( ) Q t - k ( ) D D ˆ θ t + 1 ( ) ˆ Q t + 1 ( ) ˆ z t + 1 ( ) Figure 2.10: A NARX model of an oil well. The boxes that contain the letter D represent tap delays. 63 The feedforward neural network comprised one hidden layer with 5 neurons, each with tan- gent sigmoid activation function. 70% of the data was used for training, 10% was used for valida- tion, and 20% was used for testing. Additionally,k =k u = 36 was used as the number of delays; this is equivalent to three hours of data and was recommended to us by petroleum engineering experts. The trained FFNN NARX model ^ '() played the role of a virtual well for us, as is depicted in Fig. 2.11. To simulate the CSS production phase using the virtual well in the presence of LGODM, we simulated the same number of cycles as was made available to us in some historical data, and made the lengths of each cycle equal to the actually occuring lengths. Because we had no data available for the injection and soaking phases of the wells (LGODM is not needed for those stages because it is designed only for the production stage), for each simulated cycle, we set the initial values of the inputs in ^ '() (i.e., (P (0);(0);Q(0);u(0)), to be the same as the normalized values of P 0 (t); 0 (t);Q 0 (t);u(t) that were measured at the beginning of each cyclet = 0. The LGODM decided the next value for the choke,u(t + 1), based on the current values of z 0 (t) = (P 0 (t); 0 (t);Q 0 (t)) 2.18 and the time in cycle, t. Consequently, the virtual well predicted the next values of normalized pressure, temperature drop, and production rate, i.e. z(t + 1) = (P (t + 1);(t + 1);Q(t + 1)) until the end of each cycle. This simulation process was continued to simulate all of the cycles for a well. We simulated 9 wells that had between 12 to 19 huff and puff cycles. After pruning the decision rules, we found between 71 and 243 decision rules for the wells. The choke was changed every 180 minutes. The number of cycles for each well, the number of descriptive rules inLow, Medium, and High, I, L 1 ;L 2 , and L 3 , respectively, as well as the number of unpruned and 2.18 The variables (P 0 (t); 0 (t);Q 0 (t)) were fed to the decision maker by passingz(t) = (P (t);(t);Q(t)) through rescaling, i.e. by multiplying the normalized values by their maximum possible values obtained from historical data 64 a a ˆ ϕ ⋅ ( ) D z t ( ) D z t -1 ( ) z t - k ( ) z t+1 ( ) D u t ( ) D u t - 1 ( ) u t - k u ( ) D LGODM t - 1 D z t -1 ( ) Rescaling ′ z t -1 ( ) Virtual Well Figure 2.11: Simulation of oil well production including the LGODM.z(t) = (P (t);(t);Q(t)). The boxes containing the letter D represent tap delays. Heavier lines represent vector signals. 65 Table 2.1: Increase in well production in the presence of the LGODM Well number S% I L 1 L 2 L 3 N Unpruned N Pruned 1 11.2 17 52 30 14 3974 164 95.9 2 10.1 13 40 34 12 3548 214 94.0 3 13.3 12 57 27 19 7201 202 97.2 4 17.6 17 47 27 8 2798 200 92.9 5 20.2 19 43 28 11 3035 71 97.7 6 18.9 14 51 33 12 4170 154 96.3 7 15.3 18 37 25 20 2870 243 91.5 8 17.2 14 65 23 21 3977 203 94.9 9 16.5 15 46 35 11 4001 159 96.0 Average 15.6 15.2 48.7 29.1 14.2 3952.6 184.2 95.2 pruned decision rules N Unpruned and N Pruned , are shown in Table 2.1. The parameter = (N Unpruned N Pruned )=N Unpruned 100% for each well is also shown in Table 2.1; it is between 91:5% and 97:7%. Observe that after pruning, a significantly reduced number of decision rules are obtained. The percentage increase in production S when LGODM was used to change the choke in a particular well is also shown in Table 2.1, and shows that on average, the decision maker increases the production rates by 15:6% for the oil wells.S was calculated as: S = P I i=1 P T i t=1 Q i s (t)Q i a (t) P I i=1 P T i t=1 Q i a (t) 100% (2.52) whereQ i s (t) andQ i a (t) are respectively the simulated and actual normalized production rate in thei th cycle at timet,I is the number of the cycles, andT i is the maximum time in cyclei. 66 The results of well simulations for a sample well (well No. 6) are depicted in Figs. 2.12 -2.15. This well showed an 18.9% increase in production when LGODM was used for making decisions about choke changes. The simulations suggest that the decision maker tries to maintain the variables constant in each cycle, hence, giving constant production rates in each cycle. In Fig. 2.12, simulated normalized production rate is shown. Because the initial values of the variables at some of the cycles are similar, the production rates in those cycles are similar, therefore, multiple cycles with the same production rates can be seen as a line with one single constant value. In Fig. 2.13, the choke changes are depicted. Observe that in order to keep the production rates constant, the choke determined by LGODM has oscillations, but the amplitude of those oscillations is not large. In Fig. 2.14, the simulated normalized pressure is depicted. Observe that pressure stays constant for a long time in each cycle due to constant choke settings. In Fig. 2.15, the simulated normalized temperature drop is shown. Some chattering can be seen in simulated normalized temperature drop, although its trends are fairly constant. 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Time Q(t) Figure 2.12: Simulated production for well 6. 67 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 18 18.5 19 19.5 20 20.5 21 21.5 22 22.5 Time Choke(t) Figure 2.13: Recommended choke changes for well 6. 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Time P(t) Figure 2.14: Simulated pressure for well 6. 68 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Time θ(t) Figure 2.15: Simulated temperature drop for well 6. 2.8 Conclusions and Future Works In this chapter, we presented a novel goal oriented linguistic decision making method that uses rule based systems. We started by grouping the rules according to their consequents and finding rules that had exactly the same antecedents (excluding the antecedent containing the decision variable) in each group. Then, decision rules were established by determining a change in the decision variable that made the output of the system move from one group to a more desirable group. Exact matching of the rules was relaxed to similarity matching, and similarity of the pair of descriptive rules from which each decision rule is obtained was used as a credibility degree for that rule. A methodology for pruning weighted rules was also devised to avoid proliferation of rules. LGODM was used to establish a decision system for changing the choke in oil wells that undergo cyclic steam stimulation. For validation, a neural network was trained that played the 69 role of a virtual well. Our simulations showed that LGODM leads to a considerable increase in the production of the oil wells. LGODM is applicable to systems that are very slow, in the sense that changes in them take a long time to occur, and for which historical data are available about the system so that descriptive rules can be obtained for it. Extending the present LGODM methodology to multi-input multi-output systems remains to be done. Moreover, it is straightforward to enhance LGODM so that it can be used in an adaptive manner: Recent data about the behavior of the system can be used to obtain new descriptive rules, which can be converted into new decision rules that replace the existing LGODM rules. Another important line of research is to apply LGODM to other types of systems (e.g., eco- nomical, financial, biomedical, and social systems) for which a prior mathematical model is not available but historical data are plentiful, to establish whether LGODM can enhance their perfor- mance. 70 Chapter 3 Modeling Linguistic Probabilities and Linguistic Quantifiers Using Interval Type-2 Fuzzy Sets Probability but no truth, facility but no freedom–it is owing to these two fruits that the tree of knowledge cannot be confused with the tree of life. Friedrich Nietzsche 3.1 Introduction I N this chapter, we synthesize interval type-2 fuzzy set models of linguistic probability words and linguistic quantifiers by applying the Enhanced Interval Approach to data collected from subjects about those words. We establish some user friendly sub-vocabularies of linguistic prob- abilities and linguistic modifiers, so that they can be used in advanced computing with words applications. The user friendly vocabularies are based on axioms of fuzzy probabilities. Fuzzy probabilities [27, 288, 290, 321] model the linguistic uncertainty about the probability of an event which may be a result of the fuzziness of an event itself, and/or due to the fact that human’s perceptions of the probability of an event are uncertain. 71 In [317], a framework for calculation of the numeric probability of a fuzzy event is estab- lished. Yager [288] argues that the probability of a fuzzy event must be a fuzzy set itself, so based on [317], he defines the fuzzy probability of a fuzzy event. To rigorize the theory of fuzzy probabilities, axiomatic frameworks have recently been proposed for it in [88, 89, 183]. Some research has quite recently been devoted to modeling everyday reasoning and decision making [87, 344]. Zadeh suggests using fuzzy logic and the framework of Advanced Computing with Words (ACWW) [175] for automation of everyday reasoning and decision making using fuzzy logic. ACWW is a methodology of computation in which carriers of information can be numbers, intervals and words. In such a methodology, assignment of attributes to variables may be implicit, and one generally deals with linguistic truth, probability, and possibility. Modeling of words in natural languages plays a pivotal role in ACWW. Mendel argues that since words mean different things to different people, a first-order uncertainty model of a word should be an interval type-2 fuzzy set (IT2 FS) [168]. Moreover, Zadeh anticipates that type-2 fuzzy sets will play a more important role in ACWW in the future [175]. Therefore, it is plausible to implement reasoning schemes for ACWW using type-2 fuzzy sets. Consequently, synthesizing IT2 FS models of linguistic probabilities is an essential prerequisite to solve ACWW problems that involve linguistic probabilities [24,210,212,215,219–221]. Zadeh [320] shows that linguistic quantifiers (e.g. Most, Few, Some, etc.) are mathematically equivalent to linguistic probabilities; therefore, it is viable to model them along with linguistic probabilities so as to use them in solving ACWW problems that involve them. In this chapter, we establish vocabularies of linguistic probability words and linguistic quan- tifiers and apply the Enhanced Interval Approach (EIA) [286] to the data that was collected from subjects about those words. Then, we establish user-friendly vocabularies of probability words. Our results can then be used by all other researchers. 72 3.2 Enhanced Interval Approach EIA is an improved version of the Interval Approach (IA) [151] for synthesizing IT2 FS models of words using interval data collected from subjects about those words. The structures of both methods are very similar, but the EIA modifies the steps of the IA so that its resulting IT2 FSs are reasonably narrower than those yielded by the IA. In this chapter, we do not intend to go deep into the EIA methodology; hence, we briefly introduce its steps. First,n subjects are asked to provide their intervals associated with a word on a certain scale. In [151, 286] a 0-10 scale was used; in this study, we use a 0-100% scale, because it is a nat- ural scale for linguistic probabilities. Then, the following steps are performed on the collected intervals: 3.1 (i). Bad data processing: In this step, only valid intervals are accepted. Valid intervals are intervals [a (i) ;b (i) ] for which 0 a (i) < b (i) 100 andb (i) a (i) < 100. n 0 intervals remain after this step. (ii). Outlier processing: In this step, the Box-Whisker outlier removal test is performed on inter- val end-pointsa (i) ’s andb (i) ’s and then interval lengthsL (i) =b (i) a (i) . After removing intervals whose endpoints are outliers,n 00 intervals remain. After removing intervals whose lengths are outliers,m 0 intervals remain. 3.1 The first four steps are called the Data Part of the method, and the fifth step is called the Fuzzy Set Part. 73 (iii). Tolerance limit processing: Tolerance limit processing ona (i) andb (i) is performed first, and then on L (i) = b (i) a (i) . For the former, only intervals that satisfy the following formulas are accepted: a (i) 2 [ ^ m a k^ a ; ^ m a +k^ a ] b (i) 2 [ ^ m b k^ b ; ^ m b +k^ b ] (3.1) wherek is determined such that the given limits include at least 95% of the subject data intervals, with 95% confidence. Note that it is assumed the data interval endpoints are approximately normal. ^ m a , ^ a are sample mean and sample standard deviation ofm 0 left endpoints, and ^ m b , ^ b are sample mean and sample standard deviation of the m 0 right endpoints. This step reducesm 0 interval endpoints tom + interval endpoints. ^ m L and ^ L are then computed based on the remaining data, and only intervals satisfying the following are kept: L (i) 2 [ ^ m L k 0 ^ L ; ^ m L +k 0 ^ L ] (3.2) wherek 0 = min(k 1 ;k 2 ;k 3 ) in whichk 1 is determined such that one can assert with 95% confidence that the interval in (3.2) contains at least 95% ofL (i) ’s, and k 2 = ^ m L =^ L (3.3) 74 k 3 = (100 ^ m L )=^ L (3.4) Equation (3.3) guarantees that in (3.2) ^ m L k 0 ^ L 0, and (3.4) guarantees that in (3.2) ^ m L +k 0 ^ L 100 so that intervals whose length L (i) is outside the probability interval [0%; 100%] (and therefore do not have any meaning) are rejected. This step reducesm + intervals tom 00 intervals. (iv). Reasonable-interval processing: In this step, intervals that have little overlap with others are removed, following the philosophy that “words must mean the same thing to different people, otherwise effective communication cannot be established.” It is assumed that both the left endpoints and the right endpoints are distributed normally around the mean values of intervals that have survived, i.e. ^ m a and ^ m b , and, each reasonable interval ought to include the point at which the two Gaussian distribution functions intersect. In EIA, this step is utilized to discard intervals that are overly long, by allowing only the intervals that satisfy the following formula to survive: 2 ^ m a a (i) b (i) 2 ^ m a + (3.5) A formula for is given in [286]. This step reducesm 00 intervals tom intervals. (v). Fuzzy set part: In this step, first the nature of the Footprint of Uncertainty (FOU) is de- termined, i.e. it is determined whether the FOU is a left shoulder, an interior, or a right shoulder. This is done using a classification algorithm. Then, each interval is converted to a T1 FS. The resulting T1 FSs may be inadmissible (i.e. their parameters may be outside the 75 0-100 scale). Those inadmissible T1 FSs are removed, leavingm surviving T1 FSs. The parameters of the FOU are derived from the parameters of the survivingm intervals using simple formulas and bounding procedures that are explained in [286]. 3.3 Modeling Probability Words and Linguistic Quantifiers Using EIA To begin, we established a vocabulary of linguistic probabilities consisting of all of the combi- nations of one of the probability words Improbable, Unlikely, Probable, and Likely with one of the hedges: Extremely, Highly, Very, Quite, Pretty, Fairly, and Somewhat; the word Tossup and the probability words without hedges are also included in the vocabulary. The word Possible is also included in the vocabulary. Although possibility is different from probability, this word has been used by many people as having a probability connotation, and has been investigated in data-oriented studies about subjective and linguistic probability [?, ?, 258]. Our final vocabulary is comprised of 34 probability words. We also established a vocabulary of 24 linguistic quantifiers consisting of the following words: Almost none, A small number, A little bit, Little, A little, A few, Some, Enough, Many, Much, Plenty, So many, Lots, A good deal, A great deal, So much, A lot, A large amount, Most, A majority, A large number, Too many, Too much, Almost all. We collected data about linguistic probabilities from 100 subjects and about linguistic quan- tifiers from 111 subjects 3.2 on Amazon Mechanical Turk website [1]. Each subject was asked the following question for a set of randomly selected words containing half of the words of each vocabulary: 3.2 The native language of the subjects was English. 76 “On a scale of 0%-100%, what are the endpoints of an interval that you associate with the word— ?” The surveys were randomized, so that the subjects could not correlate their answers. For each probability word, between 37 and 58 intervals, and for each linguistic quantifier, between 46 and 66 intervals were collected. We applied the EIA to the data, and obtained the FOUs for the words. Then we ranked them according to their average centroids [280]. The FOUs of probability words are depicted in Fig. 3.1, and those of linguistic quantifiers are depicted in Fig. 3.2. 0 50 100 0 0.5 1 Extremely Unlikely 0 50 100 0 0.5 1 Extremely Improbable 0 50 100 0 0.5 1 Highly Unlikely 0 50 100 0 0.5 1 Highly Improbable 0 50 100 0 0.5 1 Very Unlikely 0 50 100 0 0.5 1 Very Improbable 0 50 100 0 0.5 1 Quite Unlikely 0 50 100 0 0.5 1 Quite Improbable 0 50 100 0 0.5 1 Pretty Improbable 0 50 100 0 0.5 1 Unlikely 0 50 100 0 0.5 1 Improbable 0 50 100 0 0.5 1 Pretty Unlikely 0 50 100 0 0.5 1 Fairly Improbable 0 50 100 0 0.5 1 Somewhat Improbable 0 50 100 0 0.5 1 Fairly Unlikely 0 50 100 0 0.5 1 Somewhat Unlikely 0 50 100 0 0.5 1 Tossup 0 50 100 0 0.5 1 Somewhat Probable 0 50 100 0 0.5 1 Possible 0 50 100 0 0.5 1 Somewhat Likely 0 50 100 0 0.5 1 Fairly Likely 0 50 100 0 0.5 1 Probable 0 50 100 0 0.5 1 Fairly Probable 0 50 100 0 0.5 1 Likely 0 50 100 0 0.5 1 Pretty Probable 0 50 100 0 0.5 1 Pretty Likely 0 50 100 0 0.5 1 Quite Likely 0 50 100 0 0.5 1 Quite Probable 0 50 100 0 0.5 1 Very Likely 0 50 100 0 0.5 1 Very Probable 0 50 100 0 0.5 1 Highly Likely 0 50 100 0 0.5 1 Highly Probable 0 50 100 0 0.5 1 Extremely Likely 0 50 100 0 0.5 1 Extremely Probable Figure 3.1: FOUs of the vocabulary of 34 linguistic probabilities on a percentage scale. In order to establish reduced-size user-friendly vocabularies which contain a reasonable num- ber of words, we considered two sub-vocabularies of probability words: the first one is derived 77 0 20 40 60 80 100 0 0.5 1 Almost none 0 20 40 60 80 100 0 0.5 1 A small number 0 20 40 60 80 100 0 0.5 1 A little bit 0 20 40 60 80 100 0 0.5 1 Little 0 20 40 60 80 100 0 0.5 1 A little 0 20 40 60 80 100 0 0.5 1 A few 0 20 40 60 80 100 0 0.5 1 Some 0 20 40 60 80 100 0 0.5 1 Enough 0 20 40 60 80 100 0 0.5 1 Many 0 20 40 60 80 100 0 0.5 1 Much 0 20 40 60 80 100 0 0.5 1 Plenty 0 20 40 60 80 100 0 0.5 1 So many 0 20 40 60 80 100 0 0.5 1 Lots 0 20 40 60 80 100 0 0.5 1 A good deal 0 20 40 60 80 100 0 0.5 1 A great deal 0 20 40 60 80 100 0 0.5 1 So much 0 20 40 60 80 100 0 0.5 1 A lot 0 20 40 60 80 100 0 0.5 1 A large amount 0 20 40 60 80 100 0 0.5 1 Most 0 20 40 60 80 100 0 0.5 1 A majority 0 20 40 60 80 100 0 0.5 1 A large number 0 20 40 60 80 100 0 0.5 1 Too many 0 20 40 60 80 100 0 0.5 1 Too much 0 20 40 60 80 100 0 0.5 1 Almost all Figure 3.2: FOUs of the vocabulary of 24 linguistic quantifiers on a percentage scale. from the words Improbable, Probable and the second from Likely, Unlikely; both vocabularies contained the word Tossup, and only the second one contained Possible. The parameters of the FOUs of those vocabularies are given in Tables 3.1 and 3.2. The parameters of FOUs of the vocabulary of linguistic quantifiers are given in Table 3.3. The four parameters of the UMF deter- mine its trapezoidal shape. The first four parameters of the LMF determine its trapezoidal shape (the EIA always gives a triangular MF, therefore the second and third parameters of the LMF are equal). The fifth parameter of the LMF is its height. Next, we computed pairwise Jaccard similarities for each of the vocabularies as [46,173,257]: s J ( e A; e B) = R U (min( e A (u); e B (u)) + min( e A (u); e B (u)))du R U (max( e A (u); e B (u)) + max( e A (u); e B (u)))du (3.6) 78 The pairwise similarities of the vocabulary associated with the words Probable and Improbable are given in Table 3.4. The pairwise similarities of the vocabulary associated with the words Likely and Unlikely are given in Table 3.5. Table 3.1: Parameters of FOUs of the vocabulary of 17 words derived from Improbable and Probable as well as Tossup. The scale is percentage. Upper membership function Lower membership function Extremely Improbable (0.00, 0.00, 1.83, 13.16) (0.00, 0.00, 0.46, 6.27, 1.00) Highly Improbable (0.00, 0.00, 5.92, 18.16) (0.00, 0.00, 0.92, 12.85, 1.00) Very Improbable (2.93, 10.00, 12.50, 17.07) (8.96, 11.67, 11.67, 16.04, 0.76) Quite Improbable (0.86, 15.00, 20.00, 29.14) (12.93, 17.50, 17.50, 22.07, 0.65) Pretty Improbable (6.89, 15.00, 20.00, 28.11) (12.93, 17.50, 17.50, 22.07, 0.65) Improbable (5.86, 17.50, 25.00, 34.14) (18.96, 22.00, 22.00, 26.04, 0.58) Fairly Improbable (11.89, 22.50, 27.50, 38.11) (17.93, 25.00, 25.00, 32.07, 0.76) Somewhat Improbable (9.82, 25.00, 30.00, 45.18) (22.93, 27.50, 27.50, 32.07, 0.65) Tossup (35.86, 50.00, 55.00, 64.14) (48.96, 50.83, 50.83, 51.41, 0.41) Somewhat Probable (47.93, 55.00, 60.00, 67.07) (52.93, 57.50, 57.50, 62.07, 0.65) Probable (50.86, 65.00, 70.00, 79.14) (62.93, 67.50, 67.50, 72.07, 0.65) Fairly Probable (50.86, 65.00, 72.50, 84.14) (61.89, 68.00, 68.00, 72.07, 0.58) Pretty Probable (60.86, 70.00, 77.50, 89.14) (68.96, 73.00, 73.00, 76.04, 0.58) Quite Probable (65.86, 77.50, 87.50, 98.11) (78.96, 82.50, 82.50, 86.04, 0.53) Very Probable (71.89, 82.50, 90.00, 98.11) (82.93, 87.00, 87.00, 92.07, 0.58) Highly Probable (76.89, 85.00, 90.00, 98.11) (83.96, 87.50, 87.50, 91.04, 0.65) Extremely Probable (86.84, 97.72, 100.00, 100.00) (94.05, 99.54, 100.00, 100.00, 1.00) 3.4 Establishing Reduced-Size Vocabularies In this section, we construct reduced-size user-friendly vocabularies of linguistic probabilities as well as linguistic quantifiers by excluding some of the words that are highly similar to the others. This is done to make the vocabulary provide a good partitioning of the space of numeric 79 Table 3.2: Parameters of FOUs of the vocabulary of 18 words derived from Unlikely and Likely as well as Tossup and Possible. The scale is percentage. Upper membership function Lower membership function Extremely Unlikely (0.00, 0.00, 2.73, 13.16) (0.00, 0.00, 0.46, 5.95, 1.00) Highly Unlikely (0.00, 0.00, 5.46, 13.16) (0.00, 0.00, 0.92, 10.22, 1.00) Very Unlikely (0.00, 0.00, 10.92, 26.33) (0.00, 0.00, 1.38, 18.16, 1.00) Quite Unlikely (0.86, 12.50, 20.00, 29.14) (13.96, 17.00, 17.00, 21.04, 0.58) Unlikely (5.86, 17.50, 25.00, 34.14) (18.96, 22.00, 22.00, 26.04, 0.58) Pretty Unlikely (11.89, 18.50, 25.00, 33.11) (17.93, 21.18, 21.18, 23.45, 0.46) Fairly Unlikely (16.89, 25.00, 30.00, 38.11) (22.93, 27.50, 27.50, 32.07, 0.65) Somewhat Unlikely (15.86, 30.00, 37.50, 48.11) (28.96, 33.46, 33.46, 36.04, 0.62) Tossup (35.86, 50.00, 55.00, 64.14) (48.96, 50.83, 50.83, 51.41, 0.41) Possible (44.82, 57.50, 67.50, 80.18) (58.96, 62.50, 62.50, 66.04, 0.53) Somewhat Likely (44.44, 57.50, 67.50, 80.18) (57.93, 62.50, 62.50, 67.07, 0.53) Fairly Likely (50.86, 62.50, 70.00, 84.14) (63.96, 67.00, 67.00, 71.04, 0.58) Likely (54.82, 68.00, 80.00, 94.14) (67.93, 73.68, 73.68, 80.73, 0.55) Pretty Likely (61.89, 72.50, 82.50, 94.14) (74.14, 77.50, 77.50, 80.86, 0.53) Quite Likely (65.86, 75.00, 82.50, 94.14) (73.96, 78.00, 78.00, 81.04, 0.58) Very Likely (70.86, 82.50, 90.00, 99.14) (83.96, 87.00, 87.00, 91.04, 0.58) Highly Likely (76.89, 84.50, 90.00, 98.11) (83.96, 87.11, 87.11, 90.86, 0.59) Extremely Likely (80.25, 93.17, 100.00, 100.00) (89.78, 99.08, 100.00, 100.00, 1.00) 80 Table 3.3: Parameters of FOUs of the vocabulary of 24 linguistic quantifiers. The scale is per- centage. Upper membership function Lower membership function Almost none (0.00, 0.00, 1.64, 10.53) (0.00, 0.00, 0.18, 2.32, 1.00) A small number (0.00, 0.00, 5.92, 19.75) (0.00, 0.00, 0.73, 7.58, 1.00) A little bit (0.00, 0.00, 5.92, 19.43) (0.00, 0.00, 0.92, 11.58, 1.00) Little (0.00, 0.00, 5.92, 19.75) (0.00, 0.00, 0.92, 11.58, 1.00) A little (0.00, 0.00, 6.38, 26.33) (0.00, 0.00, 0.92, 11.58, 1.00) A few (0.34, 5.00, 7.50, 13.86) (3.96, 6.36, 6.36, 9.04, 0.68) Some (3.79, 17.50, 27.50, 46.21) (19.17, 22.50, 22.50, 24.83, 0.53) Enough (40.86, 55.00, 62.50, 74.14) (51.89, 58.00, 58.00, 61.86, 0.58) Many (43.79, 60.00, 70.00, 86.21) (58.96, 64.00, 64.00, 66.04, 0.58) Much (50.03, 64.50, 70.50, 84.14) (65.55, 67.88, 67.88, 70.86, 0.47) Plenty (55.86, 67.50, 77.50, 88.11) (70.76, 72.86, 72.86, 76.04, 0.49) So many (53.79, 67.50, 80.00, 96.21) (67.93, 72.86, 72.86, 78.11, 0.49) Lots (53.79, 70.00, 77.50, 96.21) (67.93, 74.29, 74.29, 82.07, 0.70) A good deal (59.82, 72.50, 80.00, 95.18) (73.96, 77.00, 77.00, 81.04, 0.58) A great deal (59.82, 75.00, 85.00, 99.14) (75.72, 79.05, 79.05, 82.07, 0.43) So much (65.51, 74.00, 80.00, 94.14) (73.96, 77.27, 77.27, 81.04, 0.61) A lot (60.86, 75.00, 87.50, 99.14) (79.17, 82.14, 82.14, 84.83, 0.49) A large amount (60.86, 75.00, 87.50, 98.11) (77.93, 82.14, 82.14, 88.11, 0.49) Most (70.86, 82.50, 90.00, 99.14) (82.93, 87.00, 87.00, 92.07, 0.58) A majority (34.18, 85.98, 100.00, 100.00) (83.42, 98.62, 100.00, 100.00, 1.00) A large number (60.51, 86.72, 100.00, 100.00) (76.84, 98.16, 100.00, 100.00, 1.00) Too many (80.25, 98.17, 100.00, 100.00) (93.73, 99.54, 100.00, 100.00, 1.00) Too much (85.52, 98.99, 100.00, 100.00) (98.68, 99.91, 100.00, 100.00, 1.00) Almost all (84.52, 97.27, 100.00, 100.00) (96.37, 99.54, 100.00, 100.00, 1.00) 81 probabilities. We are also inspired by the works of Halliwell and Shen [88, 89] who provide axioms for type-1 fuzzy probabilities. To begin, let us briefly review the axioms provided by Halliwell and Shen. Assume that is a sample space, andF is the-algebra of events associated with , i.e: (i).FP (ii).F6=; (iii). A2F)A 0 2F (iv).fA i g 1 i=1 F) S 1 i=1 A i 2F in whichP is the family of subsets of , andA 0 = A is the complement ofA. A function LProb :F!N [0;1] is called a type-1 fuzzy probability measure 3.3 if and only if, for anyA2F: (i). 0 LProb(A) 1 (ii). LProb(;) = 0 and LProb( ) = 1 (iii). IffA i g 1 i=1 F and i 6= j ) A i \ A j = ; (i.e. A i ’s are mutually disjoint), then LProb ( S 1 i=1 A i ) L1 i=1 LProb(A i ) (iv). LProb(A 0 ) = 1 LProb(A) in whichN [0;1] is the set of all fuzzy numbers over the unit interval, is the order relation induced by the fuzzy minimum operator (and is equivalent to the-cut ranking method), represents a special addition for fuzzy numbers [61, 143], 3.4 and is the fuzzy subtraction operation. 3.3 Halliwell and Shen use the term linguistic probability measure. 3.4 The addition methods given in these references are examples of. It is different from the arithmetic addition of two fuzzy sets that is carried out by the Extension Principle. 82 The axioms suggest that the probability measure must yield a type-1 fuzzy number over [0; 1] and assign the antonym of the probability of a fuzzy event to the probability of the complement of that event. There are different models for the antonym of a type-1 fuzzy set suggested in the literature [47,193,254], but Halliwell and Shen adopted:P = 1 P as the antonym ofP , since it is more in accord with the nature of linguistic probabilities. Modeling linguistic probabilities with IT2 FSs by the EIA complies with Halliwell and Shen’s approach, since EIA yields trapezoidal FOUs whose upper membership function is normal and lower membership functions are subnormal, i.e. interval type-2 weak fuzzy numbers [214]. Since Halliwell and Shen’s approach also suggests that a fuzzy probability measure must be able to assign the antonym of probability words to complements of fuzzy events, it is plausible to make sure that the antonym of each probability word exists in the reduced-size vocabulary. 3.5 We therefore perform the following steps for each of the vocabularies: (i). Identify pairs of words whose Jaccard similarities are greater than or equal to the threshold = 0:5 (and, therefore, are more similar than not). (ii). Prune a number of the words, so that the pairwise similarities of the remaining words are less than the threshold 0:5. 3.6 Note that there is no unique way of doing this in general, but as a rule of thumb, one can keep those words to which many other words are similar, to keep the number of words in the reduced-size vocabulary small. (iii). If the antonym of a word is pruned in the last step, also prune that word. 3.5 Note that since the IT2 FS models of words are synthesized by collecting data from people, they do not satisfy :P = 1 P ; however, one could expect:P and 1 P to be highly similar. 3.6 As noted by Mendel and Wu [175], high similarity may mean different things to different people. The threshold can be used as a means to control the number of words in the reduced-size vocabulary. If we choose e.g. 0:6 as the threshold (like what Mendel and Wu did in [175]), we obtain larger vocabularies after pruning. 83 Table 3.4: Pairwise similarities between the 17 words of the vocabulary derived from Improbable, Probable and Tossup. EI HI VI QI PI I FI SI T SP Probable FP PP QP VP HP EP Extremely Improbable (EI) 1.00 0.57 0.15 0.11 0.04 0.04 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Highly Improbable (HI) 0.57 1.00 0.27 0.17 0.10 0.09 0.02 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Very Improbable (VI) 0.15 0.27 1.00 0.30 0.20 0.14 0.03 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Quite Improbable (QI) 0.11 0.17 0.30 1.00 0.82 0.51 0.25 0.22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Pretty Improbable (PI) 0.04 0.10 0.20 0.82 1.00 0.54 0.26 0.22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Improbable (I) 0.04 0.09 0.14 0.51 0.54 1.00 0.53 0.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Fairly Improbable (FI) 0.00 0.02 0.03 0.25 0.26 0.53 1.00 0.66 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Somewhat Improbable (SI) 0.01 0.03 0.04 0.22 0.22 0.41 0.66 1.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Tossup (T) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 1.00 0.34 0.11 0.10 0.01 0.00 0.00 0.00 0.00 Somewhat Probable (SP) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.34 1.00 0.22 0.19 0.04 0.00 0.00 0.00 0.00 Probable 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.11 0.22 1.00 0.83 0.32 0.11 0.04 0.00 0.00 Fairly Probable (FP) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.19 0.83 1.00 0.44 0.18 0.09 0.04 0.00 Pretty Probable (PP) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.04 0.32 0.44 1.00 0.37 0.20 0.12 0.00 Quite Probable (QP) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.11 0.18 0.37 1.00 0.60 0.46 0.10 Very Probable (VP) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.09 0.20 0.60 1.00 0.78 0.12 Highly Probable (HP) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.12 0.46 0.78 1.00 0.15 Extremely Probable (EP) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.12 0.15 1.00 84 Table 3.5: Pairwise similarities between the 18 words of the vocabulary derived from Unlikely, Likely, Tossup, and Possible. EU HU VU QU U PU FU SU T Possible SL FL L PL QL VL HL EL Extremely Unlikely (EU) 1.00 0.75 0.39 0.12 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Highly Unlikely (HU) 0.75 1.00 0.52 0.13 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Very Unlikely (VU) 0.39 0.52 1.00 0.35 0.19 0.12 0.04 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Quite Unlikely (QU) 0.12 0.13 0.35 1.00 0.49 0.38 0.14 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Unlikely (U) 0.04 0.04 0.19 0.49 1.00 0.73 0.33 0.21 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Pretty Unlikely (PU) 0.00 0.00 0.12 0.38 0.73 1.00 0.35 0.22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Fairly Unlikely (FU) 0.00 0.00 0.04 0.14 0.33 0.35 1.00 0.43 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Somewhat Unlikely (SU) 0.00 0.00 0.04 0.10 0.21 0.22 0.43 1.00 0.08 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Tossup (T) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.08 1.00 0.26 0.26 0.12 0.04 0.00 0.00 0.00 0.00 0.00 Possible 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.26 1.00 0.97 0.59 0.30 0.18 0.12 0.04 0.01 0.00 Somewhat Likely (SL) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.26 0.97 1.00 0.58 0.30 0.18 0.12 0.04 0.01 0.00 Fairly Likely (FL) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.12 0.59 0.58 1.00 0.45 0.28 0.20 0.09 0.03 0.01 Likely (L) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.30 0.30 0.45 1.00 0.68 0.57 0.27 0.18 0.08 Pretty Likely (PL) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.18 0.18 0.28 0.68 1.00 0.85 0.37 0.25 0.10 Quite Likely (QL) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.12 0.12 0.20 0.57 0.85 1.00 0.41 0.28 0.11 Very Likely (VL) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.04 0.09 0.27 0.37 0.41 1.00 0.77 0.27 Highly Likely (HL) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.03 0.18 0.25 0.28 0.77 1.00 0.29 Extremely Likely (EL) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.08 0.10 0.11 0.27 0.29 1.00 85 Table 3.6: Pairwise similarities between the 24 words of the vocabulary of linguistic quantifiers. AN ASN ALB Li AL AF S E Ma Mu Pl SMa Lo AG AGD Smu ALo ALA Mo AM ALN TMa TMu AA Almost none (AN) 1.00 0.43 0.38 0.38 0.32 0.29 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 A small number (ASN) 0.43 1.00 0.88 0.89 0.75 0.47 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 A little bit (ALB) 0.38 0.88 1.00 0.99 0.84 0.49 0.11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Little (Li) 0.38 0.89 0.99 1.00 0.84 0.49 0.11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 A little (AL) 0.32 0.75 0.84 0.84 1.00 0.42 0.18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 A few (AF) 0.29 0.47 0.49 0.49 0.42 1.00 0.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Some (S) 0.03 0.12 0.11 0.11 0.18 0.07 1.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 Enough (E) 0.00 0.00 0.00 0.00 0.00 0.00 0.01 1.00 0.48 0.33 0.19 0.18 0.16 0.10 0.08 0.05 0.07 0.07 0.01 0.20 0.04 0.00 0.00 0.00 Many (Ma) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.48 1.00 0.68 0.48 0.43 0.39 0.30 0.26 0.22 0.24 0.24 0.10 0.32 0.13 0.01 0.00 0.00 Much (Mu) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.33 0.68 1.00 0.56 0.49 0.44 0.34 0.28 0.24 0.25 0.25 0.09 0.29 0.13 0.01 0.00 0.00 Plenty (Pl) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.19 0.48 0.56 1.00 0.75 0.71 0.58 0.47 0.46 0.43 0.43 0.18 0.33 0.20 0.03 0.00 0.01 So many (SMa) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.18 0.43 0.49 0.75 1.00 0.85 0.70 0.59 0.56 0.55 0.54 0.30 0.42 0.28 0.09 0.05 0.06 Lots (Lo) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.16 0.39 0.44 0.71 0.85 1.00 0.72 0.58 0.60 0.51 0.51 0.27 0.39 0.27 0.09 0.05 0.06 A good deal (AG) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.30 0.34 0.58 0.70 0.72 1.00 0.75 0.82 0.65 0.65 0.33 0.37 0.31 0.10 0.05 0.06 A great deal (AGD) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.08 0.26 0.28 0.47 0.59 0.58 0.75 1.00 0.67 0.86 0.85 0.49 0.45 0.42 0.16 0.11 0.12 So much (SMu) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.22 0.24 0.46 0.56 0.60 0.82 0.67 1.00 0.60 0.60 0.36 0.31 0.31 0.10 0.05 0.06 A lot (ALo) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.07 0.24 0.25 0.43 0.55 0.51 0.65 0.86 0.60 1.00 0.94 0.55 0.46 0.46 0.17 0.12 0.13 A large amount (ALA) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.07 0.24 0.25 0.43 0.54 0.51 0.65 0.85 0.60 0.94 1.00 0.54 0.45 0.47 0.16 0.10 0.11 Most (Mo) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.10 0.09 0.18 0.30 0.27 0.33 0.49 0.36 0.55 0.54 1.00 0.39 0.50 0.24 0.17 0.18 A majority (AM) 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.20 0.32 0.29 0.33 0.42 0.39 0.37 0.45 0.31 0.46 0.45 0.39 1.00 0.67 0.28 0.17 0.22 A large number (ALN) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.13 0.13 0.20 0.28 0.27 0.31 0.42 0.31 0.46 0.47 0.50 0.67 1.00 0.36 0.21 0.28 Too many (TMa) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.03 0.09 0.09 0.10 0.16 0.10 0.17 0.16 0.24 0.28 0.36 1.00 0.59 0.78 Too much (TMu) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.05 0.05 0.11 0.05 0.12 0.10 0.17 0.17 0.21 0.59 1.00 0.76 Almost all (AA) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.06 0.06 0.06 0.12 0.06 0.13 0.11 0.18 0.22 0.28 0.78 0.76 1.00 86 In the vocabulary of the words derived from Improbable and Probable, we recognized the pairs of words that have Jaccard similarities greater than 0:5 (see underscored values in Table 3.4). Thus, we pruned the following words from the vocabulary: Highly Improbable, Quite Improba- ble, Pretty Improbable, Fairly Improbable, Fairly Probable, Quite Probable, Highly Probable. Since the word Pretty Improbable was pruned, we also pruned its antonym Pretty Probable (see underscored words in Table 3.4). Therefore, the following words survive: Extremely Improbable, Very Improbable, Somewhat Improbable, Tossup, Somewhat Probable, Very Probable, Extremely probable. In the vocabulary of the words derived from Likely and Unlikely, we recognized the pairs of words that have Jaccard similarities greater than 0:5 (see underscored values in Table 3.5). Thus, we pruned the following words from the vocabulary: Highly Unlikely, Pretty Unlikely, Possible, Fairly Likely, Pretty Likely, Quite Likely, Highly Likely. Since the words Fairly Likely and Quite Likely were pruned, we also pruned their antonyms Fairly Unlikely and Quite Unlikely (see underscored words in Table 3.5). Therefore, the following words survive: Extremely Unlikely, Very Unlikely, Somewhat Unlikely, Unlikely, Tossup, Somewhat likely, Likely, Very likely, Extremely likely. Finally, in the vocabulary of linguistic quantifiers, we recognized the pairs of words that have Jaccard similarities greater than 0:5 (see underscored words and values in Table 3.6). Thus, the following words survived: Almost none, A small number, A few, Some, Enough, So much, Most, Almost all. 87 3.5 Conclusions and Future Work In this chapter, we used the EIA to synthesize IT2 FS models of probability words and linguis- tic quantifiers from data collected from 100 English speaking subjects on Amazon Mechanical Turk website. We began with 24 linguistic quantifiers and 34 linguistic probabilities. Then, we formed sub-vocabularies of probability words, and pruned highly similar words from those sub- vocabularies as well as from the vocabulary of linguistic quantifiers, so as to establish reduced-size vocabularies which provide a suitable partitioning of the space of numeric probabilities and are user-friendly. The FOUs that we obtained and the vocabularies that we established are pertinent tools for researchers who deal with various problems that involve linguistic quantifiers and prob- abilities. Their upper membership functions can be used as type-1 fuzzy set models of linguistic probabilities and quantifiers. In the future, there must be efforts to establish vocabularies of usuality words, which are also treated as fuzzy probabilities to solve a variety of ACWW problems. 88 Chapter 4 Uncertainty Modeling and Reasoning with Linguistic Belief Structures All credibility, all good conscience, all evidence of truth come only from the senses. Friedrich Nietzsche 4.1 Introduction U NCERTAINTY modeling is an important topic in science and engineering. Many types of uncertainty have been recognized, including randomness, fuzziness, vagueness, and roughness [17, 123, 140, 203, 233, 316]. Various types of uncertainty may be needed to model a real world problem or represent knowledge; hence, mixtures of different types of uncertainties have also been studied [64, 184, 317, 331]. Belief structures represent a generalization of prob- ability theory, through the concept of a random set [223], which is generalization of a random variable, taking sets as its values instead of numbers. Therefore, they can be seen as a model to simultaneously represent set-valued and probabilistic uncertainty. Dempster [49, 50] showed that such a mapping induces lower and upper probabilities for an event, instead of a numeric 89 probability value. Shafer [236], showed that such a framework can be used to establish a mathe- matical theory of evidence, through the concepts of belief (credibility) and plausibility, which is also called Dempster-Shafer Theory of Evidence [51]. Theory of belief structures was extended to fuzzy sets by Zadeh [325], and was expanded by Yager [289, 292, 294]. Consequently, fuzzy evidence theory has been an important topic of interest in the fuzzy community [102,107,157,196,306,309]. In all of these works, evidence (the focal elements) is represented by type-1 fuzzy sets and probability mass assignments are numeric. There have been a few works that assume interval-valued probability mass assignments for belief structures [53, 73–76, 143, 190, 245, 272, 274, 295]. Recently, they have also been used for design of rule-based fuzzy systems [9, 10]. In this chapter, we generalize the theory of belief structures to the case in which both focal elements and probability mass assignments are represented by words modeled using interval type- 2 fuzzy sets, since it has been shown by Mendel [168] that a first-order uncertainty model for a word is an interval type-2 fuzzy set. Interval type-2 fuzzy set models of words can be synthesized by collecting data from subjects, and applying the Enhanced Interval Approach [286] to that data. We call such belief structures, linguistic belief structures. Note that the theory we develop in this chapter can be conveniently used when any of the focal elements or probability mass assignments are numeric and non-fuzzy or type-1 fuzzy sets, because they are special cases of interval type-2 fuzzy sets. The rest of this chapter is organized as follows: In Section 5.3.1, We review the basic concepts of random sets, belief structures, and evidence theory. In Section 5.3.2, we demonstrate how the concept of belief structures has been extended to fuzzy focal elements. In Section 5.3.3, belief structures with both fuzzy focal elements and fuzzy probability mass assignments are studied. It is shown that instead of normalized sums, Fuzzy Weighted Averages can be used to calculate the 90 lower and upper probabilities, so as to gurantee their existence. In Section 5.3.4, belief structures with linguistic focal elements and probability masses modeled by Interval Type-2 Fuzzy Sets are introduced. In Section 4.6 the combination of evidence provided by multiple belief structures and operations on belief structures are studied. In Section 4.7 some illustrative examples are provided. Finally, in Section 8.9, some conclusions are drawn and a framework for future works is offered. 4.2 Background Consider a set-valued mapping : 0 ! P where 0 is a sample space. is called the referential set or frame of discernment, andP is the power set of . Assume that the sample space 0 is discrete and finite ( can be uncountable and/or infinite) 4.1 . Consider a probability measure(:) on 0 . For allA , the mappingm :P ! [0; 1]: m(A) = (fsj(s) =Ag) 1(fsj(s) =;g) (4.1) is called a basic probability mass assignment. For A , if m(A)6= 0, A is called a focal element, and m() is said to define a Dempster-Shafer model or Belief Structure on . We show the belief structureB by the set of pairs of focal elements and probability masses as 4.1 A continuous frame of discernment will not drastically change the theory, since the number of focal elements remains finite. The comparable theory for continuous and infinite sample spaces is out of the scope of this chapter. Interested readers may refer to [242, 287]. 91 B =f(A 1 ;m 1 );:::; (A n ;m n )g, and denote the set of focal elements ofm() by m() . Demp- ster showed that for eachB : Prob(B)2 Prob (B); Prob + (B) = 2 4 X AB m(A); X A\B6=; m(A) 3 5 (4.2) Note that if () is a random variable ( : 0 ! R), 4.2 thenm(A) is the probability of A, sincefsj(s) =;g) =;, and 1(fsj(s) =;g) = 1 ; therefore, m(A) is just an extension of the concept of a probability distribution on and is induced by the random set (). The normalization by the term 1(fsj(s) =;g) helpsm() satisfy the following important property, that are analogous to the familiar properties satisfied by a probability distribution: X A m(A) = X A2 m() m(A) = 1 (4.3) The lower and upper probabilities ofB, defined in (5.12) are called belief or credibility and plausibility ofB, respectively: bel(B) = X AB m(A) (4.4) pls(B) = X A\B6=; m(A) (4.5) 4.2 The random variable can be viewed as : 0!P , where the range of () only consists of the singletons, i.e. f!ig2P 92 4.3 Extension of Belief Structures to Fuzzy Focal Elements and Nu- meric Mass Assignments Zadeh [325], notes that the lower and upper probabilities in (5.12) can be written as: 8 > < > : Prob (B) = bel(B) = P A2 m() m(A)I(A;B) Prob + (B) = pls(B) = P A2 m() m(A)O(A;B) (4.6) where: I :P P ! [0; 1] I(A;B) = 8 > < > : 1 AB 0 A6B (4.7) O :P P ! [0; 1] O(A;B) = 8 > < > : 1 A\B6=; 0 otherwise (4.8) I(A;B) is an indicator function of inclusion ofA inB, andO(A;B) is an indicator function of overlap (non-empty intersection) ofA andB. When the focal elements are fuzzy sets, i.e. when m(A) :F ! [0; 1], the concepts of indicators of inclusion and overlap can be relaxed to degrees of inclusion and overlap. 93 Zadeh suggests the following measures for inclusion and overlap: I :F F ! [0; 1] I(A;B) = inf !2 [max(1 A (!); B (!)] (4.9) O :F F ! [0; 1] O(A;B) = sup !2 [min( A (!); B (!)] (4.10) He believes that in a fuzzy setting, the concept of lower and upper probabilities are no longer valid, because it does not make sense to argue what the minimum probability and maximum probability of an event induced by the mappingm are; therefore, he prefers to call them Expected Certainty,EC, and Expected Possibility,E: Assume that one has a belief structure with focal el- ementsfA 1 ;A 2 ; ;A n gF , whose probability masses are determined bym :F ! [0; 1]. Then, for the hypothesisB2F , the Expected Certainty and Possibility are also calculated via (5.19) for lower and upper probabilities, replacing indicator functions of inclusion and non-empty overlap with degrees of inclusion and non-empty overlap. Yager presents the following general class of measures as candidates forI andO: I Y (A;B) =T !2 [S (C ( A (!)); B (!))] (4.11) O Y (A;B) =S !2 [T ( A (!); B (!))] (4.12) 94 whereT ,S , andC represent a t-norm, a t-conorm, and a c-norm, respectively. Yager [289] proves thatI =I Y andO =O Y satisfy the following desirable properties: (i). sup !2 A (!) = 1)I(A;B)O(A;B). (ii).I(A;B) = 1,AB. (iii).8A;B2P ;O(A;B) = 1,A\B6=;. (iv).8A;B2P ;O(A;B) = 0,A\B =;. (v).I(A;B) = 1O(A;B 0 ). Unfortunately, due to the presence of min and max in the definition ofI Y andS Y , they are conservative, i.e. they do not change in many cases whenA andB change, although they have desirable mathematical properties; therefore, some alternative compatibility measures have been proposed in the literature [102, 306, 309], by relaxing some of the mathematical properties ofI Y andS Y . 4.4 Belief Structures with Fuzzy Focal Elements and Fuzzy Proba- bility Mass Assignments The idea of fuzzifying the mass assignments was originally suggested by Zadeh in his paper on granularity of information, where he extends Dempster-Shafer theory to fuzzy sets [325]. Interval-valued belief structures [54,73–76,190,245,272,274,295] are special cases of such belief structures that were used for reasoning [53] and were employed in rule based systems [9, 10]. 95 When both focal elements and probability mass assignments are fuzzy sets, i.e. when m : F ! F [0; 1] , the Extension Principle can be used to generalize (5.19). Assume that m() induces the following belief structure : B =f(A 1 ;M 1 ); (A 2 ;M 2 );:::; (A n ;M n )g (4.13) where A i 2 F are the focal elements and M i 2 F [0; 1] are fuzzy probability mass assign- ments. Then, the lower and upper probabilities of a fuzzy hypothesisB2F become fuzzy sets themselves, so we can consider them as linguistic probabilities, and denote them by LProb and LProb + . They can be calculated as: 8 > > > > > > > > > < > > > > > > > > > : LProb (B) (z) = supz=p 1 x 1 +p 2 x 2 ++pnxn p 1 +p 2 ++pn =1 min ( M 1 (p 1 ); M 2 (p 2 ); ; Mn (p n )) LProb + (B) (z) = supz=p 1 y 1 +p 2 y 2 ++pnyn p 1 +p 2 ++pn =1 min ( M 1 (p 1 ); M 2 (p 2 ); ; Mn (p n )) (4.14) where x i = I(A;B) and y i = O(A;B). The above normalized sums [214] are interactive additions of type-1 fuzzy sets [62]. It was shown [214] that a necessary and sufficient condition for existence of solutions to the above optimization problems is: 82 [0; 1]; n X i=1 a 0 i () 1 n X i=1 b 0 i (); i = 1; 2;:::;n (4.15) 96 where a 0 i () and b 0 i () are the left endpoint and right endpoint of M i (), the -cut 4.3 of M i . Therefore, when the fuzzy probability mass assignments are postulated when building a Dempster- Shafer model, such a condition must be imposed, so as to make the fuzzy probability mass as- signments consistent and to have solutions for the optimization problems described in (5.25), i.e. so that lower and upper probabilities exist and can be calculated. Unfortunately, when type-1 fuzzy set models of the probability mass assignments are ex- tracted by collecting data from subjects [33,37,192], it is difficult to imagine that one can impose any conditions on the data collected from the subjects, so as to make the resulting membership functions satisfy the conditions of (5.26). Therefore, it was suggested in [214] to use Fuzzy Weighted Averages (FWAs) [59, 82, 150] instead of normalized sums, because FWAs involve an inherent normalization that removes the inconsistencies of the weights (fuzzy probability mass assignments), which guarantees that they always exist. Therefore, instead of (5.25), one can com- pute LProb (B) (z) and LProb + (B) (z), as: 8 > > > > > > > > > > < > > > > > > > > > > : LProb (B) (z) = sup z= P n i=1 p i x i P n i=1 p i min ( M 1 (p 1 ); M 2 (p 2 ); ; Mn (p n )) LProb + (B) (z) = sup z= P n i=1 p i y i P n i=1 p i min ( M 1 (p 1 ); M 2 (p 2 ); ; Mn (p n )) (4.16) 4.3 Here, we assume thatMi’s have the properties of fuzzy numbers, possibly except normality, i.e. they are convex, have bounded supports (by definition of the probability mass assignment, their support is [0; 1], and are upper semi- continuous, so that their-cuts are bounded intervals. 97 (5.27) can be summarized by the following expressive formulas: 8 > < > : LProb (B) = P n i=1 M i x i P n i=1 M i LProb + (B) = P n i=1 M i y i P n i=1 M i (4.17) 4.5 Linguistic Belief Structures As we discussed, a first order uncertainty model for a word is an interval type-2 fuzzy set; there- fore, it is viable to envision Linguistic Belief Structures in which focal elements, probability masses, and even the knowledge on compatibility measures are allowed to be words, modeled by interval type-2 fuzzy sets. Note that a Linguistic Belief Structure is a general belief structure which allows some of the focal elements, probability masses, and compatibility measures to be modeled by numbers, intervals, or type-1 fuzzy sets, since they are special cases of interval type-2 fuzzy sets. When both focal elements and probability mass assignments are interval type-2 fuzzy sets, i.e. whenm : f F ! f F [0; 1] , the Extension Principle can be used to generalize (5.19). Assume thatm() induces the following belief structure : e B =f( e A 1 ; f M 1 ); ( e A 2 ; f M 2 );:::; ( e A n ; f M n )g (4.18) where e A i 2 f F are the focal elements and f M i 2 f F [0; 1] are interval type-2 probability mass assignments. Then, the lower and upper probabilities of an interval type-2 fuzzy hypothesis e B2 f F become interval type-2 fuzzy sets themselves, so we can consider them as linguistic probabilities, and denote them by ^ LProb and ^ LProb + . They can be calculated as: 98 8 > > > > > > > > > < > > > > > > > > > : ^ LProb ( e B) (z) = supz=p 1 x 1 +p 2 x 2 ++pnxn p 1 +p 2 ++pn =1 min f M 1 (p 1 ); f M 2 (p 2 ); ; f Mn (p n ) ^ LProb ( e B) (z) = supz=p 1 x 1 +p 2 x 2 ++pnxn p 1 +p 2 ++pn =1 min f M 1 (p 1 ); f M 2 (p 2 ); ; f Mn (p n ) (4.19) 8 > > > > > > > > > < > > > > > > > > > : ^ LProb ( e B) (z) = supz=p 1 y 1 +p 2 y 2 ++pnyn p 1 +p 2 ++pn =1 min f M 1 (p 1 ); f M 2 (p 2 ); ; f Mn (p n ) ^ LProb + ( e B) (z) = supz=p 1 y 1 +p 2 y 2 ++pnyn p 1 +p 2 ++pn =1 min f M 1 (p 1 ); f M 2 (p 2 ); ; f Mn (p n ) (4.20) wherex i andy i are measures of inclusion and overlap between e A i and e B i . Assume thatI and O are inclusion and overlap measures for T1 FSs. Then, one way to derive inclusion and overlap measures for IT2 FSs is: 8 > < > : I( e A; e B) = 1 2 (I(A;B) +I(A;B)) O( e A; e B) = 1 2 (O(A;B) +O(A;B)) (4.21) 99 It was shown [214] that a necessary and sufficient condition for existence of solutions to the above optimization problems is: 8 > < > : 82 [0; 1]; P n i=1 a 0 i () 1 P n i=1 b 0 i (); i = 1; 2;:::;n 82 [0; h min ]; P n i=1 a 0 i () 1 P n i=1 b 0 i (); i = 1; 2;:::;n (4.22) wherea 0 i () andb 0 i () are the left endpoint and right endpoint ofM i (), the-cut ofM i ;a 0 i () and b 0 i () are the left endpoint and right endpoint of M i (), the -cut of M i ; and, h min is the minimum of the heights of M i ’s. Therefore, when interval type-2 fuzzy probability mass assignments are postulated when building a Dempster-Shafer model, such a condition must be imposed, so as to make the interval type-2 fuzzy probability mass assignments consistent and to have solutions for the optimization problems described in (4.19) and (4.20), i.e. so that lower and upper probabilities exist and can be calculated. When interval type-2 fuzzy set models of the probability mass assignments are extracted by collecting data from subjects [151, 286], it is very difficult to imagine that one can impose any conditions on the data collected from the subjects, so as to make the resulting membership functions satisfy the two sets of conditions in (4.22); therefore, it was suggested in [214] to use Linguistic Weighted Averages (LWAs) [277] instead of normalized sums, because LWAs involve an inherent normalization that removes the inconsistencies of the weights (interval type-2 fuzzy probability mass assignments) which guarantees that they always exist. Therefore, instead of (4.19) and (4.20), one can compute ^ LProb ( e B) and ^ LProb + ( e B), as: 100 8 > > > > > > > > > > > < > > > > > > > > > > > : ^ LProb ( e B) (z) = sup z= P n i=1 p i x i P n i=1 p i min f M 1 (p 1 ); f M 2 (p 2 ); ; f Mn (p n ) ^ LProb ( e B) (z) = sup z= P n i=1 p i x i P n i=1 p i min f M 1 (p 1 ); f M 2 (p 2 ); ; f Mn (p n ) (4.23) 8 > > > > > > > > > > > < > > > > > > > > > > > : ^ LProb + ( e B) (z) = sup z= P n i=1 p i y i P n i=1 p i min f M 1 (p 1 ); f M 2 (p 2 ); ; f Mn (p n ) ^ LProb + ( e B) (z) = sup z= P n i=1 p i y i P n i=1 p i min f M 1 (p 1 ); f M 2 (p 2 ); ; f Mn (p n ) (4.24) (5.34) and (5.35) can be summarized by the following expressive formulas: 8 > > < > > : ^ LProb ( e B) = P n i=1 f M i x i P n i=1 f M i ^ LProb + ( e B) = P n i=1 f M i y i P n i=1 f M i (4.25) 4.6 Combination of Evidence and Operations on Belief Structures Combination of evidence is an important problem in evidence theory, and has received noticeable attention [23, 65, 71, 182, 240, 241]. Evidence combination has been used in many applications for which information fusion is needed [18, 56, 96, 224], and in deision making [52]. The idea of combination of evidence stems from Dempster’s Rule of Combination [49], which suggests how to combine the knowledge represented by two belief structure into a new belief structure. 101 Nevertheless, it was shown by Zadeh [323] that Dempster’s rule of combination yields counter- intuitive results. This argument has caused a lot of controversy in the literature of evidential reasoning [83, 84, 145, 243] and led to the introduction of new rules of combination [29, 63, 112, 292, 293]. 4.6.1 Dempster’s Rule of Combination and Operations on Belief Structures Yager [291] explains that Dempster’s Rule of Combination is simply performing the union oper- ation on two belief structures, which therefore provides a methodology to perform operations on belief structures. Assume thatG :P 1 P 2 !P 3 is a set operator. Also assume thatB 1 andB 2 are belief structures on 1 and 2 , whose probability mass assignments are determined bym i :P i ! [0; 1]; i = 1; 2 . The set operatorG() can be extended 4.4 to operate onB 1 and B 2 to yieldB 3 =G(B 1 ;B 2 ) with the probability mass assignmentm 3 :P 3 ! [0; 1], which is calculated as: m 3 (C) = X C=G(A;B) m 1 (A)m 2 (B) (4.26) Yager callsG a non-null forming operator ifA;B6=;)G(A;B)6=;. IfG is non-null forming, for any m 1 and m 2 , m 3 (;) = 0; Examples of nun-null forming operators are union, Cartesian product, and summation of two sets. An example of null-forming (or degenerate) set operator is the intersection operation. If (4.26) is used to take the intersection of two belief structures, i.e. whenG(A;B) =A\B, there might 4.4 This argument can be extended to functions of one belief structure (e.g. exponentiation, inversion) or more than two belief structures very easily. We adhere to Yager’s treatment of this problem which focuses on binary operators. 102 be someA’s andB’s that have empty intersections, yieldingm 3 (;) > 0, which is undesirable and may sound counter-intuitive. 4.5 Dempster suggests that the probability mass that falls into the empty set is proportionally divided among other focal elements by a normalization factor, as: m 3 (C) = P C=A\B A\B6=; m 1 (A)m 2 (B) 1 P A\B=; m 1 (A)m 2 (B) (4.27) Note that\ can be substituted with any degenerate operatorG(A;B). As we mentioned, it was demonstrated by Zadeh [323] that (4.27) yields counter-intuitive results. Therefore, Yager suggests that the probability mass that falls into empty set is assigned to the frame of discernment (referential set) , therefore Yager’s combined belief structure has the following probability mass assignment: m 3 (C) = 8 > < > : P C=G(A;B) m 1 (A)m 2 (B) C6=;; P G(A;B)=;; m 1 (A)m 2 (B) C = (4.28) Arithmetic operators of summationB 1 B 2 , subtractionB 1 B 2 , multiplicationB 1 B 2 , and divisionB 1 B 1 2 between belief structures can easily be defined using (4.26). 4.6 The above argument can be extended to belief structures with fuzzy focal elements and lin- guistic belief structures. Since linguistic belief structures are quite general and have fuzzy focal 4.5 If belief structures are analogous to probability mass functions, they must intuitively assign a zero probability mass to the empty set . 4.6 Note that any point operatorg : UV ! W can be extended to a set operatorG :PUPV !PW using G(A;B) =fw =g(u;v)ju2A; v2Bg. 103 elements and fuzzy probability masses as their special case, we focus on arithmetic operations on linguistic belief structures. Assume that one has a belief structure e B 1 that has focal elementsf e A 1 ; e A 2 ; ; e A n g f F U , whose probability mass assignments are determined by the function m 1 :f e A 1 ; e A 2 ;:::; e A n g f F 1 ! f F [0; 1] . Assume also that one has a belief structure e B 2 that has focal elementsf e B 1 ; e B 2 ; :::; e B p g, whose probability mass assignments are determined by the functionm 2 :f e B 1 ; e B 2 ;:::; e B p g f F 2 ! f F [0; 1] . Note that f F U represents the set of all IT2 FSs over the universe of discourseU. The belief structure e B 3 =G( e B 1 ; e B 2 ), 4.7 has focal elementsf e C 1 ; e C 2 ;:::; e C r g and its probability mass assignments are determined bym 3 :f e C 1 ; e C 2 ; ; e C r g f F 3 ! f F [0; 1] . Let’s denotem 1 ( e A i ) = f M 1i ; m 2 ( e B j ) = e N 2j ; m 3 ( e C k ) = e P 3k . Then, generalization of (4.28) to IT2 FSs can be done by generalizing the following function to IT2 probability mass assignments f M 1i ; i = 1;:::;n and e N 2j ; j = 1;:::;p, by the Extension Principle: 8 > < > : ' K (x 1 ;:::;x n ;y 1 ;:::;y p ) = P (i;j)2K x i y j K =f(i;j)jG( e A i ; e B j ) = e C k g (4.29) where e C k = G( e A i ; e B j ) is the arithmetic sum of the IT2 FSs e A i and e B j , and is determined as [90, 91]: 8 > < > : e C k (z) = sup z=G(x;y) min e A i (x); e B j (y) e C k (z) = sup z=G(x;y) min e A i (x); e B j (y) (4.30) 4.7 Note that we still assume that all of the belief structures have a finite number of focal elements, therefore m 1 = f e A1; e A2; ; e Ang f F 1 and m 2 = f e B1; e B2; ; e Bmg f F 2 , and consequently, m 3 = f e C1; e C2; ; e C k g f F 3 . 104 Therefore: 8 > < > : e P 3k =' K ( f M 11 ;:::; f M 1n ; e N 21 ;:::; e N 2p ) K =f(i;j)jG( e A i ; e B j ) = e C k g (4.31) It is a well-known fact that the numeric mass assignments in a belief structure sum up to 1, therefore, extension of the above function to IT2 FSs must be performed subject to the constraint determined by the following relation: D =f(x 1 ;:::;x n ;y 1 ;:::;y p )j n X i=1 x i = 1; p X j=1 y j = 1g (4.32) Consequently: e P 3k = sup ' K (x 1 ;:::;xn;y 1 ;:::;yp )= P (i;j)2K x i y j K=f(i;j)jG( e A i ; e B j )= e C k g P n i=1 x i =1; P p j=1 y j =1 min f M 11 (x 1 ); ; f M 1n (x n ); e N 21 (y 1 ); ; e N 2p (y p ) (4.33) e P 3k = sup ' K (x 1 ;:::;xn;y 1 ;:::;yp )= P (i;j)2K x i y j K=f(i;j)jG( e A i ; e B j )= e C k g P n i=1 x i =1; P p j=1 y j =1 min f M 11 (x 1 ); ; f M 1n (x n ); e N 21 (y 1 ); ; e N 2p (y p ) (4.34) It was shown [214] that constraints of the same form asD may make the optimization problem as- sociated with the Extension Principle have no solution, especially when models of the probability words are synthesized using collecting data about words from subjects. To avoid this, one can use 105 the Doubly Normalized Linguistic Weighted Average (DNLWA) operator, which is the extension of the following function to IT2 FSs : K (x 1 ;:::;x n ;y 1 ;:::;y p ) = P (r;s)2K x r y s P n j=1 x j P n j=1 y j (4.35) Therefore: 8 > < > : e P 3k = K ( f M 11 ;:::; f M 1n ; e N 21 ;:::; e N 2p ) K =f(i;j)jG( e A i ; e B j ) = e C k g (4.36) Consequently: e P 3k = sup K (x 1 ;:::;xn;y 1 ;:::;yp )= P (i;j)2K x i y j K=f(i;j)jG( e A i ; e B j )= e C k g min f M 11 (x 1 ); ; f M 1n (x n ); e N 21 (y 1 ); ; e N 2p (y p ) (4.37) e P 3k = sup K (x 1 ;:::;xn;y 1 ;:::;yp )= P (i;j)2K x i y j K=f(i;j)jG( e A i ; e B j )= e C k g min f M 11 (x 1 ); ; f M 1n (x n ); e N 21 (y 1 ); ; e N 2p (y p ) (4.38) The DNLWA of (4.36) can be summarized by the following expressive formula: e P 3k e Y DNLWA = P (r;s)2K f M 1r e N 2s P n j=1 f M 1j P p j=1 e N 2j (4.39) 106 4.6.2 Expected Value of A Linguistic Belief Structure Assume that e B is a linguistic belief structure e B =f( e A 1 ; f M 1 )::: ( e A n ; f M n )g whose probability masses are determined bym : f F ! f F [0; 1] . We define the expected value of e B,Ef e Bg as: Ef e Bg = P n i=1 f M i e A i P n i=1 f M i (4.40) The concept of an expected value is a very useful concept in the realm of random variables, and provides the average of the random variable. It plays the same role for linguistic belief structures, i.e. it yields a linguistic average of the linguistic belief structure. 4.7 Reasoning with Linguistic Belief Structures: Two Examples In this section, we present two examples for reasoning with belief structures, and show how probabilities of hypotheses can be inferred from subjective knowledge that is provided in natural language. Example 4.1. Assume that the quality of service in a restaurant is described by a regular customer as: “It’s probable one can get somewhat good service; it’s a tossup that the quality of service is bad, and it’s very improbable that the service will be very bad. ” What can be said about the quality of service in this restaurant? What is the probability that the service is good? 107 First, we establish a vocabulary of words and their IT2 FS models for the linguistic variable probability. The latter [213] are obtained by collecting data from subjects on the Amazon Me- chanical Turk website [1] 4.8 and applying the Enhanced Interval Approach [286] to that data. The words are:fExtremely improbable, Very improbable, Improbable, Somewhat improbable, Tossup, Somewhat probable, Probable, Very probable, Extremely probableg. Their FOUs are depicted in Fig. 6.3. Their parameters are given in Table 6.3. 0 0.5 1 0 0.5 1 Extremely Improbable 0 0.5 1 0 0.5 1 Very Improbable 0 0.5 1 0 0.5 1 Improbable 0 0.5 1 0 0.5 1 Somewhat Improbable 0 0.5 1 0 0.5 1 Tossup 0 0.5 1 0 0.5 1 Somewhat Probable 0 0.5 1 0 0.5 1 Probable 0 0.5 1 0 0.5 1 Very Probable 0 0.5 1 0 0.5 1 Extremely Probable Figure 4.1: V ocabulary of IT2 FSs representing linguistic probabilities. We also use the vocabulary in [173, Table 7.14.] for the words describing quality of service. Its words are Very bad, Bad, Somewhat bad, Fair, Somewhat good, Good, Very good; their FOUs are shown in Fig. 4.2. Their parameters are given in Table 4.2. The information in the problem statement yields the following linguistic belief structure for quality of service in the restaurant: e B = 8 > < > : (SomewhatGood;Probable); (Bad; Tossup); (VeryBad;VeryImprobable) 9 > = > ; (4.41) 4.8 The native language of the subjects was English. 108 Table 4.1: Membership function parameters of the probability words depicted in Fig. 6.3 UMF parameters LMF parameters Extremely improbable (0, 0 , 0.0183 , 0.1316) (0, 0, 0.0046, 0.0627, 1.0000) Very improbable (0.0293, 0.1000, 0.1250, 0.1707) (0.0896, 0.1167, 0.1167, 0.1604, 0.7643) Improbable (0.0586, 0.1750, 0.2500, 0.3414) (0.1896, 0.2200, 0.2200, 0.2604, 0.5757) Somewhat improbable (0.0982, 0.2500, 0.3000, 0.4518) (0.2293, 0.2750, 0.2750, 0.3207, 0.6464) Tossup (0.3586 , 0.5000 , 0.5500 , 0.6414 ) (0.4896, 0.5083, 0.5083, 0.5141, 0.4107) Somewhat probable (0.4793, 0.5500, 0.6000, 0.6707) (0.5293, 0.5750, 0.5750, 0.6207,0.6464) Probable (0.5086, 0.6500, 0.7000, 0.7914) (0.6293, 0.6750, 0.6750, 0.7207, 0.6464) Very probable (0.7189, 0.8250, 0.9000, 0.9811) (0.8293, 0.8700, 0.8700, 0.9207, 0.5757 ) Extremely probable (0.8684, 0.9772, 1.0000, 1.0000) (0.9405, 0.9954, 1.0000, 1.0000, 1.0000 ) 0 5 10 0 0.5 1 Very Bad 0 5 10 0 0.5 1 Bad 0 5 10 0 0.5 1 Somewhat Bad 0 5 10 0 0.5 1 Fair 0 5 10 0 0.5 1 Somewhat Good 0 5 10 0 0.5 1 Good 0 5 10 0 0.5 1 Very Good Figure 4.2: V ocabulary of IT2 FSs representing quality of service. 109 The expected value of this linguistic belief structure e Y AQoS can be calculated according to (4.40) to obtain the summarized information that can be given about the quality of service in the restaurant. It is shown in Fig. 4.3. To present the result in a way that is more understandable to humans, we calculated its Jac- card’s similarity with the members of the vocabulary of words describing quality of service. They are given in Table 4.3. Note that the Jaccard’s similarity measures J ( e A; e B) [46,173] between IT2 FSs e A, e B overU is computed as: s J ( e A; e B) = R U (min( e A (u); e B (u)) + min( e A (u); e B (u)))du R U (max( e A (u); e B (u)) + max( e A (u); e B (u)))du (4.42) Table 4.2: Jaccard Similarities between e Y AQoS and members of the vocabulary of linguistic qual- ity of service Quality of Service Upper Membership Function Lower Membership Function Very Bad (0,0,0.59,3.95) (0,0,0.09,1.32,1) Bad (0.28,2,3,5.22) (1.79,2.37,2.37,2.71,0.48) Somewhat Bad (0.98,2.75,4,5.41) (2.79,3.3,3.3,3.71,0.42) Fair (2.38,4.5,6,8.18) (4.79,5.12,5.12,5.35,0.27) Somewhat Good (4.02,5.65,7,8.41) (5.89,6.34,6.34,6.81,0.4) Good (4.38,6.5,7.75,9.62) (6.79,7.25,7.25,7.91,0.47) Very Good (5.21,8.27,10,10) (7.66,9.82,10,10,1) 110 0 1 2 3 4 5 6 7 8 9 10 0 0.2 0.4 0.6 0.8 1 e Y AQoS Figure 4.3: Average quality of service in the restaurant. The average of the belief structure had the largest similarity with the word Good, therefore it can be concluded that: “On average, the quality of service in the restaurant is good.” Table 4.3: Jaccard Similarities between e Y AQoS and members of the vocabulary of linguistic qual- ity of service Quality of Service ( e Q i ) s J ( e Q i ; e Y AQoS ) Very Bad 0.00039379 Bad 0.032774 Somewhat Bad 0.053137 Fair 0.38084 Somewhat Good 0.67899 Good 0.75052 Very Good 0.25816 111 Next, we calculate the lower and upper probabilities that the quality of service is good. Since Yager’s compatibility measures are very conservative, we use the extension of compatibility mea- sures proposed by Yen [309] to IT2 FSs. His compatibility measures for T1 FSs are: 8 > < > : I Yen (A;B) = P i [ i i1 ] inf !2A i B (!) O Yen (A;B) = P i [ i i1 ] sup !2A i B (!) We extend them to IT2FSs as: 8 > < > : I Yen ( e A; e B) = 1 2 I Yen (A;B) +I Yen (A;B) O Yen ( e A; e B) = 1 2 O Yen (A;B) +O Yen (A;B) The lower and upper probabilities of the hypothesis that the quality of service in the restaurant is good are calculated according to (4.23) and (4.24), and they are shown in Fig. 4.4. We mapped each of them into the probability words in Fig. 6.3 that has the largest similarity with them. The similarities are given in Table 4.4. The lower probability mapped into Extremely Improbable, and the upper probability mapped into Tossup. Therefore: “The probability that the quality of service is good is between extremely improbable and tossup.” It is well known that the centroid of an IT2 FS can be used to quantify its uncertainty (e.g., [173]). The centroid can therefore be used to quantify the uncertainty about the numeric lower and upper probabilities, which are the solutions to the above problem. The term around can be used to express the uncertainty represented by the centroids; it expresses the inter-person and intra-person uncertainties about the words propagated by the LWA. This cannot be done by any solution involving T1 FSs, since T1 FSs do not reflect the uncertainty about the membership val- ues. The centroid of the lower probability (calculated using Yen’s measures) is [0:055; 0:067] and 112 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 g LProb − (Good) g LProb + (Good) Figure 4.4: Lower and upper probabilities that the quality of service is good in the restaurant calculated using Yen’s measures. Table 4.4: Jaccard Similarities between ^ LProb (Good) and ^ LProb + (Good) and members of the vocabulary of linguistic probabilities e P i Probability Word ( e P i ) s J ( ^ LProb (Good); e P i ) s J ( ^ LProb + (Good); e P i ) Extremely Improbable 0.13761 0 Very Improbable 0.098437 0 Improbable 0.0067466 0 Somewhat Improbable 0 0.0090673 Tossup 0 0.19203 Somewhat Probable 0 0.0050374 Probable 0 0 Very Probable 0 0 Extremely Probable 0 0 113 its average centroid is 0:061. The centroid of the corresponding upper probability is [0:45; 0:47], and its average centroid is 0:46. Consequently, one can also summarize the solution using Yen’s measures, as: “The probability that the quality of service is good is from around 6:1% to around 46%;” Example 4.2. It is probable that Cyrus finds a job with a good salary; it is somewhat improbable that he finds a job with very good salary; and, it’s a tossup that he gets a job with a fair salary. Usually, each year, he spends around 15% of his salary for travel; sometimes he spends around 25%; and, rarely does he spend from 30% to 40%. Assume that the maximum possible salary for Cyrus is$100,000 per year. On average, how much money will be spent by him for travel? How often will he be able to spend around$10,000 for travel in a year? The problem description implies the following belief structure for Cyrus’s predicted salary: e B Salary = 8 > < > : (Good;Probable); (Fair;Tossup); (VeryGood;SomewhatImprobable) 9 > = > ; (4.43) It also implies the following belief structure on the percentage from his salary that is spent for travel: e B Percentage = 8 > < > : (Around 15%;Usually); ([30%; 40%];Rarely) (Around 25%;Sometimes) 9 > = > ; (4.44) 114 This example shows how combination of belief structures can be used for reasoning. To derive a belief structure on the amount of money that is spent on travel e B Travel , we need to combine the above belief structures using multiplication. Therefore, according to (4.39): e B Travel = e B Salary e B Percentage = 8 > > > > > > > > > > > > > < > > > > > > > > > > > > > : e 15% e G; e U e P ( e U+ e R+ e S)( e P + e T + f SI) ; e 15% e F; e U e T ( e U+ e R+ e S)( e P + e T + f SI) ; e 15% g VG; e U f SI ( e U+ e R+ e S)( e P + e T + f SI) ; J e G; e R e P ( e U+ e R+ e S)( e P + e T + f SI) J g VG; f SI e R ( e U+ e R+ e S)( e P + e T + f SI) ; J e F; e T e R ( e U+ e R+ e S)( e P + e T + f SI) ; e 25% e G; e S e P ( e U+ e R+ e S)( e P + e T + f SI) ; e 25% e F; e S e T ( e U+ e R+ e S)( e P + e T + f SI) ; e 25% g VG; e S f SI ( e U+ e R+ e S)( e P + e T + f SI) 9 > > > > > > > > > > > > > = > > > > > > > > > > > > > ; (4.45) whereGood; VeryGood; Fair are shown by e G; g VG; e F , andProbable; SomewhatImprobable; Tossup are shown by e P; f SI; e T , and Usually; Sometimes; Rarely are shown by e U; e S; e R. Moreover,J = [30%; 40%] andAbout 15%; About 25% are denoted as e 15% and e 25%. Note that e L = f M e N is the multiplication of IT2 FSs. e L is determined by its lower and upper MF, which are T1 FSs: 8 > < > : e L (z) = sup z=xy min( f M (x); e N (y)) e L (z) = sup z=xy min( f M (x); e N (y)) (4.46) The models for the usuality words Rarely, Sometimes, Usually are shown in Fig. 4.5 and their parameters are shown in Table 4.5. The IT2 FS models for Around 15 and Around 25 are shown in Fig. 4.6. 115 0 0.5 1 0 0.2 0.4 0.6 0.8 1 Rarely 0 0.5 1 0 0.2 0.4 0.6 0.8 1 Sometimes 0 0.5 1 0 0.2 0.4 0.6 0.8 1 Usually Figure 4.5: IT2 FS models of usuality words. Table 4.5: FOU parameters of usuality words Usuality Upper Membership Function Lower Membership Function Rarely (0.0586, 0.1750, 0.2500, 0.3414) (0.1896, 0.2200, 0.2200, 0.2604, 0.5757 ) Sometimes (0.0379, 0.1750, 0.2750, 0.4621) (0.1917, 0.2250, 0.2250, 0.2483, 0.5286) Usually (0.5086, 0.6500, 0.7000, 0.7914 ) ( 0.6293, 0.6750, 0.6750, 0.7207, 0.6464) 116 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 Around 15 Around 25 Figure 4.6: IT2 FS models of usuality words. We calculated the focal elements of e B Travel using the -cut decomposition theorem and fuzzy multiplication. The results are shown in Fig. 4.7. For calculation of the probability mass assignments of e B Travle , we used-cut decomposition [90, 91], which translates the problem of calculating DNLWA into the following statement: for each, the-cut of the LMF (Y i () = [y L ();y R ()]) and the-cut of the UMF (Y i () = [y L ();y R ()]) of each DNLWA, e Y i is: 8 > > > > > > > > > > > < > > > > > > > > > > > : y L () = min x 1 2U();x 2 2R();x 3 2S() y 1 2P();y 2 2T();y 3 2SI() K i (x 1 ;x 2 ;x 3 ;y 1 ;y 2 ;y 3 ) y R () = max x 1 2U();x 2 2R();x 3 2S() y 1 2P();y 2 2T();y 3 2SI() K i (x 1 ;x 2 ;x 3 ;y 1 ;y 2 ;y 3 ) y L () = min x 1 2U();x 2 2R();x 3 2S() y 1 2P();y 2 2T();y 3 2SI() K i (x 1 ;x 2 ;x 3 ;y 1 ;y 2 ;y 3 ) y R () = max x 1 2U();x 2 2R();x 3 2S() y 1 2P();y 2 2T();y 3 2SI() K i (x 1 ;x 2 ;x 3 ;y 1 ;y 2 ;y 3 ) (4.47) 117 Note that K i is calculated form (4.35). We solved the optimization problems in (4.47) the non- linear optimization algorithms [28], 4.9 and the FOUs of the probability masses are depicted in Fig. 4.8. The words in the numerator of the expressive formula for each DNLWA are given in the title of each subfigure. 0 5 10 0 0.5 1 30% to 40%⊗ Very Bad 0 5 10 0 0.5 1 30% to 40%⊗ Bad 0 5 10 0 0.5 1 30% to 40%⊗ Somewhat Bad 0 5 10 0 0.5 1 Around 25%⊗ Very Bad 0 5 10 0 0.5 1 Around 25%⊗ Bad 0 5 10 0 0.5 1 Around 25%⊗ Somewhat Bad 0 5 10 0 0.5 1 Around 15%⊗ Very Bad 0 5 10 0 0.5 1 Around 15%⊗ Bad 0 5 10 0 0.5 1 Around 15%⊗ Somewhat Bad Figure 4.7: Focal elements of the belief structure e B Travel calculated by DNLWA. The IT2 FS model of the hypothesis “spending around $10,000 for travel” ( g 10k) is depicted in Fig. 4.9. Note that the probability mass assignments of e B Travel are highly inconsistent, and a linguistic normalized sum of the form (4.19) and (4.20) involving them does not exist. Therefore, we calculate the lower and upper probabilities using Yen’s measures and LWAs. The lower and upper probabilities of g 10k are depicted in Fig. 4.10. We calculated the Jaccard similarities of the lower and upper probabilities with the members of the vocabulary of linguistic probabilities, and 4.9 We used the interior point algorithm. 118 0 0.5 1 0 0.5 1 Somewhat Improbable, Rarely 0 0.5 1 0 0.5 1 Somewhat Improbable, Sometimes 0 0.5 1 0 0.5 1 Somewhat Improbable, Usually 0 0.5 1 0 0.5 1 Tossup, Rarely 0 0.5 1 0 0.5 1 Tossup, Sometimes 0 0.5 1 0 0.5 1 Tossup, Usually 0 0.5 1 0 0.5 1 Probable, Rarely 0 0.5 1 0 0.5 1 Probable, Sometimes 0 0.5 1 0 0.5 1 Probable, Usually Figure 4.8: Probability mass assignments of the belief structure e B Travel . The words in the numerator of the expressive formula for the DNLWA are given in the title of each subfigure. 119 the results are given in Table 4.6. Observe that the lower probability has the highest similarity with the word Extremely Improbable and the upper probability has the highest similarity with Tossup. Therefore: “The probability that Cyrus spends around $10,000 for travel is between extremely improbable and tossup.” We also calculated the centroids and average centroids of the lower and upper probabilities of g 10k, and they are [0:0090; 0:0264] , 0:0177 and [0:2850; 0:5567], 0:4209, respectively. Consequently, one can state that: “The probability that Cyrus spends around $10,000 for travel is from around 1:77% to around 42:09%.” 0 1 2 3 4 5 6 7 8 9 10 0 0.2 0.4 0.6 0.8 1 g 10k × 10k Figure 4.9: The IT2 FS modeling the hypothesis of spending around $10,000. 4.8 Conclusions and Future Work In this chapter, we first reviewed the foundations of evidence theory starting from the concept of a random set and the generalization of evidence theory to fuzzy focal elements and fuzzy 120 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 g LProb − ³ f 10 ´ g LProb + ³ f 10 ´ Figure 4.10: The lower and upper probabilities of the hypothesis of spending around $10,000, computed using Yen’s measures. Table 4.6: Jaccard Similarities between ^ LProb ( g 10k) and ^ LProb + ( g 10k) and members of the vocabulary of linguistic probabilities e P i Probability Word ( e P i ) s J ( ^ LProb ( g 10k); e P i ) s J ( ^ LProb + ( g 10k); e P i ) Extremely Improbable 0.6061 0 Very Improbable 0.0268 0.0091 Improbable 0 0.1592 Somewhat Improbable 0 0.3190 Tossup 0 0.3657 Somewhat Probable 0 0.2058 Probable 0 0.1353 Very Probable 0 0.0008 Extremely Probable 0 0 121 probability masses. Then, we extended the theory to belief structures whose focal elements and probability mass assignments are words modeled by interval type-2 fuzzy sets, and demonstrated how lower and upper probabilities can be calculated for a hypothesis from those belief structures . We showed that LWAs can be used to derive such probabilities, since normalized sums may fail to exist when words are modeled using data that are collected from subjects. Then, we showed how an operation can be performed on two belief structures to obtain a belief structure which carries knowledge about a variable of interest. Finally, we presented examples to show how lower and upper probabilities for a hypothesis are calculated using LWAs, and how belief structures can be combined. Our results need to be extended to applications involving the fusion of conflicting informa- tion [55, 65, 75, 76] using belief structures with fuzzy probability mass assignments and to the calculation of their expected values. Moreover, applications of such Linguistic Belief Structures in rule-based systems that use evidential reasoning for inference [131, 132, 153, 239, 305] has to be investigated. 122 Chapter 5 Extension of Set Functions to Interval Type-2 Fuzzy Sets: Applications to Evidential Reasoning with Linguistic Belief Structures Truth is so hard to tell, it sometimes needs fiction to make it plausible. Francis Bacon A method for extending set functions to Interval Type-2 Fuzzy Sets is proposed in this chap- ter. We start from the Extension Principle for extending set functions to Type-1 Fuzzy Sets. Then we extend set functions to embedded Type-1 Fuzzy Sets of Interval Type-2 Fuzzy Sets. Consequently, we construct the Interval Type-2 Fuzzy Set result of the extension process (which resides in the co-domain of the extended set function) from its embedded Type-1 Fuzzy Sets. We show that the Extension Principle for set functions can be used to infer a single fuzzy probability measure from a linguistic belief structure, instead of lower and upper probabilities, and demonstrate how such an inference can be used to solve Advanced Computing with Words problems. 123 5.1 Introduction The Extension Principle [316,318] is the main tool to extend mathematical theories that deal with non-fuzzy sets to fuzzy sets. The original formulation of the extension principle is as follows: Assume thatf :U 1 U 2 U n !V is a function (U i ;V R). Its extension to Type-1 Fuzzy Sets (T1 FSs)f :F U 1 F U 2 F Un !F V is derived as: B =f(A 1 ;A 2 ;:::;A n ) (5.1) B (v) = 8 > > > > > < > > > > > : sup vjv=f(u) min 0 B @ A 1 (u 1 ); A 2 (u 2 ) ;:::; An (u n ) 1 C A 9v =f(u) 0 69v =f(u) (5.2) wheref(u) = f(u 1 ;u 2 ;:::;u n ). The above Extension Principle was formulated for real func- tions. Set functions (e.g. number-valued set functions, set-valued set-functions and measures [13]) are of great interest in mathematics (for instance, a probability measure is a number-valued set function. Also, the pre-image and the forward image of a real function are set-valued set functions). They are also important concepts to analyze the Extension Principle in (5.2) mathe- matically [11] . In the sequel, we study the Extension Principles for extending set functions to Type-1 Fuzzy Sets (T1 FSs) and try to generalize them to Interval Type-2 Fuzzy Sets (IT2 FSs). We also show 124 how this theory can be used to infer an IT2 fuzzy belief measure from Linguistic Belief Struc- tures, and how such a belief measure can be applied to solving Advanced Computing with Words (ACWW) problems. 5.2 The Extension Principle for Set-Valued Set Functions The following Extension Principle was formulated for functions that map sets into sets by Zadeh [318] and was further studied in [11]: Proposition 5.1. Assume thatG :P U 1 P U 2 P Un !P V , whereP U i denotes the family of subsets ofU i . The domain ofG can be extended toG :F U 1 F U 2 F Un !F V , using the following formula: 8 > < > : B =G(A 1 ;A 2 ;:::;A n ) B() =G (A 1 ();A 2 ();:::;A n ()) (5.3) where A() is the -cut of A. The fuzzy set B can be constructed from its -cuts by -cut decomposition theorem (Theorem 5.1) ifG is cutworthy [11, 127]: Definition 5.1 (Cutworthy Set-Valued Function [11, 276]). Assume that G :P U 1 P U 2 P Un !P V .G is called cutworthy, if it preserves the subsethood order, i.e.: 8i2f1; 2;:::;ng; C i D i )G(C 1 ;C 2 ;:::;C n )G(D 1 ;D 2 ;:::;D n ) (5.4) Theorem 5.1. B =[ 2(0; 1] B(), where B() (x) = B() (x). In other words, B (x) = sup 2(0; 1] B() (x). 125 Proposition 5.2. Assume thatg :R n !R, andG :P R n!P R is the extension ofg to sets, 5.1 defined as: G(C 1 ;C 2 ;:::;C n ) = fg(x 1 ;x 2 ;:::;x n )jx i 2C i ;8i2f1; 2;:::;ngg (5.5) ThenG is cutworthy. Proof. By the definition ofG,y2G(C 1 ;C 2 ;:::;C n ))9x 1 2C 1 ;x 2 2C 2 ;:::;x n 2C n ;y = g(x 1 ;x 2 ;:::;x n ). But if8i 2 f1; 2;:::;ng;C i D i , then x i 2 D i , and by the definition of G(D 1 ;D 2 ;:::;D n ), g(x 1 ;x 2 ;:::;x n )2 G(D 1 ;D 2 ;:::;D n ), so y2 G(D 1 ;D 2 ;:::;D n ); hence,G(C 1 ;C 2 ;:::;C n )G(D 1 ;D 2 ;:::;D n ). The cutworthiness ofG guarantees thatB = G(A 1 ;A 2 ;:::;A n ) is a fuzzy set. It is a well- known fact that the-cuts of a fuzzy setA are nested [189], i.e.A2F U , ()A() A()). The cutworthiness of G then implies that ) G(A 1 ();A 2 ();:::;A n ()) G(A 1 ();A 2 ();:::;A n ())) B() B()) B2F U . IfG is not cutworthy, the-cuts G(A 1 ();A 2 ();:::;A n ()) are not nested, which means that they do not represent a legitimate fuzzy set. Example 5.1 (See [276]). The set operation of difference AnB = (A\B 0 ) is not cutworthy, therefore, it cannot be extended to T1 FSs using (5.3). Next, assume thatG :P U 1 P U 2 P Un has to be extended to IT2 FSs, using its extension to T1 FSs. According to Mendel [169], the procedure of calculating B = G(A 1 ;A 2 ;:::;A n ) must first be applied to all embedded T1 FSs of e A i . 5.1 It is often called the forward image ofg. 126 Definition 5.2. An embedded type-1 fuzzy setA e of an IT2 FS e A overU is a T1 FS overU for which8t2U; e A (t) Ae (t) e A (t). Mendel suggests that for solving any problem that involves IT2 FSs, that problem has to be first solved for embedded T1 FSs of the IT2 FSs that are involved in that problem, and then the membership value of the IT2 FS solution to that problem at each point is just the union of the membership values of all embedded T1 FS solutions at that point. Therefore, first we apply G :P U 1 P U 2 P Un to the-cuts of each of the embedded T1 FSs of e A i ’s, to obtain the cut of one of the embedded T1 FSs of e B, i.e.B e (): 8 > < > : B e () =G(A 1;e ());A 2;e ());:::;A n;e ())) 8A i;e ; e A i (x) A i;e (x) e A i (x) (5.6) Then the embedded T1 FS B e can be constructed from its -cuts, and consequently, e B can be constructed from its embedded T1 FSs, i.e.B e ’s: 8 > > > < > > > : Be (x) = sup Be() (x) e B (x) = S e A i (x) A i;e (x) e A i (x) f Be (x)g (5.7) where Be() (x) = 8 > < > : x2B e () 0 x62B e () (5.8) andB e is calculated from (5.6). Next, we show that the problem of finding e B (x) stated in (5.7) can be reduced to extension ofG to the LMF and the UMF of the IT2 FS e A. 127 Lemma 5.1. IfA;B2F U , thenAB,A()B(). Proof. See [127]. Lemma 5.2. If G : P U 1 P U 2 P Un ! P V is cutworthy, then its extension to T1 FSs preserves the subsethood order, i.e. 8i 2 f1; 2;:::;ng; C i ;D i 2 F U ;C i D i ) G(C 1 ;C 2 ;:::;C n )G (C 1 ;C 2 ;:::;C n ). Proof. According to Lemma 5.1,8i 2 f1; 2;:::;ng; C i D i ) C i () D i (). Since G is cutworthy, this means thatG(C 1 ();C 2 ();:::;C n ())G(D 1 ();D 2 ();:::;D n ()), which again, according to Lemma 5.1, means that G(C 1 ;C 2 ;:::;C n ) G(D 1 ;D 2 ;:::;D n ). Theorem 5.2. IfG :P U 1 P U 2 :::P Un !P V is cutworthy and e B =G( e A 1 ; e A 2 ;:::; e A n ), thenB =G(A 1 ;A 2 ;:::;A n ) andB =G(A 1 ;A 2 ;:::;A n ). Proof. Since e A i (x) A i;e (x) e A i (x), every embedded set satisfiesA i A i;e A i . Be- causeG is cutworthy, according to Lemma 5.2,G(A 1 ;A 2 ;:::;A n )G(A 1;e ;A 2;e ;:::;A n;e ) G(A 1 ;A 2 ;:::;A n ), which, according to (5.7), meansG(A)B e G(A). This is equivalent to B =G(A) andB =G(A), by the definition of an embedded T1 FS. Note that Theorem 5.2 may seem to be an obvious result, since it is already known that for extending a real function to IT2 FSs, it is sufficient to extend it to the LMF and the UMF of that fuzzy set. However, one must be careful about drawing such conclusions for functions whose domains or co-domains are not real numbers. A very famous and important function is the centroid of a T1 FS, which is a function from T1 FSs to real numbers. It was shown that the centroid of an IT2 FS cannot be calculated by calculating the centroid of its LMF and UMF [121]. Therefore, before working out the mathematical details, it is premature to conclude that extension 128 of any function to IT2 FSs can be done by extending that function to the LMF and UMF of the IT2 FSs. In fact, as we showed, the extension of a real function to sets is cutworthy. Extension of a real function to T1 FSs involves the extension of that real function to the-cuts of that fuzzy set. Therefore, when it is extended to IT2 FSs, it can be applied to the LMF and the UMF of them. Hence, the result for extension of real functions to IT2 FSs (extending the real function to the LMF and the UMF of the IT2 FS) is a special case of Theorem 5.2. In [169], it is argued that the LMF and UMF of the IT2 FS e B can be found based on two optimization problems: 8 > > > < > > > : e B (x) = min e A i (x) A i;e (x) e A i (x) Be (x) e B (x) = max e A i (x) A i;e (x) e A i (x) Be (x) (5.9) Theorem 5.2 shows that the solution to those optimization problems are e B (x) = G(A 1 ;A 2 ;:::;A n ) (x) and e B (x) = G(A 1 ;A 2 ;:::;An) (x). 5.3 Evidential Reasoning with An Interval Type-2 Fuzzy Valued Mea- sure In this section we demonstrate how extension of functions over sets can be used to infer an IT2 fuzzy probability for a belief structure with IT2 fuzzy focal elements and numeric or fuzzy mass assignments, instead of lower and upper probabilities. It is more meaningful to infer an IT2 fuzzy probability for an IT2 fuzzy event, instead of a lower and an upper probability. It is especially useful for solving Advanced Computing with Words (ACWW) problems that induce such belief 129 structures as will be seen in the sequel. To begin, we provide some background material on belief structures. 5.3.1 Background Consider a set-valued mapping : 0 ! P where 0 is a sample space. is called the referential set or frame of discernment, andP is the power set of . Assume that the sample space 0 is discrete and finite ( can be uncountable and/or infinite) 5.2 . Consider a probability measure(:) on 0 . For all (non-fuzzy)A , the mappingm :P ! [0; 1]: m(A) = (fsj(s) =Ag) 1(fsj(s) =;g) (5.10) is called a basic probability mass assignment. For A , if m(A)6= 0, A is called a focal element, andm() is said to define a Dempster-Shafer model or Belief Structure on . We show the belief structureB by the set of pairs of focal elements and probability masses as: B =f(A 1 ;m 1 ); (A 2 ;m 2 );:::; (A n ;m n )g (5.11) We also denote the set of focal elements ofm() by m() . Dempster showed that for eachB : Prob(B)2J B = Prob (B); Prob + (B) (5.12) 5.2 A continuous frame of discernment will not drastically change the theory, since the number of focal elements remains finite. The comparable theory for continuous and infinite sample spaces is out of the scope of this chapter. 130 J B = 2 4 X AB m(A); X A\B6=; m(A) 3 5 (5.13) Note that if () is a classical random variable ( : 0 ! R), 5.3 then m(A) is the probability ofA, sincefsj(s) =;g) =;, and 1(fsj(s) =;g) = 1; therefore,m(A) is just an extension of the concept of a probability distribution on and is induced by the random set (), which can be viewed as a set-valued random variable. The normalization by the term 1(fsj(s) =;g) helpsm() satisfy the following important property, that is analogous to the familiar properties satisfied by a probability distribution: X A m(A) = X A2 m() m(A) = 1 (5.14) The lower and upper probabilities ofB, defined in (5.12) are called belief or credibility and plausibility ofB, respectively: bel(B) = X AB m(A) (5.15) pls(B) = X A\B6=; m(A) (5.16) Alternatively, whenB is represented as in (5.11), we can write: bel(B) = X A i B m i (5.17) pls(B) = X A i \B6=; m i (5.18) 5.3 The random variable can be viewed as : 0!P , where the range of () only consists of the singletons, i.e. f!ig2P 131 The intervalJ B is sometimes called the belief interval or the ignorance interval forB. 5.3.2 Extension of Belief Structures to Fuzzy Focal Elements and Numeric Mass Assignments Zadeh [319], notes that the lower and upper probabilities in (5.12) and (5.13) can be written as: 8 > < > : Prob (B) = bel(B) = P A2 m() m(A)I(A;B) Prob + (B) = pls(B) = P A2 m() m(A)O(A;B) (5.19) where: I :P P ! [0; 1] I(A;B) = 8 > < > : 1 AB 0 A6B (5.20) O :P P ! [0; 1] O(A;B) = 8 > < > : 1 A\B6=; 0 otherwise (5.21) I(A;B) is an indicator function of inclusion ofA inB, andO(A;B) is an indicator function of overlap (non-empty intersection) ofA andB. When the focal elements are fuzzy sets, i.e. when m(A) :F ! [0; 1], the concepts of indicators of inclusion and overlap can be relaxed to degrees of inclusion and overlap. 132 Zadeh suggests the following measures for inclusion and overlap: I :F F ! [0; 1] I(A;B) = inf !2 [max(1 A (!); B (!)] (5.22) O :F F ! [0; 1] O(A;B) = sup !2 [min( A (!); B (!)] (5.23) He believes that in a fuzzy setting, the concept of lower and upper probabilities are no longer valid 5.4 and it does not make sense to argue what the minimum probability and maximum proba- bility of an event induced by the mappingm are; therefore, he prefers to call belief and plausibility Expected Certainty,EC, and Expected Possibility,E, in the fuzzy setting. Assume that one has a belief structure with focal elementsfA 1 ;A 2 ; ;A n g F , whose probability masses are determined by m : F ! [0; 1]. Then, for the event B 2 F , the Expected Certainty and Possibility are also calculated via (5.19) for lower and upper probabilities, replacing indicator functions of inclusion and non-empty overlap with degrees of inclusion and non-empty overlap. Therefore, (5.19) can be used to calculate expected certainty and expected possibility (which can be carelessly called lower and upper probabilities) for belief structures with fuzzy focal elements and numeric mass assignments substituting the indicator functions for inclusion and non-empty overlap with their fuzzy extensions, which are shown in (5.22) and (5.23). 5.4 This is because different inclusion and overlap measures yield different extensions of lower and upper probabil- ities, which are not necessarily comparable, due to lack of a linear order in the set of all fuzzy sets over a universe of discoures; therefore, none of the extensions can be called lower or upper probabilities. 133 When the belief structure has IT2 FS focal elements and numeric mass assignments, the above arguments are still valid, with the exception that inclusion and overlap measures need to be cal- culated for IT2 FSs. 5.3.3 Belief Structures with Fuzzy Focal Elements and Fuzzy Probability Mass Assignments The idea of fuzzifying the mass assignments was originally suggested by Zadeh in his paper on granularity of information, where he extends Dempster-Shafer theory to fuzzy sets [319]. Interval- valued belief structures [190, 295] are special cases of such belief structures that were used for reasoning [53] and were employed in rule-based systems [9]. When both focal elements and probability mass assignments are fuzzy sets, i.e. when m : F ! F [0; 1] , the Extension Principle can be used to generalize (5.19). Assume that m() induces the following belief structure : B =f(A 1 ;M 1 ); (A 2 ;M 2 );:::; (A n ;M n )g (5.24) where A i 2 F are the focal elements and M i 2 F [0; 1] are fuzzy probability mass assign- ments. Then, the lower and upper probabilities of a fuzzy event B 2 F become fuzzy sets 134 themselves, so we can consider them as linguistic probabilities, and denote them by LProb (B) and LProb + (B). They can be calculated [319] as: 8 > > > > > > > > > > > > < > > > > > > > > > > > > : LProb (B) (z) = sup z=p 1 x 1 ++pnxn p 1 +p 2 ++pn =1 min ( M 1 (p 1 ); ; Mn (p n )) LProb + (B) (z) = sup z=p 1 y 1 ++pnyn p 1 +p 2 ++pn =1 min ( M 1 (p 1 ); ; Mn (p n )) (5.25) where x i =I(A i ;B) and y i =O(A i ;B). The above normalized sums [218] are interactive additions of type-1 fuzzy sets [62]. It was shown [218] that a necessary and sufficient condition for existence of solutions to the above optimization problems is: 82 [0; 1]; n X i=1 a 0 i () 1 n X i=1 b 0 i (); i = 1; 2;:::;n (5.26) where a 0 i () and b 0 i () are the left endpoint and right endpoint of M i (), the -cut 5.5 of M i . Therefore, when the fuzzy probability mass assignments are postulated when building a Dempster- Shafer model, such a condition must be imposed so as to make the fuzzy probability mass assign- ments consistent and to have solutions for the optimization problems described in (5.25), i.e. so that lower and upper probabilities exist and can be calculated. Unfortunately, when type-1 fuzzy set models of the probability mass assignments are ex- tracted by collecting data from subjects [33, 37] [173, Ch.3, Appendix 3.A], [192], it is difficult 5.5 Here, we assume thatMi’s have the properties of fuzzy numbers (possibly except normality) i.e. they are convex, have bounded supports (by definition of the probability mass assignment, their support is [0; 1]), and are upper semi- continuous, so that their-cuts are bounded intervals. 135 to imagine that one can impose any conditions on the data collected from the subjects to make the resulting membership functions satisfy the conditions of (5.26). Therefore, it was suggested in [218] to use Fuzzy Weighted Averages (FWAs) [59, 82, 150] instead of normalized sums, be- cause FWAs involve an inherent normalization that removes the inconsistencies of the weights (fuzzy probability mass assignments), which guarantees that they always exist. Therefore, in- stead of (5.25), one can compute LProb (B) (z) and LProb + (B) (z), as: 8 > > > > > > > > > > > > > < > > > > > > > > > > > > > : LProb (B) (z) = sup z= P n i=1 p i x i P n i=1 p i min ( M 1 (p 1 ); M 2 (p 2 ); ; Mn (p n )) LProb + (B) (z) = sup z= P n i=1 p i y i P n i=1 p i min ( M 1 (p 1 ); M 2 (p 2 ); ; Mn (p n )) (5.27) (5.27) can be summarized by the following expressive formulas: LProb (B) = P n i=1 M i x i P n i=1 M i ; LProb + (B) = P n i=1 M i y i P n i=1 M i (5.28) 5.3.4 Linguistic Belief Structures As we discussed, a first-order uncertainty model for a word is an IT2 FS; therefore, it is viable to envision Linguistic Belief Structures in which focal elements, probability masses, and even the knowledge on compatibility measures are allowed to be words, modeled by IT2 FSs. Note that a Linguistic Belief Structure is a general belief structure which allows some of the focal elements, probability masses, and compatibility measures to be modeled by numbers, intervals, or T1 FSs, since they are special cases of IT2 FSs. 136 When both focal elements and probability mass assignments are IT2 FSs, i.e. when m : f F ! f F [0; 1] , the Extension Principle in Theorem 5.2 can be used to generalize (5.19). Assume thatm() induces the following belief structure: e B =f( e A 1 ; f M 1 ); ( e A 2 ; f M 2 );:::; ( e A n ; f M n )g (5.29) where e A i 2 f F are the focal elements and f M i 2 f F [0; 1] are IT2 fuzzy probability mass assign- ments. Then, the lower and upper probabilities of an interval type-2 fuzzy event e B2 f F become interval type-2 fuzzy sets themselves, so we can consider them as linguistic probabilities, and denote them by ^ LProb ( e B) and ^ LProb + ( e B). They can be calculated as: 8 > > > > > > > > > > > > < > > > > > > > > > > > > : ^ LProb ( e B) (z) = sup z=p 1 x 1 ++pnxn p 1 ++pn =1 min f M 1 (p 1 ); ; f Mn (p n ) ^ LProb ( e B) (z) = sup z=p 1 x 1 ++pnxn p 1 ++pn =1 min f M 1 (p 1 ); ; f Mn (p n ) (5.30) 8 > > > > > > > > > > > > < > > > > > > > > > > > > : ^ LProb + ( e B) (z) = sup z=p 1 y 1 ++pnyn p 1 ++pn =1 min f M 1 (p 1 ); ; f Mn (p n ) ^ LProb + ( e B) (z) = sup z=p 1 y 1 ++pnyn p 1 +p 2 ++pn =1 min f M 1 (p 1 ); ; f Mn (p n ) (5.31) 137 wherex i andy i are measures of inclusion and overlap between e A i and e B. Assume thatI andO are inclusion and overlap measures for T1 FSs. Then, one way to derive inclusion and overlap measures for IT2 FSs is: 8 > < > : I( e A; e B) = 1 2 (I(A;B) +I(A;B)) O( e A; e B) = 1 2 (O(A;B) +O(A;B)) (5.32) It was shown [218] that necessary and sufficient conditions for existence of solutions to the above optimization problems are: 8 > > > > > < > > > > > : 82 [0; 1]; P n i=1 a 0 i () 1 P n i=1 b 0 i () 82 [0; h min ]; P n i=1 a 0 i () 1 P n i=1 b 0 i (); i = 1; 2;:::;n (5.33) wherea 0 i () andb 0 i () are the left endpoint and right endpoint ofM i (), the-cut ofM i ;a 0 i () andb 0 i () are the left endpoint and right endpoint ofM i (), the-cut ofM i ; and,h min is the minimum of the heights ofM i ’s. Therefore, when interval type-2 fuzzy probability mass assign- ments are postulated when building a Dempster-Shafer model, such a condition must be imposed, so as to make the IT2 fuzzy probability mass assignments consistent and to have solutions for the optimization problems described in (5.30) and (5.31), i.e. so that lower and upper probabilities exist and can be calculated. As we explained for T1 FS word models, when IT2 FS models of the probability mass as- signments are extracted by collecting data from subjects [151, 286], it is very difficult to imagine that one can impose any conditions on the data collected from the subjects, so as to make the 138 resulting membership functions satisfy the two sets of conditions in (5.33); therefore, it was sug- gested in [218] to use Linguistic Weighted Averages (LWAs) [277] instead of normalized sums, because LWAs involve an inherent normalization that removes the inconsistencies of the weights (IT2 fuzzy probability mass assignments) which guarantees that they always exist. Therefore, instead of (5.30) and (5.31), one can compute ^ LProb ( e B) and ^ LProb + ( e B), as: 8 > > > > > > > > > > > < > > > > > > > > > > > : ^ LProb ( e B) (z) = sup z= P n i=1 p i x i P n i=1 p i min f M 1 (p 1 ); f M 2 (p 2 ); ; f Mn (p n ) ^ LProb ( e B) (z) = sup z= P n i=1 p i x i P n i=1 p i min f M 1 (p 1 ); f M 2 (p 2 ); ; f Mn (p n ) (5.34) 8 > > > > > > > > > > > < > > > > > > > > > > > : ^ LProb + ( e B) (z) = sup z= P n i=1 p i y i P n i=1 p i min f M 1 (p 1 ); f M 2 (p 2 ); ; f Mn (p n ) ^ LProb + ( e B) (z) = sup z= P n i=1 p i y i P n i=1 p i min f M 1 (p 1 ); f M 2 (p 2 ); ; f Mn (p n ) (5.35) (5.34) and (5.35) can be summarized by the following expressive formulas: 8 > > < > > : ^ LProb ( e B) = P n i=1 f M i x i P n i=1 f M i ^ LProb + ( e B) = P n i=1 f M i y i P n i=1 f M i (5.36) 139 5.3.5 Extending the Concept of Belief Interval Using Extension Principle for Set Functions In the previous sections, we saw that inferring the probability of an event from a belief structure with fuzzy mass assignments results in fuzzy lower and upper probabilities. In this section, we show how we can infer just one fuzzy set from such belief structures as the probability of a fuzzy event, using the Extension Principle for set functions. To begin, note that the belief interval of an event inferred from a belief structure with non- fuzzy focal elements and numeric mass assignments can be viewed as a function that maps a set (an event) to another set (a belief interval). Therefore, from (5.12) and (5.13),J B is a set function of the focal elements and the event, i.e. it is the following set-to-set function: J B = (A 1 ;A 2 ;:::;A n ;B) (5.37) Note that in (5.37),J B can also be written as a function ofA i ’s,B, andm i ’s, i.e.: J B = (A 1 ;A 2 ;:::;A n ;B;m 1 ;m 2 ;:::;m n ) (5.38) However, for numeric probability mass assignments this task is not necessary and m i ’s can be treated as fixed parameters, because we only need to extendA i ’s andB to fuzzy sets. When hav- ing set-valued or fuzzy probability mass assignments, the description ofJ B in (5.38) is useful, sincem i ’s can be extended to fuzzy or non-fuzzy sets with such a description. Consequently,J B can be extended to the case when the focal elements and the events are T1 or IT2 FSs using the Extension Principle for set-to-set functions. It was shown in [157] that such 140 a set function is a cutworthy function; hence, it can be extended to T1 FSs. By Theorem 5.2, we can extend it to IT2 FSs as well. Assume that we have a linguistic belief structure with IT2 FS focal elements and numeric mass assignments, as: e B =f( e A 1 ;m 1 ); ( e A 2 ;m 2 );:::; ( e A n ;m n )g (5.39) Then define the following two belief structures with T1 FS focal elements: B =f(A 1 ;m 1 ); (A 2 ;m 2 );:::; (A n ;m n )g (5.40) B =f(A 1 ;m 1 ); (A 2 ;m 2 );:::; (A n ;m n )g (5.41) Next, define the-cuts ofB andB, as: B() =f(A 1 ();m 1 ); (A 2 ();m 2 );:::; (A n ();m n )g (5.42) B() =f(A 1 ();m 1 ); (A 2 ();m 2 );:::; (A n ();m n )g (5.43) Note that the-cut belief structuresB() andB() in (5.42) and (5.43) have non-fuzzy focal elements. Next, assume that we wish to infer the probability of an IT2 fuzzy event e B, namely e P = ^ LProb( e B), from (5.39). Since the belief interval is essentially an interval of all possible prob- abilities of an event, we can calculate ( e A 1 ; e A 2 ;:::; e A n ; e B), the extension of the set-to-set 141 function in (5.37). According to the , we have to calculateP = (A 1 ;A 2 ;:::;A n ;B) and P = (A 1 ;A 2 ;:::;A n ;B). The-cuts ofP andP are then the broad belief intervals associated with the eventsB() andB(), that are respectively infered fromB() andB(), i.e: P () = (A 1 ();A 2 ();:::;A n ();B()) = J B() = 2 4 X A i ()B() m i ; X A i \B6=; m i 3 5 ; 2 [0; h min ] (5.44) P () = (A 1 ();A 2 ();:::;A n ();B()) = J B() = 2 4 X A i ()B() m i ; X A i \B6=; m i 3 5 (5.45) 5.3.6 A Fuzzy-Valued Measure for Linguistic Belief Structures The-cuts of the Linguistic Belief Structure in (5.39) can be defined as: B() = f(A 1 ();M 1 ()); (A 2 ();M 2 ());:::; (A n ();M n ())g; 2 [0; h min ] (5.46) whereh min is the minimum of height of the LMFs of f M i ’s and e A i ’s. B() = f(A 1 ();M 1 ()); (A 2 ();M 2 ());:::; (A n ();M n ())g; 2 [0; 1] (5.47) 142 The above -cuts of a Linguistic Belief Structure are belief structures whose mass assign- ments and focal elements are sets. They can be viewed as a special case of a belief structure with fuzzy focal elements and fuzzy mass assignments. As was mentioned in Section 5.3.3, FWAs can be used to derive a lower and an upper probability for such belief structures. When probability mass assignments and focal elements are-cuts of convex fuzzy sets, they are intervals; hence, the FWA becomes an Interval Weighted Average (IWA), and the lower and the upper probabilities become intervals, instead of T1 FSs. Therefore, considering the belief structureB =f(A 1 ;M 1 ); (A 2 ;M 2 );:::; (A n ;M n )g; whose focal elements and probability mass assignments are intervals, the lower and the upper probabili- ties can be calculated, as: 8 > < > : bel(B) = [p 1 ;p 2 ] = P n i=1 M i I(A i ;B) P n i=1 M i pls(B) = [q 1 ;q 2 ] = P n i=1 M i O(A i ;B) P n i=1 M i (5.48) Since the lower and upper probabilities are no longer numbers, we cannot define the belief interval forB. Metaphorically, the belief interval in this case is blurred, i.e. its end-points are uncertain and are known to lie within intervals themselves. This gives rise to the new concept of a broad belief intervalK , which includes all the possible belief intervals forB: K B = [ m i 2M i J B = [ m i 2M i (A 1 ;A 2 ;:::;A n ;B;m 1 ;m 2 ;:::;m n ) (5.49) 143 SinceK B is the union of intervals whose left end-points lie in [p 1 ; p 2 ] and their right end-points lie in [q 1 ; q 2 ], K B = [minbel(B); maxpls(B)] = [p 1 ;q 2 ] (5.50) p 1 andq 2 are calculated as [173]: 8 > < > : p 1 = min m i 2M i P n i=1 m i I(A i ;B) P n i=1 m i q 2 = max m i 2M i P n i=1 m i O(A i ;B) P n i=1 m i (5.51) SinceA i andB are non-fuzzy sets and the inclusion and overlap measuresI andO reduce down to the indicator functions of inclusion and overlap [see (5.20) and (5.21)], (5.51) can be rewritten as: 8 > < > : p 1 = min m i 2M i P ijA i B m i P n i=1 m i q 2 = max m i 2M i P ijA i \B6=; m i P n i=1 m i (5.52) The broad belief intervalK B can be viewed as a set to set function (A 1 ;:::;A n ;B;M 1 ;:::;M n ) that maps the interval mass assignmentsM i , focal elementsA i , and the interval eventB to the interval [p 1 ; q 2 ]. Theorem 5.3. K B = (A 1 ;:::;A n ;B;M 1 ;:::;M n ) is cutworthy. Proof. It was shown in [157] that the function in (5.38) is cutworthy. We wish to show that is cutworthy as well. Assume that A i C i ; B D; M i N i . We want to show that (A 1 ;:::;A n ;B;M 1 ;:::;M n ) (C 1 ;:::;C n ;D;N 1 ;:::;N n ). We have to show that ify2 (A 1 ;:::;A n ;B;M 1 ;:::;M n ), theny2 (C 1 ;:::;C n ;D;N 1 ;:::;N n ). 144 By the definition of : y2 (A 1 ;:::;A n ;B;M 1 ;:::;M n )) y2 [ m i 2M i (A 1 ;:::;A n ;B;m 1 ;:::;m n ) (5.53) Therefore,9m 1 ;m 2 ;:::;m n for which,y2 (A 1 ;:::;A n ;B;m 1 ;:::;m n ). On the other hand, sinceA i C i andBD, from cutworthiness of , it is concluded that (A 1 ;:::;A n ;B;m 1 ; :::;m n ) (C 1 ;:::;C n ;D;m 1 ;:::;m n ). This immediately impliesy2 (C 1 ;:::;C n ;D; m 1 ;:::;m n ) and becauseM i N i ,m i 2M i )m i 2N i ; therefore,y2 S m i 2N i (C 1 ;:::; C n ;D;m 1 ;:::;m n ), or in other words,y2 (C 1 ;:::;C n ;D;N 1 ;:::;N n ). Since is cutworthy, it can be extended to T1 and IT2 FSs to yield a fuzzy-valued belief measure so as to infer a single fuzzy set as the probability of an event from a Linguistic Belief Structure. 5.4 Conclusions and Future Work In this chapter, we provided the Extension Principle for extending set functions to IT2 FSs. We showed that under the assumption of cutworthiness, the extension of a set function to an IT2 FS reduces to its extension to the LMF and UMF of that fuzzy set, and if a set-valued set function is not cutworthy, it cannot be extended to IT2 FSs at all. We also showed how the Extension Principle can be used to infer an IT2 fuzzy probability for an IT2 fuzzy event from a belief structure with IT2 FS focal elements and mass assignments. In the future, applications of using the concept of a broadened belief interval in rule-based systems that use evidential reasoning for inference [131, 132, 153, 239, 305] has to be investigated. 145 Chapter 6 Probability Calculations Using the Generalized Extension Principle for Type-1 Fuzzy Sets: Applications to Advanced Computing with Words From principles is derived probability, but truth or certainty is obtained only from facts. Sir Tom Stoppard, British Playwright I N this chapter, we propose and demonstrate an effective methodology for implementing the Generalized Extension Principle to solve Advanced Computing with Words (ACWW) prob- lems. Such problems involve implicit assignments of linguistic truth, probability, and possibility. To begin, we establish vocabularies of the words involved in the problems, and then collect data from subjects about the words after which fuzzy set models for the words are obtained by using the Interval Approach (IA) or the Enhanced Interval Approach (EIA). Next, the solutions of the ACWW problems, which involve the fuzzy set models of the words, are formulated using the Generalized Extension Principle. Because the solutions to those problems involve complicated functional optimization problems that cannot be solved analytically, we then develop a numeri- cal method for their solution. Finally, the resulting fuzzy set solutions are decoded into natural 146 language words using Jaccard’s similarity measure. We explain how ACWW problems can solve some potential prototype Engineering problems and connect the methodology of this chapter with Perceptual Computing. 6.1 Introduction Computing with Words (CWW or CW) is a methodology of computation whose objects are words rather than numbers [168, 175, 329, 333]. Words are drawn from natural languages, and are mod- eled by fuzzy sets. Basic Computing with Words deals with simple assignments of attributes through IF-THEN rules. Advanced Computing with Words (ACWW) involves problems in which the carriers of information are numbers, intervals, and words. Assignment of attributes may be implicit, and one generally deals with assignments of linguistic truth, probability, and possibil- ity constraints through complicated natural language statements. Moreover, world knowledge is necessary for solving ACWW problems [170]. Modeling words is subsequently an important task for solving ACWW problems. Mendel [168] argues that since words mean different things to different people, a first order uncertainty model for a word should be an interval type-2 fuzzy set (IT2FS). Interestingly, Zadeh [333] an- ticipates that in the future, fuzzy sets of higher type will play a central role in ACWW. Therefore, it is plausible to examine the solutions to ACWW problems when words are modeled by IT2FSs. In this chapter, however, we use IT2 FS models of words only to establish T1FS models of them, for reasons that are given in our Conclusions. CWW has been applied successfully to hierarchical and distributed decision making [92,172], Perceptual Reasoning and Perceptual Computing [171, 173, 282], and decision support [97, 163]. There have been extensive attempts to implement the approach in more realistic settings, e.g. [124, 147 174, 202], and attempts to formalize the paradigm of CWW [30, 31, 125]. Despite the extensive literature on CWW, there have only been a few attempts to deal with ACWW problems to the best of our knowledge [194, 210, 211, 215, 221]. Zadeh has introduced a set of challenge problems for ACWW in his recent works, and has proposed solutions to them [327, 333]. The solutions to many of the problems utilize the Gen- eralized Extension Principle (GEP), which results in complicated optimization problems. In this chapter, we establish a methodology to carry out the computations of the GEP for solving ACWW problems. We are focusing on solving one of Zadeh’s famous ACWW problems, but later demon- strate that some of his other famous ACWW problems can also be solved in exactly the same manner. Historically, Zadeh’s ACWW challenge problems have been formulated in terms of everyday reasoning examples involving attributes like height, age, distance, etc (e.g., Probably John is tall. What is the probability that John is short? 6.1 In this chapter, we also demonstrate how ACWW can address more realistic problems dealing with subjective judgments. It has been demonstrated in [57, 66, 120, 179, 250, 261, 296, 353] that subjective, fuzzy, and linguistic probabilities can be used to model the subjective assessments of safety, reliability, and risk. Some examples that address product safety, network security, and network trust are: It is somewhat improbable that cars of model X are unsafe. What is the probability that they are safe? It is very probable that the network provided by Company Y is highly secure. What is the probability that it is somewhat insecure? Probably the online auction of website Z is pretty trustworthy. What is the probability that it is extremely trustworthy? 6.1 This question can also be stated as: What is the probability that (the height of) John is short? 148 It is somewhat likely that the risk associated with investment in real estate is risky. What is the average risk associated with investment in real estate? These examples suggest that the methodologies for solving Zadeh’s ACWW problems can also be applied to solve more realistic problems. We show how to do this in Section 6.5. The rest of this chapter is organized as follows: In Section 6.2, we describe a famous ACWW problem, its variations and their solutions that are obtained by using the GEP; in Section 6.3, we first establish fuzzy set models of the words that are involved in the ACWW problems of Section 6.2, and then implement numerical solutions to the problems when type-1 fuzzy set word mod- els are used; in Section 6.4, we provide discussions about how to validate solutions of ACWW problems that are solved in this chapter; in Section 6.5, we solve an Engineering ACWW prob- lem, using the methodology of our earlier sections; in Section 6.6, we show how some of Zadeh’s other ACWW problems can be solved using the same methodology as described in Section 6.3; in Section 6.7, we present a high level discussion describing how ACWW problems can be for- mulated by investigating the linguistic description of the problems; in Section 6.8, we investigate the relationship between the methodology of this chapter with Perceptual Computing; and, finally in Section 6.9, we present some conclusions, as well as some directions for future research. 6.2 Problem Description Among Zadeh’s many ACWW problems is the following famous Probability that John is short (PJS) problem: Probably John is tall. What is the probability that John is short? 149 This problem involves a linguistic probability (probably) and an implicit assignment of that lin- guistic probability to the probability that John is tall. The probability that “John is tall”,P Tall , is calculated as 6.2 [317]: P Tall = Z b a Tall (h)p H (h)dh (6.1) in whicha andb are the minimum and maximum possible heights of men andp H is the probability distribution function of heights, where: Z b a p H (h)dh = 1 (6.2) The probability of the fuzzy event “Short” is calculated as: P Short = Z b a Short (h)p H (h)dh (6.3) To derive the soft constraint imposed onP Short by the fact thatP Tall is constrained by “Prob- ably”, one needs to use the framework of the GEP, which is an important tool for propagation of possibilistic constraints, and was originally introduced in [325]. Assume thatf() andg() are real functions: f;g :U 1 U 2 U n !V (6.4) 6.2 We assume that “Probably John is tall” is equivalent to “It is probable that John is tall”. 150 Moreover, assume that: f(X 1 ;X 2 ; ;X n ) isA g(X 1 ;X 2 ; ;X n ) isB whereA andB are T1 FSs. ThenA inducesB as follows: B (v) = 8 > > < > > : sup u 1 ;u 2 ;;unjv=g(u 1 ;u 2 ;;un) A (f(u 1 ;u 2 ; ;u n )) 9v =g(u 1 ;u 2 ; ;u n ) 0 69v =g(u 1 ;u 2 ; ;u n ) (6.5) The GEP basically extends the functiong(f 1 ()) : V ! V to T1 FSs, wheref 1 is the pre- image of the functionf(). In the PJS problem,f =P Tall , and: f :X [a;b] !R (6.6) whereX [a;b] is the space of probability distribution functions on [a;b]. Also,g =P Short , and: g :X [a;b] !R (6.7) The GEP then implies that the soft constraint on the probability that “John is short” is: 151 P Short (v) = 8 > > > > < > > > > : sup v= R b a p H (h) Short (h)dh R b a p H (h)dh=1 Probable R b a p H (h) Tall (h)dh 9p H 2X [a;b] s:t:v = R b a p H (h) Short (h)dh 0 69p H 2X [a;b] s:t:v = R b a p H (h) Short (h)dh (6.8) Note that (6.8) cannot be solved using-cut decomposition theorem, sinceB() =g(f 1 (A())) (f =P Tall andg =P Short ), but the relationf 1 () cannot be derived explicitly. In this chapter, we also study variations of the PJS problem, that include: Probably John is tall. What is the probability that John isW ?. This is called the “PJW problem”. In this problem W represents any height word. Similar to (6.8), the GEP yields the soft constraint P W on the probability that “John isW ”: P W (v) = 8 > > > > < > > > > : sup v= R b a p H (h) W (h)dh R b a p H (h)dh=1 Probable R b a p H (h) Tall (h)dh 9p H 2X [a;b] s:t:v = R b a p H (h) W (h)dh 0 69p H 2X [a;b] s:t:v = R b a p H (h) W (h)dh (6.9) 152 Note that (6.8) and (6.9) are difficult functional optimizations that have to be carried out overX [a;b] , the space of probability distributions over [a;b]. In this chapter, a methodology for performing the optimizations is offered. 6.3 Implementation of The Solution to The PJW Problem In this section, we implement the solution to the PJW problem which includes the PJS problem. We model the words involved in the problem, and use a methodology to approximate the solution to the optimization problem of (6.9). 6.3.1 Modeling Words To begin, we established the following vocabularies of linguistic heights and linguistic probabil- ities: Heights =fVery short, Short, Moderately short, Medium, Moderately tall, Tall, Very tallg and, Probabilities =fExtremely improbable, Very improbable, Improbable, Somewhat improbable, Tossup, Somewhat probable, Probable, Very probable, Extremely probableg. Next, we modeled all of these words as FSs. Recall, that there are at least two types of uncertainty associated with a word [167, 168]: intra-uncertainty, the uncertainty an individual has about the meaning of a word, 6.3 and inter- uncertainty, the uncertainty a group of people have about the meaning of a word. In other words, words mean different things to different people, and this fact calls for (at least) using IT2FSs as models of words [167, 168]. In order to synthesize IT2 FS models of words, we begin by collecting data from a group of subjects and then use the Interval Approach (IA) [151] or Enhanced Interval Approach (EIA) 6.3 This is related to the uncertainty associated with unsharpness of classes [332]. 153 [286]. We collected data from 48 subjects using the Amazon Mechanical Turk website [1] for the above vocabularies of linguistic heights and linguistic probabilities. We used the EIA [286] to obtain IT2 FS models of the words from that data. The IT2 FS footprints of uncertainties (FOUs) have nine parameters (see Fig. 6.1); (q;r;s;t) determine the upper membership function (UMF) and (q 1 ;r 1 ;s 1 ;t 1 ;h) determine the lower membership function (LMF), whereh is the height of the lower membership function. 1 q q 1 r h r 1 s 1 s t 1 t Figure 6.1: FOU of an IT2FS described by nine parameters. The vocabulary of linguistic heights modeled by IT2FSs is depicted in Fig. 6.2. We assumed that the minimum possible height is a = 139:7 cm and the maximum possible height is b = 221cm. The parameters of the FOUs of the linguistic height words are given in Table 6.1. Table 6.1: Membership function parameters of the height words depicted in Fig. 6.2 UMF parameters LMF parameters Very short (139.70, 139.70, 140.92, 153.17) (139.70,139.70, 140.28, 143.95, 1) Short (141.62, 150.00, 158.75, 168.11) (151.35, 153.85, 153.85, 156.04, 0.45) Moderately short (149.24, 157.48, 162.56, 170.80) (159.79, 160.84, 160.84 161.21, 0.52) Medium (154.32, 163.83, 171.45, 178.42) (166.06, 168.53, 168.53, 169.83, 0.46) Moderately tall (166.89, 175.00, 181.61, 190.59) (173.96, 177.91, 177.91, 181.04, 0.59) Tall (174.12, 185.00, 193.04, 204.14) (185.86, 188.99, 188.99, 192.07, 0.44) Very tall (176.58, 207.70, 221.00, 221.00) (193.67, 219.07, 221, 221, 1) 154 150 200 0 0.5 1 Very short 150 200 0 0.5 1 Short 150 200 0 0.5 1 Moderately short 150 200 0 0.5 1 Medium 150 200 0 0.5 1 Moderately tall 150 200 0 0.5 1 Tall 150 200 0 0.5 1 Very tall Figure 6.2: V ocabulary of IT2FSs representing linguistic heights. Table 6.2: Pairwise similarities between the height words depicted in Fig. 6.2 Very short Short Moderately short Medium Moderately tall Tall Very tall Very short 1.000 0.129 0.016 0 0 0 0 Short 0.129 1.000 0.451 0.166 0.001 0 0 Moderately short 0.016 0.451 1.000 0.336 0.015 0 0 Medium 0 0.166 0.336 1.000 0.149 0.014 0.001 Moderately tall 0 0.001 0.015 0.149 1.000 0.222 0.042 Tall 0 0 0 0.014 0.222 1.000 0.165 Very tall 0 0 0 0.001 0.0422 0.165 1.000 155 To make sure that the vocabulary of height words provides an appropriate partitioning of the space of heights, we calculated the pairwise Jaccard similarities between the height words. Jaccard similarity [46, 280] between IT2 FSs e A and e B is calculated as: s J ( e A; e B) = R U min( e A (u); e B (u))du + R U min( e A (u); e B (u))du R U max( e A (u); e B (u))du + R U max( e A (u); e B (u))du (6.10) Pairwise similarities between the height words are shown in Table 6.2. Observe that the words have pairwise similarities that are less than 0:5, indicating that this vocabulary provides a good partitioning of the universe of discourse. Similar information for the vocabulary of linguistic probabilities is given in Fig. 6.3 and Tables 6.3 and 6.4. In Table 6.4, observe that the probability words also have pairwise similari- ties less than 0:5, indicating that this vocabulary provides a good partitioning of the universe of discourse. 0 0.5 1 0 0.5 1 Extremely Improbable 0 0.5 1 0 0.5 1 Very Improbable 0 0.5 1 0 0.5 1 Improbable 0 0.5 1 0 0.5 1 Somewhat Improbable 0 0.5 1 0 0.5 1 Tossup 0 0.5 1 0 0.5 1 Somewhat Probable 0 0.5 1 0 0.5 1 Probable 0 0.5 1 0 0.5 1 Very Probable 0 0.5 1 0 0.5 1 Extremely Probable Figure 6.3: V ocabulary of IT2FSs representing linguistic probabilities. 156 Table 6.3: Membership function parameters of the probability words depicted in Fig. 6.3 UMF parameters LMF parameters Extremely improbable (0, 0 , 0.0183 , 0.1316) (0, 0, 0.0046, 0.0627, 1.0000) Very improbable (0.0293, 0.1000, 0.1250, 0.1707) (0.0896, 0.1167, 0.1167, 0.1604, 0.7643) Improbable (0.0586, 0.1750, 0.2500, 0.3414) (0.1896, 0.2200, 0.2200, 0.2604, 0.5757) Somewhat improbable (0.0982, 0.2500, 0.3000, 0.4518) (0.2293, 0.2750, 0.2750, 0.3207, 0.6464) Tossup (0.3586 , 0.5000 , 0.5500 , 0.6414 ) (0.4896, 0.5083, 0.5083, 0.5141, 0.4107) Somewhat probable (0.4793, 0.5500, 0.6000, 0.6707) (0.5293, 0.5750, 0.5750, 0.6207,0.6464) Probable (0.5086, 0.6500, 0.7000, 0.7914) (0.6293, 0.6750, 0.6750, 0.7207, 0.6464) Very probable (0.7189, 0.8250, 0.9000, 0.9811) (0.8293, 0.8700, 0.8700, 0.9207, 0.5757 ) Extremely probable (0.8684, 0.9772, 1.0000, 1.0000) (0.9405, 0.9954, 1.0000, 1.0000, 1.0000 ) Table 6.4: Pairwise similarities between the probability words depicted in Fig. 6.3 Extremely improbable Very improbable Improbable Somewhat improbable Tossup Somewhat probable Probable Very probable Extremely probable Extremely improbable 1.0000 0.1502 0.0394 0.0063 0 0 0 0 0 Very improbable 0.1502 1.0000 0.1432 0.0405 0 0 0 0 0 Improbable 0.0394 0.1432 1.0000 0.4091 0 0 0 0 0 Somewhat improbable 0.0063 0.0405 0.4091 1.0000 0.0382 0 0 0 0 Tossup 0 0 0 0.0382 1.0000 0.3369 0.1150 0 0 Somewhat probable 0 0 0 0 0.3369 1.0000 0.2179 0 0 Probable 0 0 0 0 0.1150 0.2179 1.0000 0.0353 0 Very probable 0 0 0 0 0 0 0.0353 1.0000 0.1242 Extremely probable 0 0 0 0 0 0 0 0.1242 1.0000 157 Because Zadeh’s solutions [333] involve type-1 FS models of words, we chose those models from our interval type-2 FS models by using two kinds of embedded T1FSs. Definition 6.1. An embedded type-1 fuzzy setA e of an IT2 FS e A overU is a T1 FS overU for which8t2U; e A (t) Ae (t) e A (t). Definition 6.2. A middle embedded type-1 fuzzy setA ne of an IT2 FS e A overU is an embedded T1 FS of e A that is normal. i.e. sup t Ane (t) = 1. An UMF is a middle embedded T1 FS. Our first T1FS is the UMF from each of the IT2 FS FOUs (see Figs. 6.2 and 6.3, and Tables 6.1 and 6.3). Our second T1FS is obtained from the Fig. 6.1 and Fig. 6.3 FOU parameters, as shown in Fig. 6.4, and is also a middle embedded T1 FS, sort of an average T1 FS. In the sequel we refer to this T1 FS as a “middle embedded T1 FS.” 1 Left shoulder Interior Right shoulder Figure 6.4: Middle embedded type-1 fuzzy sets for left shoulder, interior, and right shoulder FOUs. The vocabulary of height words modeled by middle embedded T1FSs is shown in Fig. 6.5 and their parameters are given in Table 6.5. We computed the pairwise similarities for T1 FS 158 models and observed that they were less than 0.5, indicating that this vocabulary provides a good partitioning of the universe of discourse for heights. 150 200 0 0.2 0.4 0.6 0.8 1 Very short 150 200 0 0.2 0.4 0.6 0.8 1 Short 150 200 0 0.2 0.4 0.6 0.8 1 Moderately short 150 200 0 0.2 0.4 0.6 0.8 1 Medium 150 200 0 0.2 0.4 0.6 0.8 1 Moderately tall 150 200 0 0.2 0.4 0.6 0.8 1 Tall 150 200 0 0.2 0.4 0.6 0.8 1 Very tall Figure 6.5: V ocabulary of linguistic heights modeled by middle embedded type-1 fuzzy sets. Table 6.5: Membership function parameters of the height words depicted in Fig. 6.5 Trapezoidal membership function parameters Very short (139.7000, 139.7000, 140.5964, 148.5572) Short (141.6237, 150.0000, 156.3023, 162.0711) Moderately short (149.2437, 157.4800, 161.6994, 166.0017) Medium (154.3237, 163.8300, 169.9909, 174.1224) Moderately tall (166.8934, 175.0000, 179.7609, 185.8129) Tall (174.1176 185.0000 191.0140 198.1066) Very tall (185.1236 213.3897 221.0000 221.0000) 159 Comparable MFs of middle embedded T1FS models of our nine probability words are given in Fig. 6.6, and Table 6.6, respectively. We also observed that our probability words have pair- wise similarities less than 0:5, indicating that this vocabulary provides a good partitioning of the universe of discourse for probability. 0 0.5 1 0 0.5 1 Extremely Improbable 0 0.5 1 0 0.5 1 Very Improbable 0 0.5 1 0 0.5 1 Improbable 0 0.5 1 0 0.5 1 Somewhat Improbable 0 0.5 1 0 0.5 1 Tossup 0 0.5 1 0 0.5 1 Somewhat Probable 0 0.5 1 0 0.5 1 Probable 0 0.5 1 0 0.5 1 Very Probable 0 0.5 1 0 0.5 1 Extremely Probable Figure 6.6: V ocabulary of middle embedded type-1 fuzzy set models of linguistic probabilities. 6.3.2 Approximate Solution to the Optimization Problem Next, we solve the optimization problem of (6.9). It is a functional optimization problem, and cannot be solved analytically. Instead, our approach is to: 1. Choose the family (families) of probability distributions pertinent to the problem. 2. Choose the ranges of the parameters of the families of probability distributions. 3. Discretize the ranges of parameters of the probability distributions. 160 Table 6.6: Membership function parameters of the probability words in Fig. 6.6 Trapezoidal membership function parameters Extremely improbable (0, 0, 0.0114, 0.0972) Very improbable (0.0293, 0.1000, 0.1208, 0.1655) Improbable (0.0586, 0.1750, 0.2350, 0.3009) Somewhat improbable (0.0982, 0.2500, 0.2875, 0.3862) Tossup (0.3586, 0.5000, 0.5292, 0.5778) Somewhat probable (0.4793, 0.5500, 0.5875, 0.6457) Probable (0.5086, 0.6500, 0.6875, 0.7561) Very probable (0.7189, 0.8250, 0.8850, 0.9509) Extremely probable (0.9044, 0.9863, 1.0000, 1.0000) 4. Construct a pool of probability distributions having all possible combinations of parame- ters, and for all of its members: 4.1. Choose a specific p H from the pool (again, note that R b a p H (h)dh = 1, becausep H (h) is a probability distribution function on [a;b]). 4.2. Computev = R b a p H (h) W (h)dh. 4.3. Compute R b a p H (h) Tall (h)dh. 4.4. Compute (v) = Probable R b a p H (h) Tall (h)dh . 5. Construct a scatter plot of (v) versusv. 6. Detect an envelope of (v), namely P W (v). The envelope detection plays the role of taking the sup. One can imagine different ways of detecting the envelope. We used the following algorithm: (a) Divide the space of possiblev’s, which is [0; 1], intoN bins. 161 (b) For each bin: i. Search for all the (v; (v)) pairs whosev value falls in the bin. ii. Compute , the maximum of the (v)’s associated with the pairs found in the previous step. If there are no pairs whosev’s fall in the bin, = 0. iii. Forv’s that are members of the bin, set P W (v) = . Implementation of solutions using a huge number of distributions involves an enormous com- putational burden, and it is impossible to carry out the optimization over all possible distributions. One needs to incorporate some additional world knowledge about the type of probability distri- bution of heights into the solution of the PJW problem, since one should only use probability distributions for heights of males that make sense. More generally, each ACWW problem has a real world domain associated with it; hence, it is either explicitly or implicitly constrained by that domain. Therefore, when probability distributions are needed, they should be selected pertinent to that domain. It is shown in [232] that the distribution of the height of Americans is a mixture of Gaussian distributions, since heights of both American men and women obey Gaussian distributions. This suggests that the optimization problem in (6.9) can be carried out onX N [a;b] , the space of all normal distributions over [a;b]. The probability density function of a Gaussian probability distribution is formulated as: f(xj;) = 1 p 2 e (x) 2 2 2 (6.11) 162 Becausex2 [a;b], we normalize each probability distributionf(xj;) byF (bj;)F (aj;), whereF (xj;) = R x 1 f(j;)d is the cumulative distribution function off(xj;), so as to make (6.11) a probability distribution on [a;b]; so, for each distribution, we construct: g(xj;)f(xj;)=(F (bj;)F (aj;))I [a;b] (x) (6.12) whereI [a;b] () is the indicator function of the interval [a;b]. In this chapter, we chose 100 equally spaced points in the intervals [139:7; 221] and [0:01; 20] respectively as candidates for, and, which led to 10; 000 Gaussian distributions (normalized over [a; b]), and then implemented the above algorithm. The (v) versusv scatter plot for the case of using UMF T1 FSs is depicted in Fig. 6.7. It is not surprising that the scatter plots for the words Very short, Short, Moderately short, and Medium are left shoulders, because, the a priori knowledge that probably John is tall intuitively suggests that the probability of short sounding height words must be close to zero. It is not so obvious, however, before implementation of the solution, that the probability associated with the word Moderately tall is the whole unit interval, which means that such a probability can either be very small or very large. 6.4 The scatter plot associated with Tall has points that are exactly on the MF of Probable. The probability associated with Very tall is an interior MF, and is non-zero on quite a wide range, which means that there is lots of uncertainty associated with the probability that John is very tall. Similar plots were obtained for middle embedded T1 FSs. The envelopes for these plots are depicted in Figs. 6.8 and 6.9. 6.4 In the sequel, it will be seen that the solution maps to Somewhat improbable which has the broadest support. 163 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Very short v γ(v) (a) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Short v γ(v) (b) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Moderately short v γ(v) (c) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Medium v γ(v) (d) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Moderately tall v γ(v) (e) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Tall v γ(v) (f) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Very tall v γ(v) (g) Figure 6.7: Scatter plots for eachP W using UMFs as T1FS models of words, when distributions are Gaussian. 164 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Very short v μ P W (v) (a) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Short v μ P W (v) (b) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Moderately short v μ P W (v) (c) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Medium v μ P W (v) (d) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Moderately tall v μ P W (v) (e) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Tall v μ P W (v) (f) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Very tall v μ P W (v) (g) Figure 6.8: The detected envelopes P W (v) for plots in Fig. 6.7. UMFs were used as T1FS models of words and probability distributions were Gaussian. 165 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Very short v μ P W (v) (a) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Short v μ P W (v) (b) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Moderately short v μ P W (v) (c) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Medium v μ P W (v) (d) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Moderately tall v μ P W (v) (e) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Tall v μ P W (v) (f) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Very tall v μ P W (v) (g) Figure 6.9: The detected envelopes P W (v) when middle embedded type-1 fuzzy set models of words and Gaussian distributions were used. 166 We then calculated the Jaccard’s similarity of the solutions with each linguistic probability, so as to translate the results into natural language words. Those similarities are summarized in Tables 6.7 and 6.8. Table 6.7: Similarities between the T1 FSs depicted in Fig. 6.8 and UMFs for linguistic probability words P W whenW is: Extremely improbable Very improbable Improbable Somewhat improbable Tossup Somewhat probable Probable Very probable Extremely probable Very short 0.015878 0 0 0 0 0 0 0 0 Short 0.37504 0.02125 0 0 0 0 0 0 0 Moderately short 0.64825 0.079516 0.0071497 0 0 0 0 0 0 Medium 0.37724 0.41852 0.29086 0.13669 0 0 0 0 0 Moderately tall 0.075095 0.083312 0.17914 0.20202 0.16662 0.12086 0.16662 0.16877 0.077245 Tall 0 0 0 0 0.13138 0.27937 0.97344 0.043363 0 Very tall 0.006329 0.042309 0.29054 0.36591 0.30181 0.21892 0.16074 0 0 Table 6.8: Similarities between the T1 FSs depicted in Fig. 6.9 and middle T1 FSs for linguistic probability words P W whenW is: Extremely improbable Very improbable Improbable Somewhat improbable Tossup Somewhat probable Probable Very probable Extremely probable Very short 0.013334 0 0 0 0 0 0 0 0 Short 0.076762 0 0 0 0 0 0 0 0 Moderately short 0.16 0 0 0 0 0 0 0 0 Medium height 0.87674 0.23411 0.053564 0.0099949 0 0 0 0 0 Moderately tall 0.090033 0.099885 0.21477 0.24221 0.19826 0.14354 0.19939 0.092259 0.025536 Tall 0 0 0 0 0.14275 0.31181 0.85741 0.014082 0 Very tall 0.23814 0.2642 0.56076 0.37451 0 0 0 0 0 All of our solutions to the problem “What is the probability that John isW ”, given “Probably John is tall,” are summarized in Table 6.9. They were obtained by choosing the word (in Tables 167 6.7 and 6.8) that has the largest similarity. So, for example, when UMFs are used as T1FS models of words, then the linguistic solutions to the PJW problems are: “It is extremely improbable that John is very short.” “It is extremely improbable that John is short.” “It is extremely improbable that John is moderately short.” “It is very improbable that John has medium height.” “It is somewhat improbable that John is moderately tall.” “It is probable that John is tall.” (which means that the algorithm works correctly, since we have already assumed that “Probably John is tall”.) “It is somewhat improbable that John is very tall.” Observe, from Table 6.9 that five of the linguistic solutions are the same, regardless of which T1 FS model was used for all the words; however, the fact that two of the solutions are different suggests that results for this ACWW problem can be sensitive to the T1 FS that is used to model a word. Table 6.9: Summary of the solutions to the problem “What is the probability that John isW , given “Probably John is tall” W UMF Middle embedded T1FSs Very short Extremely improbable Extremely improbable Short Extremely improbable Extremely improbable Moderately short Extremely improbable Extremely improbable Medium height Very improbable Extremely improbable Moderately tall Somewhat improbable Somewhat improbable Tall Probable Probable Very tall Somewhat improbable Improbable Before leaving this section, It is interesting to study the solutions to some of the problems more thoroughly. 168 Observe, from Figs. 6.8(e) and 6.9(e) that the probability that “John is moderately tall” is a fuzzy set whose membership is equal to 1 for large subintervals of [0; 1]. To understand the reason for this, observe from (6.9) that in order for P Moderately tall (v) = 1 to hold for a particular p H and the value of v associated with it (v = R b a Moderately tall (h)p H (h)dh 2 [0; 1]), one must have Probable ( R b a Tall (h)p H (h)dh) = 1 for that particular p H . Because Probable (u) = 1, 0:65u 0:7 (see the parameters of the UMF of Probable in Table 6.3 ), Probable ( R b a Tall (h)p H (h)dh) = 1 occurs when: 0:65 Z b a Tall (h)p H (h)dh 0:7 (6.13) Those p H that satisfy (6.13) can make v vary over a large range of values in [0; 1], because they can have either large or small overlaps with Moderately tall. If they are to the left of Tall , they have large overlaps with both Tall and Moderately tall . If they are to the right of Tall , they have small overlaps with Moderately tall . Note from (6.13), that p H (h) has a large overlap with Tall (h). If p H happens to be to the right of Tall , it may have a very small overlap with Moderately tall , because Moderately tall lies to the left of Tall which makes v = R b a Moderately tall (h)p H (h)dh2 [0; 1], small; hence, P Moderately tall (v) can equal 1 for small values ofv. On the other hand, ifp H (h) happens to be to the left of Tall , it will have a large overlap with Moderately tall (h), which makesv large; hence P Moderately tall (v) can equal 1 for large values ofv . A similar argument applies toP Very tall [although P Moderately tall (v) can equal 1 for a wider range ofv’s. Compare Figs. 6.8(g) and 6.9(g))]. Comparing Figs. 6.8 and 6.9, observe that middle embedded T1 FSs provide narrower MFs for the words involved in the PJW problem as compared to UMF T1FSs. Because the GEP propagates 169 uncertainty, solutions corresponding to middle embedded T1FS are therefore narrower (i.e., less uncertain) than solutions using UMF T1FSs. 6.4 On Correctness of The Results A natural question to ask at this point is: How does one know that the solution obtained for our ACWW problem is correct? We know of no general way to answer this question. Instead, we fall back on a very simple idea, namely we consider one PJW problem for which we surely know what the correct answer is, and see if the GEP and our algorithm for solving (6.9) provide that answer. That PJW problem is: “Probably John isW . What is the probability that John isW ?.” Common sense tells us the answer is “Probable,” i.e., It is probable that John isW . The solution to this problem using T1FSs and the GEP is (see (6.9)): P W (v) = 8 > > > > < > > > > : sup v= R b a p H (h) W (h)dh R b a p H (h)dh=1 Probable R b a p H (h) W (h)dh 9p H 2X [a;b] s:t:v = R b a p H (h) W (h)dh 0 69p H 2X [a;b] s:t:v = R b a p H (h) W (h)dh (6.14) 170 Because sup v= R b a p H (h) W (h)dh R b a p H (h)dh=1 Probable Z b a p H (h) W (h)dh = sup v= R b a p H (h) W (h)dh R b a p H (h)dh=1 Probable (v) = Probable (v) (6.15) (6.14) simplifies to: P W (v) = 8 > < > : Probable (v) 9p H 2X [a;b] s:t:v = R b a p H (h) W (h)dh 0 69p H 2X [a;b] s:t:v = R b a p H (h) W (h)dh (6.16) From (6.16), it is obvious that the membership function of P W is always equal to that of Probable, except for thosev’s for which no probability distribution is found that satisfiesv = R b a p H (h) W (h)dh, which, by second line of (6.16), means that P W (v) = 0; however, this does not occur, sinceW ’s have membership functions that have large areas, 6.5 it is conceivable thatp H ’s with different amounts of overlap with W ’s could yield any possiblev (i.e. the whole unit interval). Therefore, the solution using T1FSs gives the word “Probable” as the solution (the similarity of the output with “Probable” is not exactly 1, due to the approximation enforced by the envelope detection algorithm). We could stop with (6.16), but we want to verify that our computational method also leads to the right-hand side of (6.16). In order to demonstrate (6.16), we calculated the solution to the PJW problem “Probably John isW . What is the probability that John isW ?”. We used UMF T1 FS word models, and divided the intervals [139:7; 221] and [0; 20] into 200 equally spaced points, so that there were 200 200 = 40000 Gaussian probability distributions. The solutions for eachW are shown in 6.5 If they had small areas under them, the integral R b a W (h)pH (h)dh might have an upper bound less than 1. 171 Fig. 6.10, and their similarities with the vocabulary of linguistic probabilities modeled by UMFs are given in Table 6.10. Observe that all of the solutions map to the word “Probable,” which is the result that was expected. How to validate the GEP approach to solving ACWW problems in a more general way is an important and open research question, one that we leave to the readers. Table 6.10: Similarities between the words in Fig. 6.10 and the UMFs of the linguistic probability words of Fig.6.3 P W whenW is: Extremely improbable Very improbable Improbable Somewhat improbable Tossup Somewhat probable Probable Very probable Extremely probable Very short 0 0 0 0 0.13469 0.28436 0.94587 0.045339 0 Short 0 0 0 0 0.13507 0.28474 0.94331 0.045816 0 Moderately short 0 0 0 0 0.13497 0.28452 0.94432 0.045751 0 Medium 0 0 0 0 0.13494 0.28461 0.94403 0.045652 0 Moderately tall 0 0 0 0 0.13511 0.28482 0.94417 0.045546 0 Tall 0 0 0 0 0.13505 0.28481 0.94398 0.045504 0 Very tall 0 0 0 0 0.13495 0.28459 0.94329 0.045816 0 6.5 An Engineering ACWW Problem In this section we solve a specific problem about product reliability to demonstrate how ACWW problems for real world applications can be solved by the methodology of this chapter. Consider the following statement, which provides the subjective judgment about the reliability of a product, and a related question: Probably product X is highly reliable. What is the probability that the reliability of X isR? R represents a word describing reliability, and can be one of the words: None to very little, Very low, Low, More or less low, From fair to more or less high, More or less high, High, Extremely 172 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Very short v μPW (v) (a) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Short v μPW (v) (b) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Moderately short v μPW (v) (c) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Medium v μPW (v) (d) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Moderately tall v μPW (v) (e) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Tall v μPW (v) (f) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Very tall v μPW (v) (g) Figure 6.10: The detected envelopes P W (v), which are solutions to the PJW problem “Probably John isW . What is the probability that John isW ?” 173 high. These words constitute the user-friendly eight-word vocabulary introduced in [173, Ch. 7]. Their FOUs are depicted in Fig. 6.11, and the parameters of the MFs are given in Table 6.11. The pairwise Jaccard similarities between those words are given in Table 6.12. Note that the threshold of pairwise similarity for an appropriate partitioning of the space was set to 0:6 in [173, Ch. 7], therefore this vocabulary represents a good partitioning of the space. We chose a [0; 10] scale, and interpreted reliability as “time to failure,” so “high reliability” corresponds to a large time to failure. This scale is a “hypothetical scale,” and can be re-scaled to any appropriate time scale. 0 5 10 0 0.5 1 None to very little 0 5 10 0 0.5 1 Very low 0 5 10 0 0.5 1 Low 0 5 10 0 0.5 1 More or less low 0 5 10 0 0.5 1 From Fair to more or less high 0 5 10 0 0.5 1 More or less high 0 5 10 0 0.5 1 High 0 5 10 0 0.5 1 Extremely high Figure 6.11: V ocabulary of IT2FSs representing linguistic reliability [173, Fig. 7.5]. As in Section 6.3, we used the UMFs of IT2FS models of reliability words as the T1FS models of those words (see Table 6.11 and Fig. 6.11). 174 Table 6.11: Membership functions of the words depicted in Fig. 6.11 [173, Table 7.13] UMF parameters LMF parameters None to very little (0, 0, 0.22, 3.16) (0, 0, 0.02, 0.33, 1) Very low (0, 0, 1.37, 3.95) (0, 0, 0.14, 1.82, 1) Low (0.38, 1.63, 3.00, 4.62) (1.90, 2.24, 2.24, 2.51, 0.31) More or less low (0.38, 2.25, 4.00, 5.92) (2.99, 3.31, 3.31, 3.81, 0.32) From fair to more or less high (2.33, 5.11, 7.00, 9.59) (5.79, 6.31, 6.31, 7.21, 0.43) More or less high (4.38, 6.25, 8.00, 9.62) (6.90, 7.21, 7.21, 7.60, 0.29) High (4.73, 8.82, 10, 10) (7.68, 9.82, 10, 10, 1) Extremely high (7.10, 9.80, 10, 10) (9.74, 9.98, 10, 10, 1) Table 6.12: Pairwise similarities between the words depicted in Fig. 6.11 None to very little Very low Low More or less low From fair to more or less high More or less high High Extremely high None to very little 1 0.50973 0.2415 0.1671 0.0090779 0 0 0 Very low 0.50973 1 0.34288 0.2405 0.029751 0 0 0 Low 0.2415 0.34288 1 0.59678 0.082946 0.0012906 0 0 More or less low 0.1671 0.2405 0.59678 1 0.18822 0.044303 0.01457 0 From Fair to more or less high 0.0090779 0.029751 0.082946 0.18822 1 0.5464 0.2342 0.098566 More or less high 0 0 0.0012906 0.044303 0.5464 1 0.35195 0.16251 High 0 0 0 0.01457 0.2342 0.35195 1 0.37394 Extremely high 0 0 0 0 0.098566 0.16251 0.37394 1 175 In the reliability literature, time to failure is modeled by a variety of distributions including exponential, Gaussian, and Weibull. Here, we assume that the distribution of time to failure (as a measure of reliability) is Weibull, whose probability density function is: f(xj;) = x 1 e ( x ) I [0;1) (x) (6.17) whereI [0;1) () is the indicator function of the interval [0;1). In all of the simulations of this section, we chose 2 [0; 10] and 2 [0:1; 1500], and divided each of these intervals into 500 equally spaced points, obtaining 250000 Weibull probability distributions. Solutions from the GEP using UMFs were obtained employing the same methodology as applied to the PJW problem; the detected envelopes are shown in Fig. 6.12. We then calculated the Jaccard’s similarity of the solutions with each linguistic probability, so as to translate the results into natural language words. Those similarities are summarized in Table 6.13. Table 6.13: Similarities between the words depicted in Fig. 6.12 and linguistic probability words P R when R is: Extremely improbable Very improbable Improbable Somewhat improbable Tossup Somewhat probable Probable Very probable Extremely probable None to very little 0.051538 0 0 0 0 0 0 0 0 Very low 0.096336 0.009429 0 0 0 0 0 0 0 Low 0.28419 0.054818 0.019226 0.000046534 0 0 0 0 0 More or less low 0.68862 0.16345 0.064759 0.035634 0 0 0 0 0 From fair to more or less high 0.020604 0 0 0 0.12018 0.21302 0.39502 0.37763 0.1118 More or less high 0 0 0 0 0.030922 0.066089 0.2933 0.46496 0.22155 High 0 0 0 0 0.12894 0.27585 0.99457 0.041807 0 Extremely high 0.16345 0.2125 0.45715 0.51555 0.047485 0.0000069725 0 0 0 176 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P None to very little v μ P W (v) (a) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Very low v μ P W (v) (b) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Low v μ P W (v) (c) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P More or less low v μ P W (v) (d) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P From fair to more or less high v μ P W (v) (e) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P More or less high v μ P W (v) (f) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P High v μ P W (v) (g) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P Extremely high v μ P W (v) (h) Figure 6.12: The detected envelopes P R when UMF fuzzy set models of words and Weibull distributions were used. 177 All of our solutions to the problem ‘What is the probability that the reliability of product X isR” given “Probably product X has high reliability”, are summarized in Table 6.14. They were obtained by choosing the word (in Table 6.13) that has the largest similarity; thus, the linguistic solutions to this problem are: “It is extremely improbable that X’s reliability is none to very little.” “It is extremely improbable that X has very low reliability.” “It is extremely improbable that X has low reliability.” “It is extremely improbable that X has more or less low reliability.” “It is probable that X’s reliability is from fair to more or less high.” “It is probable that X’s reliability is more or less high.” “It is probable that X’s reliability is high.” (which means that the algorithm works correctly, since we have already assumed that “Probably X has high reliability”.) “It is somewhat improbable that X’s reliability is extremely high.” Observe in Fig. 6.12(e), there is a “spike” close tov = 0. Actually, when the distributions used for solving the reliability problems (or the PJW problems) 6.6 have “tails” (like Gaussian and Weibull distributions), the membership function ofP R (orP W ), at least for someR’s (W ’s), are non-zero for v’s that are very close to zero. When R is far from High (e.g., when R = Fromfair tomoreor lesshigh), and when a probability distribution has a large amount of overlap with High , R b a High ()p()d can be large, so that Probable ( R b a High ()p()d)6= 0. However, since R (h) is in the region of the tail ofp H ,v = R b a R ()p()d is very small, but is non zero; hence, P R (v) is non-zero for very small values ofv. One may therefore expect such spikes to appear in many solutions to the reliability and PJW problems that mainly look like right shoulders or interiors. Those spikes are so small that, in many solutions, they are not visible. 6.6 The widths of the spikes are so narrow in the PJW problem that may be not be visible in the plots. 178 Table 6.14: Summary of the solutions to the problem “What is the probability that the reliability of product X isR” given “Probably product X has high reliability” P R whenR is: UMF None to very little Extremely improbable Very low Extremely improbable Low Extremely improbable More or less low Extremely improbable From fair to more or less high Probable More or less high Probable High Probable Extremely high Somewhat improbable More importantly they do not contribute very much to similarity calculations, because they have very small areas. 6.6 Other Zadeh ACWW Challenge Problems Zadeh has proposed some other challenge problems for ACWW [333], some of which can also be solved by the GEP. In this section, we present three of them, so that the reader will see that the methodology we have presented for the PJW problems can in principle also be used to solve the other problems. 6.6.1 Tall Swedes Problem (AHS) The tall Swedes problem 6.7 is about the average height of Swedes (AHS), and is: 6.7 We do not use the acronym “TSP” because it is already widely used for the famous Traveling Salesman Problem. 179 Most Swedes are tall. What is the average height of Swedes? This problem involves a linguistic quantifier (Most) and an implicit assignment of the linguistic quantifier to the portion of tall Swedes. The portion of Swedes who are tall is equivalent to the probability of the fuzzy event Tall, and is calculated as: P Tall = Z b a Tall (h)p H (h)dh (6.18) in which a and b are the minimum and maximum possible heights and p H is the probability distribution function of heights which clearly satisfies: Z b a p H (h)dh = 1 (6.19) The soft constraint “Most” is assigned to P Tall . On the other hand, the average height of Swedes is calculated as: AH = Z b a p H (h)hdh (6.20) To derive the soft constraint imposed onAH by the fact thatP Tall is constrained by “Most”, one needs to use the framework of the GEP. In the Tall Swedes problem,f =P Tall , where f :X [a;b] !R (6.21) 180 in whichX [a;b] is the space of all possible probability distribution functions on [a;b]; and, g = AH, where g :X [a;b] !R (6.22) The GEP then implies that the soft constraint on the average height of Swedes is computed, as: AH (v) = sup v= R b a p H (h)h(h)dh R b a p H (h)dh=1 Most Z b a p H (h) Tall (h)dh (6.23) Comparing (6.23) and (6.9), it is clear that our methodology for solving (6.9) can also be used to solve (6.23). It is worth noting that other approaches to solving the tall Swedes problem have been offered in [194, 221]. 6.6.2 Robert’s Problem (RP) The RP is: Usually Robert leaves his office at about 5 pm. Usually it takes Robert about an hour to get home from work. What is the probability that Robert is at home at 6:15 pm? The time of arrival of Robert,Z, the time of departure of Robert,X, and his travel time,Y are three random variables that satisfy: Z =X +Y (6.24) 181 The probability density function ofZ is calculated as: p Z (v) = Z +1 1 p X ()p Y (v)dp X p Y (6.25) The probability of the fuzzy eventAAbout 5pm,P A is: P A = Z b 1 a 1 p X () A ()d (6.26) wherea 1 anda 2 are respectively the earliest and latest possible times that Robert leaves the office. The probability of the fuzzy eventBAboutanhour,P B is: P B = Z b 2 a 2 p Y () B ()d (6.27) wherea 2 andb 2 are respectively the smallest and the largest possible amounts of time that it takes Robert to get home from work. The probability that Robert is at home beforet = 6 : 15,P Home is: P Home = Z t t 0 p Z ()dw (6.28) wheret 0 is the earliest time for Robert to get home.P A andP B are both constrained by “usually”. We want to find the constraint onP Home : 182 Z b 1 a 1 p X () A ()d isUsually and Z b 2 a 2 p Y () B ()d isUsually P Home = Z t t 0 (p X p Y )()d is Then: (w) = sup w= R t t 0 p Z ()d R b 1 a 1 p X ()d=1 R b 2 a 2 p Y ()d=1 min Usually Z b 1 a 1 p X () A ()d ; Usually Z b 2 a 2 p Y () B ()d (6.29) We leave devising an algorithm for solving this problem to the reader. Note that for solving this problem, two probability distributionsp X andp Y need to be selected from the corresponding pools of probability distributions, and thenp Z =p X p Y needs to be calculated from (6.25). Then t =P Home must be calculated from (6.28). For eacht, (t) must be calculated from (6.29) (it is the argument of sup), which needs calculatingP A andP B from (6.26) and (6.27). The solution is the envelope of (t). 1. Choose the family (families) of probability distributions pertinent top X andp Y . 2. Choose the ranges of the parameters of the families of probability distributions. 183 3. Discretize the ranges of parameters of the probability distributions. 4. For p X , construct a pool of probability distributions having all possible combinations of parameters. 5. For p Y , construct a pool of probability distributions having all possible combinations of parameters. For all of the members of the two pools: 5.1. Choose two specific probability distributions p X and p Y from the pools (Note that R b 1 a 1 p X ()d = 1 and R b 2 a 2 p Y ()d = 1, becausep X andp Y are probability distribution functions on [a 1 ;b 1 ] and [a 2 ;b 2 ], respectively). 5.2. Computev = R b a p H (h) W (h)dh. 5.3. Compute R b a p H (h) Tall (h)dh. 5.4. Compute (v) = Probable R b a p H (h) Tall (h)dh . 6. Construct a scatter plot of (v) versusv. 7. Detect an envelope of (v), namely P W (v). The envelope detection plays the role of taking the sup. One can imagine different ways of detecting the envelope. We used the following algorithm: (a) Divide the space of possiblev’s, which is [0; 1], intoN bins. (b) For each bin: i. Search for all the (v; (v)) pairs whosev value falls in the bin. ii. Compute , the maximum of the (v)’s associated with the pairs found in the previous step. If there are no pairs whosev’s fall in the bin, = 0. 184 iii. Forv’s that are members of the bin, set P W (v) = . 6.6.3 Swedes and Italians Problem (SIP) The SIP is: Most Swedes are much taller than most Italians. What is the difference in the average height of Swedes and the average height of Italians? Zadeh formulates the SIP using generalized constraints, as follows. Assume that the popula- tion of Swedes is represented byfS 1 ;S 2 ; ;S m g and the population of Italians is represented byfI 1 ;I 2 ; ;I n g. The height of S i is denoted by x i , i = 1; ;m, and the height of I j is denoted byy j ,j = 1; ;n. Let: 8 > < > : x (x 1 ;x 2 ; ;x m ) y (y 1 ;y 2 ; ;y n ) (6.30) Muchtaller is defined as a fuzzy relation onH m S H n I , in whichH m S andH n I are respectively the spaces of all possible heights of Swedes and Italians, and is the degree to whichS i is much taller thanI j ish ij , i.e.: h ij Muchtaller (x i ;y j ) (6.31) The cardinalityc i of the set of Italians in relation to whom a SwedeS i is much taller can be calculated using the following -count [320]: c i = n X j=1 Muchtaller (x i ;y j ) = n X j=1 h ij (6.32) 185 The proportion of Italians in relation to whomS i is much taller, i , is then: i c i n (6.33) Using a T1FS model for the linguistic quantifierMost, the degreeu i to which a SwedeS i , is much taller than most Italians, is: u i = Most ( i ) (6.34) The proportion of them Swedes who are much taller than most Italians can be derived via the division of the -count of those Swedes bym: v = 1 m m X i=1 u i (6.35) Consequently, the degree to whichv belongs to the linguistic quantifierMost is determined by: M(x;y) = Most (v) (6.36) in which the fact thatv is a function ofx andy is emphasized in the argument ofM. The difference in average height of Swedes and the average height of Italians,d, is calculated as: d = 1 m X i x i 1 n X j y j (6.37) 186 To derive the linguistic constraint imposed on d by (6.36), one exploits the GEP. Zadeh’s approach states that there is a soft constraint “Most” onv, the -count of Swedes who are much taller than most Italians, given by (6.35), and requires the calculation of the soft constraint ond, given by (6.37). Therefore, in (6.5),f(x;y) =v and: f :H m S H n I !R (6.38) Also,g(x;y) =d = 1=m P i x i 1=n P j y j , and: g :H m S H n I !R (6.39) The GEP implies that the soft constraintD on the difference in average heights,d, is characterized by the following membership function: D (d) = sup (x;y)2H m S H n I d= 1 m P i x i 1 n P j y j Most 0 @ 1 m m X i=1 Most 0 @ 1 n n X j=1 Muchtaller (x i ;y j ) 1 A 1 A = sup (x;y)2H m S H n I d= 1 m P i x i 1 n P j y j M(x;y) (6.40) in which (x;y) belongs toH m S H n I , the space of all possible heights that Swedes and Italians can have. The sup is taken over this space since we have no information on the height distributions among these two nationalities. The problem stated in (6.40) can be solved using the following algorithm, which is in the spirit of our methodology for solving the PJW problem: 1. ChooseN 1 differentx’s andN 2 differenty’s for whichx i 2H S andy j 2H I . 187 2. Construct a pool of all possible pairs (x;y), and for all of its members: 2.1. Choose a specific (x;y) from the pool. 2.2. Computed = 1 m P i x i 1 n P j y j . 2.3. Compute (d) = Most 1 m P m i=1 Most 1 n P n j=1 Muchtaller (x i ;y j ) . 3. Construct a scatter plot of (d) versusd. 4. Detect an envelope of (d), namely D (d). 5. Apply an envelope detection algorithm to (d). 6.7 Discussion In this section, we briefly discuss how one can solve an ACWW problem in general. This issue is not an easy one, because ACWW problems can conceivably be very complicated, due to the complicated nature of natural languages; hence, more research needs to be done to recognize different categories of ACWW problems involving implicit assignments of probability, truth, pos- sibility, etc. Nevertheless, inspired by ACWW problems that have been discussed in this chapter, we attempt to demonstrate a fairly general framework for solving ACWW problems that involve linguistic probabilities, namely: (i). Determine the linguistic probability words, linguistic quantifiers, and linguistic usuality words in the problem. (ii). From the linguistic description of the problem, determine which quantities (i.e., numeric probabilities) are constrained by the words that were found in the previous step. This may be a difficult task, since those words may be implicitly assigned to the quantity. (e.g., Most 188 is assigned to the portion of Swedes who are tall, Probable is assigned to the probability that John is tall, etc.) (iii). Formulate the numeric probabilities in the previous step: (iii).1. Use definite integrals involving indicator functions of non-fuzzy events or membership functions of fuzzy events and the (unknown) probability dis- tribution functions pertinent to those events (continuous case); or, (iii).2. Use the fraction of the cardinality of the fuzzy event over the cardinality of the whole population (discrete case). Like the continuous case, for which the probability density functions might be unknown, some quantities related to the population (e.g. average height of a population or the variance of the height of a population) may be unknown. (iv). Formulate the quantity on which a fuzzy constraint must be calculated, in terms of the unknowns (e.g., probability distributions) of Steps (iii).1. or (iii).2. This quantity may be an average, a numeric probability, or a function of some averages, etc. To do this, the calculi of fuzzy sets is needed. (v). Apply the GEP: knowing the soft constraints on the quantities formulated in Step (iii), deter- mine the soft constraint on the quantity that was formulated in Step (iv). The sup of the GEP is taken over all possible unknowns of Steps (iii).1. or (iii).2. For example, if a probability distribution is not known, or if the height of the individuals in a population is unknown, the sup is taken over all admissible probability distributions or admissible heights of individ- uals. The word “admissible” is crucial and implies that additional problem-specific world knowledge is needed. 189 We leave it to the reader to return to our earlier sections to see how these five steps were implemented for specific ACWW problems. 6.8 The Relationship between Perceptual Computing and Advanced Computing with Words using the GEP In this section, we investigate the relationship between ACWW and Perceptual Computing [173]. The architecture of a Perceptual Computer (Per-C for short) is illustrated in Fig. 6.13 [164, 166, 167]. The Per-C has three major subsystems: encoder, which constructs fuzzy set models of words that are used by the CWW engine; CWW engine, which, based on the knowledge pertinent to a specific problem and the calculi of fuzzy sets, computes a fuzzy set at its output, that is used by the decoder; and, decoder, which transforms the FS output of the engine into a recommendation (a word) that is comprehensible to humans. Data that support the recommendation may also be provided. In this chapter, we used the Interval Approach (IA) [151] or the Enhanced Interval Approach (EIA) [286] for establishing IT2FS models of words. We then used UMFSs or middle embedded T1FSs extracted from the IT2FS models of words, as T1FS models of them; hence, we used the IA or EIA as encoders. To compute the solutions to ACWW problems using those fuzzy set models, we used the GEP; hence, the GEP acts as a new 6.8 CWW engine. We then used Jaccard’s similarity measure as a decoder to transform the fuzzy set provided by the GEP, into a word. We can therefore conclude that ACWW using the GEP fits within the framework of Perceptual Computing. 6.8 Other CWW engines are: novel weighted averages and IF-THEN rules. 190 Figure 6.13: The perceptual computer that uses FS models for words. 6.9 Conclusions and Future Work Zadeh believes that the GEP is one of the main aggregation tools for CWW, especially when dealing with probability constraints. Unfortunately, analytic solutions involving the GEP are presently hopeless, and existing numerical algorithms (e.g.,-cut decomposition) are not directly applicable. In this chapter, we solved some of Zadeh’s challenge problems that involve linguistic probabilities using a novel algorithm for implementing the GEP. To the best of our knowledge, our algorithm is arguably the first attempt to actually implement Zadeh’s solution to some of his ACWW problems. The applicability of our algorithm has been demonstrated by solving some specific ACWW problems. An important prerequisite for using the GEP for problems involving linguistic probabilities, is knowing the type of probability distributions for a specific problem. This additional “world knowledge” is a luxury that may not always be available. A limitation of our algorithm for implementing the GEP is its “exhaustive search” on the space of the parameters involved in the problem, especially when the number of parameters (of the probability distributions) proliferates. 6.9 Implementing the GEP without exhaustive search is certainly worthy of future research. Although our solutions to the PJW problems used T1 FSs, those FSs were obtained from IT2 FS models for reasons that are given in Section 6.3. The readers may be wondering why we have 6.9 An example of this occurs when we need the distribution of heights of people (men and women), which is a mixture of Gaussians (kf(xj1;1) + (1k)f(xj2;2)), and hence, has five parameters. 191 not presented our solutions to the PJW problem entirely within the framework of IT2 FSs since we are strong advocates for using such FSs in CWW problems. We have not done this for the following reasons: (i). To-date, although Zadeh formulated the PJT problem and gave the structure of its solution using the GEP for T1 FSs, no numerical or linguistic solutions were provided even for T1 FSs. We believe that we are the first to do so in this chapter. (ii). All of the basic concepts for solving the PJW problem are more easily explained and un- derstood using T1 FSs. (iii). To-date, how to state and solve the GEP for IT2 FSs is an open research topic, one that we are researching and will report on in the future. (iv). We strongly believe that an IT2 solution to a problem needs to be compared with T1 solu- tions, so that one can: (a) see if an IT2 solution is needed, and (b) observe similarities and differences between the T1 and IT2 solutions. The results that we have presented in this chapter will serve as a useful baseline for such comparisons. We also believe that there is more than one way to solve ACWW problems, and have pre- sented different solutions to Zadeh’s ACWW problems in [210, 211, 215, 216] using syllogisms and Novel Weighted Averages. Those solutions do not need any world knowledge about prob- ability distributions, but they need world knowledge about the domains of variables involved in the problems (e.g., time and height). Interestingly, they also use the Extension Principle, but in a different way from the GEP. 192 How to validate a particular solution to an ACWW problem is an important and interesting issue. In this chapter, we established a method to investigate the correctness of solutions by check- ing whether the algorithm yields the correct solution to a problem whose solution is intuitively known. We believe that establishing methodologies for validating the solutions to an ACWW problem is an open and very important research topic. 193 Chapter 7 Probability Calculations Using Variations of The Generalized Extension Principle for Interval Type-2 Fuzzy Sets: Applications to Advanced Computing with Words There is no such uncertainty as a sure thing. Robert Burns, Scottish Poet and Lyricist I N this chapter, we propose and demonstrate an effective methodology for extending the Gen- eralized Extension Principle to solve Advanced Computing with Words (ACWW) problems when words are modeled by IT2 FSs. Such problems involve implicit assignments of linguistic truth, probability, and possibility. The solutions of the ACWW problems, which involve the fuzzy set models of the words, are formulated using variations of the Generalized Extension Principle for IT2 FSs. 194 7.1 Introduction Advanced Computing with Words (ACWW) [175, 333] is a subarea of Computing with Words (CWW) [31, 142, 168, 170, 226, 262, 270] which deals with problems in which assignment of constraints are allowed to be implicit, thorugh intricate natural language statements. Those con- straints may describe probability, truth, possibility, and usuality. Moreover, world knowledge plays a crucial role in solving ACWW problems. In contrast, Basic Computing with Words mainly deals with assignment of attributes via IF-THEN rules [334]. Because fuzzy logic deals with the uncertainty associated with unsharpness of boundaries of classes that are described by narural language words, it seems that its dessemination is equal to solving problems that are stated in natural languages for everyday reasoning; however, enginner- ing applications of fuzzy logic [176] extensively involve the function approximation properties of fuzzy systems through the calculi of IF-THEN rules, rather than their semantic applications to natural language statements. Nevertheless, the past decade witnessed the resurgence of the idea of CWW and its applications to various real-world problems including control [162], decision- making [163, 166, 284], classification [7], quality assessment [100], text processing [346], expert systems [124, 202], and natural language generation [117]. Moreover, various theoretical approaches have been proposed to formalize the CWW paradigm, among which are: ontologies [222], Turing Machines [260], rough sets [199], formalization of the Generalized Constraint Language [125] which was proposed by Zadeh [326, 330, 338], fuzzy Petri nets [30], and Perceptual Computing (Per-C) [164, 166, 173, 194]. Zadeh in his works has introduced some ACWW problems that involve everyday reasoning and decision making by linguistic probabilities [333]. To the best of our knowledge, there have been only a few attempts to computationally solve those problems [137, 210–212, 215, 216, 220]. 195 The Generalized Extension Principle (GEP) [325] is an important tool in solving ACWW problems. The solutions of ACWW problems with GEP may involve complicated functional optimization problems that are not easy to solve. In [220], an effective methodology to solve ACWW problems using the GEP for Type-1 Fuzzy Sets (T1 FSs) was introduced. On the other hand Zadeh predicts that Type-2 Fuzzy Sets (T2 FSs) will play a central role in solving ACWW problems. Therefore, a framework of ACWW with the GEP has to be developed for T2 FSs. According to Mendel [168], words mean different things to different people; therefore, a correct first-order uncertainty model for a word is an Interval Type-2 Fuzzy Set (IT2 FS). In this chapter, we propose an approach to extend the solutions of ACWW problems to IT2 FSs using variations of the GEP for IT2 FSs. The rest of this chapter is organized as follows. In Section 7.2, we describe one of Zadeh’s ACWW problems and show its solution methodology using the GEP for T1 FSs. In Section 7.3, we propose two versions of the GEP for IT2 FSs. In Section 7.4, we use the mathematical framework we already developed in 7.3, to find mathematical solutions to ACWW problems, when words are modeled by IT2 FSs. In Section 7.4, we use the GEPs that are introduced in Section 7.5, to computationally implement the solutions to Zadeh’s ACWW problems. Finally in Section 7.6, we present some conclusions and directions for future research. 7.2 Problem Description Among Zadeh’s many ACWW problems is the following famous Probability that John is short (PJS) problem: Probably John is tall. What is the probability that John is short? 196 This problem involves a linguistic probability (probably) and an implicit assignment of that lin- guistic probability to the probability that John is tall. The probability that “John is tall”,P Tall , is calculated as 7.1 [317]: P Tall = Z b a p H (h) Tall (h)dh (7.1) in whicha andb are the minimum and maximum possible heights of men andp H is the probability distribution function of heights, where: Z b a p H (h)dh = 1 (7.2) The probability of the fuzzy event “Short” is calculated as: P Short = Z b a p H (h) Short (h)dh (7.3) To derive the soft constraint imposed onP Short by the fact thatP Tall is constrained by “Prob- ably,” one needs to use the framework of the GEP, which is an important tool for propagation of possibilistic constraints, and was originally introduced in [325]. Assume thatf() andg() are real-valued functions: f;g :U 1 U 2 U n !V R (7.4) 7.1 We assume that “Probably John is tall” is equivalent to “It is probable that John is tall”. 197 Moreover, assume that: f(X 1 ;X 2 ; ;X n ) isA g(X 1 ;X 2 ; ;X n ) isB whereA andB are T1 FSs. ThenA inducesB as follows: B (v) = 8 > > < > > : sup u 1 ;u 2 ;;unjv=g(u 1 ;u 2 ;;un) A (f(u 1 ;u 2 ; ;u n )) 9v =g(u 1 ;u 2 ; ;u n ) 0 69v =g(u 1 ;u 2 ; ;u n ) (7.5) The GEP basically extends the function 7.2 g(f 1 ()) : V ! V to T1 FSs, where f 1 is the pre-image of the functionf(). In the PJS problem,f =P Tall , and: f :X [a;b] !R (7.6) whereX [a;b] is the space of probability distribution functions on [a;b]. Also,g =P Short , and: g :X [a;b] !R (7.7) 7.2 f(g 1 ) is understood as the composition of two relations, i.e.fg 1 . 198 The GEP then implies that the soft constraint on the probability that “John is short” is: P Short (v) = 8 > > > > < > > > > : sup p H jv= R b a p H (h) Short (h)dh R b a p H (h)dh=1 Probable R b a p H (h) Tall (h)dh 9p H 2X [a;b] s:t:v = R b a p H (h) Short (h)dh 0 69p H 2X [a;b] s:t:v = R b a p H (h) Short (h)dh (7.8) Note that (7.8) cannot be solved by using the -cut decomposition theorem, since B() = g(f 1 (A())) (f = P Tall and g = P Short ), but the relation f 1 () cannot be derived explic- itly. In this chapter, we also study variations of the PJS problem, that include: Probably John is tall. What is the probability that John isW ?. This is called the “PJW problem”. In this problem W represents any height word. Similar to (7.8), the GEP yields the soft constraint P W on the probability that “John isW ”: P W (v) = 8 > > > > < > > > > : sup p H jv= R b a p H (h) W (h)dh R b a p H (h)dh=1 Probable R b a p H (h) Tall (h)dh 9p H 2X [a;b] s:t:v = R b a p H (h) W (h)dh 0 69p H 2X [a;b] s:t:v = R b a p H (h) W (h)dh (7.9) Note that (7.8) and (7.9) are difficult functional optimizations that have to be carried out over X [a;b] , the space of probability distributions over [a;b]. 199 In this chapter, we solve the PJW problem using IT2 fuzzy models of words. Since the solution of the PJS problem relies on the GEP, we first try to extend the framework of the GEP to IT2 FSs, and then demonstrate how the PJW problem can be solved using the extended GEP. Because the GEP is one of the main tools that are used in ACWW [325,333], its extension to IT2 FSs aids the use of IT2 FSs in solving ACWW problems. 7.3 Extensions of the GEP to IT2 FSs 7.3.1 Extension of the GEP for Real-Valued Functions to IT2 FSs Next, we extend the GEP to IT2 FSs. In this section we focus on real-valued functions constrained by IT2 FSs. Assume thatf() andg() are real-valued functions: f;g :U 1 U 2 U n !V R (7.10) Moreover, assume that: f(X 1 ;X 2 ; ;X n ) is e A g(X 1 ;X 2 ; ;X n ) is e B where e A and e B are IT2 FSs. Then e A induces e B as follows: 200 e B (v) = 8 > > < > > : sup u 1 ;u 2 ;;unjv=g(u 1 ;u 2 ;;un) e A (f(u 1 ;u 2 ; ;u n )) 9v =g(u 1 ;u 2 ; ;u n ) 0 69v =g(u 1 ;u 2 ; ;u n ) (7.11) e B (v) = 8 > > < > > : sup u 1 ;u 2 ;;unjv=g(u 1 ;u 2 ;;un) e A (f(u 1 ;u 2 ; ;u n )) 9v =g(u 1 ;u 2 ; ;u n ) 0 69v =g(u 1 ;u 2 ; ;u n ) (7.12) The GEP for IT2 FSs can be written in the following compact form: e B (v) = 8 > > < > > : g sup u 1 ;u 2 ;;unjv=g(u 1 ;u 2 ;;un) e A (f(u 1 ;u 2 ; ;u n )) 9v =g(u 1 ;u 2 ; ;u n ) 0 69v =g(u 1 ;u 2 ; ;u n ) (7.13) whereg sup should be understood as the extension of the sup function to intervals, which acts as two sup functions on the end-points of the intervals. The above GEP basically extends the functiong(f 1 ()) : V R! V R to IT2 FSs, wheref 1 is the pre-image of the functionf(). This GEP is consistent with the ordinary Exten- sion Principle for IT2 FSs [5], which extends the domain ofg :U!V to fuzzy sets: e B =g( e A) (7.14) 201 e B (v) = 8 > > < > > : sup ujv=g(u) e A (u) 9v =g(u) 0 69v =g(u) (7.15) e B (v) = 8 > > < > > : sup ujv=g(u) e A (f(u)) 9v =g(u) 0 69v =g(u) (7.16) 7.3.2 Extension of the GEP to IT2 FSs Using the GEP for Embedded T1 FSs Whenf andg themselves are determined by the membership functions of some fuzzy sets (like the PJS problem in whichf;g are determined by a probability distributionp H and the membership functions of the T1 FSs Tall and Short), extension of the GEP to IT2 FSs can be performed by the methodology proposed by Mendel in [169]. That methodology needs the definition of an embedded T1 FS. Definition 7.1. An embedded type-1 fuzzy setA e of an IT2 FS e A overU is a T1 FS overU for which8t2U; e A (t) Ae (t) e A (t). Assume thatf;g :U 1 U 2 U n !V R, where some of theU i ’s can be the space of MFs of T1 FSs over a universe of dicourse (for example the set of all MFs of height words over the set [139:7; 221]). For simplicity, from now on, we assume thatU n =U 0 n is set of MFs of the 202 T1 FSs over a universe of discourseH. Extension to the case that otherU n ’s are spaces of MFs is straightforward. Also assume that: f(X 1 ;X 2 ; ;X n1 ; e T ) :U 1 U 2 U n1 U 00 n !V R is e A g(X 1 ;X 2 ; ;X n1 ; e S ) :U 1 U 2 U n1 U 00 n !V R is e B whereU 00 n is the space of all of the MFs of IT2 FSs over the universe of discourseH. 7.3 e T and e S denote the MFs of the IT2 FSs e T and e S, to emphasize the dependence of the functionsf and g on the MFs of some T1 FSs. We would like to derive e B, the soft constraint ong, whenT andS are extended to be IT2 FSs e T and e S, andf is constrained by an IT2 FS, e A. The methodology proposed in [169], suggests that everyy IT2 FS present in the problem is reduced to an embedded T1 FS, and the function (which isg(f 1 ())) is extended for each of those 7.3 We are cautious about usingF U 0 n , becausef andg are not functions that were originally defined on a universe of discourse and were extended to FSs. They are only functions of the MFs of some FSs. In the PJS problem that we are studying,f andg are probabilities of fuzzy events which are obtained by relaxing the indicator function of a non-fuzzy event to the MF of a fuzzy event, and not applying the Extension Principle for extending the domain of a function to fuzzy sets. 203 embedded T1 FSs. Therefore, considering arbitrary embedded T1 FSs of e S; e T; e A, and calculating g(f 1 ()) for those embedded T1 FSs,B e , an embedded T1 FS for e B is obtained as: Be (v) = 8 > > < > > : sup u 1 ;u 2 ;;u n1 ; Se jv=g(u 1 ;u 2 ;;u n1 ; Se ) Ae (f(u 1 ;u 2 ; ; Te )) 9v =g(u 1 ;u 2 ; ;u n1 ; Se ) 0 69v =g(u 1 ;u 2 ; ;u n1 ; Se ) (7.17) whereA e ,B e ,S e , andT e denote the embedded T1 FSs for e A, e B, e S, and e T , respectively. Hence: e B (v) = [ h2Hj e T (h) Te (h) e T (h) e S (h) Se (h) e S (h) f Be (v)g (7.18) According to [169], the LMF and UMF of the IT2 FS e B can be found using the following optimization problems: e B (v) = min e A (x) Ae (x) e A (x) e S (x) Se (x) e S (x) e T (x) Te (x) e T (x) Be (x) (7.19) e B (v) = max e A (x) Ae (x) e A (x) e S (x) Se (x) e S (x) e T (x) Te (x) e T (x) Be (x) (7.20) 204 Theorem 7.1. For the optimization problems stated in (7.19) and (7.20), the optimization can be carried out only on the LMF and the UMF of e A, i.e.: e B (v) = min Ae (x)= e A (x) e S (x) Se (x) e S (x) e T (x) Te (x) e T (x) Be (x) (7.21) e B (v) = max Ae (x)= e A (x) e S (x) Se (x) e S (x) e T (x) Te (x) e T (x) Be (x) (7.22) Proof. One has8u 1 ;u 2 ; ;u n1 ; Se e A (f(u 1 ;u 2 ; ;u n1 ; Se )) Ae (f(u 1 ;u 2 ; ;u n1 ; Se )) e A (f(u 1 ;u 2 ; ;u n1 ; Se )) (7.23) Taking sup from both sides of each inequality proves the theorem. 7.3.3 Two-Stage Extension of The GEP to IT2 FSs Using The Extension of The GEP to IT2 FSs for Real-Valued Functions and Embedded Type-1 Fuzzy Sets Assume that: f(X 1 ;X 2 ; ;X n1 ; e T ) :U 1 U 2 U n1 U 00 n !V R is e A g(X 1 ;X 2 ; ;X n1 ; e S ) :U 1 U 2 U n1 U 00 n !V R is e B 205 whereU 00 n is the space of all of the MFs of IT2 FSs over the universe of discourseH. 7.4 e T and e S denote the MFs of the IT2 FSs e T and e S, to emphasize the dependence of the functionsf and g on the MFs of some T1 FSs. We would like to derive e B, the soft constraint ong, whenT andS are extended to be IT2 FSs e T and e S, andf is constrained by an IT2 FS, e A. For the two-stage extension of the GEP to IT2 FSs, we first fix the embedded T1 FS for e T and e S, and apply (7.13). Then, we consider the uncertainties associated with the membership functions of e T and e S. First, fixT e andS e : f(X 1 ;X 2 ; ;X n1 ; Te ) :U 1 U 2 U n1 U 0 n !V R is e A g(X 1 ;X 2 ; ;X n1 ; Se ) :U 1 U 2 U n1 U 0 n !V R is e B f;g are real-valued functions, and since Te and Se are fixed, e B can be calculated using (7.13), i.e.: e B( Se ; Te ) (v) = 8 > > < > > : g sup u 1 ;u 2 ;;u n1 jv=g(u 1 ;u 2 ;;u n1 ; Se ) e A (f(u 1 ;u 2 ; ; Te )) 9v =g(u 1 ;u 2 ; ;u n1 ; Se ) 0 69v =g(u 1 ;u 2 ; ;u n1 ; Se ) (7.24) 7.4 We are cautious about usingF U 0 n , becausef andg are not functions that were originally defined on a universe of discourse and were extended to FSs. They are only functions of the MFs of some FSs. In the PJS problem that we are studying,f andg are probabilities of fuzzy events which are obtained by relaxing the indicator function of a non-fuzzy event to the MF of a fuzzy event, and not applying the Extension Principle for extending the domain of a function to fuzzy sets. 206 where Se ; Te are set as the arguments of e B to emphasize its dependence to the choice of Se ; Te . The second stage of extension is aggregating over all embedded T1 FSsS e ;T e : e B = [ e S (x) Se (x) e S (x) e T (x) Te (x) e T (x) e B( Te ; Se ) (7.25) Applying the definition of union of IT2 FSs, e B = e B 1 [ e B 2 ) e B (t) = max( e B 1 (t); e B 2 (t)) e B (t) = max( e B 1 (t); e B 2 (t)) to (7.24) and (7.25), and using the definition of (7.17) forB e , one has: e B (v) = max Ae (x)= e A (x) e S (x) Se (x) e S (x) e T (x) Te (x) e T (x) Be (x) (7.26) e B (v) = max Ae (x)= e A (x) e S (x) Se (x) e S (x) e T (x) Te (x) e T (x) Be (x) (7.27) Comparing (7.21) and (7.22) with (7.26) and (7.27), it is obvious that the difference between the two-stage GEP and the GEP that uses embedded T1 FSs is in the calculation of the LMF of e B: the former uses a max over all embedded T1 FSs, while the latter uses min. 207 7.4 Extension of Zadeh’s Solution to Interval Type-2 Fuzzy Sets In this section, we extend Zadeh’s solution to the PJS problem for the case of modeling all of the words by IT2 FSs. The statement Probably John is tall implicitly assigns the IT2 soft constraint “probably” to the probability that John is tall. Thus, the solution relies on the definition of the probability of an IT2 fuzzy event, something that has not been developed for IT2 FSs to the best of our knowledge, although type-2 fuzzy events have been discussed in [271]. 7.4.1 The Probability of an IT2 Fuzzy Event To solve the PJS problem, we need the definition of the probability of an IT2 fuzzy event. One can argue that the probability of a T1 fuzzy event is a numeric value [317] or a T1 FS [288, 290]. Consequently, one can argue that the probability of an IT2 fuzzy event is a number, a T1 FS, or an IT2 FS [297]. A similar approach of defining numeric values for similarity, subsethood, and uncertainty measures pertinent to IT2 FSs was pursued in [278, 280]. We define the numeric probability of an IT2 fuzzy event as: Definition 7.2 (Numeric probability of an IT2 fuzzy event). The numeric probability of an IT2 fuzzy event e A2 g F R , for which e A and e A are both measurable, is defined as: Prob ( e A) = Z U e A (u)p(u)du + (1) Z U e A (u)p(u)du (7.28) Note that assigning a numeric probability to an IT2 fuzzy event follows Zadeh’s philosophy of assigning a numeric (as opposed to fuzzy) probability to a T1 fuzzy event. In Appendix B.1, we show that the numeric probability of an IT2 fuzzy event has the same properties as the probability 208 of a T1 fuzzy event. Throughout this chapter, we assume that Prob( e A) = Prob 0:5 ( e A), hence fixing = 1=2. 7.4.2 Solution of The PJW Problem Using The Extension of The GEP for Real- Valued Functions to IT2 FSs Considering the definition of the numeric probability of IT2 fuzzy events, we have: 8 > < > : f(p H ) =P Tall = 1 2 R b a p H (h)( Tall (h) + Tall (h))dh g(p H ) =P W = 1 2 R b a p H (h)( f W (h) + f W (h))dh (7.29) The statement Probably John is Tall constrainsf(p H ) by the IT2 FS e A = Probable. Using (7.13), the soft constraint ong(p H ), the probability that John is f W , which is the solution to the PJW problem, becomes: e P f W (v) = 8 > > > > > > > > > > > < > > > > > > > > > > > : g sup p H 2X [a;b] jv= 1 2 R b a p H (h)( f W (h)+ f W (h))dh R b a p H (h)dh=1 Probable 1 2 ( R b a Tall (h) + Tall (h)p H (h)dh) 9p H 2X [a;b] jv = 1 2 R b a p H (h)( f W (h) + f W (h))dh 0 69p H 2X [a;b] jv = 1 2 R b a p H (h)( f W (h) + f W (h))dh (7.30) 209 which can be represented more simply as: e P f W (v) = 8 > > < > > : g sup v=Prob( f W ) Probable (Prob(Tall))9p H 2X [a;b] jv = Prob( f W ) 0 69p H 2X [a;b] jv = Prob( f W ) (7.31) where Prob( f W ) = 1 2 R b a p H (h)( f W (h) + f W (h))dh. 7.4.3 Solution of the PJW Problem via Extension of the GEP to IT2 FSs Using the One-Stage and Two-Stage GEP for Embedded T1 FSs Yet another solution to the PJW problem using IT2 FSs is to apply the GEP to all of the embedded T1 FSs of Tall; f W;Probable. Using (7.17), and knowing that f = R b a Talle p H (h)dh, g = R b a Shorte p H (h)dh whereTall e andShort e respectively denote embedded T1 FSs forTall and Short, and noting that A e = Probable e is an embedded T1 FS for the IT2 FS Probable, we have: P f W;e (v) = 8 > > > < > > > : sup p H jv= R b a p H (h) Shorte (h)dh Probablee ( R b a p H (h) Talle (h)dh) 9v = R b a p H (h) Shorte (h)dh 0 69v = R b a p H (h) Shorte (h)dh (7.32) whereP f W;e denotes an embedded T1 FS for e P f We , the probability that John is f W . According to Theorem 7.1: 210 e P f W (v) = min Probablee = Probable Tall (h) Talle (h) Tall (h) Short (h) Shorte (h) Short (h) P f W;e (v) (7.33) e P f W (v) = max Probablee = Probable Tall (h) Talle (h) Tall (h) Short (h) Shorte (h) Short (h) P f W;e (v) (7.34) If one wants to apply the two-stage GEP to solving the problem, one can use the following equation instead of (7.35), and (7.34) remains unchanged: e P f W (v) = max Probablee = Probable Tall (h) Talle (h) Tall (h) Short (h) Shorte (h) Short (h) P f W;e (v) (7.35) 7.4.4 An Interesting Special Case In [220], we studied an interesting special case, when the probability distribution of the height of the population of which John is a member is known. In such a case, the PJW problem becomes a special problem. We study this special case in the framework of the GEP for IT2 FSs. Assume that the probability distribution of heights is known to bep H . Then the answer to the question “What is the probability that John is f W ”, when the GEP for real-valued functions is used, is a numerical value: P f W = 1 2 Z b a p H (h) f W (h)dh + f W (h)dh (7.36) 211 and there is no need for the information “Probably John is tall.” This means that this piece of information (Probably John is tall) in the PJW and PJS problems compensates for the imprecision of knowledge about the probability distribution of heights. However, if the problem is to be solved given that “Probably John is tall,” we can still use (7.30) to derive a MF for e P f W . Since P Tall = 1 2 R b a p H (h)( Tall (h)dh + Tall (h)dh) is also a numerical value, (7.30) reduces to: e P f W (v) = 8 > < > : Probable 1 2 R b a p H (h) Tall (h)dh + Tall (h) v = 1 2 R b a p H (h) f W (h)dh + f W (h)dh 0 otherwise (7.37) Equation (7.37) says that P W is an IT2 fuzzy singleton atP W . Obviously, the membership inter- val of the fuzzy singleton depends on the compatibility of Probable with 1 2 R b a p H (h)( Tall (h)dh+ Tall (h)dh) (which is exactly Probable (P Tall )). The less this compatibility is, then the smaller are the lower and the upper membership values for this fuzzy singleton, which can be interpreted as less confidence in the solution. In the extreme case when Probable (1=2 R b a p H (h)( Tall (h)dh+ Tall (h)dh)) = 0, because e P f W (v) = 08v, we obtain the empty set as e P f W , which reflects a total incompatibility of a human’s world knowledge (assumed for the problem) with reality. This can be interpreted in two ways: (i). The wordProbable has a different meaning in this context. When the probability of being Tall underp H has a zero membership value inProbable, it means thatProbable has to be calibrated according to the context, probably allowing for a wider support. (ii). Using the models that we use for Probable, and f W and the distribution of height of the populationp H , the probability that John is tall cannot be calledProbable. 212 It is worth noting that the empty set can be interpreted as the word Undefined [335]. Using one- stage and two-stage GEP for embedded T1 FSs (7.32), becomes: P f W;e (v) = 8 > < > : Probablee ( R b a p H (h) Talle (h)dh) v = R b a p H (h) Shorte (h)dh 0 otherwise (7.38) whereP f W;e denotes an embedded T1 FS for e P f We , the probability that John is f W . Aggregation of P f W;e yields an IT2 FS as e P f W . However, if there exists an incompatibility problem for all embedded T1 FSs in the above equation (which is quite possible), again an empty set is obtained as the solution. 7.5 Implementation of The Solutions to The PJW Problem In this section, we implement the solutions to the PJW problem which includes the PjS problem. We use IT2 FS models of the words in the PJW problem, and yield computational methodologies to solve the optimization problems associated with the GEP. 7.5.1 Modeling Words To begin, we established the following vocabularies of linguistic heights and linguistic probabil- ities: Heights =fVery short, Short, Moderately short, Medium, Moderately tall, Tall, Very tallg and, Probabilities =fExtremely improbable, Very improbable, Improbable, Somewhat improbable, Tossup, Somewhat probable, Probable, Very probable, Extremely probableg. Next, we modeled all of these words as FSs. 213 Recall, that there are at least two types of uncertainty associated with a word [168]: intra- uncertainty, the uncertainty an individual has about the meaning of a word, 7.5 and inter-uncertainty, the uncertainty a group of people have about the meaning of a word. In other words, words mean different things to different people, and this fact calls for (at least) using IT2 FSs as models of words [168]. In order to synthesize IT2 FS models of words, we began by collecting data from a group of subjects and then used the Enhanced Interval Approach (EIA) [286]. We collected data from 48 subjects using the Amazon Mechanical Turk website [1] for the vocabularies of linguistic heights and from 111 subjects for the vocabulary linguistic probabilities [213]. We used the EIA [286] to obtain IT2 FS models of the words from that data. The IT2 FS footprints of uncertainties (FOUs) have nine parameters (see Fig. 7.1); (q;r;s;t) determine the upper membership function (UMF) and (q 1 ;r 1 ;s 1 ;t 1 ;h) determine the lower membership function (LMF), whereh is the height of the lower membership function. 1 q q 1 r h r 1 s 1 s t 1 t Figure 7.1: FOU of an IT2 FS described by nine parameters. The vocabulary of linguistic heights modeled by IT2 FSs is depicted in Fig. 7.2. We assumed that the minimum possible height is a = 139:7 cm and the maximum possible height is b = 221cm. 7.6 The parameters of the FOUs of the linguistic height words are given in Table 7.1. 7.5 This is related to the uncertainty associated with unsharpness of classes [332]. 7.6 This range corresponds to [4 0 7 00 ; 6 0 9 00 ], and is inspired by the range [140cm; 220cm] that Zadeh uses in [339]. 214 150 200 0 0.5 1 Very short 150 200 0 0.5 1 Short 150 200 0 0.5 1 Moderately short 150 200 0 0.5 1 Medium 150 200 0 0.5 1 Moderately tall 150 200 0 0.5 1 Tall 150 200 0 0.5 1 Very tall Figure 7.2: V ocabulary of IT2 FSs representing linguistic heights obtained by EIA. To make sure that the vocabulary of height words provides an appropriate partitioning of the space of heights, we calculated the pairwise Jaccard similarities between the height words. Jaccard similarity [46, 280] between IT2 FSs e A and e B is calculated as: s J ( e A; e B) = R U min( e A (u); e B (u))du + R U min( e A (u); e B (u))du R U max( e A (u); e B (u))du + R U max( e A (u); e B (u))du (7.39) Pairwise similarities between the height words are shown in Table 7.2. Observe that the words have pairwise similarities that are less than 0:5, indicating that this vocabulary provides a good partitioning of the universe of discourse. 215 Table 7.1: Membership function parameters of the height words depicted in Fig. 7.2 UMF parameters LMF parameters Very short (139.70, 139.70, 140.92, 153.17) (139.70,139.70, 140.28, 143.95, 1) Short (141.62, 150.00, 158.75, 168.11) (151.35, 153.85, 153.85, 156.04, 0.45) Moderately short (149.24, 157.48, 162.56, 170.80) (159.79, 160.84, 160.84 161.21, 0.52) Medium (154.32, 163.83, 171.45, 178.42) (166.06, 168.53, 168.53, 169.83, 0.46) Moderately tall (166.89, 175.00, 181.61, 190.59) (173.96, 177.91, 177.91, 181.04, 0.59) Tall (174.12, 185.00, 193.04, 204.14) (185.86, 188.99, 188.99, 192.07, 0.44) Very tall (176.58, 207.70, 221.00, 221.00) (193.67, 219.07, 221, 221, 1) Table 7.2: Pairwise similarities between the height words depicted in Fig. 7.2 Very short Short Moderately short Medium Moderately tall Tall Very tall Very short 1.000 0.129 0.016 0 0 0 0 Short 0.129 1.000 0.451 0.166 0.001 0 0 Moderately short 0.016 0.451 1.000 0.336 0.015 0 0 Medium 0 0.166 0.336 1.000 0.149 0.014 0.001 Moderately tall 0 0.001 0.015 0.149 1.000 0.222 0.042 Tall 0 0 0 0.014 0.222 1.000 0.165 Very tall 0 0 0 0.001 0.0422 0.165 1.000 216 Similar information for the vocabulary of linguistic probabilities [213] is given in Fig. 7.3 and Tables 7.3 and 7.4. In Table 7.4, observe that the probability words also have pairwise simi- larities less than 0:5, indicating that this vocabulary provides a good partitioning of the universe of discourse. 0 0.5 1 0 0.5 1 Extremely Improbable 0 0.5 1 0 0.5 1 Very Improbable 0 0.5 1 0 0.5 1 Improbable 0 0.5 1 0 0.5 1 Somewhat Improbable 0 0.5 1 0 0.5 1 Tossup 0 0.5 1 0 0.5 1 Somewhat Probable 0 0.5 1 0 0.5 1 Probable 0 0.5 1 0 0.5 1 Very Probable 0 0.5 1 0 0.5 1 Extremely Probable Figure 7.3: V ocabulary of IT2 FSs representing linguistic probabilities obtained by EIA. Table 7.3: Membership function parameters of the probability words depicted in Fig. 7.3 UMF parameters LMF parameters Extremely improbable (0, 0 , 0.0183 , 0.1316) (0, 0, 0.0046, 0.0627, 1.0000) Very improbable (0.0293, 0.1000, 0.1250, 0.1707) (0.0896, 0.1167, 0.1167, 0.1604, 0.7643) Improbable (0.0586, 0.1750, 0.2500, 0.3414) (0.1896, 0.2200, 0.2200, 0.2604, 0.5757) Somewhat improbable (0.0982, 0.2500, 0.3000, 0.4518) (0.2293, 0.2750, 0.2750, 0.3207, 0.6464) Tossup (0.3586 , 0.5000 , 0.5500 , 0.6414 ) (0.4896, 0.5083, 0.5083, 0.5141, 0.4107) Somewhat probable (0.4793, 0.5500, 0.6000, 0.6707) (0.5293, 0.5750, 0.5750, 0.6207,0.6464) Probable (0.5086, 0.6500, 0.7000, 0.7914) (0.6293, 0.6750, 0.6750, 0.7207, 0.6464) Very probable (0.7189, 0.8250, 0.9000, 0.9811) (0.8293, 0.8700, 0.8700, 0.9207, 0.5757 ) Extremely probable (0.8684, 0.9772, 1.0000, 1.0000) (0.9405, 0.9954, 1.0000, 1.0000, 1.0000 ) 217 Table 7.4: Pairwise similarities between the probability words depicted in Fig. 7.3 Extremely improbable Very improbable Improbable Somewhat improbable Tossup Somewhat probable Probable Very probable Extremely probable Extremely improbable 1.0000 0.1502 0.0394 0.0063 0 0 0 0 0 Very improbable 0.1502 1.0000 0.1432 0.0405 0 0 0 0 0 Improbable 0.0394 0.1432 1.0000 0.4091 0 0 0 0 0 Somewhat improbable 0.0063 0.0405 0.4091 1.0000 0.0382 0 0 0 0 Tossup 0 0 0 0.0382 1.0000 0.3369 0.1150 0 0 Somewhat probable 0 0 0 0 0.3369 1.0000 0.2179 0 0 Probable 0 0 0 0 0.1150 0.2179 1.0000 0.0353 0 Very probable 0 0 0 0 0 0 0.0353 1.0000 0.1242 Extremely probable 0 0 0 0 0 0 0 0.1242 1.0000 It will be seen in the sequel that the word models that are obtained by the EIA cause some problems in solving ACWW problems. In fact, the amount of uncertainty determined by them for each member of the universe of dicourse is sometimes excessive and counter-intuitive. For example, for interior FOUS like that of the wordTossup, the membership grade of the numeric probability 0:5083 in the IT2 FS yielded by the EIA is between 0:4107 and 1, which is counter- intuitive, because any probability close to 0:50 has to have a membership grade close to 1 in the wordTossup. The subnormality of the LMFs contributes to a huge amount of uncertainty about the membership grades in IT2 FSs. Therefore, we use IT2 fuzzy models of words whose LMFs are normal. Those vocabularies for height and probability words are respectively depicted in Figs 7.4 and 7.5 and their parameters are respectively given in Tables 7.5 and 7.7. Pairwise similarities for them are shown in Tables 7.6 and 7.8. 218 140 160 180 200 220 0 0.5 1 Very short 140 160 180 200 220 0 0.5 1 Short 140 160 180 200 220 0 0.5 1 Moderately short 140 160 180 200 220 0 0.5 1 Medium height 140 160 180 200 220 0 0.5 1 Moderately tall 140 160 180 200 220 0 0.5 1 Tall 140 160 180 200 220 0 0.5 1 Very tall Figure 7.4: V ocabulary of IT2 FSs with normal LMFs, representing linguistic heights. 0 0.5 1 0 0.5 1 Extremely Improbable 0 0.5 1 0 0.5 1 Very Improbable 0 0.5 1 0 0.5 1 Improbable 0 0.5 1 0 0.5 1 Somewhat Improbable 0 0.5 1 0 0.5 1 Tossup 0 0.5 1 0 0.5 1 Somewhat Probable 0 0.5 1 0 0.5 1 Probable 0 0.5 1 0 0.5 1 Very Probable 0 0.5 1 0 0.5 1 Extremely Probable Figure 7.5: V ocabulary of IT2 FSs with normal LMFs, representing linguistic probabilities. 219 Table 7.5: Membership function parameters of the height words depicted in Fig. 7.4 UMF parameters LMF parameters Very short (139.7, 139.7, 140.9175, 153.165) (139.7, 139.7, 140.9175, 147.0412, 1) Short (141.6237, 150, 158.75, 168.1066) (145.8118, 150, 158.75, 163.4283,1) Moderately short (149.2437, 157.48, 162.56, 170.7963) (153.3618, 157.48, 162.56, 166.6782, 1) Medium heigt (154.3237, 163.83, 171.45, 178.4163) (159.0768, 163.83, 171.45, 174.9332, 1) Moderately tall (166.8934, 175, 181.61,190.5903) (170.9467, 175, 181.61, 186.1001,1) Tall (174.1176, 185, 193.04, 204.1421) (179.5588, 185, 193.04, 198.5911, 1) Very tall (176.5771, 207.7062, 221, 221) (192.1416, 207.7062, 221, 221, 1) Table 7.6: Pairwise similarities between the height words depicted in Fig. 7.4 Very short Short Moderately short Medium Moderately tall Tall Very tall Very short 1 0.084735 0.011167 0 0 0 0 Short 0.084735 1 0.4187 0.11535 0.00076749 0 0 Moderately short 0.011167 0.4187 1 0.27811 0.009684 0 0 Medium 0 0.11535 0.27811 1 0.11335 0.008665 0.00057879 Moderately tall 0 0.00076749 0.009684 0.11335 1 0.1811 0.033413 Tall 0 0 0 0.008665 0.1811 1 0.13807 Very tall 0 0 0 0.00057879 0.033413 0.13807 1 220 Table 7.7: Membership function parameters of the probability words depicted in Fig. 7.5 UMF parameters LMF parameters Extremely improbable (0, 0, 0.018258, 0.13165) (0, 0, 0.018258, 0.074954, 1) Very improbable (0.029289, 0.1, 0.125, 0.17071) (0.064645, 0.1, 0.125, 0.14786, 1) Improbable (0.058579, 0.175, 0.25, 0.34142) (0.11679, 0.175, 0.25, 0.29571, 1) Somewhat Improbable (0.098223, 0.25, 0.3, 0.45178) (0.17411, 0.25, 0.3, 0.37589, 1) Tossup (0.35858, 0.5, 0.55, 0.64142) (0.42929, 0.5, 0.55, 0.59571, 1) Somewhat probable (0.47929, 0.55, 0.6, 0.67071) (0.51464, 0.55, 0.6, 0.63536, 1) Probable (0.50858, 0.65, 0.7, 0.79142) (0.57929, 0.65, 0.7, 0.74571, 1) Very probable (0.71893, 0.825, 0.9, 0.98107) (0.77197, 0.825, 0.9, 0.94053, 1) Extremely probable (0.86835, 0.97725, 1, 1) (0.9228, 0.97725, 1, 1, 1) Table 7.8: Pairwise similarities between the probability words depicted in Fig. 7.5 Extremely improbable Very improbable Improbable Somewhat improbable Tossup Somewhat probable Probable Very probable Extremely probable Extremely improbable 1 0.12682 0.02805 0.0047249 0 0 0 0 0 Very improbable 0.12682 1 0.11231 0.029457 0 0 0 0 0 Improbable 0.02805 0.11231 1 0.40429 0 0 0 0 0 Somewhat Improbable 0.0047249 0.029457 0.40429 1 0.025224 0 0 0 0 Tossup 0 0 0 0.025224 1 0.33866 0.076549 0 0 Somewhat probable 0 0 0 0 0.33866 1 0.19006 0 0 Probable 0 0 0 0 0.076549 0.19006 1 0.024133 0 Very probable 0 0 0 0 0 0 0.024133 1 0.092095 Extremely probable 0 0 0 0 0 0 0 0.092095 1 221 7.5.2 Solving The PJS Problem by Extension of GEP for Real-Valued Functions to IT2 FSs Next, we solve the optimization problem of (7.30). In fact, it consists of two optimization prob- lems for the LMF and UMF of the wordProbable. It is a functional optimization problem, and cannot be solved analytically. Instead, our approach is based on the method that was proposed in [220] for T1 FSs, namely: 1. Choose the family (families) of probability distributions pertinent to the problem. 2. Choose the ranges of the parameters of the families of probability distributions. 3. Discretize the ranges of parameters of the probability distributions. 4. Construct a pool of probability distributions having all possible combinations of parame- ters, and for all of its members: 4.1. Choose a specific p H from the pool (again, note that R b a p H (h)dh = 1, becausep H (h) is a probability distribution function on [a;b]). 4.2. Computev = 1 2 ( R b a p H (h) f W (h)dh + R b a p H (h) f W (h)dh). 4.3. Computet = 1 2 ( R b a p H (h) Tall (h)dh + R b a p H (h) Tall (h)dh). 4.4. Compute (v) = Probable (t) and (v) = Probable (t). 5. Construct scatter plots of (v) versusv and (v) versusv. 6. Detect an envelope of (v) and (v), namely e P f W (v) and e P f W (v). The envelope detection plays the role of taking the sup. One can imagine different ways of detecting the envelope. We used the following algorithm: 222 (a) Divide the space of possiblev’s, which is [0; 1], intoN bins. (b) For each bin: i. Search for all the v; (v) and (v; (v)) pairs whosev value falls in the bin. ii. Compute , the maximum of the (v)’s associated with the pairs found in the previous step. If there are no pairs whosev’s fall in the bin, = 0. iii. Compute , the maximum of the (v)’s associated with the pairs found in the previous step. If there are no pairs whosev’s fall in the bin, = 0. iv. Forv’s that are members of the bin, set e P f W (v) = and e P f W (v) = . Implementation of solutions using a huge number of distributions involves an enormous com- putational burden, and it is impossible to carry out the optimization over all possible distributions. One needs to incorporate some additional world knowledge about the type of probability distri- bution of heights into the solution of the PJW problem, since one should only use probability distributions for heights of males that make sense. More generally, each ACWW problem has a real world domain associated with it; hence, it is either explicitly or implicitly constrained by that domain. Therefore, when probability distributions are needed, they should be selected pertinent to that domain. It is shown in [232] that the distribution of the height of Americans is a mixture of Gaussian distributions, since heights of both American men and women obey Gaussian distributions. This 223 suggests that the optimization problem in (7.30) can be carried out onX N [a;b] , the space of all normal distributions over [a;b]. The probability density function of a Gaussian probability distribution is formulated as: f(xj;) = 1 p 2 e (x) 2 2 2 (7.40) Becausex2 [a;b], we normalize each probability distributionf(xj;) byF (bj;)F (aj;), so as to make (7.40) a probability distribution on [a;b], whereF (xj;) = R x 1 f(j;)d is the cumulative distribution function off(xj;); so, for each distribution, we construct: g(xj;)f(xj;)=(F (bj;)F (aj;))I [a;b] (x) (7.41) whereI [a;b] () is the indicator function of the interval [a;b]. In this chapter, we chose 100 equally spaced points in the intervals [160; 183:8] and [5; 10] respectively 7.7 as candidates for , and , which led to 10; 000 Gaussian distributions (nor- malized over [a; b]), and then implemented the above algorithm. Using the IT2 FS models of words with subnormal LMFs results in e P f W = e P f W = 0, i.e. obtaining the empty set 7.8 as 7.7 The smallest average height among world countries is 160 cm and the largest average height is 183.8 [2]. For standard deviations, we could not find a source which reports all the standard deviations of heights for each country. The standard deviation of height of Americans is 7-7.5 cm [232]; therefore, we intuitively chose the range [5, 10] as the range of standard deviations of heights. Note that this information comprises additional world knowledge for solving the problem. 7.8 We do not show the empty sets. 224 the probability that John is f W for all f W , which sounds counter-intuitive. This is due to ex- cessive uncertainty in the IT2 FS models of words. When the LMF of the IT2 FS represent- ingTall is subnormal and has a small height, the value of R b a p H (h) Tall (h)dh is small, mak- ing the argument of Probable in (7.30) small. Since Probable (u) = 0 for u < 0:50858, and R b a p H (h) Tall (h)dh+ R b a p H (h) Tall (h)dh< 0:50858 for allp H that we have, we obtain empty set as the solution to all of the problems. In other words, under those IT2 FS models of the words Tall andProbable and the probability distributions we have for heights, the probability of the IT2 fuzzy event Tall cannot be Probable, or in other words, the probability of the IT2 fuzzy event Tall is inconsistent with Probable, and such inconsistency is reflected in obtaining the empty set as the probability of other fuzzy sets. Thus, we used the modified word models with normal LMFs. The envelopes for the scatter plots associated with IT2 FS models of words with normal LMFs are depicted in Fig. 7.6. It is not surprising that the probabilities for the words Very short, Short, Moderately short, and Medium are left-shoulders, because, the a priori knowledge that “Probably John is tall” intuitively suggests that the probability of short sounding height words must be close to zero. The scatter plot associated of e P Tall has points that are exactly on the MF of Probable, resulting to an envelope that resembels to Probable. The envelope of e P Very Tall has an interior shape and results in an interior MF. We then calculated the Jaccard’s similarity of the solutions with each linguistic probability, so as to translate the results into natural language words. Those similarities are summarized in Table 7.9. The boldface numbers signify the words with the highest similarities to each e P f W . 225 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 P Very short (a) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 P Short (b) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 P Moderately short (c) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 P Medium (d) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 P Moderately tall (e) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 P Tall (f) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 P Very tall (g) Figure 7.6: The detected envelopes e P f W (v) when IT2 FS models of words in Fig. 7.5 and the GEP for real-valued functions were used. 226 Table 7.9: Similarities between the IT2 FSs depicted in Fig. 7.6 and the linguistic probability words depicted in Fig. 7.5 P W when W is: Extremely improbable Very improbable Improbable Somewhat improbable Tossup Somewhat probable Probable Very probable Extremely probable Very short 0.10244 0 0 0 0 0 0 0 0 Short 0.13125 0 0 0 0 0 0 0 0 Moderately short 0.22136 0.0027184 0 0 0 0 0 0 0 Medium heigt 0.60748 0.25157 0.056886 0.01901 0 0 0 0 0 Moderately tall 0 0 0 0.0016326 0.18899 0.24974 0.33284 0.28148 0.038728 Tall 0 0 0 0 0.079442 0.19639 0.95965 0.025474 0 Very tall 0.10985 0.71131 0.11809 0.032309 0 0 0 0 0 So, when the IT2 FS models of words with normal LMFs and the numeric probability of an IT2 fuzzy event are used, then the linguistic solutions to the PJW problems are: “It is extremely improbable that John is very short.” “It is extremely improbable that John is short.” “It is extremely improbable that John is moderately short.” “It is extremely improbable that John has medium height.” “It is probable that John is moderately tall.” “It is probable that John is tall.” (which means that the algorithm works correctly, since we have already assumed that “Probably John is tall”.) “It is very improbable that John is very tall.” 7.5.3 Solving The PJW Problem by Extension of The GEP to IT2 FSs Using The GEP for Embedded T1 FSs The Extension of the GEP to IT2 FSs Using the GEP for Embedded T1 FSs calls for solving a huge number of GEP optimization problems associated with each embedded T1 FSs. It is known that if calculations that are related to IT2 FSs are solved over their embedded T1 FSs, a 227 huge number of embedded T1 FSs has to be considered, if one is dealing with an IT2 FS over a discrete domain. Over a continuous domain, there is an uncountably huge number of embedded T1 FSs for an IT2 FS. To overcome this problem, a constrained representation theorem for IT2 FSs was introduced in [283], which involves only normal and convex embedded T1 FSs, which are considered constrained embedded T1 FSs [6]. Therefore, we performed the computations in (7.32)-(7.34) for constrained embedded T1 FSs. When dealing with trapezoidal IT2 FSs with normal LMFs, one can cover the FOU of the IT2 FS with a parametric family of normal and convex trapezoidal T1 FSs. We presenet the following algorithm for implementing (7.32)-(7.34), which is based on the methodology that was proposed in [220] for solving the PJW problem using T1 FSs: 1. Choose the family (families) of probability distributions pertinent to the problem. 2. Choose the ranges of the parameters of the families of probability distributions. 3. Discretize the ranges of parameters of the probability distributions. 4. Construct a pool of probability distributions having all possible combinations of parame- ters. 5. Decompose each of the IT2 FSsTall and f W toM embedded T1 FSs,Tall e;i ;W e;i ; (i = 1; 2;:::;M). Construct a pool of all possible pairs of embedded T1 FSs: if f W6=Tall, then (Tall e;i ;W e;j ); (i;j = 1; 2;:::;M). If f W =Tall, then (Tall e;i ;W e;i ) =Tall e;i ;Tall e;i ); (i = 1; 2;:::;M). 7.9 For all of the pssoible pairs of (Tall e;i ;W e;j ): 7.9 Since f W = Tall, we are dealing only with the IT2 FSsTall;Probable here. Therefore, we need to use the same embedded T1 FS ofTall in the calculation ofv andt in Steps 5.2. and 5.3.. Otherwise, we would obtain wrong results. 228 5.1. Choose a specific p H from the pool of possible probability distributions (again, note that R b a p H (h)dh = 1, becausep H (h) is a probability distribu- tion function on [a;b]). 5.2. Computev = R b a p H (h) W e;j (h)dh. 5.3. Computet = R b a p H (h) Tall e;i (h)dh. 5.4. Compute e;i;j (v) = Probable (t) and e;i;j (v) = Probable (t). 6. Construct scatter plots of e;i;j (v) versusv and e;i;j (v) versusv. 7. Detect an envelope of e;i;j (v) and e;i;j (v), namely P f W;e (v) and P f W;e (v). 8. Construct e P f W (v) = max e P f W;e (v) (if one-stage GEP is used) or e P f W (v) = e P f W;e (v) (if two-stage GEP is used) and e P f W (v) = max e P f W;e (v) The envelope detection plays the role of taking the sup. Because we gave algorithms for how we detect an envelope of a scatter plot in previous section, we do not repeat the algo- rithm here. Table 7.10: Similarities between the IT2 FSs depicted in Fig. 7.7 and linguistic probability words in Fig. 7.5. e P f W when f W is: Extremely improbable Very improbable Improbable Somewhat improbable Tossup Somewhat probable Probable Very probable Extremely probable Very short 0.20565 0 0 0 0 0 0 0 0 Short 0.2984 0.0027534 0 0 0 0 0 0 0 Moderately short 0.55227 0.025836 0.00017682 0 0 0 0 0 0 Medium heigt 0.28185 0.43198 0.17963 0.069589 0 0 0 0 0 Moderately tall 0 0 0 0.012108 0.40679 0.31021 0.41041 0.015448 0 Tall 0 0 0 0 0.083096 0.206 0.90812 0.0080048 0 Very tall 0.22603 0.25533 0.47723 0.21243 0 0 0 0 0 229 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 P Very short (a) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 P Short (b) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 P Moderately short (c) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 P Medium (d) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 P Moderately tall (e) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 P Tall (f) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 P Very tall (g) Figure 7.7: The detected envelopes e P f W (v) when IT2 FS models of words in Fig. 7.5 and the two-stage GEP were used. 230 We used the same family of height probability distributions as the one we used in Section to implement the solution using embedded T1 FSs. We divided the ranges for means and standard deviations of the height probability distributions into 100 points, having a pool of 10; 000 proba- bility distributions. We also chose 10 embedded T1 FSs of the words f W andTall, which make us use 10 10 = 100 pairs of embedded T1 FSs in the simulations (except when f W =Tall, where we only use 10 pairs of embedded T1 FSs.) Using the one-stage GEP yields completely filled-in FOUS and extra uncertainty in the solutions which is not desirable. We do not present those so- lutions here for the sake of brevity. The solutions using the two-stage GEP and IT2 FS models of words with normal LMFs are depicted in Fig. 7.7. Their similarities with the vocabulary of probability words in Fig. 7.5 are given in Table 7.10. A summary of the solutions to the PJW obtained by different methods is given in Table 7.11. In the first and second columns, the solutions to the PJW problem are given respectively using UMFs and normal embedded T1 FSs [220] of the IT2 FSs depicted in Fig. 7.3. The next two columns represent the solutions presented in this chapter. Table 7.11: Summary of the solutions to the problem “What is the probability that John is W , given “Probably John is tall” W T1 UMF normal embedded T1 FSs GEP for Real-Valued Functions Two-Stage GEP Very short Extremely improbable Extremely improbable Extremely improbable Extremely improbable Short Extremely improbable Extremely improbable Extremely improbable Extremely improbable Moderately short Extremely improbable Extremely improbable Extremely improbable Extremely improbable Medium Very improbable Extremely improbable Extremely improbable Very improbable Moderately tall Probable Tossup Probable Probable Tall Probable Probable Probable Probable Very tall Improbable Extremely improbable Very improbable Improbable 231 7.6 Conclusions and Future Work The GEP is one of the main aggregation engines for ACWW, especially when dealing with prob- abilistic soft constraints [325]. As we indicated in [220], the optimization problems related to GEP for T1 FSs that deal with some of Zadeh’s ACWW problems can be solved using a novel algorithm which involves exhaustive search. On the other hand, we showed that there is some uncertainty about selecting the MF of the T1 FSs modeling the words in the ACWW problems, which may affect the solution to an ACWW problem. This calls for making use of some models of words which reflect the inter and ontra personal uncertainty about the membership value of the elements, i.e using IT2 FSs. Unfortunately, there has been no research work on the generalization of the GEP to IT2 FSs to the best of our knowledge, therefore, we extended the framework of the GEP to IT2 FSs and introduced computational algorithms to implement it. The applicability of our algorithms have been illustrated by solving some prototype ACWW problems. An important prerequisite for using the GEP for problems involving linguistic probabilities, is knowing the type of probability distributions for a specific problem. This additional “world knowledge” is a luxury that may not always be available. Like its T1 counterpart, a limitation of our algorithm for implementing the IT2 FS framework for GEP is its “exhaustive search” nature on the space of the parameters involved in the problem, especially when the number of parameters (of the probability distributions) proliferates, 7.10 espe- cially when we extend the GEP through the framework of embedded T1 FSs, because a problem involving T1 FSs has to be solved for all of the combinations of the embedded T1 FSs of the IT2 FSs involved in the problem. Obtaining computational algorithms to deal with GEP free of exhaustive search is an interesting topic for future research. 7.10 An example of this occurs when we need the distribution of heights of people (men and women), which is a mixture of Gaussians (rf(xj1;1) + (1r)f(xj2;2)), and hence, has five parameters. 232 We have repeated this fact in our previous research that we believe that there is more than one way to solve ACWW problems. To show this, we have presented different solution methodologies to Zadeh’s ACWW problems in [210–212, 215, 216] which use syllogistic reasoning and Novel Weighted Averages [218]. Those solutions do not need any world knowledge about probability distributions, but they need world knowledge about the domains of variables involved in the prob- lems (e.g., time and height). This is a desirable feature of those solutions, because they do not face with any discrepancies between the IT2 FS models of words and the probability distributions. 7.11 This is at the expense of obtaining lower and upper probabilities for a problem, instead of getting only one fuzzy probability. Interestingly, those soloutions also use the Extension Principle, but in a different way from the GEP. They involve the challenge of choosing appropriate compatibility measures [210], and may require multiple computational methodologies for fusion of inconsistent information [212]. GEP is perceived to be a pivotal tool for manipulating Z-numbers [301, 343]; therefore, our methodology may also contribute to developing viable frameworks for computing with IT2 ver- sions of Z-numbers. 7.11 We showed that such a discrepancy occurs when using IT2 FSs with subnormal LMFs in this chapter. We also observed such a discrepancy when studying the correctness of the results in [220]. 233 Chapter 8 Syllogistic Reasoning for Advanced Computing with Words [A syllogismos is ] a discourse [logos] in which, certain things being stated, something other than what is stated follows of necessity from them. Aristotle, Prior Analytics, I.1.24b18-20, Jenkinson. In this chapter, we use syllogistic reasoning and Linguistic Belief Structures to solve ACWW problems, in particular the challenge problems proposed by Zadeh. 8.1 Our Solutions to The Tall Swedes Problem This section presents our solutions to the Tall Swedes problem using novel weighted averages (which are weighted averages in which at least one subcriterion or weight is an interval, T1 FS or IT2 FS) [173], more specifically, the fuzzy weighted average (FWA) [59,150] when the linguistic terms are modeled by T1 FSs [127, 316], and the linguistic weighted average (LWA) [173, 277, 279] when the linguistic terms are modeled by IT2 FSs [165,318]. To begin, we need to translate the Tall Swedes problem into a form suitable for novel weighted averages. Hence, similar to 234 Zadeh’s solution, our solution relies on the extension principle, but in a very different way, namely in the derivation of the algorithms for computing the FWA and the LWA. 8.1.1 Translating the Tall Swedes Problem into A Novel Weighted Average The height of Swedes can be completely described by two categories: Tall and notTall, where notTall is the complement of Tall. It is intuitive that if Most Swedes are tall, then A few Swedes are notTall. Here, a linguistic probability (LProb) is assigned toTall andnotTall, i.e., LProb(Tall) = Most and LProb(notTall) = Few. The question is how to model the word Few, given that we have the FS model for word Most. Consider the analogy with classical probability theory. The domain of heights under consid- eration is [139.7, 221] cm. Assume thatTall is a crisp set, e.g., the set of all people whose heights are between 180 cm and 205 cm. Then,notTall includes heights in [139.7, 180) cm and (205, 221] cm. If Prob(Tall) = p, then Prob(notTall) = 1p. When probabilities are linguistic values like “Most,” 8.1 we must exploit the extension principle [127,316]. If LProb(Tall) = Q, in whichQ is a linguistic probability [88, 89], then LProb(notTall) is 1 e Q, in which e represents the subtraction operation for FSs, i.e.: 1 e Q (u) = sup u=1w ( Q (w)) = Q (1u) = :Q (8.1) in which:Q is the antonym of the fuzzy quantifierQ. 8.1 It is worth noting that Most is essentially a linguistic quantifier, but here, it can be mathematically interpreted as a fuzzy probability. The above discussions are valid for both linguistic probabilities and linguistic quantifiers. 235 The above argument is a basis for the following syllogism [320, 322] through which we can translate the problem so that it is solvable via novel weighted averages: QA’s areB’s :QA’s areB 0 ’s in whichB 0 is the complement of the FSB characterized by: B 0(u) = 1 B (u) (8.2) Exploiting the aforementioned rule, we have: “A Few Swedes are not tall,” in whichFew is the antonym of the linguistic quantifierMost. By analogy with probability theory, the average height of Swedes can be computed as AH = MostTall +FewnotTall Most +Few (8.3) Of course, the challenging part of our solution is how to computeAH in (8.3). We do this for two cases when the words in (8.3) are modeled by T1 and IT2 FSs, respectively. Zadeh’s method for solving the Tall Swedes problem does not exploit a probability model obeying the above axioms. In his approach, he only considers the event Tall. Nevertheless, the above axiomatic approach requires the-algebra of events to contain the complement of this event,notTall as well. As will be seen in the sequel, using syllogisms proposed by Zadeh, we construct a model containing the eventnotTall and the linguistic probability assigned to it. 236 It should be noted that neither our solution nor Zadeh’s solution make any assumption about the distribution of heights of Swedes. Zadeh’s solution carries out the optimization over the space of all plausible height distributionsX to aggregate the available linguistic information. Our solution also uses the extension principle and fuzzy logical syllogisms to calculate the average height of Swedes. The only information utilized is the linguistic probabilities assigned to the eventsTall andnotTall. In other words, when one says “Most Swedes are tall,” no assumption is made on the probability distribution of height among tall Swedes. This argument is similar to that of the statement “85% of Swedes have a height between 185 and 205.” One cannot infer from this statement how the height is distributed in the interval [185, 205]. The uncertainty represented by an interval fundamentally differs from the uncertainty represented by a probability distribution (e.g. a uniform distribution) over such an interval. Similarly, the uncertainty represented by the fuzzy setTall is fundamentally different from the one represented by any probability distribution. 8.1.2 Solution Using T1 FSs and the FWA To solve the Tall Swedes problem using the FWA, first we need to establish the T1 FS models for Tall and Most. In practice, we can collect data from a group of subjects about all words and then use the Interval Approach [151] or the Enhanced Interval Approach [286] to construct the IT2 FS models. Our hypothetical T1 FSs could then correspond to the upper membership functions of these IT2 FSs. In this chapter, we constructed a seven-word vocabulary [220] for the height of Swedes, as shown in Fig. 8.1(a), based on our understanding about the words. Assume the words Tall and Most are modeled by the T1 FSs shown in 8.1(b) and Fig. 8.1(c), respectively. The corresponding membership functions for notTall and Few are also shown in these figures. For T1 FSs, (8.3) is a FWA; however, (8.3) is more complex than a traditional FWA because notTall is a combination of two T1 FSs. In fact, notTall is non-convex as long as Tall is 237 150 200 0 0.2 0.4 0.6 0.8 1 Very short 150 200 0 0.2 0.4 0.6 0.8 1 Short 150 200 0 0.2 0.4 0.6 0.8 1 Moderately short 150 200 0 0.2 0.4 0.6 0.8 1 Medium height 150 200 0 0.2 0.4 0.6 0.8 1 Moderately tall 150 200 0 0.2 0.4 0.6 0.8 1 Tall 150 200 0 0.2 0.4 0.6 0.8 1 Very tall (a) 139.7 140 174.1176 185 193.04 204.1421 220 221 0 0.2 0.4 0.6 0.8 1 notTall1 notTall2 Tall Height (cm) (b) 0 5 8 14 71 83 90 99 100 0 0.2 0.4 0.6 0.8 1 Few Most Percentage (%) (c) 140 168 181 188 195 207 220 0 0.2 0.4 0.6 0.8 1 Height (cm) AH AH1 AH2 (d) Figure 8.1: (a) The 7-word vocabulary. (b) T1 FS models for Tall and notTall. (c) T1 FS models for Most and Few. (d) Average height, computed by an FWA. 238 modeled by an interior T1 FS. The following theorem is needed to compute this special FWA; it relies on the-cut decomposition theorem (See Appendix B.2): Theorem 8.1. LetA,B andC be T1 FSs, andf be any function of two variables. Then,f(A[ B;C) =f(A;C)[f(B;C), provided that union is carried out by the max t-conorm. Proof. See Appendix B.3. Note that Theorem 8.1 does not require the T1 FSs to be normal and/or convex; however, to compute f(A;C) and f(B;C) efficiently using -cuts [127], A, B and C need to be convex. Note also that Theorem 8.1 can easily be extended to functions of more than two T1 FSs. Corollary 8.1. Consider the expressive formula for the FWA:Y = P N i=1 W i X i = P N i=1 W i , whereX j =X a j [X b j . Then: Y =Y a [Y b (8.4) where: Y a = W 1 X 1 + +W j X a j + +W N X N W 1 + +W j + +W N Y b = W 1 X 1 + +W j X b j + +W N X N W 1 + +W j + +W N (8.5) Proof. Follows directly from Theorem 8.1. Let AH1 = MostTall +FewnotTall1 Most +Few (8.6) AH2 = MostTall +FewnotTall2 Most +Few (8.7) 239 wherenotTall1 andnotTall2 are the two convex and normal T1 FSs shown in Fig. 8.1(b), and notTall =notTall1[notTall2. Clearly,AH1 andAH2 are ordinary FWAs [150]. According to Corollary 1,AH =AH1[AH2. The resultingAH1,AH2 andAH are shown in Fig. 8.1(d). Their centroids are 186.47 cm, 190.70 cm, and 187.96 cm, respectively. So, one may accept the centroid ofAH as a defuzzified solution. Therefore, a linguistic answer to the question “what is the average height of Swedes” would be the average height of Swedes is 187.96 cm. Sometimes one may want to answer the question using words from a pre-selected vocabulary, e.g., the seven-word vocabulary in Fig. 8.1(a). In this case, one can use the Jaccard similarity mea- sure to compute the similarity betweenAH and all words in that vocabulary, and then mapAH into the word with the maximum similarity. For the exemplar vocabulary shown in Fig. 8.1(a), the similarities are given in Table 8.1. Therefore, the answer is the average height of Swedes is tall. We could have different answers when different vocabularies are used. Table 8.1: Similarities betweenAH in Fig. 8.1(d) and the five words in Fig. 8.1(a) Linguistic heightH i s J (AH;H i ) Very short 0 Short 0 Moderately Short 0.039 Medium Height 0.0767 Moderately tall 0.4853 Tall 0.6073 Very tall 0.2033 240 8.1.3 Solution Using IT2 FSs and the LWA Mendel [168] argues that a scientifically correct first-order model for a word is an IT2 FS, since it captures both inter-person and intra-personal uncertainties associated with the word. This is perhaps the main reason for Zadeh’s prediction that type-2 FSs will play a pivotal role in advanced CWW [170]. Although he advocates type-2 (T2) FSs for advanced CWW, he does not provide a solution involving T2 FSs to the Tall Swedes problem. In this section we propose a solution to the Tall Swedes problem using IT2 FSs. Again, one needs to establish the IT2 FS models for ] Most and g Tall before applying the LWA [173,277,279]. In practice one can collect data from a group of subjects about all words and then use the Interval Approach [151] or the Enhanced Interval Approach [286] to construct the IT2 FS models [220]. Consider next the complement of an IT2 FS. When g Tall is modeled as an interior FOU in Fig. 8.2(b), ^ notTall is shown as the dark area in the same figure, which is quite different from traditional IT2 FSs; however, it can also be viewed as the union of ^ notTall1 and ^ notTall2, which are the two shoulder IT2 FSs in Fig. 8.2(b). Assume the words g Tall and ] Most are modeled by the IT2 FSs shown in 8.2(b) and Fig. 8.2(c), respectively. The corresponding ^ notTall and g Few are also shown in these figures. Consequently, (8.3) becomes a LWA; however, (8.3) is more complex than a traditional LWA because ^ notTall is the union of two IT2 FSs. In fact, ^ notTall is non-convex as long as g Tall is modeled by an interior IT2 FS. The following theorem is needed to compute this special LWA: Theorem 8.2. Let ~ A, ~ B and ~ C be IT2 FSs, and f be any function of two variables. Then, f( ~ A[ ~ B; ~ C) =f( ~ A; ~ C)[f( ~ B; ~ C), provided that union is carried out by max t-conorm. Proof. See Appendix B.4. 241 160 180 200 220 0 0.5 1 Very short 160 180 200 220 0 0.5 1 Short 160 180 200 220 0 0.5 1 More or less short 160 180 200 220 0 0.5 1 Moderate 160 180 200 220 0 0.5 1 More or less tall 160 180 200 220 0 0.5 1 Tall 160 180 200 220 0 0.5 1 Very tall (a) 140 150 160 170 180 190 200 210 220 0 0.2 0.4 0.6 0.8 1 g notT all1 g notT all2 g T all Height (cm) (b) 0 456 89 14 71 83 87 9092 99 100 0 0.2 0.4 0.6 0.8 1 g Few g Most Percentage (%) (c) 140 150 160 170 180 190 200 210 220 0 0.2 0.4 0.6 0.8 1 Height (cm) g AH1 g AH2 g AH (d) Figure 8.2: (a) The five-word vocabulary. (b) IT2 FS models for g Tall and ^ notTall. (c) IT2 FS models for ] Most and g Few. (d) Average height, computed by a LWA. Note that all upper member- ship functions in (a), (b), (c), and (d) are the same as the T1 FSs in Figs. 8.1(a), 8.1(b), 8.1(c), and 8.1(d), respectively. 242 Note that Theorem 8.2 does not require the IT2 FSs to be normal and/or convex; however, to computef( ~ A; ~ C) andf( ~ B; ~ C) efficiently using-cuts, ~ A, ~ B and ~ C need to be convex 8.2 . Note also that Theorem 8.2 can easily be extended to functions of more than two IT2 FSs. Corollary 8.2. Consider the expressive formula for LWA: e Y = P N i=1 f W i e X i = P N i=1 f W i . As- sume that e X j = e X a j [ e X b j . Then: e Y = e Y a [ e Y b (8.8) where: e Y a = f W 1 e X 1 + + f W j e X a j + + f W N e X N f W 1 + + f W j + + f W N e Y b = f W 1 e X 1 + + f W j e X b j + + f W N e X N f W 1 + + f W j + + f W N (8.9) Proof. Follows directly from Theorem 8.2. Let ] AH1 = ] Most g Tall + g Few ^ notTall1 ] Most + g Few (8.10) ] AH2 = ] Most g Tall + g Few ^ notTall2 ] Most + g Few (8.11) where ^ notTall1 and ^ notTall2 are two convex and normal IT2 FSs shown in Fig. 8.2(b), and ^ notTall = ^ notTall1[ ^ notTall2. Clearly, ] AH1 and ] AH2 are ordinary LWAs. According to Corol- lary 2, g AH = ] AH1[ ] AH2. The resulting ] AH1, ] AH2 and g AH are shown in Fig. 8.2(d). Their centroids, computed by the Enhanced Karnik Mendel algorithms [281], are [180.01, 193.1] cm, 8.2 By a convex IT2 FS, it is meant that both the upper and lower membership functions of the IT2 FS are convex. 243 [184.16, 196.83] cm, and [183.41, 189] cm, respectively. The centroid of the solution provides un- certainty bands, and therefore one can state the solution to “what is the average height of Swedes” as: the average height of Swedes is around 186.2 cm. Note that 186.2 cm is the average centroid of the solution, and the term “around” reflects the uncertainty represented by the centroids, and hence reflects the inter-person and intra-person un- certainties about the words. Such uncertainties are propagated by the linguistic weighted average, and can be captured when reporting a numeric value for the average height, by calculating the centroid and average centroid. This cannot be done by any solution involving type-1 fuzzy sets, including Zadeh’s solution. To map g AH into a word in the pre-defined vocabulary shown in Fig. 8.2(a), one computes the similarity between g AH and the five words using the Jaccard similarity measure. The results are shown in Table 8.2. So, we would again say on average, height of Swedes is tall. Although the same linguistic solution to the Tall Swedes problem has been obtained by T1 and IT2 FSs, it is when those solutions are translated into numbers that there is a difference. The center of gravity of the MF for the T1 FSModerate is a numeric value, whereas the centroid of the MF for the IT2 FS ^ Moderate is an interval. Such an uncertainty bound cannot be obtained by T1 FSs. 8.2 Zadeh’s Methodology for Solving The Magnus Problem Zadeh solves the Magnus problem by utilizing the following intersection-product syllogism [320, 325]: 244 Table 8.2: Similarities between g AH in Fig. 8.2(d) and the five words in Fig. 8.2(a). Linguistic height f H i s J ( g AH; f H i ) Very short 0 Short 0 Moderately short 0.0031 Medium Height 0.0603 Moderately tall 0.3849 Tall 0.4893 Very tall 0.1505 Q 1 A’s areB’s Q 2 (A andB)’s areC’s Q 1 Q 2 A’s are (B andC)’s At least (Q 1 Q 2 )A’s areC’s in whichQ 1 andQ 2 are linguistic quantifiers, A, B, C are linguistic attributes, and denotes fuzzy multiplication determined by: Q 1 Q 2 (u) = sup u=xy (min( Q 1 (x); Q 2 (y)) (8.12) 245 At least is an operator acting on a fuzzy quantifier Q, and can be seen as the order relation extended by the extension principle as follows: Atleast(Q) (x) = sup yx Q (y); x;y2 [0; 1] (8.13) The intersection-product syllogism is a fuzzy extension of a simple calculation involving nu- meric quantifiers. Consider the following example: 50% of the students of the EE Department at USC are graduate students. 80% of the graduate students of the EE Department at USC are on F1 visa. Therefore, 50% 80% of the graduate students of the EE Department at USC are on F1 visa. Consequently, at least 40% of the students of the EE Department at USC are on F1 visa (since some undergraduate students may be on F1 visa, we use the term at least). It can easily be observed that in the Magnus problem,Q 1 =Most,Q 2 =Most,A =Swede, B =tall, andC =blond; therefore, it can be inferred thatAtleast (Q =Most Most) Swedes are both tall and blond. On the other hand for a (non-decreasing) monotonic quantifierQ, i.e. one whose membership function Q (u) is monotonically non-decreasing, Atleast(Q) = Q [320]. For a proof of this proposition see Appendix B.5. Zadeh viewsMost as a monotonic quantifier, and henceMost Most is monotonic. There- fore, the portion of Swedes who are both blond and tall is Most Most. Zadeh interprets a linguistic constraint on the portion of a population as a linguistic probability (LProb), and di- rectly concludes that: LProb(Magnus is blond) = Most Most (8.14) 246 8.3 Implementation of Zadeh’s Solution to The Magnus Problem Zadeh’s elaboration on the Magnus problem does not include some components of CWW engines. He does not provide the exact membership functions of the fuzzy quantifierMost, the linguistic approximation method, or the vocabulary of linguistic probabilities through which the result could be communicated with humans. We believe that although a linguistic quantifier has the same mathematical nature as a linguis- tic probability (both of them are fuzzy subsets of the unit interval [0; 1]), they are semantically different: A linguistic quantifier imposes an elastic constraint on the portion of a population having a specific attribute, but a linguistic probability (e.g., Very likely) imposes a linguistic con- straint on the more objective concept of likelihood of a particular event. This issue can be viewed as analogous to the difference between two ways of defining probability: the relative frequency definition and the axiomatic definition. The relative frequency approach defines the probability of an event in an experiment as the limit of the ratio of the number of occurrences of that event to the total number of repetitions of the experiment, when this number tends to infinity. The linguistic quantifier Most imposes a linguistic constraint on such a ratio. On the other hand, a set of linguis- tic probabilities (e.g., Unlikely, Likely, Very likely, Very unlikely) obeys a fuzzy version of axioms pertinent to probability theory [27]. The relative frequency approach is based on heuristics and the axiomatic approach bears more mathematical rigor. In any solution of the Magnus problem (be it in a type-1 or a type-2 fuzzy logic framework) not only must one therefore construct a fuzzy set for modeling (precisiating) the quantifierMost, but one must also employ a vocabulary of fuzzy probabilities for linguistic approximation of the solutionMost Most containing a linguistic probability that is understandable by human beings. To do the latter, we choose a member of the vocabulary of linguistic probabilities whose Jaccard’s 247 similarity [46,108] withMost Most is the largest. Recall that the Jaccard’s similarity measure between two type-1 fuzzy setsA andB,s J (A;B), is: s J (A;B) = R X A\B (x)dx R X A[B (x)dx (8.15) We chose a type-1 fuzzy set to model (precisiate) the linguistic quantifierMost, whose mem- bership function is depicted in Fig. 8.1(c). We also constructed a nine word vocabulary of type- 1 fuzzy sets to model the linguistic probabilities. The members of the vocabulary are: Abso- lutely improbable, Extremely improbable , Very improbable, Improbable, Somewhat improbable, Tossup, Somewhat Probable, Probable, Very Probable, Extremely Probable. Their membership functions are shown in Fig. 8.3. We also have the extreme words Absolutely improbable and Absolutely certain 8.3 which are naturally modeled via singletons, and hence are not shown in the figure. Computation of Zadeh’s solution to the Magnus problem involves a fuzzy multiplication. Since we have chosen a trapezoidal membership function forMost, it is easy to computeMost Most by (8.12) and-cut decomposition theorem. The computedMost Most corresponding to the membership function of Fig. 8.1(c) is shown in Fig. 8.4. Note that the center of gravity of the solution is 0:7436. Later, this will be compared to the average centroid of the type-2 solution(s). 8.3 Note that from (8.15), it is obvious that the Jaccard’s similarity of a singleton and any fuzzy set is always zero. Hence, no solution would map to the extreme words in the decoding procedure. The extreme words are included in the vocabulary because their presence is crucial for an axiomatic approach to fuzzy probabilities, as will be seen in the sequel. 248 0 0.5 1 0 0.5 1 Extremely Improbable 0 0.5 1 0 0.5 1 Very Improbable 0 0.5 1 0 0.5 1 Improbable 0 0.5 1 0 0.5 1 Somewhat Improbable 0 0.5 1 0 0.5 1 Tossup 0 0.5 1 0 0.5 1 Somewhat Probable 0 0.5 1 0 0.5 1 Probable 0 0.5 1 0 0.5 1 Very Probable 0 0.5 1 0 0.5 1 Extremely Probable Figure 8.3: The membership functions of the vocabulary of type-1 linguistic probabilities. 0 5 8 14 71 83 90 99 100 0 0.2 0.4 0.6 0.8 1 Most⊗Most Percentage (%) Figure 8.4: The membership functions ofMost Most. 249 In the next step of our interpretation of Zadeh’s solution, we calculate Jaccard’s similarity measure betweenMost Most and each member of the vocabulary of type-1 linguistic proba- bilitiesP i ; (i = 1;:::; 9), and denote them bys J (Most Most;P i ). The results are shown in Table 8.3. Table 8.3: Similarities between Zadeh’s solutionMost Most and linguistic probabilities Linguistic probability (P i ) s J (Most Most;P i ) Extremely improbable 0 Very improbable 0 Improbable 0 Somewhat improbable 0 Tossup 0.0825 Somewhat probable 0.1546 Probable 0.4840 Very probable 0.3580 Extremely probable 0.0651 It can be concluded that decoding Zadeh’s methodology yields the following solution to the Magnus problem, given the aforementioned fuzzy sets for the type-1 linguistic probabilities: “It is probable that Magnus (a Swede picked at random) is blond.” 8.4 Critique of Zadeh’s Solution to The Magnus Problem As stressed by [88, 89], although Zadeh’s treatment of fuzzy probabilities is very interesting and can be applied to data-centered applications, it needs to be mathematically rigorized so that it is 250 more appropriate from the viewpoint of a probability theorist. As a result, the following linguistic probability measure is defined in [88,89]: Assume that is a sample space, andB is the-algebra of events associated with , i.e: (i).BP (ii).B6=; (iii). A2B)A 0 2B (iv).fA i g 1 i=1 B) S 1 i=1 A i 2B in whichP is the family of subsets of , andA 0 = A is the complement ofA. A function LProb :B!N [0;1] is called a linguistic probability measure if and only if, for anyA2B: (i). 0 LProb(A) 1 (ii). LProb(;) = 0 and LProb( ) = 1 (iii). IffA i g 1 i=1 B and i 6= j ) A i \ A j = ; (i.e. A i ’s are mutually disjoint), then LProb ( S 1 i=1 A i ) c L1 i=1 LProb(A i ) (iv). LProb(A 0 ) = 1 LProb(A) in whichN [0;1] is the set of all fuzzy numbers over the unit interval, is the order relation induced by the fuzzy minimum operator (and is equivalent to the-cut ranking method), ^ represents a special addition for fuzzy numbers [61, 62, 143], and is the fuzzy subtraction operation. Zadeh’s approach to solving the Magnus problem does not obey the axioms of linguistic prob- ability theory. In his approach, he only considers the events “tall” and “blond”. Nevertheless, the 251 above axiomatic approach requires the -algebra of events to contain the complement of those events (i.e. “not tall” and “not blond”) as well. As will be seen in the sequel, using syllogisms proposed by Zadeh, we consider the events “not tall” and “not blond” and the linguistic proba- bilities assigned to them in our model. We solve the Magnus problem using type-2 fuzzy sets as models for linguistic probabilities, and we can show [217] that our treatment of linguistic proba- bilities obeys a set of axioms which are similar to those stated for type-1 fuzzy sets in above. 8.5 Fuzzy Reasoning and Calculation of Linguistic Upper and Lower Probabilities via Linguistic Weighted Averages for The Magnus Problem Heuristically, when one assigns a linguistic attribute to Most of a population, it can be concluded that the rest of that population does not have that attribute. Such an intuition can be derived formally from the following rule obtained from the entailment principle [320] : QA’s areB’s :QA’s areB 0 ’s in which:Q is the antonym of the fuzzy quantifierQ, and its membership function [320] is given by: :Q (u) = Q (1u); u2 [0; 1] (8.16) andB 0 is the complement of the fuzzy setB, characterized by: B 0(u) = 1 B (u) (8.17) 252 Exploiting the aforementioned rule, we have: “A few Swedes are not tall,” in whichFew is an antonym of the linguistic quantifierMost. Consequently, applying this rule to the statement “Most tall Swedes are blond,” one concludes that: “A few tall Swedes are not blond.” It is worth noting that the semantics of the fuzzy quantifier Most suggests that it is more appropriately modeled by an interior fuzzy set. As we insisted earlier, when Most of a population have a linguistic attribute, it means that there are some members of that population who do not have such an attribute. It is reasonable, therefore, to assume that Most (1)6= 1 (i.e. the member- ship value of 100% inMost is not 1); thus, we choose an interior fuzzy set to model the quantifier Most, and its antonym to model the quantifierFew, as shown in Fig. 8.2(c). One has linguistic information about the distribution of blond Swedes among those who are tall, i.e. one knows that most of tall Swedes are blond. Unfortunately, one does not know anything about the distribution of blond Swedes among those (few) Swedes who are not tall. This situation is summarized in the tree of Fig. 8.5. Swede Most yy Few %% Tall Most yy Few :Tall ? ? %% Blond :Blond Blond :Blond Figure 8.5: Linguistic information about the distribution of blond people among Swedes. Because of one’s total ignorance about the distribution of blonds among the Swedes who are not tall, one can take two different approaches to calculate the linguistic probability that Magnus is blond: The first approach assumes that the distribution of blonds among Swedes who are not tall is 253 similar to the case of tall Swedes i.e., Most Swedes who are not tall are blond; few of them are not blond; however, such a methodology does not seem to be quite plausible, because it completely neglects our total ignorance about the situation, and assumes additional world knowledge, which may not always be available. The second approach calculates a linguistic lower and a linguistic upper probability corre- sponding respectively to the pessimistic case, when none of Swedes who are not tall are blond and the optimistic case when all of Swedes who are not tall are blond. To calculate the linguistic lower and upper probabilities, we use a normalized version of the intersection-product syllogism, so that the problem can be solved using Linguistic Weighted Averages. The linguistic lower probability LProb () is determined as: LProb (Magnus is blond) = MostMost +FewNone Most +Few (8.18) Similarly, the linguistic upper probability LProb + () is determined as: LProb + (Magnus is blond) = MostMost +FewAll Most +Few (8.19) In (8.18) and (8.19),All andNone are singletons, respectively represented by: All (u) = 8 > < > : 1 u = 1 0 otherwise (8.20) None (u) = 8 > < > : 1 u = 0 0 otherwise (8.21) 254 A Linguistic Weighted Average e Y LWA of interval type-2 fuzzy sets e X i with interval type-2 fuzzy weights f W i ; (i = 1; ;n) is characterized by the following expressive formula 8.4 : e Y LWA = P n i=1 f W i e X i P n i=1 f W i = [Y LWA ;Y LWA ] (8.22) in which: Y LWA = min 8W i 2[W i ;W i ] P n i=1 W i X i P n i=1 W i (8.23) Y LWA = max 8W i 2[W i ;W i ] P n i=1 W i X i P n i=1 W i (8.24) in which the underlined and overlined quantities denote lower and upper membership values of e Y LWA . Observe that LProb and LProb + in (8.18) and (8.19) are Linguistic Weighted Averages that can be computed by (8.23) and (8.24). By analogy with classical probability theory, one can observe that the weighted average solu- tions in (8.18) and (8.19) can be derived from the following conditional fuzzy probability calcu- lation which is a generalization of the calculations of [88, 89] to type-2 fuzzy sets: LProb(blondjSwede) = LProb(talljSwede) LProb(blondjtall and Swede)+ LProb(:talljSwede) LProb(blondj:tall and Swede) (8.25) 8.4 This means that although e YLWA can be expressed by (8.22), it is not computed by adding/multiplying interval type-2 fuzzy sets. 255 LProb (Magnus is blond) is obtained by assuming in (8.25) that: LProb(blondj:tall and Swede) = None = 0 (8.26) and LProb + (Magnus is blond) is obtained by assuming in (8.25) that: LProb(blondj:tall and Swede) = All = 1 (8.27) It can be shown [214] that the intrinsic normalization present in the linguistic weighted average contributes to interactive addition of fuzzy probabilities, which avoids obtaining counterintuitive linguistic probabilities whose membership functions are non-zero outside [0; 1]. One can argue that when there is no assumption on the distribution of blonds among Swedes who are not tall, based on the principle of maximum entropy [45], one should assume that: LProb(blondj:tall and Swede) = LProb(not blondj:tall and Swede) = 0:5 (8.28) It can easily be shown that the linguistic probability calculated by (8.28) is exactly (LProb + + LProb )=2. This is a direct result of the fact that (All +None)=2 = 0:5, and the nature of the average. The linguistic upper and lower probabilities provide more flexibility when such an assumption cannot be made. We calculate LProb and LProb + exploiting a method based on alpha-cuts. The results are shown in Figs. 8.6 and 8.7, respectively. The centroid and average centroid of the lower probability are [0:7378; 0:8669] and 0:8023, respectively. The centroid and average centroid for the upper probability are [0:8181; 0:9152] and 0:8666, respectively. The centroid can be used to 256 report a solution in terms of the uncertain numeric upper and lower probabilities for the problem: “the probability that Magnus is blond is between around 80% and around 87%.” Note that 80% and 87% are the average centroids of the type-2 fuzzy lower and the upper probabilities, and the term “around” reflects the uncertainty represented by the centroids, and reflects the inter-person and intra-person uncertainties about the words. Such uncertainties are propagated by the linguistic weighted average, and can be captured when reporting a numeric value for the lower and the upper probabilities, by calculation of centroid and average centroid. This cannot be done by any solution involving type-1 fuzzy sets, including Zadeh’s solution. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 LProb − Figure 8.6: Linguistic lower probability that Magnus is blond. In order to communicate the results to people via linguistic probabilities, we computed Jac- card’s similarities of the linguistic lower probability and the linguistic upper probability with the vocabulary of linguistic probabilities ( e P i ; i = 1; ; 9) whose membership functions are shown in Fig. 8.17. The results are shown in Table 8.4. The vocabulary of type-2 linguistic probabilities contains the same words as the vocabulary of type-1 linguistic probabilities. As done previously, 257 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 LProb + Figure 8.7: Linguistic upper probability that Magnus is blond. we modeled the extreme words Absolutely improbable and Absolutely certain as singletons, and hence they are not shown in the figure. Since we have obtained a pessimistic and an optimistic probability, we can make the following statement: “The probability that Magnus is blond is very probable.” 8.6 Our Solution to The Swedes and Italians Problem This section presents our solution to the Swedes and Italians problem using Linguistic Weighted Averages (LWAs). To begin, we need to translate the problem into a form suitable for LWAs. We argue that “Most Swedes are much taller than most Italians” implies that “A few Swedes are not 258 Table 8.4: Similarities between the linguistic lower and upper probabilities with the members of the vocabulary of linguistic probabilities Linguistic probability ( e P i ) s J (LProb ; e P i ) s J (LProb + ; e P i ) Absolutely improbable 0 0 Extremely improbable 0 0 Very improbable 0 0 Improbable 0 0 Somewhat improbable 0 0 Tossup 0.0101 0 Somewhat probable 0.0295 0 Probable 0.1837 0.0385 Very probable 0.4557 0.8455 Extremely probable 0.0803 0.1465 Absolutely certain 0 0 259 much taller than most Italians.” Such an intuition can be derived formally from the following rule obtained from the entailment principle, and was originally stated for fuzzy quantifiers in [320]: QA’s areB’s :QA’s areB 0 ’s in which:Q is the antonym of the fuzzy quantifierQ, and its membership function is given by: :Q (u) = P (1u); u2 [0; 1] (8.29) andB 0 is the complement of the fuzzy setB, characterized by: B 0(u) = 1 B (u) (8.30) This implies that we have the following belief structure for the problem: B 1 =f(Muchtaller;Most); (notMuchTaller;Few)g (8.31) in whichMuchtaller andnotMuchtaller are focal elements, andMost andFew are proba- bility mass assignments. The difference between this belief structure and the belief structures that are studied in the literature [157,289,306,308] is that the probability assignments are words rather than numeric values. Belief structures with fuzzy-valued probability mass assignments were first introduced by Zadeh [319]; however, they have not been in the mainstream of research in the 260 evidential reasoning community. In the past decade, there has been some research on belief struc- tures with interval-valued probability mass assignments [53, 245, 274]. As a natural extension, some studies formulate fuzzy-valued probability mass assignments [44, 54, 295, 355]. In order to solve the Swedes and Italians problem, we are interested in the expected value (average) of the above belief structure. The expected value of traditional belief structures whose probability mass assignments are numeric was formulated by Yager [291]. Inspired by Yager’s work, and the methodology of Zadeh [319] for dealing with a belief structure with fuzzy prob- ability mass assignments, the expected value of such a structure can be calculated as described next. Assume that one has a belief structureB with focal elementsfA 1 ;A 2 ; ;A n g F U , whose probability mass assignments arefM 1 ;M 2 ; ;M n gF [0;1] , in whichF U represents the set of all type-1 fuzzy sets over the universe of discourseU. Then, the membership function of the expected valueEfBg of this belief structure is calculated as: EfBg (z) = sup z=p 1 x 1 +p 2 x 2 ++pnxn p 1 +p 2 ++pn =1 min 0 B @ M 1 (p 1 ); ; Mn (p n ) A 1 (x 1 ); ; An (x n ) 1 C A (8.32) Unfortunately, as noted in [214], the above optimization problem may have no solution (see Appendix B.6), but instead, one can use fuzzy weighted averages (FWAs) represented by the following expressive formulas [150]: EfBg = P n i=1 M i A i P n i=1 M i (8.33) 261 Similarly, if the focal elements and the probability mass assignments are interval type-2 fuzzy sets, one can use Linguistic Weighted Averages (LWAs) to guarantee that there are solutions for the problem of determining the expected value. Assume that one has a belief structure e B with focal elementsf e A 1 ; e A 2 ; ; e A n g f F U , whose probability mass assignments aref f M 1 ; f M 2 ; ; f M n g f F [0;1] , in which f F U represents the set of all interval type-2 fuzzy sets over the universe of discourseU. Then, the expected value of e B,Ef e Bg, is calculated via the following LWA: Ef e Bg = P n i=1 f M i e A i P n i=1 f M i (8.34) Consequently, we first use an LWA to remove the first linguistic quantifier “Most” in the problem statement, and determine that, on average, how much taller Swedes are than most Italians. Because we want to next account for “most” in “most Italians” by using the syllogism derived from the entailment principle, we need to bring “most” Italians to the front of this sentence. We do this by re-interpreting the previous sentence as determining how much shorter most Italians are than the average height of Swedes. The solution is then used by another LWA to remove the second quantifier, and determine how much shorter on average Italians are than the average height of Swedes. In such a framework, we can calculate the following LWA to obtain the average value that Swedes are taller than most Italians, g AH 1 : g AH 1 Ef e B 1 g = MostMuchtaller +FewnotMuchtaller Most +Few (8.35) 262 This implies that on average, Swedes are g AH 1 taller than most Italians. Using the same syllogism as used to calculate g AH 1 , we can obtain that: On Average, Swedes are not g AH 1 taller than a few Italians. This argument, therefore, induces the following belief structure: e B 2 =f( g AH 1 ;Most); (not g AH 1 ;Few)g (8.36) Following the same methodology to calculate the expected value of e B 1 , we can calculate the expected value of e B 2 as: g AH 2 Ef e B 2 g = Most g AH 1 +Fewnot g AH 1 Most +Few (8.37) This can be interpreted as “On average, Swedes are g AH 2 taller than Italians,” or “The difference between the average height of Swedes and the average height of Italians is g AH 2 .” 8.7 Implementation of the Solution to The Swedes and Italians Prob- lem In this section, we solve the Swedes and Italians problem based on the theory provided in Section 8.6. First, we establish an interval type-2 fuzzy set model of the word “Much taller” on the universe of discourse of all “height differences” so that it is illustrative of the one obtained by the Enhanced Interval Approach [286]. It is depicted in Fig. 8.8. Note that possible values of “height difference” can be positive or negative, therefore the “Much taller” is modeled as a fuzzy set over [65; 65]cm. 263 −60 −40 −20 0 20 40 60 0 0.2 0.4 0.6 0.8 1 Much taller Difference in Height (cm) Figure 8.8: The fuzzy set model for ‘Much taller”. The membership functions for a vocabulary of linguistic quantifiers are also established by the same method. The words are: A few, Most. The membership functions are depicted in Fig. 8.2(c). Note that in [320], linguistic quantifiers are mathematically treated as fuzzy probabilities. Therefore, they are shown on a [0; 1] scale. In the next step, we calculate g AH 1 according to (8.35). Note that the LWAs include the fuzzy set not Much taller, which is the complement of Much taller. Its membership function is shown in Fig. 8.9. Observe thatnotMuchtaller is a non-convex fuzzy set, but it can be written as the union of two convex interval type-2 fuzzy sets 8.5 notMuchtaller 1 andnotMuchtaller 2 . For calculating the LWA, we need the following: Theorem 8.3. Let ~ A, ~ B and ~ C be interval type-2 fuzzy sets, andf be any function of two vari- ables. Then,f( ~ A[ ~ B; ~ C) = f( ~ A; ~ C)[f( ~ B; ~ C), provided that the union is carried out by the max t-conorm. 8.5 By a convex interval type-2 fuzzy set, we mean that both of its lower membership function and upper membership functions are convex. Accordingly, for a non-convex interval type-2 fuzzy set, the lower or the upper membership function (or both) is non-convex. 264 Proof. See [221]. Corollary 8.3. Consider the expressive formula for LWA: e Y = P N i=1 f W i e X i = P N i=1 f W i . As- sume that 8.6 e X j = S m r=1 e X r j . Then: e Y = m [ r=1 e Y r (8.38) where: e Y r = f W 1 e X 1 + + f W j e X r j + + f W N e X N f W 1 + + f W j + + f W N (8.39) Proof. Follows directly from Theorem 8.3, by induction. Let g AH 11 = MostMuchtaller +FewnotMuchtaller 1 Most +Few (8.40) g AH 12 = MostMuchtaller +FewnotMuchtaller 2 Most +Few (8.41) SincenotMuchtaller 1 andnotMuchtaller 2 are convex interval type-2 fuzzy sets, g AH 11 and g AH 12 can easily be computed by the methodology stated in [277], and g AH 1 can be computed by taking the union of them, according to Corollary 8.3. g AH 11 and g AH 12 are shown in Fig. 8.10. The union of them is shown in Fig. 8.11. 8.6 Note that, in general, it is not needed that e X r j ’s are convex. However, here we need to represent a non-convex e Xj as the union of convex e X r j ’s. 265 −60 −40 −20 0 20 40 60 0 0.2 0.4 0.6 0.8 1 not Much T aller1 not Much T aller2 Much T aller Difference in Height (cm) Figure 8.9: Complement of not Much taller (the dark shaded FOU), which is equal to notMuchtaller 1 [notMuchtaller 2 . −60 −40 −20 0 20 40 60 0 0.2 0.4 0.6 0.8 1 g AH 11 g AH 12 Difference in Height (cm) Figure 8.10: The fuzzy sets g AH 11 and g AH 12 which are calculated by ordinary LWAs. 266 −60 −40 −20 0 20 40 60 0 0.2 0.4 0.6 0.8 1 g AH 1 Difference in Height (cm) Figure 8.11: The fuzzy set g AH 1 , which is the union of g AH 11 and g AH 12 . Next, g AH 2 can be calculated according to (8.37). Note that, to do this, we need to compute not g AH 1 , whose Footprint of Uncertainty (FOU) is shown in Fig. 8.12. −60 −40 −20 0 20 40 60 0 0.2 0.4 0.6 0.8 1 not g AH 1 Difference in Height (cm) Figure 8.12: The fuzzy setnot g AH 1 . Observe from Fig. 8.12 thatnot g AH 1 is the union of three interval type-2 fuzzy sets. We call them e E, e F , and e G, as depicted in Fig. 8.13. e E and e F have normal upper membership functions, and e G (which is a fully filled in rectangular FOU) has a subnormal membership function. 267 −60 −40 −20 0 20 40 60 0 0.2 0.4 0.6 0.8 1 e E e F e G Difference in Height (cm) Figure 8.13: The fuzzy sets e E, e F , and e G whose union yieldsnot g AH 1 . Let g AH 21 = Most g AH 1 +Few e E Most +Few (8.42) g AH 22 = Most g AH 1 +Few e F Most +Few (8.43) g AH 23 = Most g AH 1 +Few e G Most +Few (8.44) Since e E, e F , and e G are convex interval type-2 fuzzy sets, g AH 21 , g AH 22 , and g AH 23 can also easily be computed by the methodology stated in [277]. Note that since e G is subnormal, and since the LWA is calculated as two Fuzzy Weighted Averages (FWAs), g AH 23 is also subnormal. Also note that the lower membership function of e G is zero everywhere. As a result, the lower membership function of g AH 23 will be zero. Consequently, g AH 2 can be computed by taking the 268 −60 −40 −20 0 20 40 60 0 0.2 0.4 0.6 0.8 1 g AH 21 g AH 22 Difference in Height (cm) g AH 23 Figure 8.14: The fuzzy sets g AH 21 , g AH 22 and g AH 23 . union of g AH 21 , g AH 22 , and g AH 23 , according to Corollary 8.3. g AH 21 , g AH 22 , and g AH 23 are shown in Fig. 8.14. The union of them is shown in Fig. 8.15. We calculate the centroid and the average centroid of g AH 2 . The centroid is [15:5157; 28:7910], and the average centroid is 22:1534 . The centroid can be used to report uncertain numeric solu- tions for the Swedes and Italians problem, and the term around reflects the uncertainty represented by the centroids; it demonstrates the inter-person and intra-person uncertainties about the words. Such uncertainties are propagated by the LWA, and can be captured when reporting a numeric value for the difference in average heights of Swedes and Italians, by calculating the centroid and the average centroid. This cannot be done by any solution involving type-1 fuzzy sets, including Zadeh’s solution. We conclude that a fuzzy numerical solution to the Swedes and Italians problem is: “The difference in the average height of Swedes and the average height of Italians is around 22:1534 cm.” 269 −60 −40 −20 0 20 40 60 0 0.2 0.4 0.6 0.8 1 g AH 2 Difference in Height (cm) Figure 8.15: The fuzzy set g AH 2 , which is the union of g AH 21 , g AH 22 , and g AH 23 . In the next step, in order to translate the results so that they are comprehensible by humans, we calculate the Jaccard’s similarity [280] of g AH 2 with the members of the vocabulary of in- terval type-2 fuzzy words which represent amounts of difference in height. The words of the vocabulary are None to very little, (which is a fuzzy set over both negative and positive val- ues of difference), Small amount, Moderate amount, Substantial amount, Huge amount (which represent positive values of difference), and the antonyms:Small amount,:Moderate amount, :Substantial amount,:Huge amount (which represent negative values of difference), and the membership values of the antonym of a word e D in the vocabulary of amounts of difference in height is calculated by: : e D (x) = e D (x) (8.45) 270 The words are depicted in Fig. 8.16. The similarities are summarized in Table 8.5. Observe that the highest similar word to g AH 2 is Small Amount. Therefore, we can conclude that a linguistic solution to the problem is: “There difference in the average height of Swedes and Italians is low.” Note that since the problem statement includes the linguistic quantifier Most, the above solu- tion cannot be inferred directly from it. −60 −40 −20 0 20 40 60 0 0.2 0.4 0.6 0.8 1 1.2 NVL Difference in Height (cm) SmA MA SA HA ¬ SmA ¬ MA ¬ SA ¬ HA Figure 8.16: The vocabulary of interval type-2 fuzzy set models for amounts of difference. 8.8 Syllogistic Reasoning Using The Fuzzy Belief Measure for Ad- vanced Computing with Words In this section, we show how the extension of the broad belief interval to IT2 FSs can be used to solve an Advanced Computing with Words (ACWW) problem. 271 It was shown in [211,215,216] that syllogistic reasoning can be used to solve some of Zadeh’s ACWW problems. In particular, syllogistic reasoning results in inference of averages or probabil- ities from Linguistic Belief Structures. As we mentioned in Sections 5.3.3 and 5.3.4, lower and upper probabilities can be inferred from belief structures with fuzzy probability mass assignments and fuzzy focal elements using FWAs and LWAs. However, it might be more desirable to infer just one probability word as the solution of an ACWW problem, since an answer in the form of statements such as: “It is very probable that John is tall” is clearer and less uncertain than one in the form of “The probability that John is tall is between probable and extremely probable .” Hence, we use the concept of a broad belief interval to infer a single linguistic probability as the solution to ACWW problems, instead of lower and upper probabilities. To demonstrate our methodology, we consider the following problem (called the PJS problem) [210, 220, 333]: Probably John is tall. What is the probability that John is short? Table 8.5: Similarities between g AH 2 and members of the vocabulary of linguistic height differ- ences Linguistic height difference ( e D i ) s J ( g AH 2 ; e D i ) :Huge Amount (:HA) 0 :Substantial Amount (:SA) 0 :Moderate Amount (:MA) 0 :Small Amount (:SmA) 0.0125 None to Very Little (NVL) 0.1031 Small Amount (SmA) 0.3476 Moderate Amount (MA) 0.2866 Substantial Amount (SA) 0.0477 Huge Amount (HA) 0.0000 272 It was shown in [210] that using syllogistic reasoning, the following belief structure is induced by the PJS problem, from which the probability of the event Short has to be inferred: e B Height =f(Tall;Probable); (notTall;Improbable)g (8.46) where the probability mass assignments are selected from the vocabulary of linguistic probability words modeled by IT2 FSs. Those IT2 FSs are synthesized using data collected from subjects [213] employing the Enhanced Interval Approach (EIA) [286]. The focal element Tall is selected from a vocabulary of linguistic heights that was also synthesized using the EIA [220]. not Tall is the complement of tall and is derived from the formulas: notTall (x) = 1 Tall (x); notTall (x) = 1 Tall (x). The vocabulary of linguistic probabilities is shown in Fig. 8.17. The focal elements Tall and not Tall and the event Short are shown in Fig. 8.18. Note that not Tall has non-convex LMF and UMF, so it is shown as the union of two IT2 FSs notTall1 and notTall2 that have convex LMFs and UMFs. Also, note that Short only has non-zero overlap with not Tall1. It is easier to see whether the -cuts of Short have non-zero overlaps with those of the convex fuzzy set notTall1 (in fact, all of those-cuts are overlapping). To infer the probability of the event Short, instead of calculating the lower and the upper probabilities of Short, we use the concept of the broad belief interval. We calculate the-cuts of e B Height to have belief structures with interval focal elements and interval probability mass assignments. Then we calculate the broad belief interval for each-cut. The probability of the fuzzy event Short can be synthesized from the broad belief intervals at each level; and is shown in Fig. 8.19. 273 0 0.5 1 0 0.5 1 Extremely Improbable 0 0.5 1 0 0.5 1 Very Improbable 0 0.5 1 0 0.5 1 Improbable 0 0.5 1 0 0.5 1 Somewhat Improbable 0 0.5 1 0 0.5 1 Tossup 0 0.5 1 0 0.5 1 Somewhat Probable 0 0.5 1 0 0.5 1 Probable 0 0.5 1 0 0.5 1 Very Probable 0 0.5 1 0 0.5 1 Extremely Probable Figure 8.17: V ocabulary of IT2 FSs representing linguistic probabilities. 140 150 160 170 180 190 200 210 220 0 0.2 0.4 0.6 0.8 1 notT all2 T all Height (cm) notT all1 Short Figure 8.18: V ocabulary of IT2 FSs representing the focal elements of e B Height . 274 In order to map the probability of the event Short (Fig. 8.19) into a linguistic probability, we calculate its Jaccard similarity with the members of the vocabulary of probability words in Fig. 8.17. The Jaccard similarity between IT2 FSs e A and e B is calculated as: s J ( e A; e B) = R U min( e A (u); e B (u))du + R U min( e A (u); e B (u))du R U max( e A (u); e B (u))du + R U max( e A (u); e B (u))du (8.47) whereU is the universe of discourse. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 LProb(Short) Figure 8.19: Probability of the fuzzy event Short. The word with the highest Jaccard similarity with the probability of Short is Somewhat Im- probable. Therefore, the solution to the PJS problem is: It is somwhat improbable that John is short. Note that the lower and upper probabilities of the event Short can be calculated using (5.34) and (5.35); this has been discussed in more detail in [210]. The linguistic solution using the 275 concept of lower and upper probabilities is: The probability that John is short is between zero and very improbable. 8.9 Conclusions and Future Work In this chapter, we have provided solutions to Zadeh’s Tall Swedes problem, which is formulated as “Swedes who are more than twenty years old range in height from 140 centimeters to 220 centimeters. Most are tall. What is the average height of Swedes over twenty?” We presented Zadeh’s solution to this problem and proposed a new approach using novel weighted averages. A syllogism incorporating fuzzy quantifiers was used first to interpret the problem. Next, separate vocabularies were constructed for the quantifier and the linguistic terms on height. The vocab- ularies can consist of type-1 or interval type-2 fuzzy sets. The average height of Swedes was then computed by novel weighted averages. The final answer to the average height of Swedes can be a number, which is the centroid of a FWA or a combination of the average centroid of a LWA as well as its uncertainty band, or it can be a word in the vocabulary, which is mapped from the fuzzy weighted average or the linguistic weighted average. Compared with Zadeh’s solution, our approach does not require solving complicated mathematical programs; it solves the problem entirely based on the facts given in the problem statement and people’s understanding about the words “Most” and “Tall. ” Our solution makes use of the extension principle, as does his, but in a very different way. In the future the “robustness” of our results to changes in MFs and FOUs might be investi- gated. 276 In this chapter, we also applied Linguistic Weighted Averages to the calculation of linguistic probabilities to solve Zadeh’s “Magnus” challenge problem. We demonstrated that applying syl- logisms related to linguistic quantifiers to this problem gives rise to the derivation of linguistic upper and lower probabilities. This is in accord with the existing landscape of Dempster-Shafer theory of evidence, which has been used for manipulation of imprecise probabilities in decision making problems. Forthcoming efforts have to be devoted to developing many aspects of linguis- tic evidence theory. The use of Linguistic Weighted Averages appears to be promising in solving more compli- cated CWW problems which involve implicit assignment of linguistic constraints to variables like probability (and truth). This is mainly due to the inherent normalization done by them, which prevents them from resulting in fuzzy probabilities whose membership functions are non-zero outside the interval [0; 1]. In this chapter, we translated an Advanced Computing with Words Problem into belief structures with interval type-2 focal elements (Much taller and not Much taller) and interval type-2 fuzzy mass probabilities (Most and Few), by using fuzzy syllogisms. We used the LWA to calculate the expected value of a belief structure g AH 1 which is an interval type-2 fuzzy set, and applied the fuzzy syllogisms again to obtain a belief structure whose focal elements are ( g AH andnot g AH 2 ) and interval type-2 mass probabilities are again (Most and Few). Then we calculated the expected value of this belief structure and argued that this is the solution to the Swedes and Italians problem. We used the centroid of the solution to yield a fuzzy numeric solution to the problem. We also used Jaccard’s similarity measure to map the solution to a word in a vocabulary of linguistic height differences. We also used the concept of broad belief interval and its extension to IT2 FSs to infer a single probability from a Linguistic Belief Structure associated with an ACWW problem. 277 Future research should be devoted to the fusion of conflicting information [65] using such belief structures and performing operations on them; the results can then be applied to more complicated Advanced Computing with Words Problems. 278 Chapter 9 Epilogue: Advanced Computing with Words: Status, Challenges, and Future A book is never finished; it’s abandoned. Gene Fowler, American Journalist and Author. B ASIC Computing with Words continues to be a prosperous area of research (For example, see the following recent papers: [39, 68, 103, 156, 161, 188, 200, 229, 230, 252, 315]). On the other hand, ACWW is in its infancy; therefore, it is necessary to investigate its current status, challenges, and future. Zadeh [333] gives three rationales for CWW. He begins first with the following three premises: (i). Words are less precise than numbers; (ii). Precision carries a cost; and (iii). Numbers are respected, words are not. He then proceeds with the following rationales: 279 (i). Use words when numbers are not known or are too costly to obtain. Use of words is a necessity. (ii). Words are good enough. Numbers are known but there is a tolerance for imprecision which can be exploited by employing words in place of numbers, aiming at a reduction in cost and achieving simplicity. Use of words is advantageous. (iii). Linguistic summarization. Words are used to summarize numerical information. Use of words is expedient. These rationales provide a guideline to identify the current status and challenges of ACWW, as well as its future. In this section, we provide a constructive critique of Zadeh’s rationales. The first rationale can be interpreted as being about the cost of obtaining data; it might still be valid in some cases, but is somehow losing its importance with the recent advances in information technology resulting in the abundance of data [26, 158], especially with the conversion of the Internet into the source of data [95] for almost everything. We believe that the phrase ”Computing With Words” carries the connotation of ”the way in which a person (or people) solves a problem.” This connotation raises the following very inter- esting question: How would a person solve an ACWW problem? To the best of our knowledge no answers to this question have been published. We feel that obtaining answers to this question is very important because they will shed light not only on how a person obtains a solution to an ACWW problem but also on the statement of the problem itself. To do this, we propose that research be conducted jointly with psychologists. In many of the prototype problems that Zadeh provides (see Table 1.1), his bits of expert knowledge or World Knowledge (e.g., ”Most Swedes are tall”, ”Usually Robert leaves the office at about 5 p.m.”, and ”Usually, several cars are stolen daily at Berkeley”) are given. These bits 280 of knowledge often replace the knowledge about probability distributions of some variables (e.g., height of Swedes, the time that Robert leaves his office, and the number of cars that are stolen daily in Berkeley.). However, in these examples, all of those probability distributions can easily be obtained by collecting data, from data on the Internet, or are already available on the Internet. When ACWW problems were initially posed the Internet did not exist and so easy access to much World Knowledge did not exist. This has changed dramatically, and we feel it cannot be ignored. Ask someone a question today and if they don’t know the answer they go to the Internet. Of course, one can argue that some of the information on the Internet is wrong, but this is changing and arguably does not deter a person from using the Internet. The implications of using the huge amount of data available through the Internet are very profound on ACWW. Consider the Tall Swedes Problem as a simple example. Today it is possible to find the answer to the question ”What is the average height of Swedes” on the Internet (e.g., [58]). The fact that ”Most Swedes are tall” is no longer relevant to this question. Interestingly, the same may or may not be true for the engineering version of this problem (Table 1.2) because a particular company may not make the relevant data available on the Internet; but, that will not deter a person from looking for the answer on the Internet because for a person to answer such a question they need some sort of data, numerical or linguistic, e.g., to answer the question: ”What is the average lifetime of an ipad” see [159] ( there are many more sites that can also be used.) In some problems whose intents are to implement everyday decision making using human knowledge, the posed questions can be answered without referring to the provided World Knowl- edge, by collecting data or performing observations. Examples of such problems are about risk, reliability, or lifetime of products for which it is safer to rely on collecting data about those vari- ables than to rely solely on the knowledge of some experts. This argument does not totally refute the plausibility of using ACWW for assessing the reliability or lifetime of a product, because the 281 lifetime of a product may be so long, or the product may be so expensive and rare that collecting data is not practical, in which case the best one may be able to do is to rely on expert knowledge. In that case, the credibility of the results may then be a matter of question. It is true that Zadeh’s problems are only prototypes of what can be addressed by ACWW, but given the advancements in data collection and information retrieval methodologies, we question whether one can justify using expert knowledge about probabilities instead of obtaining the exact or approximate probability distributions. We have already shown that more World Knowledge is needed than is stated in order to solve an ACWW problem, e.g. the kind of pdf that is associ- ated with a specific problem. Having such probability distributions, the answer to some of the prototype problems reduces to calculating the probability of a (fuzzy) event e.g., ”What is the probability that Robert is home before 6:15” can be answered by collecting data about Robert’s arrival times. Today, one can easily find what the travel time is between two addresses during a particular hour on the Internet (e.g., on Google Maps), without the need for expert knowledge. Focusing again on the Tall Swedes Problem, in order to solve the optimization problem in (6.23) one needs the density functionp H (h). It is not reasonable to use arbitrary density functions, be- causep H (h) should be tied to the distribution of the heights of Swedes. But are these Swedish men, women or both? More World Knowledge is needed. In fact, height distributions may be bimodal (see [232]). In any event, by going to the Internet (there are many sites about the heights Swedes) it is possible to arrive at a family of pdfs that can be used forp H (h). The same is true for the engineering version of this problem where one needs to choose pdfs that are commensurate with a reliability lifetime problem. The site [178] is a good starting point. This argument may leave us with fewer instances for which ACWW is applicable, e.g., prediction of the probabili- ties of a future event or judgments about the probability of an event about which data collection is extremely difficult or impossible (a conceivable example is a social, or economic or political 282 process taking place in a very closed country about which little information is available except expert knowledge/judgment). Not using data (numerical or linguistic) to solve an ACWW problem is, to us, analogous to providing an a priori probability distribution about something, i.e. it provides a starting answer but one that gets replaced by answers that are based on facts (a posteriori distributions). Hopefully, as one acquires enough data, the answers converge to a fixed answer. Solutions to ACWW problems should experience a similar behavior. We therefore feel that the time is right for experimental work to be done to establish how people solve ACWW problems and to modify or eliminate the ones whose solutions either can be found entirely on the Internet or be found by collecting data very easily. This discussion leads us to two more questions: (i). ”What does one do with the answer to an ACWW problem?” (ii). How does one validate the answer to an ACWW problem? The second question may be easier to answer than the first one, i.e. we believe that the answer to an ACWW problem can only be validated by acquiring more data, because without data the answer is speculative, even if it has been obtained by using the GEP or syllogistic reasoning. Speculative answers may be okay, but they lead to the first question, which may sound facetious, but it is not. Mendel [168] claims that a Turing-type test can be used to validate a solution to a CWW problem. In a Turing-type test, a human is required to provide an answer to the CWW problem; however, ACWW problems are so complex that a human might be unable to provide an answer. It is clear from our discussions in this dissertation that there are many steps required to solve an ACWW problem, after which it seems natural to us to ask: Is the solution ‘correct”? We know 283 of no simple way to answer this question. In [220] we have suggested a relatively simple way to check whether or not the numerical solution to an ACWW problem may be correct. Our sugges- tion is to formulate a special version of the ACWW problem for which a human can provide the answer, and see if the numerical solution to that problem agrees with the human’s answer. For example, “Probably John is Tall. What is the probability that John is tall?.” Common sense says that the answer is “Probable,” i.e.,“It is probable that John is tall.” An ACWW methodology must yield the answer “Probable” for this problem. It was shown in [220] that sometimes the discrep- ancy between the ranges of parameters of the family of distributions (which is World Knowledge) and the membership functions of the probability words leads to the failure of such a validation process. Further research is needed to address this issue. One answer to the first question may be: “I don’t really care, because what I am really inter- ested in is learning how a machine can provide a solution to this problem.” To us, this is where the present state of ACWW is (e.g., [220]). There is nothing wrong with this because by learning how a machine can provide a solution to an ACWW problem one may discover new things, and that’s what research should be about. Another answer to this question may be: “I need the answer to the ACWW question in order to make a decision.” For example, if I am designing a home in Sweden I will need to know how tall a doorway should be; or, if Robert’s wife has prepared a surprise birthday dinner party for him she will need to know around what time to start cooking so that the food is not over-cooked. So, the intended use of the answer to the ACWW question informs the solution to the question (feedback is present). The builder of the home in Sweden cannot accept the answer: “The average height of Swedes is pretty tall;” he needs numbers. Similarly, Robert’s wife may not be able to accept the answer: “Robert will be home somewhat close to 6:15;” she also needs numbers. Of course, such numbers can also be provided to them, but this requires that the World Knowledge 284 involving “pretty tall” and “somewhat close” be accurate enough. Also, it is still unclear why one needs to use the given World Knowledge when the answer to some questions can be found on the Internet. Zadeh’s second rationale, Words are good enough, also loses its importance when data can be collected about a fact. Collecting data about many real-life events is becoming cheaper and more accessible, leaving us with less instances for which relying on intervals and words is necessary; however, words may be good enough or even unavoidable for problems that deal with subjective issues, e.g. assessing the quality of an article submitted for publication, or the desirability of an option, or the possibility of something to occur. On the other hand, when data can be collected about an event and the probability of that event is considered, we question whether it is practical to adhere to this rationale and rely only on subjective judgment about the probability of that event. Zadeh’s third rationale, (the need for) Linguistic summarization, is valid almost always when something has to be reported to a human. No matter how knowledgeable or expert someone is, one needs to communicate their ideas to other people in natural language, which introduces the issue of imprecision of the words. This rationale suggests that the subject of linguistic summarization [115,191,285,298] is an important topic for ICWW. How to formulate linguistic summarizations so that they are also a part of ACWW is also an important direction for future research. We have already mentioned that the GEP is an essential aggregation tool for ACWW, espe- cially when dealing with probability constraints. Nevertheless, analytic solutions for the opti- mization problem in the GEP are in general presently hopeless when probability constraints are involved, and existing numerical algorithms (e.g.,-cut decomposition) are not directly applica- ble. In [220], some of Zadeh’s challenge problems that involve linguistic probabilities are solved 285 using a novel algorithm for implementing the GEP. A limitation of this algorithm is that it in- cludes ”exhaustive search” on the space of the parameters involved in the problem. Implementing the GEP without exhaustive search is another challenge for future research. As stated in the previous chapters, we also believe that there is more than one way to solve ACWW problems, e.g., GEP, and syllogistic reasoning. Syllogistic reasoning does not need World Knowledge about probability distributions, but it needs World Knowledge about the domains of variables involved in the problems (e.g., time and height). It also uses the Extension Principle, but in a different way from the GEP. Moreover, it involves the challenge of choosing appropriate compatibility measures [210], and may require multiple computational methodologies for fusion of inconsistent information [212]. Much more research is needed about this approach to solving ACWW problems. Because the GEP is the main tool for manipulating Z-numbers [15, 118, 119, 198, 301, 343], the connections between ACWW and computing with Z-numbers is also an interesting topic for future research. The reader may not agree with many aspects of the discussions that we have just presented above; but that’s okay, because we have made them to provoke new thinking about ACWW. We believe that the time is right for shaking the tree of ACWW to see what golden apples fall from it. 286 Appendix A List of Patents and Publications Related to The Dissertation I N the following, a list of papers that are related to the dissertation and are in preparation, published, accepted, and submitted is given. [1] M. R. Rajati, J. M. Mendel, On advanced computing with words using the generalized extension principle for type- 1 fuzzy sets, IEEE Transactions on Fuzzy Systems, vol. 22, no. 5, pp. 1245–1261, 2014. [2] J. M. Mendel and M. R. Rajati, On Computing Normalized Interval Type-2 Fuzzy Sets, IEEE Transactions on Fuzzy Systems, vol. 22, no. 5, 2014, pp. 1335–1340. [3] M. R. Rajati, J. M. Mendel, and A. S. Popa, Fuzzy Decision Support Based on Exact Rule Matching for Liquid Lift Optimization, Presented in 2015 SPE Western Regional Meeting, Garden Grove, California, USA, 27-30 April, 2015. [4] M. R. Rajati, J. M. Mendel, A. S. Popa, and L. Brenskelle, Linguistic Goal Oriented Decision Making, US Patent Application 2015. 287 [5] M. R. Rajati, J. M. Mendel, Extension of set functions to Interval Type-2 Fuzzy Sets: Applications to evidential reasoning, Proceedings of 2014 Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS) . IEEE, 2014, pp. 1-8. [6] J. M. Mendel and M. R. Rajati, Advanced Computing with Words: Status, Challenges, and Future, Accepted for publication in Fuzzy Logic: Towards The Future. [7] M. R. Rajati, J. M. Mendel, Advanced computing with words using syllogistic reasoning and arithmetic operations on linguistic belief structures, in Proceedings of 2013 IEEE Interna- tional Conference on Fuzzy Systems (FUZZ-IEEE 2013). IEEE, 2013. [8] —-, Novel weighted averages versus normalized sums in computing with words, Informa- tion Sciences, vol. 235, pp. 130-149, 2013. [9] M. R. Rajati, J. M. Mendel, Modeling linguistic probabilities and linguistic quantifiers using interval type-2 fuzzy sets, in Proceedings of 2013 Joint IFSA World Congress and NAFIPS Annual Meeting. IEEE, 2013, pp. 327–332. [10] —-, Lower and upper probability calculations using compatibility measures for solv- ing zadehs challenge problems, in Proc. 2012 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, 2012, pp. 1-8. [11] —-, Solving Zadehs Swedes and Italians challenge problem, in Proceedings of 2012 Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS). IEEE, 2012, pp. 1-6. [12] M. R. Rajati, J. M. Mendel, and D. Wu, Solving Zadehs Magnus challenge problem on linguistic probabilities via linguistic weighted averages, in Proceedings of 2011 IEEE Interna- tional Conference on Fuzzy Systems (IEEE FUZZ 2011). IEEE, 2011, pp. 2177- 2184. [13] M. R. Rajati, D. Wu, and J. M. Mendel, On solving Zadehs tall Swedes problem, in Proceedings of 2011 World Conference on Soft Computing (WCSC 2011), 2011. 288 [14] M. R. Rajati and J. M. Mendel, Interval type-2 fuzzy sets as linguistic probabilities, in preparation. [15] —-, Uncertainty modeling and reasoning with linguistic belief structures for computing with words, in preparation. [16] —-, Linguistic Goal Oriented Decision Making with Rule Based Systems, in preparation. 289 Appendix B Some Proofs and Important Theorems B.1 Properties of The Numeric Probability of T1 and IT2 Fuzzy Events In this section, we review some of the properties of the numeric probability of a T1 fuzzy event [317]. We begin with some definitions: Definition B.1 (Probabilistic product of two T1 fuzzy events). Assume that A and B are two fuzzy events. Their probabilistic product,A e B, is determined as A e B (x) = A (x) B (x) (B.1) Definition B.2 (Probabilistic sum of two T1 fuzzy events). Assume thatA andB are two fuzzy events. Their probabilistic sum,A e +B, is determined as A e +B (x) = A (x) + B (x) A (x) B (x) (B.2) The following properties hold for a T1 fuzzy event [317]: 290 (i). AB) Prob(A) Prob(B). (ii). Prob(A[B) = Prob(A) + Prob(B) Prob(A\B). (iii). Prob(A e +B) = Prob(A) + Prob(B) Prob(A e B). The definitions of probabilistic sum and probabilistic product can be extended to IT2 FSs as: Definition B.3 (Probabilistic product of two IT2 fuzzy events). Assume that e A and e B are two fuzzy events. Their probabilistic product, e A e e B, is determined as e A e e B (x) = e A (x) e B (x) e A e e B (x) = e A (x) e B (x) (B.3) Since Prob ( e A) =Prob(A) + (1)Prob(A), it is easy to show that: (i). e A e B) Prob ( e A) Prob ( e B), where e A e B, e A (x) e B (x); e A (x) e B (x). (ii). Prob ( e A[ e B) = Prob ( e A) + Prob ( e B) Prob ( e A\ e B). (iii). Prob ( e A e + e B) = Prob ( e A) + Prob ( e B) Prob ( e A e e B). Therefore, Prob () has properties that are similar to those of the numeric probability measure of a T1 fuzzy event. B.2 -cut Decomposition Theorem A T1 FSA is convex if A (x 1 + (1)x 2 ) min( A (x 1 ); A (x 2 )) (B.4) 291 8x 1 ;x 2 2X and82 [0; 1]. According to this definition,A in Fig. B.1 is convex, butB is not. 1 0 10 A B Figure B.1: A convex T1 FSA and its complement,B, which is also convex. According to Klir and Yuan [127], the-cut of a T1 FSA, denotedA(), is an interval of real numbers, defined as: A() =fxj A (x)g = [a();b()] (B.5) where 0 1. The-cut Decomposition Theorem [127] introduced below can be applied to any convex T1 FS. Theorem B.1 (Decomposition Theorem). LetA andA() be T1 FSs inX withA() defined in (B.5). Then [127]: A = [ 2[0;1] A() (B.6) where S denotes the standard fuzzy union (i.e., sup over2 [0; 1]). Note that because a T1 FS is described by its membership function, (B.6) is a commonly-used short-hand for A (x) = sup 2[0;1] A() (x) 8x2X: (B.7) 292 Observe from this theorem that, if the-cuts of a T1 FS can be determined, for82 [0; 1], the T1 FS itself can be specified; therefore, determining a T1 FS is equivalent to determining its -cuts for82 [0; 1]. One important application of the-cut Decomposition Theorem is to compute some function of a T1 FS, or between several T1 FSs [127]. It gives exactly the same result as the one obtained by using Zadeh’s Extension Principle. B.3 Proof of Theorem 8.1 WhenA,B andC are T1 FSs,f(A;C),f(B;C),f(A;C)[f(B;C) andf(A[B;C) are also T1 FSs. To showf(A[B;C) = f(A;C)[f(B;C), we only need to show that anyf(x;y), wherex is from the universe of discourse ofA[B andy is from the universe of discourse ofC, has the same membership grade onf(A[B;C) andf(A;C)[f(B;C). B.1 For the ease of understanding, first consider the simplest case wheref(x;y) is a one-to-one mapping for both x and y. Let A (x) and B (x) be the membership grades of x on A and B, respectively, and C (y) be the membership grade of y on C. According to the Extension Principle [316], (x;y) is mapped into the point (f(x;y); max( A (x); C (y))) onf(A;C), and (f(x;y); max( B (x); C (y))) on f(B;C). The membership grade of f(x;y) on f(A;C)[ f(B;C) is the maximum of max( A (x); C (y)) and max( B (x); C (y)), i.e., (x;y) is mapped into the point (f(x;y); max( A (x); B (x); C (y))) onf(A;C)[f(B;C). Forf(A[B;C), the membership grade ofx onA[B is max( A (x); B (x)). By applying the Extension Principle to f(A[B;C), it follows that (x;y) is mapped into the point (f(x;y); max( A (x); B (x); C (y))) onf(A[B;C), i.e., the same as that onf(A;C)[f(B;C). B.1 This proof was mainly worked out by Dr. Dongrui Wu. 293 Next consider the general case wheref(x;y) is a many-to-one mapping of bothx andy, i.e., f(x 1 ;y 1 ); (x 2 ;y 2 );:::; (x n ;y n )g may be mapped into the same valuef(x;y). Then, the member- ship grade off(x;y) onf(A;C) becomes max 8i=1;:::;n max( A (x i ); C (y i )), and the membership grade off(x;y) onf(B;C) becomes max 8i=1;:::;n max( B (x i ); C (y)). Consequently, the member- ship grade off(x;y) onf(A;C)[f(B;C) is max max 8i=1;:::;n max( A (x i ); C (y i )); max 8i=1;:::;n max( B (x i ); C (y)) = max 8i=1;:::;n (max( A (x i ); B (x i ); C (y))) By applying the Extension Principle tof(A[B;C), the membership grade off(x;y) onf(A[ B;C) is max 8i=1;:::;n (max( A (x i ); B (x i ); C (y i ))), i.e., again the same as that on f(A;C)[ f(B;C). In summary, anyf(x;y) has the same membership grade onf(A;C)[f(B;C) andf(A[ B;C). So,f(A[B;C) =f(A;C)[f(B;C). 294 B.4 Proof of Theorem 8.2 When ~ A, ~ B and ~ C are IT2 FSs,f( ~ A; ~ C),f( ~ B; ~ C),f( ~ A; ~ C)[f( ~ B; ~ C) andf( ~ A[ ~ B; ~ C) are also IT2 FSs. B.2 Furthermore, according to the Representation Theorem for IT2 FSs [177], f( ~ A; ~ C) = [ 8A2 ~ A;C2 ~ C f(A;C) (B.8) f( ~ B; ~ C) = [ 8B2 ~ B;C2 ~ C f(B;C) (B.9) f( ~ A[ ~ B; ~ C) = [ 8A2 ~ A;B2 ~ B;C2 ~ C f(A[B;C) (B.10) whereA,B andC are embedded T1 FSs B.3 of ~ A, ~ B and ~ C, respectively. It follows from (B.8) and (B.9) that f( ~ A; ~ C)[f( ~ B; ~ C) = 0 @ [ 8A2 ~ A;C2 ~ C f(A;C) 1 A [ 0 @ [ 8B2 ~ B;C2 ~ C f(B;C) 1 A = [ 8A2 ~ A;B2 ~ B;C2 ~ C f(A;C)[f(B;C) (B.11) According to Theorem8.1,f(A;C)[f(B;C) =f(A[B;C); Hence,f( ~ A[ ~ B; ~ C) =f( ~ A; ~ C)[ f( ~ B; ~ C). B.2 This proof was mainly worked out by Dr. Dongrui Wu. B.3 Assume that ~ A is an interval type-2 fuzzy set. An embedded type-1 fuzzy set of ~ A is a type-1 fuzzy set Ae that satisfies ~ A (x) Ae (x) ~ A (x), in which ~ A (x) and ~ A (x) respectively represent the lower membership function and the upper membership function of ~ A. For notational simplicity, in the above proof, an embedded type-1 fuzzy set of ~ A is denoted byA. 295 B.5 Proof forAtleast(Q) =Q WhenQ Is Monotonically Non-decreasing The operator At least is defined by: Atleast(Q) (x) = sup yx Q (y); x;y2 [0; 1] (B.12) Since Q (x) is non-decreasingly monotonic,8yx; Q (y) Q (x). Therefore, sup yx Q (y) = Q (x), and Atleast(Q) (x) = Q (x). B.6 Existence of Solutions to The Optimization Problems of Equa- tion (8.32) The first optimization problem stated in (8.32) can be translated into the following optimization problem for each-cut ofEfBg [214]: M i () = [a 0 i ();b 0 i ()] EfBg() = [z L ();z R ()] z L () = min p i 2[a 0 i ();b 0 i ()] P n i=1 p i =1 n X i=1 p i x i z R () = max p i 2[a 0 i ();b 0 i ()] P n i=1 p i =1 n X i=1 p i x i (B.13) in whichEfBg() and M i () represent the -cuts ofEfBg and M i , respectively. Observe that if P i a 0 i ()> 1, then both optimization problems in (B.13) do not have solutions, since the constraintsp i 2 [a 0 i ();b 0 i ()] and P i p i = 1 cannot be satisfied simultaneously. 296 BIBLIOGRAPHY [1] Amazon Mechanical Turk. [Online]. Available: https://www.mturk.com/mturk/ [2] Human height. [Online]. Available: https://en.wikipedia.org/wiki/Human height [3] Steam injection (oil industry). [Online]. Available: http://en.wikipedia.org/wiki/Steam injection (oil industry) [4] S. Abe and M.-S. Lan, “A method for fuzzy rules extraction directly from numerical data and its application to pattern classification,” IEEE Transactions on Fuzzy Systems, vol. 3, no. 1, pp. 18–28, 1995. [5] J. R. Aguero and A. Vargas, “Calculating functions of interval type-2 fuzzy numbers for fault current analysis,” IEEE Transactions on Fuzzy Systems, vol. 15, no. 1, pp. 31–40, 2007. [6] J. Aisbett, J. T. Rickard, and D. Morgenthaler, “Multivariate modeling and type-2 fuzzy sets,” Fuzzy Sets and Systems, vol. 163, no. 1, pp. 78–95, 2011. [7] S. Aja-Fern´ andez, R. de Luis-Garcıa, M. A. Martın-Fern´ andez, and C. Alberola-L´ opez, “A computational TW3 classifier for skeletal maturity assessment. A computing with words approach,” Journal of Biomedical Informatics, vol. 37, no. 2, pp. 99–107, 2004. [8] J. Alvarez and S. Han, “Current overview of cyclic steam injection process,” Journal of Petroleum Science Research, vol. 2, no. 3, pp. 48–63, July 2013. [9] F. Aminravan, M. Hoorfar, R. Sadiq, A. Fransicque, H. Najjaran, and M. Rodriguez, “In- terval belief structure rule-based system using extended fuzzy Dempster-Shafer inference,” in 2011 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2011, pp. 3017–3022. [10] F. Aminravan, R. Sadiq, M. Hoorfar, M. Rodriguez, and H. Najjaran, “Multicriteria infor- mation fusion using a fuzzy evidential rule-based framework,” in 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2012, pp. 1890–1895. 297 [11] B. Araabi, N. Kehtarnavaz, and C. Lucas, “Restrictions imposed by the fuzzy extension of relations and functions,” Journal of Intelligent and Fuzzy Systems-Applications in Engi- neering and Technology, vol. 11, no. 2, pp. 9–22, 2001. [12] R. Ara´ ujo and A. T. de Almeida, “Learning sensor-based navigation of a real mobile robot in unknown worlds,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cy- bernetics, vol. 29, no. 2, pp. 164–178, 1999. [13] Z. Artstein, “Set-valued measures,” Transactions of the American Mathematical Society, vol. 165, pp. 103–125, 1972. [14] K. J. ˚ Astr¨ om and B. Wittenmark, Adaptive control. Courier Dover Publications, 2013. [15] A. Azadeh, M. Saberi, N. Z. Atashbar, E. Chang, and P. Pazhoheshfar, “Z-AHP: A Z- number extension of fuzzy analytical hierarchy process,” in Proceedings of 2013 7th IEEE International Conference on Digital Ecosystems and Technologies (DEST),. IEEE, 2013, pp. 141–147. [16] P. Baranyi, L. T. K´ oczy, and T. D. Gedeon, “A generalized concept for fuzzy rule interpo- lation,” IEEE Transactions on Fuzzy Systems, vol. 12, no. 6, pp. 820–837, 2004. [17] C. Barker, “Vagueness,” in Encyclopedia of Language and Linguistics, P. W. Simpson, Ed. [18] O. Basir and X. Yuan, “Engine fault diagnosis based on multi-sensor information fusion using dempster–shafer evidence theory,” Information Fusion, vol. 8, no. 4, pp. 379–386, 2007. [19] R. E. Bellman and L. A. Zadeh, “Decision-making in a fuzzy environment,” Management Science, vol. 17, no. 4, pp. B–141, 1970. [20] H. R. Berenji, “Fuzzy Q-learning: a new approach for fuzzy dynamic programming,” in Proceedings of the Third IEEE Conference on Fuzzy Systems. IEEE, 1994, pp. 486–491. [21] F. Berzal, J. C. Cubero, N. Mar´ ın, M. A. Vila, J. Kacprzyk, and S. Zadro˙ zny, “A gen- eral framework for computing with words in object-oriented programming,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 15, no. supp01, pp. 111–131, 2007. [22] F. Berzal, M. J. Martin-Bautista, M.-A. Vila, and H. L. Larsen, “Computing with words in information retrieval,” in IFSA World Congress and 20th NAFIPS International Confer- ence, 2001. Joint 9th. IEEE, 2001, pp. 3088–3092. [23] I. Bloch, “Information combination operators for data fusion: A comparative review with classification,” IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, vol. 26, no. 1, pp. 52–67, 1996. [24] A. Bonarini, ,(Studies in Fuzziness, 8), Physica-Verlag, Berlin, D, pp. 447–466, 1996. 298 [25] ——, “Delayed reinforcement, fuzzy q-learning and fuzzy logic controllers,” in Genetic Algorithms and Soft Computing, ser. Studies in Fuzziness, F. Herrera and J. L. Verdegay, Eds. Physica Verlag, 1996, vol. 8, pp. 447–466. [26] B. Brown, M. Chui, and J. Manyika, “Are you ready for the era of ´ big data ´ ?” McKinsey Quarterly, vol. 4, pp. 24–35, 2011. [27] J. Buckley, Fuzzy probabilities: new approach and applications. Springer Verlag, 2005. [28] R. H. Byrd, J. C. Gilbert, and J. Nocedal, “A trust region method based on interior point techniques for nonlinear programming,” Mathematical Programming, vol. 89, no. 1, pp. 149–185, 2000. [29] F. Campos and F. de Souza, “Extending Dempster-Shafer theory to overcome counter intu- itive results,” in Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE’05. IEEE, 2005, pp. 729– 734. [30] Y . Cao and G. Chen, “A fuzzy petri-nets model for computing with words,” IEEE Transac- tions on Fuzzy Systems, vol. 18, no. 3, pp. 486–499, 2010. [31] Y . Cao, M. Ying, and G. Chen, “Retraction and generalized extension of computing with words,” IEEE Transactions on Fuzzy Systems, vol. 15, no. 6, pp. 1238–1250, 2007. [32] C. Chakraborty and D. Chakraborty, “Fuzzy rule base for consumer trustworthiness in in- ternet marketing: An interactive fuzzy rule classification approach,” Intelligent Data Anal- ysis, vol. 11, no. 4, pp. 339–353, 2007. [33] J. L. Chameau and J. C. Santamarina, “Membership functions I: Comparing methods of measurement,” International Journal of Approximate Reasoning, vol. 1, no. 3, pp. 287– 301, 1987. [34] B. Chen, X. P. Liu, S. S. Ge, and C. Lin, “Adaptive fuzzy control of a class of nonlinear systems by fuzzy approximation approach,” IEEE Transactions on Fuzzy Systems, vol. 20, no. 6, pp. 1012–1021, 2012. [35] C. P. Chen, Y .-J. Liu, and G.-X. Wen, “Fuzzy neural network-based adaptive control for a class of uncertain nonlinear stochastic systems,” IEEE Transactions on Cybernetics, vol. 44, no. 5, pp. 583 – 593, 2014. [36] F.-C. Chen, “Back-propagation neural networks for nonlinear self-tuning adaptive control,” IEEE Control Systems Magazine, vol. 10, no. 3, pp. 44–48, 1990. [37] J. Chen and K. Otto, “Constructing membership functions using interpolation and mea- surement theory,” Fuzzy Sets and Systems, vol. 73, no. 3, pp. 313–327, 1995. 299 [38] M.-Y . Chen and D. A. Linkens, “Rule-base self-generation and simplification for data- driven fuzzy models,” in The 10th IEEE International Conference on Fuzzy Systems, vol. 1, 2001, pp. 424–427. [39] S.-M. Chen, Y .-C. Chang, and J.-S. Pan, “Fuzzy rules interpolation for sparse fuzzy rule- based systems based on interval type-2 Gaussian fuzzy sets and genetic algorithms,” IEEE Transactions on Fuzzy Systems, vol. 21, no. 3, pp. 412–425, 2013. [40] S. Chen, S. Billings, and P. Grant, “Non-linear system identification using neural net- works,” International Journal of Control, vol. 51, no. 6, pp. 1191–1214, 1990. [41] S.-J. Chen, C.-L. Hwang, and F. P. Hwang, Fuzzy multiple attribute decision making. Springer, 1992. [42] S.-M. Chen, M.-S. Yeh, and P.-Y . Hsiao, “A comparison of similarity measures of fuzzy values,” Fuzzy Sets and Systems, vol. 72, no. 1, pp. 79–89, 1995. [43] O. Cord´ on, M. J. del Jesus, and F. Herrera, “A proposal on reasoning methods in fuzzy rule- based classification systems,” International Journal of Approximate Reasoning, vol. 20, no. 1, pp. 21–45, 1999. [44] I. Couso and L. S´ anchez, “Upper and lower probabilities induced by a fuzzy random vari- able,” Fuzzy Sets and Systems, vol. 165, no. 1, pp. 1–23, 2011. [45] T. M. Cover and J. A. Thomas, Elements of information theory. Wiley-interscience, 2012. [46] V . Cross and T. Sudkamp, Similarity and compatibility in fuzzy set theory: Assessment and applications. Physica Verlag, 2002. [47] A. de Soto and E. Trillas, “On antonym and negate in fuzzy logic,” International Journal of Intelligent Systems, vol. 14, no. 3, pp. 295–303, 1999. [48] M. Delgado, O. Duarte, and I. Requena, “An arithmetic approach for the computing with words paradigm,” International journal of intelligent systems, vol. 21, no. 2, pp. 121–142, 2006. [49] A. Dempster, “Upper and lower probabilities induced by a multivalued mapping,” Annals of Mathematical Statistics, vol. 38, no. 2, pp. 325–339, 1967. [50] ——, “A generalization of Bayesian inference,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 205–247, 1968. [51] ——, “The Dempster–Shafer calculus for statisticians,” International Journal of Approxi- mate Reasoning, vol. 48, no. 2, pp. 365–377, 2008. [52] Y . Deng and F. T. S. Chan, “A new fuzzy Dempster MCDM method and its application in supplier selection,” Expert Systems with Applications, vol. 38, no. 8, pp. 9854–9861, 2011. 300 [53] T. Denœux, “Reasoning with imprecise belief structures,” International Journal of Approx- imate Reasoning, vol. 20, no. 1, pp. 79–111, 1999. [54] ——, “Modeling vague beliefs using fuzzy-valued belief structures,” Fuzzy Sets and Sys- tems, vol. 116, no. 2, pp. 167–199, 2000. [55] ——, “Conjunctive and disjunctive combination of belief functions induced by nondistinct bodies of evidence,” Artificial Intelligence, vol. 172, no. 2, pp. 234–264, 2008. [56] ——, “A k-nearest neighbor classification rule based on Dempster-Shafer theory,” in Clas- sic works of the Dempster-Shafer theory of belief functions. Springer, 2008, pp. 737–760. [57] Y . Ding and A. Lisnianski, “Fuzzy universal generating functions for multi-state system reliability assessment,” Fuzzy Sets and Systems, vol. 159, no. 3, pp. 307–324, 2008. [58] Disabled World. (2008, October) Height chart of men and women in different countries. [Online]. Available: http://www.disabled-world.com/artman/publish/height-chart.shtml [59] W.-M. Dong and F. S. Wong, “Fuzzy weighted averages and implementation of the exten- sion principle,” Fuzzy Sets and Systems, vol. 21, no. 2, pp. 183–199, 1987. [60] H. Doukas, C. Karakosta, and J. Psarras, “Computing with words to assess the sustain- ability of renewable energy options,” Expert Systems with Applications, vol. 37, no. 7, pp. 5491–5497, 2010. [61] D. Dubois and H. Prade, “Various kinds of interactive addition of fuzzy numbers, appli- cation to decision analysis in presence of linguistic probabilities,” in Proceedings of 18th IEEE Conference on Decision and Control including the Symposium on Adaptive Pro- cesses, vol. 18. IEEE, 1979, pp. 783–787. [62] ——, “Additions of interactive fuzzy numbers,” IEEE Transactions on Automatic Control, vol. 26, no. 4, pp. 926–936, 1981. [63] ——, “Representation and combination of uncertainty with belief functions and possibility measures,” Computational Intelligence, vol. 4, no. 3, pp. 244–264, 1988. [64] ——, “Rough fuzzy sets and fuzzy rough sets*,” International Journal of General System, vol. 17, no. 2-3, pp. 191–209, 1990. [65] ——, “Evidence, knowledge, and belief functions,” International Journal of Approximate Reasoning, vol. 6, no. 3, pp. 295–319, 1992. [66] J. Dunyak, I. Saad, and D. Wunsch, “A theory of independent fuzzy probability for system reliability,” IEEE Transactions on Fuzzy Systems, vol. 7, no. 3, pp. 286–294, 1999. [67] M. J. Er and C. Deng, “Online tuning of fuzzy inference systems using dynamic fuzzy Q-learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 34, no. 3, pp. 1478–1489, 2004. 301 [68] M. Fazzolari, R. Alcal´ a, and F. Herrera, “A multi-objective evolutionary method for learn- ing granularities based on fuzzy discretization to improve the accuracy-complexity trade- off of fuzzy rule-based classification systems: D-mofarc algorithm,” Applied Soft Comput- ing, vol. 24, pp. 470–481, 2014. [69] R. Felix, “Goal-oriented control of VLSI design processes based on fuzzy sets,” in Proceed- ings of the Twentieth International Symposium on Multiple-Valued Logic, 1990. IEEE, 1990, pp. 386–393. [70] R. Fleshman and O. Lekic, “Artificial lift for high-volume production,” Oilfield Review, vol. 11, pp. 48–63, 1999. [71] M. C. Florea, A. L. Jousselme, ´ E. Boss´ e, and D. Grenier, “Robust combination rules for evidence theory,” Information Fusion, vol. 10, no. 2, pp. 183–197, 2009. [72] K. L. Fox, R. R. Henning, J. T. Farrell, and C. C. Miller, “System and method for assessing the security posture of a network using goal oriented fuzzy logic decision rules,” 2005, US Patent 6,883,101. [73] C. Fu and S. Yang, “The combination of dependence-based interval-valued evidential rea- soning approach with balanced scorecard for performance assessment,” Expert Systems with Applications, vol. 39, no. 3, pp. 3717–3730, 2012. [74] ——, “The conjunctive combination of interval-valued belief structures from dependent sources,” International Journal of Approximate Reasoning, 2012. [75] ——, “An evidential reasoning based consensus model for multiple attribute group decision analysis problems with interval-valued group consensus requirements,” European Journal of Operational Research, vol. 223, no. 1, pp. 167–176, 2012. [76] ——, “Group consensus based on evidential reasoning approach using interval-valued be- lief structures,” Knowledge-Based Systems, vol. 35, pp. 167–176, 2012. [77] P. Y . Glorennec, “Fuzzy Q-learning and dynamical fuzzy Q-learning,” in Proceedings of the Third IEEE Conference on Fuzzy Systems. IEEE, 1994, pp. 474–479. [78] F. Gomide and W. Pedrycz, “A relational framework for approximate reasoning in truth space,” in Joint 9th IFSA World Congress and 20th NAFIPS International Conference, 2001, vol. 3. IEEE, 2001, pp. 1604–1607. [79] R. P. Grimaldi, Discrete and Combinatorial Mathematics, 5th ed. Pearson Education India, 2006. [80] S. Guadarrama, “A contribution to computing with words and perceptions,” Ph.D. disser- tation, Technical University of Madrid, Madrid, 2007. [81] S. Guadarrama and M. Garrido, “Concept-analyzer: A tool for analyzing fuzzy concepts,” in Proceedings of IPMU, vol. 8, 2008, pp. 1084–1089. 302 [82] Y . Y . Guh, C. C. Hon, and E. S. Lee, “Fuzzy weighted average: The linear programming approach via Charnes and Cooper’s rule,” Fuzzy Sets and Systems, vol. 117, no. 1, pp. 157–160, 2001. [83] R. Haenni, “Are alternatives to Dempsters rule of combination real alternatives? Comments on “About the belief function combination and the conflict management problem”–Lefevre et al,” Information Fusion, vol. 3, no. 4, pp. 237–239, 2002. [84] ——, “Shedding new light on Zadeh’s criticism of Dempster’s rule of combination,” in Information Fusion, 2005 8th International Conference on, vol. 2. IEEE, 2005, pp. 6–pp. [85] M. T. Hagan, H. B. Demuth, M. H. Beale, and O. De Jes´ us, Neural Network Design, 2nd ed., 2014. [86] M. T. Hagan and M. B. Menhaj, “Training feedforward networks with the marquardt algo- rithm,” IEEE Transactions on Neural Networks, vol. 5, no. 6, pp. 989–993, 1994. [87] Y . Hagmayer and M. Osman, “From colliding billiard balls to colluding desperate house- wives: causal bayes nets as rational models of everyday causal reasoning,” Synthese, pp. 1–12, 2012. [88] J. Halliwell, “Linguistic probability theory,” Ph.D. dissertation, School of Informatics, Uni- versity of Edinburgh, UK, 2007. [89] J. Halliwell and Q. Shen, “Linguistic probabilities: theory and application,” Soft Comput- ing, vol. 13, no. 2, pp. 169–183, 2009. [90] H. Hamrawi, “Type-2 fuzzy alpha-cuts,” Ph.D. dissertation, De Montfort University, 2011. [91] H. Hamrawi and S. Coupland, “Type-2 fuzzy arithmetic using alpha-planes,” in Proceed- ings of 2009 IFSA-EUSFLAT Conference, 2009, pp. 606–611. [92] S. Han and J. Mendel, “A new method for managing the uncertainties in evaluating multi- person multi-criteria location choices, using a perceptual computer,” Annals of Operations Research, vol. 195, no. 1, pp. 277–309, 2012. [93] T. C. Havens, J. M. Keller, and M. Popescu, “Computing with words with the ontological self-organizing map,” IEEE Transactions on Fuzzy Systems, vol. 18, no. 3, pp. 473–485, 2010. [94] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. Prentice Hall Inc. [95] T. Heath and C. Bizer, “Linked data: Evolving the web into a global data space,” Synthesis Lectures on the Semantic Web: Theory and Technology, vol. 1, no. 1, pp. 1–136, 2011. [96] L. Hegarat-Mascle, I. Bloch, and D. Vidal-Madjar, “Application of Dempster-Shafer evi- dence theory to unsupervised classification in multisource remote sensing,” IEEE Transac- tions on Geoscience and Remote Sensing, vol. 35, no. 4, pp. 1018–1031, 1997. 303 [97] F. Herrera, S. Alonso, F. Chiclana, and E. Herrera-Viedma, “Computing with words in decision making: foundations, trends and prospects,” Fuzzy Optimization and Decision Making, vol. 8, no. 4, pp. 337–364, 2009. [98] F. Herrera and L. Mart´ ınez, “A 2-tuple fuzzy linguistic representation model for computing with words,” IEEE Transactions on Fuzzy Systems, vol. 8, no. 6, pp. 746–752, 2000. [99] ——, “A model based on linguistic 2-tuples for dealing with multigranular hierarchical linguistic contexts in multi-expert decision-making,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 31, no. 2, pp. 227–234, 2001. [100] E. Herrera-Viedma, G. Pasi, A. G. Lopez-Herrera, and C. Porcel, “Evaluating the informa- tion quality of web sites: A methodology based on fuzzy computing with words,” Journal of the American Society for Information Science and Technology, vol. 57, no. 4, pp. 538– 549, 2006. [101] Y .-C. Hsueh and S.-F. Su, “Learning error feedback design of direct adaptive fuzzy control systems,” IEEE Transactions on Fuzzy Systems, vol. 20, no. 3, pp. 536–545, 2012. [102] C. Hwang and M. Yang, “Generalization of belief and plausibility functions to fuzzy sets based on the Sugeno integral,” International Journal of Intelligent Systems, vol. 22, no. 11, pp. 1215–1228, 2007. [103] H. Ishibuchi, S. Mihara, and Y . Nojima, “Parallel distributed hybrid fuzzy GBML models with rule set migration and training data rotation,” IEEE Transactions on Fuzzy Systems, vol. 21, no. 2, pp. 355–368, 2013. [104] H. Ishibuchi, K. Nozaki, and H. Tanaka, “Distributed representation of fuzzy rules and its application to pattern classification,” Fuzzy Sets and Systems, vol. 52, no. 1, pp. 21–32, 1992. [105] H. Ishibuchi and T. Yamamoto, “Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining,” Fuzzy Sets and Systems, vol. 141, no. 1, pp. 59–88, 2004. [106] ——, “Rule weight specification in fuzzy rule-based classification systems,” IEEE Trans- actions on Fuzzy Systems, vol. 13, no. 4, pp. 428–435, 2005. [107] M. Ishizuka, K. S. Fu, and J. T. P. Yao, “Inference procedures under uncertainty for the problem-reduction method,” Information Sciences, vol. 28, no. 3, pp. 179–206, 1982. [108] P. Jaccard, “´ etude comparative de la distribution florale dans une portion des alpes et du jura,” Bulletin de la Soci` et` e Vaudoise des Sciences Naturelles, pp. 547–579, 1901. [109] ——, “The distribution of the flora in the alpine zone,” New Phytologist, vol. 11, no. 2, pp. 37–50, 1912. 304 [110] J.-S. Jang, “ANFIS: adaptive-network-based fuzzy inference system,” IEEE Transactions on Systems, Man and Cybernetics, vol. 23, no. 3, pp. 665–685, 1993. [111] N. Jing, R. Xuemei, and Z. Dongdong, “Adaptive control for nonlinear pure-feedback sys- tems with high-order sliding mode observer.” IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 3, pp. 370–382, 2013. [112] A. Jøsang, J. Diaz, and M. Rifqi, “Cumulative and averaging fusion of beliefs,” Information Fusion, vol. 11, no. 2, pp. 192–200, 2010. [113] C.-F. Juang and C.-H. Hsu, “Reinforcement interval type-2 fuzzy controller design by on- line rule generation and Q-value-aided ant colony optimization,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 39, no. 6, pp. 1528–1542, 2009. [114] C.-F. Juang and C.-T. Lin, “An online self-constructing neural fuzzy inference network and its applications,” IEEE Transactions on Fuzzy Systems, vol. 6, no. 1, pp. 12–32, 1998. [115] J. Kacprzyk and R. R. Yager, “Linguistic summaries of data using fuzzy logic,” Interna- tional Journal of General Systems, vol. 30, no. 2, pp. 133–154, 2001. [116] J. Kacprzyk and S. Zadro˙ zny, “Computing with words in decision making through individ- ual and collective linguistic choice rules,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 9, no. supp01, pp. 89–102, 2001. [117] ——, “Computing with words is an implementable paradigm: fuzzy queries, linguistic data summaries, and natural-language generation,” IEEE Transactions on Fuzzy Systems, vol. 18, no. 3, pp. 461–472, 2010. [118] B. Kang, D. Wei, Y . Li, and Y . Deng, “Decision making using Z-numbers under uncertain environment,” Journal of Computational Information Systems, vol. 8, no. 7, pp. 2807– 2814, 2012. [119] B. Kanga, D. Weia, Y . Lia, and Y . Denga, “A method of converting Z-number to classical fuzzy number,” Journal of Information & Computational Science, no. 3, pp. 703–709, 2012. [120] I. Karimi and E. H¨ ullermeier, “Risk assessment system of natural hazards: A new approach based on fuzzy probability,” Fuzzy Sets and Systems, vol. 158, no. 9, pp. 987–999, 2007. [121] N. N. Karnik and J. M. Mendel, “Centroid of a type-2 fuzzy set,” Information Sciences, vol. 132, no. 1, pp. 195–220, 2001. [122] A. Kaufmann, M. M. Gupta, and A. Kaufmann, Introduction to fuzzy arithmetic: Theory and applications. Van Nostrand Reinhold Company, 1985. [123] R. Keefe, “Vagueness: Philosophical aspects,” in Encyclopedia of Language and Linguis- tics, P. W. Simpson, Ed. 305 [124] E. S. Khorasani, P. Patel, S. Rahimi, and D. Houle, “An inference engine toolkit for com- puting with words,” Journal of Ambient Intelligence and Humanized Computing, pp. 1–20, 2012. [125] E. S. Khorasani, S. Rahimi, and W. Calvert, “Formalization of generalized constraint lan- guage: A crucial prelude to computing with words,” IEEE Transactions on Cybernetics, vol. 43, no. 1, pp. 246–258, 2013. [126] E. S. Khorasani, S. Rahimi, P. Patel, and D. Houle, “Cwjess: Implementation of an ex- pert system shell for computing with words,” in 2011 Federated Conference on Computer Science and Information Systems (FedCSIS), 2011, pp. 33–39. [127] G. J. Klir and B. Yuan, Fuzzy sets and fuzzy logic: Theory and applications. Prentice Hall, 1995. [128] L. T. K´ oczy and K. Hirota, “Interpolative reasoning with insufficient evidence in sparse fuzzy rule bases,” Information Sciences, vol. 71, no. 1, pp. 169–201, 1993. [129] L. T. K´ oczy and K. Hirota, “Size reduction by interpolation in fuzzy rule bases,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 27, no. 1, pp. 14–25, 1997. [130] S. Koenig and R. G. Simmons, “The effect of representation and knowledge on goal- directed exploration with reinforcement-learning algorithms,” Machine Learning, vol. 22, no. 1-3, pp. 227–250, 1996. [131] G. Kong, D.-L. Xu, R. Body, J.-B. Yang, K. Mackway-Jones, and S. Carley, “A belief rule- based decision support system for clinical risk assessment of cardiac chest pain,” European Journal of Operational Research, vol. 219, no. 3, pp. 564–573, 2012. [132] G. Kong, D.-L. Xu, X. Liu, and J.-B. Yang, “Applying a belief rule-base inference method- ology to a guideline-based clinical decision support system,” Expert Systems, vol. 26, no. 5, pp. 391–408, 2009. [133] B. Kosko, “Fuzzy cognitive maps,” International Journal of Man-Machine Studies, vol. 24, no. 1, pp. 65–75, 1986. [134] ——, “Fuzzy systems as universal approximators,” IEEE Transactions on Computers, vol. 43, no. 11, pp. 1329–1333, 1994. [135] A. K. Kostarigka and G. A. Rovithakis, “Adaptive dynamic output feedback neural network control of uncertain mimo nonlinear systems with prescribed performance,” IEEE Trans- actions on Neural Networks and Learning Systems, vol. 23, no. 1, pp. 138–149, 2012. [136] B. Kostek, “”Computing with words” concept applied to musical information retrieval,” Electronic Notes in Theoretical Computer Science, vol. 82, no. 4, pp. 141–152, 2003. 306 [137] B. Kovalerchuk, “Linguistic context spaces: necessary frames for correct approximate rea- soning,” International Journal of General System, vol. 25, no. 1, pp. 61–80, 1996. [138] M. Krstic, I. Kanellakopoulos, and P. V . Kokotovic, Nonlinear and adaptive control design. Wiley, 1995. [139] F. Latif, S. Griston-Castrup, and A. Al Kalbani, “Field evaluation of mov adjustable steam chokes,” in Proceedings of SPE Western Regional Meeting. Society of Petroleum Engi- neers, 2012. [140] J. Lawry and Y . Tang, “Uncertainty modelling for vague concepts: A prototype theory approach,” Artificial Intelligence, vol. 173, no. 18, pp. 1539–1558, 2009. [141] J. Lawry, “An alternative approach to computing with words,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 9, no. supp01, pp. 3–16, 2001. [142] ——, “A methodology for computing with words,” International Journal of Approximate Reasoning, vol. 28, no. 2, pp. 51–89, 2001. [143] E. Lee and Q. Zhu, “An interval Dempster-Shafer approach,” Computers & Mathematics with Applications, vol. 24, no. 7, pp. 89–95, 1992. [144] K.-H. Lee, J.-H. Bang, I.-M. Lee, and Y .-J. Shin, “Use of fuzzy probability theory to assess spalling occurrence in underground openings,” International Journal of Rock Mechanics and Mining Sciences, vol. 64, pp. 60–67, 2013. [145] E. Lefevre, O. Colot, and P. Vannoorenberghe, “Belief function combination and conflict management,” Information fusion, vol. 3, no. 2, pp. 149–162, 2002. [146] W.-X. Li, S.-J. Liu, J.-F. Li, Z.-H. Ji, Q. Wang, and X. Yin, “Ground movement analysis in deep iron mine using fuzzy probability theory,” Applied Mathematical Modelling, vol. 37, no. 1, pp. 345–356, 2013. [147] W. Li, J. Liu, H. Wang, A. Calzada, R. M. Rodriguez, and L. Martinez, “A qualitative decision making model based on belief linguistic rule based inference methodology,” In- ternational Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 20, no. supp01, pp. 105–118, 2012. [148] C.-T. Lin, “A neural fuzzy control system with structure and parameter learning,” Fuzzy Sets and Systems, vol. 70, no. 2, pp. 183–212, 1995. [149] C.-T. Lin and C.-P. Jou, “GA-based fuzzy reinforcement learning for control of a magnetic bearing system,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernet- ics, vol. 30, no. 2, pp. 276–289, 2000. [150] F. Liu and J. M. Mendel, “Aggregation using the fuzzy weighted average as computed by the Karnik–Mendel algorithms,” IEEE Transactions on Fuzzy Systems, vol. 16, no. 1, pp. 1–12, 2008. 307 [151] ——, “Encoding words into interval type-2 fuzzy sets using an interval approach,” IEEE Transactions on Fuzzy Systems, vol. 16, no. 6, pp. 1503–1521, 2008. [152] H. Liu, P. Hu, Y . Luo, and C. Li, “A goal-oriented fuzzy reactive control method for mobile robot navigation in unknown environment,” in IEEE International Symposium on Industrial Electronics, 2009 (ISIE 2009). IEEE, 2009, pp. 1950–1955. [153] J. Liu, L. Martinez, D. Ruan, R. Rodriguez, and A. Calzada, “Optimization algorithm for learning consistent belief rule-base from examples,” Journal of Global Optimization, vol. 51, no. 2, pp. 255–270, 2011. [154] J. Liu, L. Martınez, H. Wang, R. M. Rodrıguez, and V . Novozhilov, “Computing with words in risk assessment,” International Journal of Computational Intelligence Systems, vol. 3, no. 4, pp. 396–419, 2010. [155] Y .-J. Liu, S. Tong, and C. P. Chen, “Adaptive fuzzy control via observer design for uncer- tain nonlinear systems with unmodeled dynamics,” IEEE Transactions on Fuzzy Systems, vol. 21, no. 2, pp. 275–288, 2013. [156] V . L´ opez, S. del R´ ıo, J. M. Ben´ ıtez, and F. Herrera, “Cost-sensitive linguistic fuzzy rule based classification systems under the mapreduce framework for imbalanced big data,” Fuzzy Sets and Systems, vol. 258, pp. 5–38, 2015. [157] C. Lucas and B. N. Araabi, “Generalization of the Dempster-Shafer theory: a fuzzy-valued measure,” IEEE Transactions on Fuzzy Systems, vol. 7, no. 3, pp. 255–270, 1999. [158] C. Lynch, “Big data: How do your data grow?” Nature, vol. 455, no. 7209, pp. 28–29, 2008. [159] MacRumors: news and rumors you care about. (2012, March) What’s the lifespan of an iPad. [Online]. Available: http://forums.macrumors.com/showthread.php?t=1339480 [160] E. H. Mamdani and S. Assilian, “An experiment in linguistic synthesis with a fuzzy logic controller,” International Journal of Man-Machine Studies, vol. 7, no. 1, pp. 1–13, 1975. [161] M. Mandal, A. Mukhopadhyay, and U. Maulik, “Fuzzy rule-based classifier for microar- ray gene expression data by using a multiobjective pso-based approach,” in 2013 IEEE International Conference on Fuzzy Systems. IEEE, 2013, pp. 1–7. [162] M. Margaliot and G. Langholz, “Fuzzy control of a benchmark problem: a computing with words approach,” IEEE Transactions on Fuzzy Systems, vol. 12, no. 2, pp. 230–235, 2004. [163] L. Martınez, D. Ruan, and F. Herrera, “Computing with words in decision support sys- tems: An overview on models and applications,” International Journal of Computational Intelligence Systems, vol. 3, no. 4, pp. 382–395, 2010. 308 [164] J. M. Mendel, “The perceptual computer: An architecture for computing with words,” in Proceedings of The 10th IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2001., vol. 1. IEEE, 2001, pp. 35–38. [165] ——, Uncertain rule-based fuzzy logic system: introduction and new directions. Prentice– Hall PTR, 2001. [166] ——, “An architecture for making judgments using computing with words,” International Journal of Applied Mathematics and Computer Science, vol. 12, no. 3, pp. 325–336, 2002. [167] ——, “Computing with words and its relationships with fuzzistics,” Information Sciences, vol. 177, no. 4, pp. 988–1006, 2007. [168] ——, “Computing with words: Zadeh, Turing, Popper and Occam,” IEEE Computational Intelligence Magazine, vol. 2, no. 4, pp. 10–17, 2007. [169] ——, “On answering the question “Where do I start in order to solve a new problem involv- ing interval type-2 fuzzy sets?”,” Information Sciences, vol. 179, no. 19, pp. 3418–3431, 2009. [170] J. M. Mendel, J. Lawry, and L. A. Zadeh, “Foreword to the special section on computing with words,” IEEE Transactions on Fuzzy Systems, vol. 18, no. 3, pp. 437–440, 2010. [171] J. M. Mendel and D. Wu, “Perceptual reasoning for perceptual computing,” IEEE Trans- actions on Fuzzy Systems, vol. 16, no. 6, pp. 1550–1564, 2008. [172] ——, “Computing with words for hierarchical and distributed decision making,,” in Com- putational Intelligence in Complex Decision Systems. Paris, France: Atlantis, 2009. [173] ——, Perceptual computing: Aiding people in making subjective judgments. Wiley-IEEE Press, 2010. [174] ——, “Challenges for perceptual computer applications and how they were overcome,” IEEE Computational Intelligence Magazine, vol. 7, no. 3, pp. 36–47, 2012. [175] J. M. Mendel, L. A. Zadeh, E. Trillas, R. R. Yager, J. Lawry, H. Hagras, and S. Guadarrama, “What computing with words means to me [discussion forum],” IEEE Computational In- telligence Magazine, vol. 5, no. 1, pp. 20–26, 2010. [176] J. M. Mendel, “Fuzzy logic systems for engineering: a tutorial,” Proceedings of the IEEE, vol. 83, no. 3, pp. 345–377, 1995. [177] J. M. Mendel and R. B. John, “Type-2 fuzzy sets made simple,” IEEE Transactions on Fuzzy Systems, vol. 10, no. 2, pp. 117–127, 2002. [178] Minitab. (2013) Technical support document: Distribution models for reliability data. [Online]. Available: http://www.minitab.com/support/documentation/Answers/Reliability Distribution.pdf 309 [179] B. M¨ oller, W. Graf, and M. Beer, “Safety assessment of structures in view of fuzzy ran- domness,” Computers & Structures, vol. 81, no. 15, pp. 1567–1582, 2003. [180] J. Mun, M. Shin, and M. Jung, “A goal-oriented trust model for virtual organization cre- ation,” Journal of Intelligent Manufacturing, vol. 22, no. 3, pp. 345–354, 2011. [181] J. Mun, M. Shin, K. Lee, and M. Jung, “Manufacturing enterprise collaboration based on a goal-oriented fuzzy trust evaluation model in a virtual enterprise,” Computers & Industrial Engineering, vol. 56, no. 3, pp. 888–901, 2009. [182] C. K. Murphy, “Combining belief functions when evidence conflicts,” Decision Support Systems, vol. 29, no. 1, pp. 1–9, 2000. [183] T. Nakama, E. Trillas, and I. Garc´ ıa-Honrado, “Axiomatic investigation of fuzzy probabili- ties,” in Soft Computing in Humanities and Social Sciences, R. Seiding and V . S. Gonz` alez, Eds. Springer, 2012, vol. 273, pp. 125–140. [184] S. Nanda and S. Majumdar, “Fuzzy rough sets,” Fuzzy sets and Systems, vol. 45, no. 2, pp. 157–160, 1992. [185] K. S. Narendra and S. Mukhopadhyay, “Adaptive control using neural networks and ap- proximate models,” IEEE Transactions on Neural Networks, vol. 8, no. 3, pp. 475–485, 1997. [186] K. S. Narendra and K. Parthasarathy, “Identification and control of dynamical systems using neural networks,” IEEE Transactions on Neural Networks, vol. 1, no. 1, pp. 4–27, 1990. [187] D. Nauck and R. Kruse, “How the learning of rule weights affects the interpretability of fuzzy systems,” in Proceedings, of The 1998 IEEE International Conference on Fuzzy Systems, vol. 2, 1998, pp. 1235–1240. [188] C. H. Nguyen, W. Pedrycz, T. L. Duong, and T. S. Tran, “A genetic design of linguistic terms for fuzzy rule based classifiers,” International Journal of Approximate Reasoning, vol. 54, no. 1, pp. 1–21, 2013. [189] H. T. Nguyen and V . Kreinovich, “Nested intervals and sets: concepts, relations to fuzzy sets, and applications,” in Applications of interval computations, ser. Applied Optimization, R. Baker Kearfott and V . Kreinovich, Eds. Kluwer Academic Publishers, Dordrecht, Netherlands, 1996, vol. 3, pp. 245–290. [190] H. Nguyen, V . Kreinovich, and Q. Zuo, “Interval-valued degrees of belief: applications of interval computations to expert systems and intelligent control,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 5, no. 3, pp. 317–358, 1997. [191] A. Niewiadomski, “A type-2 fuzzy approach to linguistic summarization of data,” IEEE Transactions on Fuzzy Systems, vol. 16, no. 1, pp. 198–212, 2008. 310 [192] A. Norwich and I. Turksen, “A model for the measurement of membership and the conse- quences of its empirical implementation,” Fuzzy Sets and Systems, vol. 12, no. 1, pp. 1–25, 1984. [193] V . Novak, “Antonyms and linguistic quantifiers in fuzzy logic,” Fuzzy Sets and Systems, vol. 124, no. 3, pp. 335–351, 2001. [194] V . Nov´ ak and P. Murinov´ a, “Intermediate quantifiers of natural language and their syllo- gisms,” in Proceedings of World Conference on Soft Computing, 2011, pp. 178–184. [195] K. Nozaki, H. Ishibuchi, and H. Tanaka, “A simple but powerful heuristic method for generating fuzzy rules from numerical data,” Fuzzy Sets and Systems, vol. 86, no. 3, pp. 251–270, 1997. [196] H. Ogawa, K. S. Fu, and J. T. P. Yao, “An inexact inference for damage assessment of existing structures,” International Journal of Man-Machine Studies, vol. 22, no. 3, pp. 295–306, 1985. [197] N. R. Pal and T. Pal, “On rule pruning using fuzzy neural networks,” Fuzzy Sets and Sys- tems, vol. 106, no. 3, pp. 335–347, 1999. [198] S. K. Pal, R. Banerjee, S. Dutta, and S. S. Sarma, “An insight into the Z-number approach to cww,” Fundamenta Informaticae, vol. 124, no. 1, pp. 197–229, 2013. [199] S. K. Pal, L. Polkowski, and A. Skowron, Rough Neural Computing: Techniques for Com- puting with Words. Springer, 2004. [200] D. P. Pancho, J. M. Alonso, O. Cord´ on, A. Quirin, and L. Magdalena, “Fingrams: visual representations of fuzzy rule-based inference for expert analysis of comprehensibility,” IEEE Transactions on Fuzzy Systems, vol. 21, no. 6, pp. 1133–1149, 2013. [201] K. M. Passino, S. Yurkovich, and M. Reinfrank, Fuzzy control. Addison Wesley, 1998. [202] P. Patel, E. Khorasani, and S. Rahimi, “An API for generalized constraint language based expert system,” in Proceedings of 2012 Annual Meeting of the North American Fuzzy In- formation Processing Society (NAFIPS). IEEE, 2012, pp. 1–6. [203] Z. Pawlak, S. Wong, and W. Ziarko, “Rough sets: Probabilistic versus deterministic ap- proach.” International Journal of Man-Machine Studies, 1988. [204] W. Pedrycz, Fuzzy control and fuzzy systems, 2nd ed. Research Studies Press Ltd., 1993. [205] ——, Knowledge-based clustering: from data to information granules. John Wiley & Sons, 2005. [206] W. Pedrycz and S. H. Rubin, “Data compactification and computing with words,” Engi- neering Applications of Artificial Intelligence, vol. 23, no. 3, pp. 346–356, 2010. 311 [207] L. Polkowski and A. Skowron, “Calculi of granules based on rough set theory: approximate distributed synthesis and granular semantics for computing with words,” in New Directions in Rough Sets, Data Mining, and Granular-Soft Computing. Springer, 1999, pp. 20–28. [208] M. M. Polycarpou, “Stable adaptive neural control scheme for nonlinear systems,” IEEE Transactions on Automatic Control, vol. 41, no. 3, pp. 447–451, 1996. [209] D. Qiu and H. Wang, “A probabilistic model of computing with words,” Journal of Com- puter and System Sciences, vol. 70, no. 2, pp. 176–200, 2005. [210] M. R. Rajati and J. M. Mendel, “Lower and upper probability calculations using com- patibility measures for solving zadeh’s challenge problems,” in 2012 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, 2012, pp. 1–8. [211] ——, “Solving Zadeh’s Swedes and Italians challenge problem,” in Proceedings of 2012 Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS). IEEE, 2012, pp. 1–6. [212] ——, “Advanced computing with words using syllogistic reasoning and arithmetic oper- ations on linguistic belief structures,” in Proceedings of 2013 IEEE International Confer- ence on Fuzzy Systems (FUZZ-IEEE 2013). IEEE, 2013. [213] ——, “Modeling linguistic probabilities and linguistic quantifiers using interval type-2 fuzzy sets,” in Proceedings of 2013 Joint IFSA World Congress and NAFIPS Annual Meet- ing. IEEE, 2013. [214] ——, “Novel weighted averages versus normalized sums in computing with words,” Infor- mation Sciences, vol. 235, pp. 130–149, 2013. [215] M. R. Rajati, J. M. Mendel, and D. Wu, “Solving Zadeh’s Magnus challenge problem on linguistic probabilities via linguistic weighted averages,” in Proceedings of 2011 IEEE International Conference on Fuzzy Systems (IEEE FUZZ 2011). IEEE, 2011, pp. 2177– 2184. [216] M. R. Rajati, D. Wu, and J. M. Mendel, “On solving Zadeh’s tall Swedes problem,” in Proceedings of 2011 World Conference on Soft Computing, 2011. [217] M. R. Rajati and J. M. Mendel, “Interval type-2 fuzzy sets as linguistic probabilities,” in preparation, 2013. [218] ——, “Novel weighted averages versus normalized sums in computing with words,” Infor- mation Sciences, vol. 235, pp. 130–149, 2013. [219] ——, “Uncertainty modeling and reasoning with linguistic belief structures for computing with words,” in preparation, 2013. 312 [220] ——, “On advanced computing with words using the generalized extension principle for type-1 fuzzy sets,” IEEE Transactions on Fuzzy Systems, vol. 22, no. 5, pp. 1245 – 1261, 2014. [221] M. R. Rajati, D. Wu, and J. M. Mendel, “On solving Zadeh’s tall Swedes problem,” in Proceedings of 2011 World Conference on Soft Computing (WCSC 2011), 2011. [222] M. Reformat and C. Ly, “Ontological approach to development of computing with words based systems,” International Journal of Approximate Reasoning, vol. 50, no. 1, pp. 72–91, 2009. [223] H. Robbins, “On the measure of a random set,” The Annals of Mathematical Statistics, vol. 15, no. 1, pp. 70–74, 1944. [224] F. Rottensteiner, J. Trinder, S. Clode, and K. Kubik, “Using the Dempster–Shafer method for the fusion of LIDAR data and multi-spectral images for building detection,” Information Fusion, vol. 6, no. 4, pp. 283–300, 2005. [225] G. A. Rovithakis and M. A. Christodoulou, Adaptive control with recurrent high-order neural networks. Springer, 2000. [226] S. H. Rubin, “Computing with words,” IEEE Transactions on Systems, Man, and Cyber- netics, Part B: Cybernetics, vol. 29, no. 4, pp. 518–524, 1999. [227] A. Saffiotti, E. H. Ruspini, and K. Konolige, “Blending reactivity and goal-directedness in a fuzzy controller,” in Fuzzy Systems, 1993., Second IEEE International Conference on. IEEE, 1993, pp. 134–139. [228] E. Sanchez, “Truth-qualification and fuzzy relations in natural languages, application to medical diagnosis,” Fuzzy sets and systems, vol. 84, no. 2, pp. 155–167, 1996. [229] J. A. Sanz, A. Fern´ andez, H. Bustince, and F. Herrera, “IVTURS: A linguistic fuzzy rule- based classification system based on a new interval-valued fuzzy reasoning method with tuning and rule selection.” IEEE Transactions on Fuzzy Systems, vol. 21, no. 3, pp. 399– 411, 2013. [230] J. A. Sanz, M. Galar, A. Jurio, A. Brugos, M. Pagola, and H. Bustince, “Medical diagnosis of cardiovascular diseases using an interval-valued fuzzy rule-based classification system,” Applied Soft Computing, vol. 20, pp. 103–111, 2014. [231] M. Sayyouh and M. Al-Blehed, “Using bacteria to improve oil recovery from arabian fields,” in Microbial Enhancement of Oil Recovery-Recent Advancements, ser. Develop- ments in Petroleum Science, W. A. Premuzic, E. T., Ed. Elsevier, 1993, vol. 39, pp. 397–416. [232] M. Schilling, A. Watkins, and W. Watkins, “Is human height bimodal?” The American Statistician, vol. 56, no. 3, pp. 223–229, 2002. 313 [233] R. Seising, “On the absence of strict boundaries, vagueness, haziness, and fuzziness in philosophy, science, and medicine,” Applied Soft Computing, vol. 8, no. 3, pp. 1232–1242, 2008. [234] M. Serrano and J. Sampaio do Prado Leite, “Dealing with softgoals at runtime: A fuzzy logic approach,” in Proceedings of 2011 2nd International Workshop on Requirements@ Run. Time (RE@ RunTime). IEEE, 2011, pp. 23–31. [235] M. Setnes, R. Babuˆ ska, U. Kaymak, and H. R. van Nauta Lemke, “Similarity measures in fuzzy rule base simplification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 28, no. 3, pp. 376–386, 1998. [236] G. Shafer, A mathematical theory of evidence. Princeton University Press, 1976, vol. 76. [237] R. Shahnazi and M.-R. Akbarzadeh-T, “PI adaptive fuzzy control with large and fast dis- turbance rejection for a class of uncertain nonlinear systems,” IEEE Transactions on Fuzzy Systems, vol. 16, no. 1, pp. 187–197, 2008. [238] T. Shaocheng, T. Jiantao, and W. Tao, “Fuzzy adaptive control of multivariable nonlinear systems1,” Fuzzy Sets and Systems, vol. 111, no. 2, pp. 153–167, 2000. [239] X.-S. Si, C.-H. Hu, J.-B. Yang, and Z.-J. Zhou, “A new prediction model based on belief rule base for system’s behavior prediction,” IEEE Transactions on Fuzzy Systems, vol. 19, no. 4, pp. 636–651, 2011. [240] P. Smets, “The combination of evidence in the transferable belief model,” IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 12, no. 5, pp. 447–458, 1990. [241] ——, “Belief functions: the disjunctive rule of combination and the generalized Bayesian theorem,” International Journal of Approximate Reasoning, vol. 9, no. 1, pp. 1–35, 1993. [242] ——, “Belief functions on real numbers,” International Journal of Approximate Reason- ing, vol. 40, no. 3, pp. 181–223, 2005. [243] ——, “Analyzing the combination of conflicting belief functions,” Information Fusion, vol. 8, no. 4, pp. 387–412, 2007. [244] J. T. Spooner and K. M. Passino, “Stable adaptive control using fuzzy systems and neural networks,” IEEE Transactions on Fuzzy Systems, vol. 4, no. 3, pp. 339–359, 1996. [245] Z. Su, P. Wang, X. Yu, and Z. Lv, “Maximal confidence intervals of the interval-valued belief structure and applications,” Information Sciences, 2011. [246] H. Surmann, J. Huser, and L. Peters, “A fuzzy system for indoor mobile robot navigation,” in Proceedings of 1995 International Joint Conference of the Fourth IEEE International Conference on Fuzzy Systems and The Second International Fuzzy Engineering Sympo- sium, vol. 1. IEEE, 1995, pp. 83–88. 314 [247] W.-S. Tai and C.-T. Chen, “A new evaluation model for intellectual capital based on com- puting with linguistic variable,” Expert Systems with Applications, vol. 36, no. 2, pp. 3483– 3488, 2009. [248] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its applications to modeling and control,” IEEE Transactions on Systems, Man and Cybernetics, no. 1, pp. 116–132, 1985. [249] H. Tanaka, L. Fan, F. Lai, and K. Toguchi, “Fault-tree analysis by fuzzy probability,” IEEE Transactions on Reliability, vol. 32, no. 5, pp. 453–457, 1983. [250] F. Tatari, M. Akbarzadeh-T, and A. Sabahi, “Fuzzy-probabilistic multi agent system for breast cancer risk assessment and insurance premium assignment,” Journal of Biomedical Informatics, 2012. [251] M. E. Tennyson, “Growth history of oil reserves in major california oil fields during the twentieth century,” US Geological Survey, 2005. [252] B. Thomas and G. Raju, “A novel unsupervised fuzzy clustering method for preprocessing of quantitative attributes in association rule mining,” Information Technology and Manage- ment, vol. 15, no. 1, pp. 9–17, 2014. [253] D. Tikk and P. Baranyi, “Comprehensive analysis of a new fuzzy rule interpolation method,” IEEE Transactions on Fuzzy Systems, vol. 8, no. 3, pp. 281–296, 2000. [254] E. Trillas, C. Moraga, S. Guadarrama, S. Cubillo, and E. Casti˜ neira, “Computing with antonyms,” in Forging New Frontiers: Fuzzy Pioneers I, ser. Studies in Fuzziness and Soft Computing, M. Nikravesh, J. Kacprzyk, and L. A. Zadeh, Eds. Springer Berlin Heidelberg New York, 2007, vol. 217, pp. 133–153. [255] D.-L. Tsay, H.-Y . Chung, and C.-J. Lee, “The adaptive control of nonlinear systems using the sugeno-type of fuzzy logic,” IEEE Transactions on Fuzzy Systems, vol. 7, no. 2, pp. 225–229, 1999. [256] I. B. T¨ urks ¸en, “Meta-linguistic axioms as a foundation for computing with words,” Infor- mation Sciences, vol. 177, no. 2, pp. 332–359, 2007. [257] I. Vlachos and G. Sergiadis, “Subsethood, entropy, and cardinality for interval-valued fuzzy sets–an algebraic derivation,” Fuzzy Sets and Systems, vol. 158, no. 12, pp. 1384–1396, 2007. [258] T. Wallsten and D. Budescu, “A review of human linguistic probability processing: General principles and empirical evidence,” The Knowledge Engineering Review, vol. 10, no. 1, pp. 43–62, 1995. [259] F.-Y . Wang, Y . Lin, and J. B. Pu, “Linguistic dynamic systems and computing with words for complex systems,” in IEEE International Conference on Systems, Man, and Cybernet- ics, vol. 4. IEEE, 2000, pp. 2399–2404. 315 [260] H. Wang and D. Qiu, “Computing with words via turing machines: A formal approach,” IEEE Transactions on Fuzzy Systems, vol. 11, no. 6, pp. 742–753, 2003. [261] J. Wang, “A subjective methodology for safety analysis of safety requirements specifica- tions,” IEEE Transactions on Fuzzy Systems, vol. 5, no. 3, pp. 418–430, 1997. [262] J.-H. Wang and J. Hao, “An approach to computing with words based on canonical char- acteristic values of linguistic labels,” IEEE Transactions on Fuzzy Systems, vol. 15, no. 4, pp. 593–604, 2007. [263] L.-X. Wang, “Stable adaptive fuzzy control of nonlinear systems,” IEEE Transactions on Fuzzy Systems, vol. 1, no. 2, pp. 146–155, 1993. [264] L.-X. Wang and J. M. Mendel, “Fuzzy basis functions, universal approximation, and or- thogonal least-squares learning,” IEEE Transactions on Neural Networks, vol. 3, no. 5, pp. 807–814, 1992. [265] ——, “Generating fuzzy rules by learning from examples,” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 6, pp. 1414–1427, 1992. [266] L.-X. Wang and J. Mendel, “Generating fuzzy rules from numerical data, with appli- cations,” Signal and Image Processing Institute, Department of Electrical Engineering- Systems. University of South California, Tech. Rep. 169. [267] L.-X. Wang, Adaptive fuzzy systems and control: design and stability analysis. Prentice- Hall, Inc., 1994. [268] ——, A course in fuzzy systems and control. Prentice-Hall, 1997. [269] ——, “The WM method completed: A flexible fuzzy system approach to data mining,” IEEE Transactions on Fuzzy Systems, vol. 11, no. 6, 2003. [270] P. P. Wang, Computing with words. John Wiley & Sons, Inc., 2001. [271] Y . Wang, “Type-2 fuzzy event,” in 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD). IEEE, 2012, pp. 201–205. [272] Y . M. Wang, J. B. Yang, D. L. Xu, and K. S. Chin, “On the combination and normalization of interval-valued belief structures,” Information Sciences, vol. 177, no. 5, pp. 1230–1247, 2007. [273] Y . Wang, “On concept algebra for computing with words (cww),” International Journal of Semantic Computing, vol. 4, no. 03, pp. 331–356, 2010. [274] Y . Wang, J. Yang, D. Xu, and K. Chin, “The evidential reasoning approach for multiple attribute decision analysis using interval belief degrees,” European Journal of Operational Research, vol. 175, no. 1, pp. 35–66, 2006. 316 [275] Y . Wen and X. Ren, “Neural networks-based adaptive control for nonlinear time-varying delays systems with unknown control direction,” IEEE Transactions on Neural Networks, vol. 22, no. 10, pp. 1599–1612, 2011. [276] M. J. Wierman, “Extending set functions and relations,” International Journal of General System, vol. 26, no. 1-2, pp. 91–96, 1997. [277] D. Wu and J. M. Mendel, “Aggregation using the linguistic weighted average and interval type-2 fuzzy sets,” IEEE Transactions on Fuzzy Systems, vol. 15, no. 6, pp. 1145–1161, 2007. [278] ——, “Uncertainty measures for interval type-2 fuzzy sets,” Information Sciences, vol. 177, no. 23, pp. 5378–5393, 2007. [279] ——, “Corrections to “Aggregation using the linguistic weighted average and interval type- 2 fuzzy sets”,” IEEE Transactions on Fuzzy Systems, vol. 16, no. 6, pp. 1664–1666, 2008. [280] ——, “A comparative study of ranking methods, similarity measures and uncertainty mea- sures for interval type-2 fuzzy sets,” Information Sciences, vol. 179, no. 8, pp. 1169–1192, 2009. [281] ——, “Enhanced Karnik–Mendel algorithms,” IEEE Transactions on Fuzzy Systems, vol. 17, no. 4, pp. 923–934, 2009. [282] ——, “Perceptual reasoning for perceptual computing: A similarity-based approach,” IEEE Transactions on Fuzzy Systems, vol. 17, no. 6, pp. 1397–1411, 2009. [283] D. Wu, “A constrained representation theorem for interval type-2 fuzzy sets using convex and normal embedded type-1 fuzzy sets, and its application to centroid computation,” in Proceedings of World Congress on Soft Computing, 2011. [284] D. Wu and J. M. Mendel, “Computing with words for hierarchical decision making applied to evaluating a weapon system,” IEEE Transactions on Fuzzy Systems, vol. 18, no. 3, pp. 441–460, 2010. [285] ——, “Linguistic summarization using if–then rules and interval type-2 fuzzy sets,” IEEE Transactions on Fuzzy Systems, vol. 19, no. 1, pp. 136–151, 2011. [286] D. Wu, J. M. Mendel, and S. Coupland, “Enhanced interval approach for encoding words into interval type-2 fuzzy sets and its convergence analysis,” IEEE Transactions on Fuzzy Systems, vol. 20, no. 3, pp. 499–513, 2012. [287] W. Z. Wu, Y . Leung, and J. S. Mi, “On generalized fuzzy belief functions in infinite spaces,” IEEE Transactions on Fuzzy Systems, vol. 17, no. 2, pp. 385–397, 2009. [288] R. R. Yager, “A note on probabilities of fuzzy events,” Information Sciences, vol. 18, no. 2, pp. 113–129, 1979. 317 [289] ——, “Generalized probabilities of fuzzy events from fuzzy belief structures,” Information Sciences, vol. 28, no. 1, pp. 45–62, 1982. [290] ——, “A representation of the probability of a fuzzy subset,” Fuzzy Sets and Systems, vol. 13, no. 3, pp. 273–283, 1984. [291] ——, “Arithmetic and other operations on Dempster-Shafer structures,” International Jour- nal of Man-Machine Studies, vol. 25, no. 4, pp. 357–366, 1986. [292] ——, “On the Dempster-Shafer framework and new combination rules,” Information Sci- ences, vol. 41, no. 2, pp. 93–137, 1987. [293] ——, “Quasi-associative operations in the combination of evidence,” Kybernetes, vol. 16, no. 1, pp. 37–41, 1987. [294] ——, “On probabilities induced by multi-valued mappings,” Fuzzy Sets and Systems, vol. 42, no. 3, pp. 301–314, 1991. [295] ——, “Dempster–Shafer belief structures with interval valued focal weights,” International Journal of intelligent systems, vol. 16, no. 4, pp. 497–512, 2001. [296] ——, “Perception-based granular probabilities in risk modeling and decision making,” IEEE Transactions on Fuzzy Systems, vol. 14, no. 2, pp. 329–339, 2006. [297] ——, “Level sets and the extension principle for interval valued fuzzy sets and its appli- cation to uncertainty measures,” Information Sciences, vol. 178, no. 18, pp. 3565–3576, 2008. [298] ——, “A new approach to the summarization of data,” Information Sciences, vol. 28, no. 1, pp. 69–86, 1982. [299] ——, “Approximate reasoning as a basis for computing with words,” in Computing with Words in Information/Intelligent Systems 1, L. A. Zadeh and J. Kacprzyk, Eds. Springer, 1999, pp. 50–77. [300] ——, “OWA aggregation over a continuous interval argument with applications to deci- sion making,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 34, no. 5, pp. 1952–1963, 2004. [301] ——, “On Z-valuations using Zadeh’s Z-numbers,” International Journal of Intelligent Systems, vol. 27, no. 3, pp. 259–278, 2012. [302] R. R. Yager and D. P. Filev, Essentials of fuzzy modeling and control. John Wiley, 1994. [303] ——, “Generation of fuzzy rules by mountain clustering,” Journal of Intelligent and Fuzzy Systems, vol. 2, no. 3, pp. 209–219, 1994. 318 [304] E. Yang and D. Gu, “Multiagent reinforcement learning for multi-robot systems: A survey,” University of Utrecht, Tech. Rep., 2004. [305] J.-B. Yang, J. Liu, J. Wang, H.-S. Sii, and H.-W. Wang, “Belief rule-base inference method- ology using the evidential reasoning approach-rimer,” IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, vol. 36, no. 2, pp. 266–285, 2006. [306] M.-S. Yang, T.-C. Chen, and K.-L. Wu, “Generalized belief function, plausibility func- tion, and Dempster’s combinational rule to fuzzy sets,” International Journal of Intelligent Systems, vol. 18, no. 8, pp. 925–937, 2003. [307] X. Yang, M. Moallem, and R. V . Patel, “A layered goal-oriented fuzzy motion planning strategy for mobile robot navigation,” IEEE Transactions on Systems, Man, and Cybernet- ics, Part B: Cybernetics, vol. 35, no. 6, pp. 1214–1224, 2005. [308] J. Yen, “Generalizing the Dempster–Shafer theory to fuzzy sets,” IEEE Transactions on Systems, Man, and Cybernetics, pp. 559–570, 1990. [309] ——, “Generalizing the dempster-schafer theory to fuzzy sets,” IEEE Transactions on Sys- tems, Man and Cybernetics, vol. 20, no. 3, pp. 559–570, 1990. [310] J. Yen and L. Wang, “Simplifying fuzzy rule-based models using orthogonal transformation methods,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 29, no. 1, pp. 13–24, 1999. [311] E. Yesil and M. F. Dodurka, “Goal-oriented decision support using big bang-big crunch learning based fuzzy cognitive map: An erp management case study,” in 2013 IEEE Inter- national Conference on Fuzzy Systems. IEEE, 2013, pp. 1–8. [312] H. Ying, Fuzzy control and modeling: analytical foundations and applications. Wiley- IEEE Press, 2000. [313] M. Ying, “A formal model of computing with words,” IEEE Transactions on Fuzzy Sys- tems, vol. 10, no. 5, pp. 640–652, 2002. [314] W. Yuan, D. Guan, S. Lee, and Y .-K. Lee, “A reputation system based on computing with words,” in Proceedings of the 2007 international conference on Wireless communications and mobile computing. ACM, 2007, pp. 132–137. [315] M. G. Yunusoglu and H. Selim, “A fuzzy rule based expert system for stock evaluation and portfolio construction: An application to istanbul stock exchange,” Expert Systems with Applications, vol. 40, no. 3, pp. 908–920, 2013. [316] L. A. Zadeh, “Fuzzy sets,” Information and Control, vol. 8, no. 3, pp. 338–353, 1965. [317] ——, “Probability measures of fuzzy events,” J. Math. Anal. Appl, vol. 23, no. 2, pp. 421– 427, 1968. 319 [318] ——, “The concept of a linguistic variable and its application to approximate reasoning–I,” Information sciences, vol. 8, no. 3, pp. 199–249, 1975. [319] ——, “Fuzzy sets and information granularity,” in Advances in Fuzzy Set Theory and Ap- plications, M. Gupta, R. K. Ragade, and R. R. Yager, Eds. Elsevier North-Holland, 1979, vol. 11, pp. 3–18. [320] ——, “A computational approach to fuzzy quantifiers in natural languages,” Computers & Mathematics with Applications, vol. 9, no. 1, pp. 149–184, 1983. [321] ——, “Fuzzy probabilities,” Information processing & management, vol. 20, no. 3, pp. 363–372, 1984. [322] ——, “Syllogistic reasoning in fuzzy logic and its application to usuality and reasoning with dispositions,” IEEE Transactions on Systems, Man and Cybernetics, vol. 15, pp. 754– 763, 1985. [323] ——, “A simple view of the Dempster-Shafer theory of evidence and its implication for the rule of combination,” AI Magazine, vol. 7, no. 2, p. 85, 1986. [324] ——, “Discussion: Probability theory and fuzzy logic are complementary rather than com- petitive,” Technometrics, vol. 37, no. 3, pp. 271–276, 1995. [325] ——, “Fuzzy logic= computing with words,” IEEE Transactions on Fuzzy Systems, vol. 4, no. 2, pp. 103–111, 1996. [326] ——, “A New Direction in AI – Toward a Computational Theory of Perceptions,” AI Mag- azine, vol. 22, no. 1, pp. 73–84, 2001. [327] ——, “From computing with numbers to computing with words,” Annals of the New York Academy of Sciences, vol. 929, no. 1, pp. 221–252, 2001. [328] ——, “A new direction in AI: Toward a computational theory of perceptions,” AI Mgazine, vol. 22, no. 1, p. 73, 2001. [329] ——, “From computing with numbers to computing with words– from manipulation of measurements to manipulation of perceptions,” in Logic, Thought and Action, ser. Logic, Epistemology, and the Unity of Science, D. Vanderveken, Ed. Springer Netherlands, 2005, vol. 2, pp. 507–544. [330] ——, “Toward a generalized theory of uncertainty (GTU)—-an outline,” Information Sci- ences, vol. 172, no. 1, pp. 1–40, 2005. [331] ——, “Generalized theory of uncertainty (GTU)–principal concepts and ideas,” Computa- tional Statistics & Data Analysis, vol. 51, no. 1, pp. 15–46, 2006. [332] ——, “A summary and update of fuzzy logic,” in Proceedings of 2010 IEEE International Conference on Granular Computing (GrC). IEEE, 2010, pp. 42–44. 320 [333] ——, Computing with words: Principal concepts and ideas, ser. Studies in Fuzziness and Soft Computing. Springer Berlin Heidelberg New York, 2012, vol. 277. [334] ——, “Outline of a new approach to the analysis of complex systems and decision pro- cesses,” IEEE Transactions on Systems, Man and Cybernetics, no. 1, pp. 28–44, 1973. [335] ——, “The concept of a linguistic variable and its application to approximate reasoning– II,” Information sciences, vol. 8, no. 4, pp. 301–357, 1975. [336] ——, “Fuzzy sets versus probability,” Proceedings of the IEEE, vol. 68, no. 3, pp. 421–421, 1980. [337] ——, “Precisiation of meaning via translation into PRUF,” in Cognitive Constraints on Communication. Springer, 1984, pp. 373–401. [338] ——, “Toward a perception-based theory of probabilistic reasoning with imprecise proba- bilities,” Journal of statistical planning and inference, vol. 105, no. 1, pp. 233–264, 2002. [339] ——, “Precisiated natural language (PNL),” AI magazine, vol. 25, no. 3, p. 74, 2004. [340] ——, “From imprecise to granular probabilities,” Fuzzy Sets and Systems, vol. 154, no. 3, pp. 370–374, 2005. [341] ——, “Is there a need for fuzzy logic?” Information Sciences, vol. 178, no. 13, pp. 2751– 2779, 2008. [342] ——, “Precisiation of meaning–toward computation with natural language,” in 2010 IEEE International Conference on Information Reuse and Integration (IRI). IEEE, 2010, pp. 1–4. [343] ——, “A note on Z-numbers,” Information Sciences, vol. 181, no. 14, pp. 2923–2932, 2011. [344] ——, “Outline of a restriction-centered theory of reasoning and computation in an envi- ronment of uncertainty and imprecision,” in Proceedings of 2012 IEEE 13th International Conference on Information Reuse and Integration (IRI). IEEE, 2012, pp. xxi–xxii. [345] L. A. Zadeh and J. Kacprzyk, Computing with words in Information/Intelligent Systems 1: Foundations. Springer, 1999, vol. 1. [346] S. Zadro˙ zny and J. Kacprzyk, “Computing with words for text processing: An approach to the text categorization,” Information Sciences, vol. 176, no. 4, pp. 415–437, 2006. [347] T.-P. Zhang, H. Wen, and Q. Zhu, “Adaptive fuzzy control of nonlinear systems in pure feedback form based on input-to-state stability,” IEEE Transactions on Fuzzy Systems, vol. 18, no. 1, pp. 80–93, 2010. 321 [348] C. Zhou, “Fuzzy-arithmetic-based lyapunov synthesis in the design of stable fuzzy con- trollers: a computing-with-words approach,” International Journal of Applied Mathemat- ics and Computer Science, vol. 12, pp. 411–421, 2002. [349] ——, “Robot learning with ga-based fuzzy reinforcement learning agents,” Information Sciences, vol. 145, no. 1, pp. 45–68, 2002. [350] C. Zhou and Q. Meng, “Dynamic balance of a biped robot using fuzzy reinforcement learn- ing agents,” Fuzzy Sets and Systems, vol. 134, no. 1, pp. 169–187, 2003. [351] C. Zhou and D. Ruan, “Fuzzy control rules extraction from perception-based information using computing with words,” Information Sciences, vol. 142, no. 1, pp. 275–290, 2002. [352] C. Zhou, Y . Yang, and X. Jia, “Incorporating perception-based information in reinforce- ment learning using computing with words,” in Bio-Inspired Applications of Connection- ism. Springer, 2001, pp. 476–483. [353] J. Zhou, “Reliability assessment method for pressure piping containing circumferential defects based on fuzzy probability,” International Journal of Pressure Vessels and Piping, vol. 82, no. 9, pp. 669–678, 2005. [354] A. Zhu and S. X. Yang, “A goal-oriented fuzzy reactive control for mobile robots with automatic rule optimization,” in Proceedings of 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2010, pp. 3688–3693. [355] Y . Zhu, L. Bentabet, O. Dupuis, V . Kaftandjian, D. Babot, and M. Rombaut, “Automatic determination of mass functions in Dempster-Shafer theory using fuzzy c-means and spa- tial neighborhood information for image segmentation,” Optical Engineering, vol. 41, p. 760, 2002. 322
Abstract (if available)
Abstract
In this dissertation, we focus on data oriented uncertainty modeling, reasoning, and inference, and their applications to intelligent systems that implement the paradigm of Computing with Words (CWW). Computing with Words problems have been classified into at least two categories: Basic Computing with Words, and Advanced Computing with Words. Basic Computing with Words mainly deals with applications of rule-based systems while Advanced Computing with Words deals with implicit assignment of linguistic truth, probability, and possibility through intricate natural language statements. ❧ In this dissertation, we present a Linguistic Goal-Oriented Decision-Making method using rule-based systems. Unlike previous applications of rule-based systems that use them in mere function approximation schemes or apply them to modeling rather uncomplicated expert knowledge, our proposed goal-oriented decision-making method attempts at determining the desired states of a system (described using words) by investigating the linguistic rules that specify the conditions that yield it, and then designs a methodology to move the system states towards more desirable states by changing a decision variable and comparing the rules that describe the conditions that yield each of the states of the system. This approach is another realization of decision making with words or control with words, and can be seen as a bridge between applications of rule-based systems and Computing with Words. We apply our method to the problem of enhanced oil recovery as well. ❧ We also proceed with solving Advanced Computing with Words problems that deal with implicit assignments of truth, probability, and possibility to various attributes through natural languages. We specifically focus on problems that deal with linguistic probabilities. We demonstrate how Interval Type-2 Fuzzy Set (IT2 FS) models of probability words can be synthesized using data collected from subjects, and establish a general framework based on Dempster-Shafer Theory of Evidence to calculate probabilities and perform inference based on natural language information containing linguistic quantifiers and linguistic probabilities via constructs that are called Linguistic Belief Structures. We demonstrate Novel Weighted Averages and Doubly Normalized Weighted Averages as essential tools for inferring from Linguistic Belief Structures. ❧ We also develop an Extension Principle for extending set-valued functions to Interval Type-2 Fuzzy Sets and apply it to inference from Linguistic Belief Structures. We also use Syllogistic Reasoning and the methodology of Linguistic Belief Structures to solve Advanced Computing with Words challenge problems that were proposed by Zadeh. We also implement Zadeh’s methodology of handling linguistic probabilities, which involves the Generalized Extension Principle. When Solving Advanced Computing with Words problems, the Generalized Extension Principle yields functional optimization problems that are very difficult to solve analytically, so we devise a numerical method to deal with those optimization problems. As Zadeh’s methodology is based on Type-1 Fuzzy Sets (T1 FSs), we first implement it for T1 FSs. Then we extend it to Interval Type-2 Fuzzy Sets, since they are viable models of various types of uncertainty that are associated with a word. ❧ A critical review of the status, challenges, and future of Advanced Computing with Words is also presented.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Intelligent knowledge acquisition systems: from descriptive to predictive models
PDF
Aggregation and modeling using computational intelligence techniques
PDF
Learning and decision making in networked systems
PDF
Artificial Decision Intelligence: integrating deep learning and combinatorial optimization
PDF
Continuum modeling of reservoir permeability enhancement and rock degradation during pressurized injection
PDF
Sequential decision-making for sensing, communication and strategic interactions
PDF
Modeling and simulation of multicomponent mass transfer in tight dual-porosity systems (unconventional)
PDF
Uncertainty quantification and data assimilation via transform process for strongly nonlinear problems
PDF
Calibration uncertainty in model-based analyses for medical decision making with applications for ovarian cancer
PDF
Modeling and recognition of events from temporal sensor data for energy applications
PDF
Novel multi-stage and CTLS-based model updating methods and real-time neural network-based semiactive model predictive control algorithms
PDF
On the interplay between stochastic programming, non-parametric statistics, and nonconvex optimization
PDF
Externalized reasoning in language models for scalable and trustworthy AI
PDF
Algorithms for stochastic Galerkin projections: solvers, basis adaptation and multiscale modeling and reduction
PDF
Sequential Decision Making and Learning in Multi-Agent Networked Systems
PDF
Data scarcity in robotics: leveraging structural priors and representation learning
PDF
Hierarchical planning in security games: a game theoretic approach to strategic, tactical and operational decision making
PDF
Queueing loss system with heterogeneous servers and discriminating arrivals
PDF
The fusion of predictive and prescriptive analytics via stochastic programming
PDF
Computational validation of stochastic programming models and applications
Asset Metadata
Creator
Rajati, Mohammad Reza
(author)
Core Title
Advances in linguistic data-oriented uncertainty modeling, reasoning, and intelligent decision making
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
05/13/2015
Defense Date
12/02/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
belief structures,computing with words,data oriented,Decision making,enhanced oil recovery,extension principle,fuzzy logic,fuzzy systems,intelligent systems,linguistic probability,novel weighted averages,OAI-PMH Harvest,reasoning
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Mendel, Jerry M. (
committee chair
), Aminzadeh, Fred (
committee member
), Ershaghi, Iraj (
committee member
)
Creator Email
mohammadreza.rajati@gmail.com,rajati@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-568987
Unique identifier
UC11298975
Identifier
etd-RajatiMoha-3448.pdf (filename),usctheses-c3-568987 (legacy record id)
Legacy Identifier
etd-RajatiMoha-3448.pdf
Dmrecord
568987
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Rajati, Mohammad Reza
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
belief structures
computing with words
data oriented
enhanced oil recovery
extension principle
fuzzy logic
fuzzy systems
intelligent systems
linguistic probability
novel weighted averages
reasoning