Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Building a knowledgebase for deep lexical semantics
(USC Thesis Other)
Building a knowledgebase for deep lexical semantics
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Building a Knowledgebase for Deep Lexical Semantics by Niloofar Montazeri A Dissertation Presented to the Faculty of the USC Graduate School University of Southern California In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy (Computer Science) May 2016 ii To my husband, Mohsen and my parents, Behzad and Atefeh iii Acknowledgments A great many people have directly and indirectly contributed to this work by sharing their knowledge, providing their support and encouragement, and even just by having a presence in my life. First and foremost, I would like to express my deepest gratitude to my advisor, Jerry Hobbs. I learned so much from him about science, research, and ethics. From him, I learned how to organize and present my ideas and how to write research papers. He gave me complete free- dom to consult with other researchers who sometimes had completely different or even op- posing views about our work. Besides being a true scientist, Jerry is one of the most beautiful souls I have met. There was no single meeting, phone call or conversation in which I didn’t hear encouraging and motivating words from him. He let me move to the east coast with my husband and work remotely for almost a year. He made sure I have funding for the entire course of my PhD degree. I might not have survived my PhD if Jerry wasn’t my advisor. I greatly benefited from discussions with Eduard Hovy who challenged me to think independ- ently, introduced me to the statistical world of NLP, provided me with valuable research ideas and was available whenever I needed his guidance. I also learned so much from him in the NLP course at USC. I am truly grateful to Andrew Gordon who was always available to answer my questions, to meet and to discuss my work. He generously shared his ideas for improving my work and en- couraged me every single time. I was very lucky to have Katya Ovchinnikova as a friend and collaborator during these years. She provided me with so many ideas and so much practical advices. She is my living encyclo- paedia about knowledge representation and reasoning and I borrowed lots of material from her comprehensive dissertation. Her positive attitude and points of views as well as her en- couragements had deep influences on me. iv I would also like to thank my proposal and dissertation committee members: Jose Luis Am- bite, Louis Goldstein, Andrew Gordon, Elsi Kaiser, Kevin Knight and Dennis McLeod who dedi- cated their valuable time to reading my dissertation and providing me with their feedback. A special thank you to Lizsl DeLeon at the Computer Science department, who answered my endless questions promptly and helped me with all the administrative issue in these 5 years. Most of the experiments I did in this dissertation wouldn’t be possible without the help of these people: Donald Metzler who helped me get familiar with the HPC cluster; Avalon John- son, Maureen Dougherty, Christopher Ho and John Mehringer from the HPCC support team; and David Chiang and Liang Huang who introduced me to the world of python programming language, making my life so much easier. I especially thank Kazeto Yamamoto who provided the best customer service ever for his abduction engine, Henry. We exchanged about 70 emails and every time I had an issue, I got an answer promptly. I was really lucky to work at the Intelligent Systems Division at the Information Sciences Insti- tute, where I met so many wonderful people. My special thanks go to the admin staff, Peter Zamar, Kary Lau, Alma Nava, Melissa Snearl-Smith and Lisa Winston, who helped me with all non-research issues with no delay or hesitation. I also thank Yigal Arens, the director of the Intelligent Systems Division who keeps this scientific and extremely friendly community to- gether. I would like to thank all the ISI researchers including Hans Chalupsky with whom I consulted many times about my research and technical issues, as well as Gully Burns, Jose- Luis Ambite and Ulf Hermjakob with whom I had very interesting and fun conversations. I would like to thank all my friends who made my life in Los Angeles eventful and happy. I thank my lovely friends at ISI, George, Ashish, Katya and Shefali. Katya, Shefali and I were a gang and had the most fun and productive discussions and activities together. Shefali was always supportive of me, and Katya was so kind to share her window office with me. I would very much like to thank our family friends: Shima and Mohsen, the happy, playful and sup- portive couple; Zohreh and Misagh with whom we had very fun times and travels; and Marjan v and Nima who were always ready to help. In the last year that I lived in Boston, these lovely couples warmly hosted me during my visits to Los Angeles. I thank my twin brothers, Farhad and Farshad who were worried about my health and gave me advice on eating and sleeping well. I thank my lovely grandparents and my parents in law for their support and prayers. I sincerely thank Farnaz Foroud, who played the key role in my determination to pursue the PhD and deeply influenced me with her personality and spiritual teachings. I am indebted to my mother, a true angel, for her endless love and support. Without her, I wouldn’t survive the difficult times in my life, including the PhD course. I particularly owe my success to my father, who encouraged me to study computer science and who was my teacher during my Bachelor studies at Razi University. As the head and founder of the CS department, he made sure students receive the most up-to-date educa- tion. He spent countless hours on studying new books and preparing course materials accord- ing to what is offered at the top universities in the world. He made sure we had the latest computers and software in the lab and carefully chose the instructors. He truly devoted him- self to his students, including me. I deeply respect and love him. I wholeheartedly thank my husband, Mohsen for sharing all the tears and laughter. Whenever I was discouraged or lost track, he would patiently listen to me, motivate me and help me to convert my galaxy of thoughts into a practical to-do list. Without his patience and support, I could never finish this work. January 2016, Los Angeles/Boston vi Abstract To enable deep understanding and reasoning over natural language, (Hobbs 2008) has pro- posed the idea of "Deep Lexical Semantics". In Deep Lexical Semantics, “principal” and “ab- stract” domains of commonsense knowledge are encoded into "core theories" and words are linked to these theories through axioms that use predicates from these theories. This re- search is concerned with the second task: Axiomatizing words in terms of predicates in core theories. In the first part of this thesis we present a 3-step methodology for manual axiomatization of words which consists of analyzing the structure of a word’s WordNet senses, writing axioms for the most general senses, and testing the axioms on hand-crafted textual entailment pairs. We have applied this method on axiomatizing all the change-of-state words in Core WordNet. Since manual axiomatization is slow and hardly scalable, we looked at possible ways to auto- mate it. In the second part of this thesis, we present a method for automatic axiomatization of change-of-state verbs using text mining. In this method, we re-formulate the task of axi- omatizing a change-of-state event (e.g., “retire”), as finding the states that this event can change (e.g., “being employed”) from millions of web pages, using lexico-syntactic patterns that capture change-of-state. We found that the extracted information results in change-of- state axioms with different levels of defeasibility and abstraction and only a few of them are within the desired level of abstraction. In the third and main part of this thesis, we propose a mixed approach in which we use con- cept relations in existing lexical semantics resources to systematically identify the optimum set of concepts that need to be axiomatized manually and axiomatize a large number of rele- vant concepts automatically. We have used this method to axiomatize concepts related to the domains of composite entities and sets and evaluated the precision of the resulting axioms. Furthermore, we have evaluated the usefulness of these axioms on the well-studied task of extracting part-of relations from text. vii Table of Contents Acknowledgments ................................................................................................................. iii Abstract………………………………………..…………………………………………………………………..………………....vi List of Tables ........................................................................................................................... x List of Figures ........................................................................................................................ xii Chapter 1 Introduction .......................................................................................................... 1 1. Natural Language Understanding and Knowledge ..............................................................1 2. Characteristics of Existing Knowledgebases ........................................................................2 3. Limitations of Existing Knowledgebases for Deep Understanding ......................................4 4. Deep Lexical Semantics ........................................................................................................5 5. Thesis Statement ..................................................................................................................6 6. Proposed Approach ..............................................................................................................6 7. Other Contributions .............................................................................................................7 8. Thesis Outline .......................................................................................................................7 Chapter 2 Background ........................................................................................................... 9 1. Notation ...............................................................................................................................9 2. Core Theories .....................................................................................................................10 Chapter 3 Manual Axiomatization of Concepts .................................................................... 12 1. Analyzing the Structure of WordNet Senses ......................................................................15 2. Axiomatization ...................................................................................................................19 2.1 Specificity of Details ...............................................................................................19 2.2 Metonymy vs. Lexical Disambiguation ...................................................................20 2.3 Metaphors ..............................................................................................................22 2.4 Choosing Predicates ...............................................................................................23 3. Testing Axioms and Identifying Missing Information ........................................................24 4. Statistics .............................................................................................................................27 Chapter 4 Automatic Extraction of Change-of-State Axioms from Text................................. 28 1. Methodology ......................................................................................................................29 2. Assessing the Quality of the Results ..................................................................................31 viii 3. Comparison with Manually-Encoded Axioms ....................................................................32 4. Filtering Non-Change-of-State Pairs ..................................................................................34 5. Categorizing Axioms with Mechanical Turk .......................................................................35 6. Conclusions ........................................................................................................................37 Chapter 5 The Mixed Approach ........................................................................................... 39 1. Background ........................................................................................................................41 1.1 FrameNet ................................................................................................................41 1.2 WordNet .................................................................................................................44 1.3 WordNet-FrameNet Mapping ................................................................................44 1.4 Disambiguating and Extending Word-Frame Mapping and Syntactic Axioms.......44 2. Using FrameNet and WordNet to Identify and Axiomatize Derived Concepts .................45 2.1 Word-Frame Mappings and Syntactic Axioms .......................................................45 2.2 Frame Relations ......................................................................................................46 2.3 Synset Relations .....................................................................................................47 3. Identifying the Minimal Set of Basic Concepts ..................................................................47 3.1 Expansion ................................................................................................................48 3.2 Contraction .............................................................................................................51 4. Manual Axiomatization of Basic Frames and Synsets ........................................................52 5. Automatic Axiomatization of Derived Concepts in Terms of Core Theory Predicates ......53 6. Evaluation ...........................................................................................................................55 6.1 Theories of Composite Entities and Sets ................................................................55 6.2 Assumptions ...........................................................................................................57 6.3 Identifying Basic Concepts in the Domains of Composite Entities and Sets ..........57 6.4 Manual Axiomatization of BasicFrames .................................................................61 6.5 Manual Axiomatization of BasicSynsets .................................................................63 6.6 Automatic Axiomatization of Derived Concepts ....................................................65 Chapter 6 Evaluating the Usefulness of Our Knowledgebase................................................ 70 1. Prerequisites ......................................................................................................................71 1.1 Terminology ............................................................................................................71 1.2 Boxer .......................................................................................................................71 1.3 Phillip ......................................................................................................................72 1.4 Head Noun Detection Algorithm (HND algorithm) ................................................73 ix 2. Extracting Part-Whole Relations Using Axioms .................................................................74 3. Evaluation and Comparison with a State-of-the-Art Automatic Relation Learning Algorithm ......................................................................................................................77 3.1 Previous Work on Extracting Part-whole Relations ...............................................77 3.2 Chosen Method for Comparison ............................................................................79 3.3 Experiment Setup ...................................................................................................83 3.4 Evaluation ...............................................................................................................86 Chapter 7 Related Work ...................................................................................................... 91 1. Cyc ......................................................................................................................................91 2. Works Based on the Relational Approach .........................................................................93 3. Works Based on the Decompositional Approach ..............................................................97 Chapter 8 Conclusion ........................................................................................................ 102 References ……………………………………………………………………………………………………………….………..106 Appendix I. Sample Change-of-State Axioms ........................................................................ 111 x List of Tables Table 3-1: Axioms encoding compositions of fundamental predicates ...................................... 26 Table 4-1: Verbal representation of lexico-syntactic patterns for change-of-state. ................... 30 Table 4-2: Distribution of Pair Categories. ................................................................................... 32 Table 4-3: Events, their M-STATEs and sample states in their ES-Set ......................................... 33 Table 4-4: Final categories based on answers to questions ........................................................ 36 Table 4-5: Agreement and False Positives for different types of questions. .............................. 37 Table 5-1: FrameNet1 statistics ................................................................................................... 42 Table 5-2: Axiomatizing derived frames through frame relations.. ............................................ 46 Table 5-3: Axiomatizing derived words through WordNet relations.. ........................................ 47 Table 5-4: Seed words and sample synstes and frames from inDomainSynsets and inDomainFrames .......................................................................................................................... 58 Table 5-5: Number of new synsets inspected/selected in each step of expansion.. .................. 59 Table 5-6: BasicSynsets and BasicFrames. ................................................................................... 60 Table 5-7: Manually created axioms for frames in BasicFrames. ................................................ 62 Table 5-8: Axiomatization of manually-crafted predications. ..................................................... 64 Table 5-9: The mapping of basic synsets to manually-crafted predications and the corresponding syntactic axioms. ................................................................................................. 64 Table 5-10: Sample relations between frames in the composite entities and sets domains. .... 65 Table 5-11: Precision of automatically-generated axioms for derived frames. .......................... 66 Table 5-12: Precision of automatically-generated axioms for derived synsets. ......................... 68 Table 6-1: Additional Axioms ....................................................................................................... 74 Table 6-2: Part-whole predications and their corresponding part and whole noun predications. ................................................................................................................................ 76 Table 6-3: Sample Predications and extracted (part, whole, pattern) tuples. ............................ 82 Table 6-4: Chosen seeds for different types of part-whole relations. ......................................... 84 xi Table 6-5: (a) Top-15 reliable patterns and (b) Sample positive and negative patterns extracted in different iterations. ................................................................................................. 85 Table 6-6: The 10 most frequent patterns (according to the left-hand-side of the axioms). .... 86 Table 6-7: Comparison of patterns in A (the axioms) and patterns extracted by B ................... 86 Table 6-8: Sample patterns in A∩B, A-B and B-A. ....................................................................... 87 Table 7-1: Examples of ConceptNet’s lexical-semantics knowledge ........................................... 96 Table 7-2: FrameNet’s frame relations with examples and number of occurrences .................. 98 Table 7-3: Axiomatization of ‘change’, ‘become’ and ‘different’ by (Allen and Teng, 2013) .... 101 xii List of Figures Figure 1-1: Categorization of several prominent works on encoding lexical and world knowledge, based on the types of knowledge they capture and their knowledge representation method. ................................................................................................................ 2 Figure 1-2: Some core theories and axiomatization of words in terms of predicates in these theories. ......................................................................................................................................... 6 Figure 3-1: Radial structure and axioms for the verb “enter” ..................................................... 17 Figure 5-1: A simple schema of the proposed method. .............................................................. 39 Figure 5-2: An example of a basic concept and concepts derived from it ................................. 40 Figure 5-3:The expansion procedure for finding basic word senses and frames. ....................... 50 Figure 5-4: Identifying basic frames and basic synsets................................................................ 52 Figure 5-5: The schema of the automatic axiomatization procedure. ........................................ 54 1 Chapter 1 Introduction 1. Natural Language Understanding and Knowledge Words describe the world, so if we are going to draw the appropriate inferences in understand- ing a text, we must have a prior explication of how we view the world (world knowledge) and how words and phrases map to this view (lexical semantic knowledge). For example, for a ma- chine to understand the sentence “Microsoft releases the update in 2015” deep enough to rea- son that “the update will not be available before 2015”, it needs to know that 1) “release” means “a change to being available” (lexical knowledge) and 2) if there is a change to x at time t, x does not hold before t (world knowledge). For the last 40 years, AI researchers have been encoding knowledge about various aspects of the world into formal ontologies. These efforts range from description of narrow areas such as space, time, psychology and beliefs (see Davis 1990 for a review), to large-scale projects encod- ing many domains of commonsense knowledge; the most notable of which is Cyc (Guha and Lenat 1990; Cycorp, 2008). Parallel to these efforts, computational linguists have developed high-quality and rich lexical semantic resources such as WordNet (Miller et.al, 1990), FrameNet (Baker et.al., 1998; Ruppenhofer et. al. 2010) and VerbNet (Kipper et. al., 2000, 2006) that cap- ture different aspects of word meaning through different theories of lexical semantics. More recently, with the advances in Machine Learning techniques and the increasing availability of large text corpora, much attention is given to automatic extraction of both world knowledge (Mitchell et al. 2105; Etzioni et al. 2011 ; Schubert et al. 2011; Carlson et al. 2010; Clark and Har- rison 2009 ; Banko et al. 2007; Schubert 2002 ; etc.) and lexical semantic knowledge (Hearst 1992, 1998; Girju 2003; Girju et al. 2002, 2006; Chklovski and Pantel 2004; etc.) from text. There have also been successful efforts in acquiring commonsense knowledge through crowd sourcing on the Web ; examples of which are the Open Mind project 1 (OMSC; Singh et. al 2002) that re- 1 http://web.media.mit.edu/~push/Kurzweil.html 2 sulted in ConceptNet 2 , a semantic network of semi-structured natural language fragments that capture commonsense knowledge (Speer and Havasi 2013; Havasi et. al., 2007; Liu and Singh 2004). Although the above-mentioned knowledge resources are quite useful in practical applications that require shallow understanding/reasoning, they fail to capture the kind of knowledge that is required in deeper understanding of text, as we show shortly. 2. Characteristics of Existing Knowledgebases We have categorized the most prominent works in encoding word meaning 3 and world know- ledge in Figure 1-1. In this figure, rows represent types of knowledge and columns represent knowledge representation methods. Figure 1-1: Categorization of several prominent works on encoding lexical and world knowledge, based on the types of knowledge they capture and their knowledge representation method. 2 http://conceptnet5.media.mit.edu/ 3 We have omitted the distributional approaches to representing meaning as they are not suitable for deep natural language understanding and reasoning. 3 It is notable that lexical semantic resources have some small overlap with world knowledge. This is because some of their relations, like causation, enablement and happens-before, are more descriptive of rules governing the world rather than what the words describe. Similar- ly, world knowledge resources have some overlap with lexical semantics, as they contain such information as “apples are fruits” and “food is for eating”. In lexical semantics, on the relational side we have WordNet as well as the automatically gener- ated resources that extracted from text, different types of relations between words (Hearst 1992, 1998; Girju 2003; Girju et al. 2002, 2006; Chklovski and Pantel 2004; etc.). On the non- relational/formal side, we have FrameNet and VerbNet. FrameNet (in a way) decomposes the meaning of a frame into roles and VerbNet decomposes meaning into a set of primitive predi- cates through axioms. Moving on to world knowledge, on the relational side we have ConceptNet as well as automati- cally generated knowledgebases that use natural language to represent knowledge. ConceptNet contains a large amount of commonsense world knowledge in the form of binary relations between phrases; examples of which are “sun isA source of light”, “set isA collection with strict rule of membership”, “fall hasPrerequisite free space under you” and “hammer isCa- pableOf hurt toe”. Automatically generated world knowledge resources such as Yago, Freebase, DBpedia and NELL represent tons of factual knowledge through specific binary relations. These factual knowledge are mostly information about individual entities or events, such as “BBC has an office in Lon- don” and “World War II ended in 1945”. Finally, on the non-relational/formal side we have hand-coded ontologies like Cyc and many smaller-scale domain theories that use axioms and their own set of predicates for encoding knowledge. For NLP applications, words should be mapped to predicates used in the axioms. 4 3. Limitations of Existing Knowledgebases for Deep Understanding 4 We argue that the existing resources for lexical and world knowledge are not sufficient for deep reasoning. First, the relational approach (on which both lexical and world knowledge resources in the left column of Figure 1-1 are based) is not powerful enough for capturing complex rela- tions between predications. For example, the following axiom for “increase” (which says “an increase in x means a change from x being at some point p1, to x being at another point p2; where p1 is less than p2 on scale s”) cannot be translated into binary relations: Increase’(e,x)→ change'(e,e1,e2) & at’(e1,x,p1) & at’(e2,x,p2) & lessThan(p1,p2 ,s) Similarly, commonsense knowledge such as “if x is smaller than y and y is smaller than z, then x is smaller than z” cannot be represented via binary relations. Second, although decompositional lexical resources like FrameNet and VerbNet are able to cap- ture complex meaning structures, the concepts into which they decompose lexical meanings are not explicated in any coherent theories. Third, the deep and powerful ontologies built by the AI community (the ones in the lower-right corner of Figure 1-1) are not suitable for NLP because of their complex knowledge representa- tion language and/or difficulties in mapping words to concepts in these ontologies 5 . This is be- cause AI ontologies are not designed with language in mind and hence make such distinctions between entities as tangible vs. Intangible, or physical vs. abstract. As (Hobbs 2009) notes, this distinction plays very little role in language: “We can be in a room, in a social group, in the midst of an activity, in trouble, and in politics. We can move a chair from the desk to the table, move money from one bank account to another, move a discussion from religion to politics, and move an audience to tears. A fun- damental distinction between tangibles and intangibles rules out the possibility of under- standing the sense of “in” or “move” common to all these uses.” 4 For a more detailed discussion see Chapter 7 (Related Work) 5 See C y c ‟ s s ec tio n i n t h e C h ap ter 8 ( R elate d W o r k ) f o r m o r e d etails. 5 Motivated by the need for a theory of lexical semantics that is 1) anchored in theories of the world and 2) is supported by linguistics insights, (Hobbs 2008) has proposed the idea of Deep Lexical Semantics. 4. Deep Lexical Semantics In the theory of Deep Lexical Semantics (Hobbs 1995, 2008), words are defined or characterized by axioms that use predicates from “core theories”. Core theories encode “principal” and “ab- stract” domains of commonsense knowledge. Each core theory consists of a set of very abstract concepts and axioms that describe the relationships among these concepts. For example, the core theory of scales includes such concepts as low, high, lessThan, etc ; as well as axioms such as “low is less than high on a scale” or “if x is less than y and y is less than z, then x is less than z”. Other examples of core theories are the theories of causality (Hobbs 2005), space (Hobbs and Narayanan 2002), time (Pan and Hobbs 2005), commonsense psychology (Hobbs and Gordon 2010) and micro sociology (Hobbs et. al. 2012) 6 . Once we have established core theories that capture world knowledge at a sufficiently abstract level, we can write axioms that link word meaning to these theories. For example, using theo- ries of “Causality”, “Change” and “Composite Entities”, we can characterize one of the senses of the verb “cut” as “causing a change from being part of”. Interestingly, thanks to the abstract- ness of the theory of Composite Entities, this definition applies to both “cutting a member from a group” and ‘‘cutting a branch from a tree”. Figure 1-2 shows the concept of Deep Lexical Se- mantics through sample words, core theories and axioms relating them. The Deep Lexical Semantics project therefore consists of two tasks: developing core theories and axiomatizing words in terms of predicates in these theories. This thesis is mainly concerned with the second task. 6 See: http://www.isi.edu/~hobbs/csk.html for more theories and their specifications. 6 We started with a purely manual approach to identify and axiomatize the most general domain- independent word senses in terms of predicates in the core theories. Then we tried a purely automatic approach in which we tried to extract axioms from text; and finally, we came up with a mixed approach in which we use concept relations in existing lexical semantics resources to identify the minimal set of concepts that should be manually axiomatized and then axiomatize a large number of relevant concepts automatically. Figure 1-2: Some core theories and axiomatization of words in terms of predicates in these theories. 5. Thesis Statement A large scale lexical semantics knowledgebase for a domain can be developed by dividing the authoring task using the optimum mix of manual and automatic methods. 6. Proposed Approach Focusing on one domain at a time, we first identify the set of synsets and frames in WordNet and FrameNet that are related to the domain. We then use concept relations such as frame- frame relations, synset-frame mappings and synset-synset relations to identify which synsets and frames are derived from (i.e., can be defined in terms of) others. Those synsets and frames that cannot be defined in terms of any other concepts in the domain are considered “basic” and 7 are chosen for manual axiomatization. After axiomatizing these basic synsets and frames, we again use the above-mentioned concept relations in a backward manner to identify and auto- matically axiomatize a large set of synsets and frames that are derived from the manually- axiomatized concepts. We have used this method to axiomatize concepts related to the domains of composite entities and sets and evaluated the precision of the resulting axioms. Furthermore, we have evaluated the usefulness of these axioms on the well-studied task of extracting part-of relations from text. 7. Other Contributions Manual axiomatization of change-of-state words: As the first step towards implementing the theory of Deep Lexical Semantics, we used a three-step methodology for manual axiomatization of words which consists of analyzing the structure of a word’s WordNet senses, writing axioms for the most general senses, and testing the axioms on hand-crafted textual entailment pairs. We have axiomatized all the change-of-state words in Core WordNet using this method. Extracting axioms for change-of-state verbs from text: Since manual axiomatization is slow and hardly scalable, we looked at possible ways to automate it. We investigated the possibility of using text-mining to automatically extract axioms from text. In particular, we re-formulated the task of axiomatizing a change-of-state event (e.g., “retire”), as finding the states that this event can change (e.g., “being employed”) from millions of web pages, using lexico-syntactic patterns that capture change-of-state. We found that the extracted change-of-state axioms are within different levels of defeasibility and abstraction and proposed a method for using Mechanical Turk in order to categorize axioms based on their level of defeasibility. 8. Thesis Outline This thesis is structured as follows: In Chapter 2, we present a short introduction to our notation and the core theories; In Chapter 3, we describe our 3-step manual axiomatization method which we used for axiomatizing all the change-of-state verbs in Core WordNet; In Chapter 4, we present our text mining method for automatic axiomatization of change-of-state verbs; In chap- 8 ter 5, we present our mixed approach for identifying basic concepts, manually axiomatizing them and automatically axiomatizing more concepts using concept relations. At the end of chapter 5, we evaluate the mixed approach by applying it to the domains of composite entities and sets and measuring the precision and recall of the resulting axioms. In chapter 6, we eva- luate the usefulness of these axioms by applying them to the task of extracting part-whole rela- tions from text and comparing the results with a state-of-the-art part-whole relation extraction system. Finally, we review related work (which we briefly mentioned in this introduction) in more detail. 9 Chapter 2 Background 1. Notation We use a logical notation in which states and events (eventualities) are reified. Specifically, if the expression p(x) says that p is true of x, then p’(e,x) says that e is the eventuality of p being true of x. In practice, e serves as a handle to this predication through which we can modi- fy the predication or add more details about it. Eventuality e may exist in the real world (Rexist), in which case p(x) holds, or it may only exist in some modal context, in which case that is ex- pressed simply as another property of the possible individual e. The logical form of a sentence is a flat conjunction of existentially quantified positive literals, with about one literal per mor- pheme. (For example, logical words like not and or are treated as expressing predications about possible eventualities.) We use a software 7 to translate Penn TreeBank-style trees (as well as other syntactic formalisms) into this notation 8 . The underlying core theories are expressed as axioms in this notation (Hobbs 1985). Unlike deductive axioms, abductive axioms should be read “right to left”. For example, in the axiom p(x,y)→ q(x,z), the proposition q(x,z) on the right hand side implies the assump- tion of the proposition p(x,y) on the left hand side, taking into account identity of the first argument of p and the first argument of q, denoted by the same variable x. In addition, va- riables occurring in antecedent of implications are universally quantified and variables occurring in the consequent are existentially quantified. Most commonsense knowledge is defeasible, i.e., it can be defeated. This is represented in our framework by having a unique “et cetera” proposition in the antecedent of Horn clauses that 7 http://www.rutumulkar.com/nl-pipeline.html 8 Boxer parser (Bos 2008) also produces this notation. Boxer can be downloaded at: http://svn.ask.it.usyd.edu.au/trac/candc/wiki/boxer 10 cannot be proved but can be assumed at a cost corresponding to the likelihood that the conclu- sion is true 9 . For example, the axiom bird(x) & etc-i(x) → fly(x) says that if x is a bird and other unspecified conditions hold, etc-i, then x flies. No other axioms enable proving etc-i(x), but it can be assumed, and hence participate in the lowest cost proof. The index i is unique to this axiom. Since most of the axioms presented here are defeasible, we omit the “etc” predicate for sake of simplicity. 2. Core Theories A core theory is a set of predicates and axioms that describe the relationships among a set of very abstract concepts that govern or characterize many aspects of the world we live in. In fact, core theories are the kind of commonsense knowledge that are trivial and humans acquire them in their early childhood. For example, a child understands such concepts as change, com- posite entity, scale, falling and moving. Predicates in Core theories can be English words, but in order to achieve elegance – which is the principal criterion for adequacy core theories- we also use predicates that are not lexically rea- lized. To give an idea of the kind of information encoded in core theories, we sketch two principal core theories, composite entities and change-of-state in the following. Composite Entities: A composite entity is a thing composed of other things. The concept is general enough to include complex physical objects (e.g., a telephone), complex events (e.g., the process of erosion) and complex information structures (e.g., a theory). A composite entity is characterized by a set of components, a set of properties, and a set of relations. Example pre- dicates that are defined in this theory are compositeEntity(x) which simply says x is a composite entity; componentsOf(s,x) which says that s is the set of x‘s components; and 9 This approach to defeasibility is similar to circumscription (McCarthy 1980) 11 componentOf(y,x) which says that x is a component of the composite entity y. The relations between these concepts are captured in several axioms. For example the following axiom de- fines the relationship between the above predicates: componentOf(y, x) ↔ compositeEntity(x) & componentsOf(s, x) & member(y, s) The above axiom says that y is a component of x if and only if x is a composite entity, s is the set of components of x and y is a member of s. In this axiom, the predication mem- ber(y,s) comes from our core theory of sets. Change-of-State: An important predication in this theory is change’(e,e1,e2) which says that e is a change-of-state whose initial state is e1 and whose final state is e2. The chief proper- ties of change are that there is some entity whose state is undergoing change, that change is defeasibly transitive, that e1 and e2 cannot be the same unless there has been an intermediate state that is different, and that change is consistent with the before relation from our core theory of time (Hobbs et al., 2004). An example axiom in this theory is: change’(e, e1, e2) & change’(e0, e2, e3) → change’(e4, e1, e3) Which states that change is transitive: if there is a change from state e1 to state e2 and a change from state e2 to state e3, then we have a change from state e1 to state e3. Since many lexical items focus only on the initial or the final state of a change, the predications changeFrom’(e,e1) and changeTo’(e,e2) are introduced for more convenience. The first predication says that e is a change from state e1 to another state that is inconsistent with e1. In other words after the change, state e1 no longer exists. Similarly, the second predication says that e is a change to state e2 that didn‘t exist before the change. 12 Chapter 3 Manual Axiomatization of Concepts In this chapter, we describe our methodology for axiomatizing words in terms of predications in the core theories which consists of three steps: Analyzing the structure of a word’s WordNet senses, writing axioms for the most general senses, and testing the axioms on textual entail- ment pairs. We also describe common issues we faced and decisions we had to make during axiomatization. We used this method for axiomatizing all the change-of-state words in Core WordNet in terms of predicates in core theories of change-of-state, composite entities, scales and event structure. In the following, we first explain the method used for choosing and clustering words related to different core theories and then describe the three-step methodology for axiomatization of these words. Choosing Words for Manual Axiomatization WordNet 10 (Miller, 1990) contains tens of thousands of synsets referring to highly specific ani- mals, plants, chemical compounds, French mathematicians, and so on. Most of these are rarely relevant to any particular natural language understanding application. To focus on the more central words in English, the Princeton WordNet group has compiled a CoreWordNet 11 , consist- ing of 4,979 synsets (corresponding to about 3500 words) that express frequent and salient concepts. (Hobbs 2008) has classified these word senses manually into sixteen broad categories, listed here with rough descriptions and lists of sample words in the categories. Word senses are not indicated but should be obvious from the category. 10 http://wordnetweb.princeton.edu 11 CoreWordNet is downloadable from http://wordnet.cs.princeton.edu/downloads.html. 13 Scales: partial orderings and their fine-grained structure: step, degree, level, intensify, high, major, considerable, … Events: concepts involving change and causality: constraint, secure, generate, fix, power, de- velopment, … Space: spatial properties and relations: inside, top, list, direction, turn, enlarge,… Time: temporal properties and relations: year, day, summer, recent, old, early, present, then, often, … Cognition: concepts involving mental and emotional states: imagination, horror, rely, remind, matter, estimate, idea, … Communication: concepts involving people communicating with each other: journal, poetry, announcement, gesture, charter, … Persons: concepts involving persons and their relationships and activities: leisure, childhood, glance, cousin, jump, … Micro-social: social phenomena other than communication that would be present in any so- ciety regardless of their level of technology: virtue, separate, friendly, married, company, name, … Bio: living things other than humans: oak, shell, lion, eagle, shark, snail, fur, flock, … Geo: geographical, geological and meteorological concepts: storm, moon, pole, world, peak, site, sea, island, … Material World: other aspects of the natural world: smoke, stick, carbon, blue, burn, dry, tough, … Artifacts: physical objects built by humans to fulfill some function: bell, button, van, shelf, machine, film, floor, glass, chair, … Food: concepts involving things that are eaten or drunk: cheese, potato, milk, bread, cake, meat, beer, bake, spoil, … Macrosocial: concepts that depend on a large-scale technological society: architecture, air- port, headquarters, prosecution, … Economic: having to do with money and trade: import, money, policy, poverty, profit, ven- ture, owe, … These categories of course have fuzzy boundaries and overlaps, but their purpose is only for grouping together concepts that need to be axiomatized together for coherent theories. Each of 14 these categories are then given a finer-grained structure. The internal structure of the category of event words is given below, with descriptions and examples of each subcategory. State: Having to do with an entity being in some state or not: have, remain, lack, still, … Change: involving a change of state: Abstractly: incident, happen A change of real or metaphorical position: enter, return, take, leave, rise, … A change in real or metaphorical size or quantity: increase, fall, … A change in property: change, become, transition, … A change in existence: develop, revival, decay, break, … A change in real or metaphorical possession: accumulation, fill, recovery, loss, give, The beginning of a change: source, start, origin, … The end of a change: end, target, conclusion, stop, … Things happening in the middle of a change: path, variation, repetition, *take a+ break, … Participant in a change: participant, player, … Cause: having to do with something causing or not causing a change of state: In general: effect, result, make, prevent, so, thereby, … Causes acting as a barrier: restriction, limit, restraint, … An absence of causes or barriers: chance, accident, freely, … Causing a change in position: put, pull, deliver, load, … Causing a change in existence: develop, create, establish, … Causing a change in real or metaphorical possession: obtain, deprive, … Instrumentality: involving causal factors intermediate between the primary cause and the pri- mary effect: way, method, ability, influence, preparation, help, somehow, … Process: A complex of causally related changes of state: The process as a whole: process, routine, work, operational, … The beginning of the process: undertake, activate, ready, … 15 The end of the process: settlement, close, finish, … Things that happen in the middle of a process: trend, steady, postpone, drift, … Opposition: Involving factors acting against some causal flow: opposition, conflict, delay, block, bar, … Involving resistance to opposition: resist, endure, … Force: Involving forces acting causally with greater or lesser intensity: power, strong, difficulty, throw, press, … Functionality: A notion of functionality with respect to some human agent’s goals is superim- posed on the causal structure; some outcomes are good and some are bad: Relative to achieving a goal: use, success, improve, safe, … Relative to failing to achieve a goal: failure, blow, disaster, critical, … Relative to countering the failure to achieve a goal: survivor, escape, fix, reform, … We have axiomatized all the words in the Change cluster above. To make manual axiomatization of words feasible, we only axiomatize domain-independent, ordinary words like “enter” and “deliver”. Interestingly, domain-independent words comprise 70-80% of the words in most texts, even technical texts. They have such wide utility because their basic meanings tend to be very abstract, and they acquire more specific meanings in com- bination with their context. This means that the underlying theories required for explicating the meanings of these words are going to be very abstract. In the following, we describe the three-step approach for manual axiomatization of words. 1. Analyzing the Structure of WordNet Senses For each change-of-state word in our list, we analyze the structure of its WordNet senses. Typi- cally, there will be pairs that differ only in, for example, constraints on their arguments or in that one is inchoative and the other causative. This analysis generally leads to a radial structure indicating how one sense leads by increments, logically and perhaps chronologically, to another 16 word sense (Lakoff, 1987). The analysis also leads us to posit “supersenses” that cover two or more WordNet senses. Frequently, these supersenses correspond to senses in FrameNet (Baker et al. 1998) or VerbNet (Kipper et al. 2000, 2006), which tend to be coarser grained; sometimes the desired senses are in WordNet itself. For example, for the verb “enter”, three WordNet senses involve a change into a state: V2: enter, participate (become a participant; be involved in) V4: figure, enter (be or play a part of or in) V9: embark, enter (set out on (an enterprise or subject of study)) "she embarked upon a new career" We group these three senses into supersense S1 12 . Two more senses specialize supersense S1 by restricting the target state to be “in a location”: V1: enter, come in, get into, get in, go into, go in, move into (to come or go into) "the boat entered an area of shallow marshes" V6: enter (come on stage) Two other senses add a causal role to this: V5: record, enter, put down (make a record of; set down in permanent form) V8: insert, infix, enter, introduce (put or introduce into something) "insert a picture into the text" One other sense adds a causal role to S1 and restricts the target state to be membership in a group: V3: enroll, inscribe, enter, enroll, recruit (register formally as a participant or member) "The party recruited many new members" 12 In our framework, it is possible to assign a WordNet sense to two supersenses. 17 Figure 3-1 shows the radial structure of the senses for the word “enter” 13 , together with the axioms that characterize each sense. A link between two word senses means an incremental change in the axiom for one gives the axiom for the other. For example, the axioms for S2 and S2.1 are obtained by adding a causal role to the axioms for S1 and S1.1 respectively. Thus S2 is linked to S1 and S2.1 is linked to S1.1. More specifically, the expanded axiom for S1 says that if x1 enters eventuality e1, then there is a change to e1 where x1 is an argument of e1; and the axiom for S2 says that if x1 enters x2 in eventuality e1, then x1 causes a change into the state e1 where x2 is an argument of e1. So S2 and S1 are closely related and linked together. Abstraction is a particularly important kind of incremental change; one sense S1 is an abstrac- tion of another sense S1.1 when S1.1 specializes S1 either by adding more predications to or specializing some of the predications in S1’s axiom. We represent abstractions via arrows point- ing from the sub-senses to the supersenses. In Figure 3-1, S1.1 specializes S1 by adding an extra predication describing e1 as an “in” eventuality and V3 specializes S2 by specializing e1 to “membership in group”. Figure 3-1: Radial structure and axioms for the verb “enter” 13 Throughout this thesis , w e r ef er to “ W o r d Net s en s e” s i m p l y as “ s e n s e” . W e r ef er to th e n t h s e n s e o f a p ar ticu- lar part of speech (POS) of a word as POSi (e.g., V3 or ADV2). 18 Knowing this radial structure of the senses helps enforce uniformity in the construction of the axioms. If the senses are close, their axioms should be almost the same. We are currently only constructing axioms for the most general or abstract senses or supersenses. In this way, al- though we are missing some of the implications of the more specialized senses, we are captur- ing the most basic topological structure in the meanings of the words. This decision has a num- ber of advantages which are listed below: Feasibility: Axiomatizing the general senses is much more feasible than axiomatizing all the senses. (e.g., axiomatizing 4 supersenses for the verb “give” rather than all 44 specific senses). Delaying Disambiguation: Axiomatizing general senses allows us to make correct (although not precise) inferences when the disambiguation into fine-grained senses is not possible, either be- cause no context is available or because the context doesn’t help in resolving the ambiguity. For example, in the sentence “she had a baby”, it is not clear which sense of “had” is being referring to. Is it referring to the “have/possess” sense or to the “giving birth” sense? In any case we can be sure there is some relation between “she” and the “baby”. Of course it would be always good to have axioms for the specific senses too, so that we get more accurate inferences when context is strong enough to do lexical disambiguation. In fact, one can view the choice between fine-grained and coarse-grained axioms as the decision of where to place the unreliability. If we only employ fine-grained senses, we have less reliable lexical disambiguation, but more precise inferences after the right sense is determined. If we axiomatize coarse-grained senses, we get more reliable lexical disambiguation (since we have a smaller branching factor); but introduce uncertainty (and impreciseness) in the axioms. In the latter case, it is the task of our abduction engine to find more precise inferences by searching for the lowest cost proof of the text. Handling Metaphors: In metaphors, it is often the topological properties captured in a super- sense that is transferred from source to the target (Lakoff 1980). For example, we can view “en- tering a course of study” as resting on a metaphor of “entering a room”. As shown in Figure 3-1, supersense S1 unpacks and makes explicit the “change of state” property from the source do- main senses (V1 and V6) of “enter” that is transferred to the target domain senses (V2-V5, V8, V9). 19 Semantically-Oriented Word Senses : WordNet makes a distinction between senses based on argument structure, in addition to semantic differences, differences in sets of syntactic frames and/or differences in selectional restrictions (cf. Palmer 1998). We can make the distinction between senses more semantic-oriented by grouping such non-semantically differentiated senses under one supersense, assigning them multiple argument realization patterns and expli- cating only one predicate axiomatically. For example, we group senses 1 and 8 of the verb “be- gin” (which correspond to “We began working at dawn” and “begin a cigar” respectively), as one concept (supersense) that is realized by two different argument realization patterns. In addition to the above listed advantages, extracting the basic topological structure of words gives us a framework for investigating synonymy, near synonymy and other types of similarity between words as we have shown in (Montazeri and Hobbs, 2010). In constructing the axioms in the event domain, we are very much informed by the long tradi- tion of work on lexical decomposition in linguistics (e.g., Gruber 1965 and Jackendoff 1972). Our work differs from this in that our decompositions are done as logical inferences and not as tree transformations as in the earliest linguistic work, they are not obligatory but only inferences that may or may not be part of the lowest-cost abductive proof, and the “primitives” into which we decompose the words are explicated in theories that enable reasoning about the concepts. 2. Axiomatization In this section we describe some common problems we faced and decisions we had to make for axiomatizing change-of-state words and generalizing over different senses. We describe how much information we choose to embed in a verb’s axiom and how we handle metonymy and metaphors. 2.1 Specificity of Details An important question is: How much information should be encoded in an axiom? For example, one possible axiom for a generalization of senses 1, 16 and 32 of the verb “carry” is: 20 carry-s0'(e,x,y,p0,p1) ↔ and'(e,e1,e2) & hold'(e1,x,y) & move'(e2,x,p0,p1) & cause(e,e3) & move'(e3,y,p0,p1) which means the eventuality e of x's carrying y from p0 to p1 is an eventuality e of both x's holding y (e1) and x's moving from p0 to p1 (e2) where e causes the eventuality e3 of y's moving from p0 to p1. Another possibility for defining “carry” is: carry-s0'(e,x,y,p0,p1) ↔ and'(e,e1,e2) & hold-v2'(e1,x,y) & move'(e2,x,p0,p1) where the following axiom would be in a core theory of attachment and causality: and'(e,e1,e2) & hold'(e1,x,y) & move'(e2,x,p0,p1) ↔ cause(e,e3) & move'(e3,y,p0,p1) which means that x’s moving while holding y causes y’s moving. In fact, this is a fundamental property of attachment and hence should be part of the core theory of attachment. In general, we prefer to factor out all the information that can be inferred from world knowledge and ex- plicate it in our core theories. 2.2 Metonymy vs. Lexical Disambiguation WordNet is not quite consistent in handling metonymy. For example consider the following senses of the verb “begin”: V1: get down, begin, get, start out, start, set about, set out, commence (take the first step or steps in carrying out an action) "We began working at dawn"; "Who will start?";... V8: begin, start (begin an event that is implied and limited by the nature or inherent function of the direct object) "begin a cigar"; "She started the soup while it was still hot"; ... WordNet has assigned a separate sense (V8) to the use of the verb “begin” with a metonymycal argument. However, despite different argument realization patterns, V1 and V8 are semantical- ly equivalent: both mean starting an eventuality e. The only difference is that e is missing in V8, since it can be determined by the arguments. Now consider another case where metonymy is not handled in separate senses. One sense of “cut” is: V36: cut, cut off (cease, stop) "cut the noise"; "We had to cut short the conversation" 21 This sense semantically means “stopping an eventuality”. While “conversation” is an eventuali- ty, “noise” is not. What we really mean with “cut the noise” is “cut making noise”; where “mak- ing” is omitted due to its recoverability. One way to handle this case is to split the sense V36 into two sub-senses V36-a and V-36b and axiomatize them differently: cut-v36-a’(e,x,e0) ↔ cause’(e,x,e1) & changeFrom’(e1,e0) cut-v36-b’(e,x,z) ↔ cut-v36-a’(e,x,e0) & arg*(z,e0) The first axiom states that x’s cutting eventuality e0 means that x causes a change from e0. The second axiom states that x’s cutting of z means x causes a change from some eventuality e0, where z is an argument of e0. A better approach is to handle metonymy separately using axioms such as: cut-v36’(e,x,e0) →cut-v’(e,x,e0) & eventuality(e0) cut-v36’(e,x,e0) →cut-v’(e,x,y) & arg*(y,e0) and have only one predicate cut-v36 that we explicate as: cut-v36’(e,x,e0) ↔ cause’(e,x,e1) & changeFrom’(e1,e0) The logic behind this is that our predicates explicate situations; therefore if two situations are the same, their predicates should also be the same. We also merge those senses of WordNet that differ only in argument realization patterns and assign them a unique axiom. e.g., for the “begin” case above, we generate a supersense that unifies V1 and V8: begin-v1-8’(e,x,e0) ↔ changeTo’(e,e1)& eventSequence(e0,e1,e2) & arg*(x,e1) and add the following axioms to handle metonymy: begin-v1-8’(e,x,e0) → begin-v’(e,x,e0) & eventuality(e0) begin-v1-8’(e,x,e0) → begin-v’(e,x,y) & arg*(y,e0) Which say that the eventuality e of x beginning an event sequence e0 is a change to an even- tuality e1, where e1 has x as its argument and is the first eventuality in e0. Ideally we would 22 like to have a general mechanism to handle metonymy, but since such a mechanism is not im- plemented yet, we can handle some cases of metonymy using the above technique. Besides solving metonymy, this method reflects the fact that the same concept can be realized diffe- rently in text. 2.3 Metaphors Sometimes a WordNet sense has a clear metaphorical origin and that metaphorical interpreta- tion has become conventionalized as one of the senses of the word. Thus it is not inconsistent to say that an instance of a word is both a metaphor and an example of a conventional sense of a polysemous word. Consider the second sense of the verb “descend”: V2: derive, come, descend (come from; be connected by a relationship of blood, for example) "She was descended from an old Italian noble family"; "he comes from humble origins" Other senses of “descend” mean a change from being at a higher region to being at a lower re- gion on a vertical scale 14 : V1: descend, fall, go down, come down (move downward and lower, but not necessarily all the way) "The barometer is falling"; "The curtain fell on the diva"; ... V3: condescend, deign, descend (do something that one considers to be below one's dignity) V2 doesn’t seem to indicate a change on a scale; but it would if we consider a tree of life having its root in top and the children deriving from it downwards. In such cases, axiomatizing a word sense with metaphorical interpretation allows us to fit those senses within our radial structure, making the structure more coherent. However we won't do so if such abstract axioms are not practical for reasoning. In this example, we axiomatize sense 2 as descend-v2'(e,d,o) ↔ ancestorOf'(e,o,d) 14 A scale can be stipulated to be vertical for a variety of reasons. 23 which falls into kinship domain. Note however that often times we can account for many con- ceptual metaphors (Lakoff 1980) by using abstract predicates. For example, consider the follow- ing senses of the verb “cut”: V32: cut (shorten as if by severing the edges or ends of) "cut my hair" V37: abridge, foreshorten, abbreviate, shorten, cut, contract, reduce (reduce in scope while retaining essential elements) "The manuscript must be shortened" A generalization of both these senses can be captured by the following axiom : cut-s1'(e,x,y) ↔ cause'(e,x,e10) & changeFrom'(e10,e0) & connect’(e0,w,z)& componen- tOf(w,y) & componentOf(z,y) which says, x’s cutting y is x’s causing a change out of a state in which w and z, two compo- nents of y, were connected. Although “cutting a manuscript” is very different from “cutting hair”, our abstract definition of composite entity allows us to handle them similarly. The differ- ence comes from the nature of “hair” and “manuscript” that differ in their 1) component types (parts of hair are also hair; parts of manuscript are lines and paragraphs) and 2) component re- lations (physical attachment vs. the abstract relation between lines and paragraphs). 2.4 Choosing Predicates While constructing axioms, there were predicates that we needed to use but were not axioma- tized in any core theories yet. We created such predicates and made a note about the theory they should belong to. We also decided not to limit ourselves to using only basic predicates and axioms from core theories; instead, whenever we found a concept or knowledge that could be reused in several axiomatizations, we axiomatized it separately so that it can be e-used. This will also reduce re- dundancy between axioms, a source of incompatibility. 24 3. Testing Axioms and Identifying Missing Information The purpose of this round is to enforce uniformity in the way axioms are constructed, and also expose missing inferences in the core theories, as we discuss later in this chapter. We chose textual entailment as our testing framework. For each set of inferentially related words we con- struct textual entailment pairs, where the hypothesis (H) intuitively follows from text (T), and use these for testing and evaluation. The person writing the axioms should not know what the pairs are, and the person constructing the pairs should not know what the axioms look like. The ideal test then is whether given a knowledge base K consisting of all the axioms, H cannot be proven from K alone, but H can be proven from the union of K and the best interpretation of T. This is often too stringent a condition, since H may contain irrelevant material that doesn’t follow from T, so an alternative is to determine whether the lowest cost abductive proof of H given K plus T is substantially lower than the lowest cost abductive proof of H given K alone, where “substantially lower” is defined by a threshold that can be trained (Ovchinnikova et al., 2011). To show how textual entailment reveals the gaps in our knowledgebase, we work through the following example 15 : T: He cut the rope H: The length of the rope decreased We have the following axiom for “cut” 16 : AX1:cut-s0'(e,x,y) ↔ cause'(e,x,e10) & changeFrom'(e10,e0) & connect'(e0,w,z) & compo- nentOf(w,y) & componentOf(z,y) Which characterizes cut (x) as disconnecting two components of x. With such a definition, we need to infer a decrease in length of the rope from the disconnection of two pieces of the 15 In this example we do not show the costs, although they are used by our system. 16 This definition corresponds to sense V1 o f “ c u t” w h ich i s d ef i n ed as“ s ep ar ate w i th o r asi f w it h an i n s tr u m e n t”. 25 rope 17, 18 . However there are some complementary axioms that are necessary to help our gen- eral axiom AX1 work. For example, the fact that the parts of a physical composite entity are smaller than the whole in particular dimensions: componentOf(c,y) & dimensionOf(p,y) & dimensionOf(p,c) & valueOf’(v0,p,y) & valueOf’(v1,p,c) → lts(v1,v0,s) & scaleFor(s,p) This axiom states that if c is a component of y; and y and c both have a physical dimension p; and their values for this dimension are v0 and v1 respectively, then v1 (e.g., the length value of c) is less that v0 (e.g., the length value of y) on the scale associated with p. We also need to account for the fact that after separation, we refer to the parts of a rope as the rope itself (In contrast to parts of a car). One implication of this is that the properties of either of the components can be assigned to the original entity: changeFrom'(e10,e0) & connect'(e0,c1,c2) & componentOf(c1,y) & componentOf(c2,y) & dimensionOf(p,y) & dimensionOf(p,c1) & dimensionOf(p,c2) & valueOf’(e5,v0,p,y) & valueOf(v1,p,c1) & valueOf(v2,p,c2) ↔ change’(e10,e5,e2) & or’(e2,e3,e4) & valueOf’(e3,v1,p,y) & valueOf’(e4,v2,p,y) This axiom means if c1 and c2 are components of y; and y, c1 and c2 all have a dimension p and their values for this dimension are v0, v1 and v2 respectively, then a change from c1 and c2 being connected, means a change from y having the value v0 for dimension p (e.g., rope having length v0) to y having either v1 or v2 for dimension p (e.g., rope having length v1 or v2). Our general axiom for decrease is: decrease-v1'(e,p,x,v1,v2) ↔ change'(e,e1,e2) & valueOf’(e1,v1,p,x) & valueOf’(e2,v2,p,x) & lts(v2,v1,s) & scaleFor(s,p) 17 This inference may seem intuitive, but in fact, it is defeasible: If we cut the rope along the length instead of the width, we will get a narrower rope with the same length. 18 There is a separate sense for “ c u t”in WordNet d ef i n ed as “sh o r te n as if b y s e v er i n g th e ed g es o r en d s o f ”. But according to our decision explained in 2.1, this sense should be derivable from the type of argument ( h er e “ r o p e” ) that “ cu t” takes. 26 Which says a decrease of x’s value for its dimension p from v1 to v2 means a change from x’s value for dimension p being v1 to being v2, where v2 is less than v1 on scale s associated with p. With all these axioms we are now able to infer that when a rope is cut into two pieces c1 and c2, the size of rope will change to the size of either c1 or c2 which is less than the size of the original rope. A change (along some scale) from a higher value to a lower value of a property is a “decrease” of that property. The only thing that remains is inferring that the property under discussion is “length”. This would require axiomatizing the rough shapes of common objects. This example shows how we can find gaps in the core theories by testing our lexical axioms on textual entailment pairs. As we have shown in (Montazeri and Hobbs 2011), we should be able to identify the missing links in and between the core theories systematically. In this paper we identified compositions of fundamental predicates cause, change, not and possible; as show in Table 3-1. In this table, rows and columns correspond to predicates p and q which are combined through the following axiom to yield the predicate r. p’(e1,e2) & q’(e2,x) → r’(e1,x) Expanding the idea of systematically identifying related concepts, and axiomatizing their rela- tionship and composition is beyond the scope of this thesis and remain as a future work. Table 3-1: Axioms encoding compositions of fundamental predicates 27 4. Statistics In Core WordNet, about 130 word senses are related to change-of-state. We analyzed the radial structure of all the senses of each word (a total of 2414 word senses) to determine and axi- omatize the most general senses. We wrote a total of 720 axioms, samples of which are in- cluded in Appendix I. The entire set of axioms is available at https://github.com/nmontazeri/Change-of-State-Axioms In axiomatizing these senses, about 250 new predications are introduced, 110 of which are gen- eral such as up, down, during, difficult, atTime, at, in, directionOf, sizeOf, back, front, etc. Some of these predications already exist in core theories and some of them should be added to existing or new theories. Examples of specific predications that belong to less gen- eral theories are scoreOf, soundOf, expenseOf, liquid, long, childOf, function, etc. 28 Chapter 4 Automatic Extraction of Change-of- State Axioms from Text Manual axiomatization of words is slow and subjective. It would be nice if we could auto-mate axiomatization, at least for a fraction of English words. We have tried extracting the meaning of change-of-state verbs from text (Montazeri and Hobbs, 2015). The idea is that the most impor- tant part of the meaning of a change-of-state verb is the state that changes; so if we can find this particular state for a verb, we have captured the essence of the meaning of that verb. For example, the verbs “fall” and “enter” are best characterized by the states that are changed, namely “being up” and “being out” respectively 19 . To formulate the problem more precisely, suppose that our manually-encoded axioms for any change-of-state verb (referred to as EVENT) can be simplified in the following format 20 : (1) EVENT’(e,x,y) → changeFrom /to’(e, e0) & STATE’(e0, x/y) Which means that the occurrence of EVENT with agent x and patient y entails a change from x or y being in the state denoted by STATE. The task then will be to extract from text, a set of (STATE, EVENT) pairs that can be plugged in the above axiom. Note that the predicates in the resulting axioms are not restricted to predicates from core theories, but can be English words too. In other words, we are axiomatizing words in terms of other words and predicates in the core theories. Before moving on, we would like to introduce several terms that simplify our discussion in the rest of this chapter. 19 Of course, this is not the case for all change-of- s tate v er b s . Fo r ex a m p le, a c h an g e to th e s tate o f “ x b ei n g at p 2 ” is o n l y p ar t o f t h e m ea n in g o f “x r etu r n s to p 2 ”. T h e m o r e ch ar ac ter is tic p ar t o f m ea n i n g o f “r etu r n ” i s t h at “x h as b ee n at p2 before ”. 20 In these axioms, the only predicates that are anchored in core theories are changeFrom or changeTo. 29 We refer to the state predication that appears in our manual axiomatization of an event 21 as its M-STATE. For example, according to our hand-coded axiom for “en- ter”, the M-STATE of enter (x,y) is in(x,y). We refer to the set of states that are automatically extracted for an event as it’s ES- Set (standing for Extracted State Set). This chapter is organized as follows: First we present our method for extracting candidate change-of-state pairs from text. The initial results are rather noisy, but this is not much of a concern in the beginning as we would like to first study the relationship between the M-STATE and ES-Set of events and don’t want to lose any data in an attempt for increasing precision. Next we present our study on the extracted pairs and find that there are three types of pairs that yield different types of axioms with different levels of abstractness and defeasibility. We then compare the extracted states, for several sample verbs, against the M-STATEs in our ma- nual axiomatization of those verbs. Finally, we use machine-learning to filter non-change-of-state pairs and improve precision. In addition, we present a method for using Mechanical Turk to categorize the axioms according to their level of defeasibility. 1. Methodology Data Set: We harvested information from the ClueWeb09 dataset 22 , whose English portion con- tains just over 500 million web pages. Patterns: We use lexico-syntactic patterns for extracting pairs of phrases (STATE, EVENT) which may be in a change-of-state relation, in that the occurrence of EVENT results in a change from/to STATE. Examples of such pairs are (not understand, realize), (be confused, realize) and (be happy, realize). Verbal representations of our patterns are listed in Table 4-1. 21 W e u s e th e ter m s „ ‟ v er b ‟ ‟ a n d „ ‟ ev e n t ‟ ‟ in ter c h an g ea b l y , b u t to b e m o r e p r ec is e, „ ‟ e v e n t ‟ ‟ i s an i n s ta n tiatio n o f „ ‟ v er b ‟ ‟ w it h it s ar g u m e n t. 22 http://lemurproject.org/clueweb09.php/ 30 In our patterns, STATE and EVENT are two phrases that are in an adverbial-complement relation and have a common argument. EVENT is a verb phrase with verb VE and STATE is a verb phrase with either (1) a verb VS in passive form (e.g., was detained) or (2) a “being” verb VBS, with a noun, adjective, or a prepositional phrase (e.g., remained successful, was hero, was in team); where a “being” verb is a verb in the set ,“be”, “remain”, “become”, “get”, “stay”, “keep”-. To apply the constraint that STATE and EVENT have a common argument, the subject or the object of the second phrase (according to their order in the sentence) should be a pronoun that refers to the subject or object of the first phrase. Some examples are: “(John was detained) un- til March when (the authorities released him)”, “(John was happy) until (he heard the news)” and “if (John hears the news), (he will get upset)”. Pattern Pattern if EVENT, no longer STATE became/got STATE when/after EVENT became/got STATE after EVENT used to STATE before EVENT no longer STATE because EVENT STATE until t when EVENT how can STATE if EVENT although EVENT still/continued STATE stopped STATE because EVENT no longer STATE if EVENT STATE until EVENT Table 4-1: Verbal representation of lexico-syntactic patterns for change-of-state. Parsing the Corpus and Applying Patterns: Since our patterns require syntactic information, we parsed the sentences using the fast dependency parser described in (Tratz and Hovy, 2011). Before parsing this corpus, we first filter out sentences that won’t match any of our patterns, using a set of regular expressions derived from the patterns. An example of such regular expres- sions is if .*, *\(it \|he \|she \|they |you |\we\|I) will no longer .* 31 Next, we parse these sentences and apply our syntactic dependency patterns to extract STATE and EVENT from each sentence along with several features such as pronoun-resolution (wheth- er the common theme is the object or subject of the event), the pattern that was matched against the sentence, state and event verb’s voice (passive/active), and whether the state has been represented by an adjective or a noun. In the next step, we aggregate the features extracted for the same (STATE, EVENT) pairs into (STATE, EVENT, f, P, F) tuples; where f is the frequency of the pair (which shows how many times it was extracted by any change-of-state pattern), P is a dictionary structure that shows for each pattern, how many times it was matched against the (STATE, EVENT) pair, and F is a dictio- nary structure that keeps the most frequent value for each feature. We then drop the “being” verbs for nouns, adjectives and prepositional phrases; and construct the argument structures for states and events based on the most frequent pronoun-resolution case. After removing tuples with empty events like “be”, “have”, “get”, “do”, etc., we get about 68000 instances, e x- amples of which are (lost (y), find (x, y)) and (teacher (x), retire (x)). 2. Assessing the Quality of the Results One of the authors annotated 870 (STATE, EVENT) pairs which are a combination of random and high-frequency pairs. While trying to annotate these pairs, we found 3 types of change-of-state relationships between STATE and EVENT and hence we used the following fine-grained tags for annotating the results: Category-1: STATE is a precondition of EVENT and will no longer hold after EVENT. Examples are (lost (y), find (x, y)) and (alive (x), die (x)). Pairs in this category result in axiom with the format: EVENT’(e, x, y) → changeFrom’(e, e0) & STATE’(e0, x/y) Category-2: STATE is not a precondition of EVENT, but if STATE holds before EVENT, occur- rence of EVENT will surely put an end to it. Examples are (teacher (x), retire (x)) and (mar- ried (x), die (x)). Pairs in this category result in axioms with the format: EVENT’(e, x, y) & STATE’(e0, x/y)→ changeFrom’(e, e0) 32 Category-3: Similar to the situation for Category-2, but EVENT only sometimes puts an end to STATE. Examples are (happy (x), realize (x, y)) and (confused (x), read (x, y)). Such pairs re- sult in defeasible axioms with the format: EVENT’(e, x, y) & STATE’(e0, x/y) & etc → changeFrom’(e, e0) We refer to pairs that belong to any of the above categories as “change-of-state” and the rest as “non-change-of-state” pairs. Table 4-2 shows the distribution of change-of-state pairs and the finer-grained categories for all annotated pairs. In total, 69% of the annotated pairs are change-of-state pairs. We consider this as the baseline precision for our method. Change-of-State Non-change-of- state %Pairs 69% 31% Cat1 Cat2 Cat3 12% 16% 41% Table 4-2: Distribution of Pair Categories. The Nature of Non-Change-of-State Pairs: Except for parsing errors, many of the pairs tagged with category 4 represent a process-result relationship examples of which are (soak(x,y), absorb(y,z)) and (press(x,y), break(y)). Also, many of the verbs in category 4 could be in categories 1-3 if we considered more context. For example, the pair (happy, receive) is too general to represent a meaningful relation. However, if we considered more context to get (happy, receive a letter), it would fall into category 3. We will show how to improve the 69% baseline using machine learning. Before that, however, we study the annotated pairs further and compare them against the M-STATEs in our manually- created axioms for change-of-state verbs. 3. Comparison with Manually-Encoded Axioms Table 4-3 shows M-STATEs and ES-Set for 3 different events. 33 First, we have the verb “disappear” that we axiomatized with the M-STATE “see”. The states “see” and “look at” found in the ES-Set perfectly match the M-STATE. The same holds for “retire”: We have used the word “work” as its M-STATE which appears among category-1 states in the ES-Set. There are additional states in the ES-Set such as “serve” and “hold (position)” 23 which are essentially paraphrases of each other. Event M-STATE ES-Set Category-1 Category-2 Category-3 disappear'(e,x) see’(e0,y, x) look at/see’(e0,y,x) retire’ (e,x) work’(e0,x,y) work’(e0,x) serve’(e0,x,y) hold ‘(position,..)(e,x,y) stay’(with company)(e0,x,y) run(company,...)(x) teach(x) minister(x) owner/responsible/ eligible’(e0,x) move’ (e,x,p1,p2) at’(e0,x, p1) In’ (place/state/location /position/home/area)’(e0,x) reside/live/stay’(e0,x) ------------------------ hold’ (e0,position)(x,y) work’(at company)(e0,x) play’ (for team) (e0,x) Table 4-3: Events, their M-STATEs and sample states in their ES-Set States in category-2 for “retire” include “run (company)”, “teach” and “minister” that refer to specific occupations. These states cannot be used for axiomatizing the word “retire” because they apply only to those “retire” events in which the agent has a particular occupation. Catego- ry-3 includes such states as “being owner” or “being eligible”. These states are not useful for creating axioms as the resulting axioms will be highly defeasible: retirement does not necessari- ly change ownership or eligibility. We have axiomatized “move” via the abstract predicate “at” which, according to our core theory, is general enough to describe such diverse concepts as “being at a place”, “being in a position at a company” or “living in a place”. Two types of states are extracted for “move”. The 23 W e d o n ‟ t ex tr ac t s u b j ec ts o r o b j ec ts o r o th er co n tex ts f o r s tate v er b s . Ho w e v er f o r illu s tr atio n p u r p o s es w e s h o w them in parentheses. 34 “in” state in category-1 can be taken as a synonym of the M-STATE “at”. There are two groups of states in category-2: one group corresponds to residence and the other group corresponds to a metaphorical use of “move” for describing “changing one’s work (place)”. Again, these states do not define the general event “move” and rather require the initial or final position of the agent to be a location or an occupation. Using Categories 2 and 3 for Axiomatization: An interesting observation is that, in addition to axiomatizing change-of-state verbs using category-1 pairs, we can also axiomatize state predi- cates in category-2 using the states in category-1. For example, we can create the following axioms for “reside” and “teach”: reside’(e,x,y) → at’(e,x,y) teach’(e,x) → work’(e,x) This is because states from category-2 are specializations of states in category-1. Finally, although states from category-3 are not useful for axiomatization, they can capture such implicit knowledge as “retirement can change ownership” through the following defeasible axiom: retire’(e,x) & owner’(e0,x) & etc → changeFrom(e0) 4. Filtering Non-Change-of-State Pairs In order to increase the 69% baseline precision of the simple pattern-matching method, we used the C4.5 decision tree learning algorithm (Quinlan 1986) to classify the extracted pairs into change-of-state/non-change-of-state categories. The features we used are pronoun resolution, matched patterns, state and event verb’s voice (passive/active), and whether the state has been represented by an adjective or a noun) plus the following statistical information: 1) number of distinct patterns that have extracted the pair, and 2) Pointwise Mutual Information (PMI) be- tween (STATE, EVENT) and the set of change-of-state patterns that have extracted it (PT). We compute this mutual information using the following formula: 35 Where pt i is a pattern in PT, P(STATE, EVENT, pt i ) represents the probability that pattern pt i ex- tracts (STATE, EVENT) and * is a wildcard. We normalized the PMI values using the discounting factor presented in (Pantel and Ravichandran, 2004) to moderate the bias towards rare cases. We achieved a precision of 78% and a recall of 90% in a 10-fold cross validation test on our 870 annotated pairs, which means about 10% improvement over the random selection baseline. 5. Categorizing Axioms with Mechanical Turk In order to use the extracted (STATE, EVENT) pairs for axiomatization, we need to know which category they belong to. We have examined the idea of using Mechanical Turk to identify cate- gories of pairs. For each (STATE, EVENT) pair, we asked 2 questions from the annotator. Here is an instantiated version of the two questions for the pair (lost’(e0,y), find’(e,x,y) ): 1. If I hear "something/someone is found": a. I can tell that it/she was lost before being found b. I cannot tell whether it/she was lost before being found 2. If something/someone is lost: a. finding will surely put an end to it. b. finding will sometimes put an end to it. c. finding will rarely put an end to it. From our set of annotated pairs, we randomly selected 20 pairs per each of the 4 categories, a total of 80 pairs. We divided them into 8 assignments each containing 10 pairs (for each pair, 2 questions, and hence 20 questions per assignment). We required that each assignment be ans- wered by 4 subjects. After collecting the results, we used MACE 24 (Hovy et. al., 2013) to aggre- gate the answers provided by all the 4 annotators and get the final answer for each question. 24 MACE can be downloaded from: http://www.isi.edu/publications/licensed-sw/mace/ 36 MACE (Multi-Annotator Competence Estimation) is an implementation of an item-response model that learns in an unsupervised fashion to a) identify which annotators are trustworthy and b) predict the correct underlying labels. It is possible to have MACE produce answers in which it has a confidence above 90%. We have used this feature in our experiment. Since we had two types of questions, we ran MACE on each set of answers separately. As a result, we get for each pair, 2 answers: a yes/no answer for question 1 and a 3 choice answer (sure- ly/sometimes/rarely) for question 2. We then aggregated the answers to obtain the final cate- gory of the pair according to Table 4-4. surely sometimes rarely yes Cat1 Cat 3 None no Cat 2 Cat 3 None Table 4-4: Final categories based on answers to questions From the 870 pairs that we had annotated by ourselves, we randomly selected 20 pairs per each change-of-state category, plus 20 non-change-of-state pairs, a total of 80 pairs. We divided them into 8 assignments each containing 10 pairs (for each pair, 2 questions, and hence 20 questions per assignment). We required that each assignment be answered by 4 subjects. After collecting the results, we used MACE (Hovy, et. al 2013) to aggregate the answers provided by all the 4 annotators and get the final answer for each question. MACE (Multi-Annotator Compe- tence Estimation) is an implementation of an item-response model that learns in an unsuper- vised fashion to a) identify which annotators are trustworthy and b) predict the correct underly- ing labels. It is possible to have MACE produce answers in which it has a confidence above 90%. We have used this feature in our experiment. Since we had two types of questions, we ran MACE on each set of answers separately. As a re- sult, we got for each pair, 2 answers: a yes/no answer for question 1 and a 3-choice answer (surely/sometimes/rarely) for question 2. We then aggregated the answers to obtain the final category of the pair according to Table 4-4. 37 In the following, we refer to Mace as M and to the author that annotated the pairs as A. We measure the agreement only on those cases where MACE was sure about its answer and hence produced one. In evaluating the performance of Mechanical Turk, we are particularly sensitive to false positives, as they will reduce precision, while false negatives only reduce recall. We con- sider the following cases as false positives: for binary yes/no questions: M said “yes”, but A said “no”. For 3 choice questions: 1) M said “surely”, but A said “sometimes” or “rarely” 2) M said “sometimes”, but A said “rarely”. Table 4-5 shows the agreement between M and A (which is above 80%) and percentage of false-positives (which is less than 8%) for different types of ques- tions. Agreement False Positives Yes/No 86% 6% 3 Choice 85% 7.5% Final Answer 82% Table 4-5: Agreement and False Positives for different types of questions. We should mention that the reason we didn’t compute agreement between the annotators using common measures like Kappa score is that 1) since we had split the 80 questions into sets of 10 questions, in many cases a particular annotator had answered only 10 questions, making measurement of Kappa over all 80 cases unreliable and 2) Many of the annotators are spam- mers and this significantly lowers the agreement. 6. Conclusions We used lexico-syntactic patterns to extract candidate pairs of (STATE, EVENT) words from mil- lions of web pages. Our analysis of the results showed that only a small fraction of the extracted pairs (12% of all the pairs and 17% of the change-of-state pairs) yield axioms in our desired level of generality; examples of which are (“steady”, “accelerate”) and (“not official”, “announce”). In most cases however, STATE is a specialization of the general state changed by EVENT and hence cannot be used for representing the meaning of EVENT. An example of such pairs is (“be teach- er”, “retire”) where “be teacher” is a specialization of “be employed” and is too specific to cap- 38 ture the meaning of “retire”. Still, these specialized pairs yield other types of axioms that are quite useful for reasoning. We used machine learning to filter non-change-of-state pairs and improve the quality of extrac- tions by 10%. Finally, we showed that Mechanical Turk can be used for differentiating between (STATE, EVENT) pairs that yield different types of axioms. 39 Chapter 5 The Mixed Approach As our experiment in the previous chapter suggests, it is hard, if not impossible, to dispense with manual axiomatization of words altogether, especially for the most general and basic words. In this chapter we describe another approach to tackle the scale issue by limiting manual axiomatization to the most basic and general concepts 25 and then using concept relations in WordNet and FrameNet to identify and axiomatize additional concepts that can be (directly or indirectly) defined in terms of these basic concepts. Throughout this chapter, we refer to the first group of concepts, i.e., those that need to be axiomatized manually, as “ Basic Concepts” and call the second group of concepts, i.e., those that can be axiomatized in terms of the basic concepts, as “Derived Concepts”. Figure 5-1 shows a simplified schema of this approach. Figure 5-1: A simple schema of the proposed method. Focusing on one domain at a time, we first identify the set of synsets and frames in WordNet and FrameNet that are related to the domain. We then use frame-frame relations, synset-frame mappings and synset-synset relations to identify which synsets and frames are derived from 25 In this chapter, we refer to frames, synsets and words collectively as concepts. 40 (i.e., can be defined in terms of) others. Those synsets and frames that cannot be defined in terms of any other concepts in the domain are considered basic and are chosen for manual axi- omatization. The above-mentioned relations between synsets and frames are then used to identify and axiomatize all possible synsets and frames that can be defined in terms of the ma- nually axiomatized concepts. Figure 5-2: An example of a basic concept (the frame Change_position_on_a_scale) and concepts derived from it (the frame Proliferat_in_number and all the words in the left hand side of the figure). Figure 5-2 shows a basic frame, Change_position_on_a_scale, and the concepts derived from it. Words such as “increase”, “decrease” and “jump” 26 are mapped to (and hence are a kind of) Change_position_on_a_scale. The word “increase” itself has several more specific children (hy- ponyms) in WordNet; such as “jump”, “accrue” and ”snowball”. All these words are a kind of “increase” which itself is a kind of Change_position_on_a_scale. On the other hand, the frame Proliferate_in_number inherits from (and hence, is a kind of) Change_position_on_a_scale. This frame has several words mapped to it: “dwindle”, “multiply” and “proliferate” are mapped to (and hence are a kind-of) Proliferate_in_number which itself is a kind of Change_position_on_a_scale. According to these observations, if we manually axiomatize the frame Change_position_on_a_scale in terms of the predicates in the core theories, we should 26 In our examples, we may use the terms words and synsets interchangeably. 41 be able to adopt the same axioms for all the mentioned words that are directly or indirectly derived from it. Note that in the above example, the derived concept were all related to the basic concept through is-a relation chains and therefore simply inherited the parent concept’s axioms. For other relations such as causation and antonymy, derived concept modifies the basic concept’s axiom by adding such predicates as “CAUSE” and “NOT”. We evaluated this approach on the domain of Composite Entities and its neighbor domain, Sets. We have identified 21 basic concepts related to these domains and manually axiomatized them in terms of predicates in our core theories. Using WordNet and FrameNet concept relations, we could identify and automatically axiomatize 486 derived synsets. Our evaluation of the auto- matically-generated axioms showed an average precision of 0.89 without considering the syn- tactic structures and an average precision of 0.78 if syntactic structures are considered. We dedicated a separate chapter to evaluation of usefulness of our axioms. This chapter is organized as follows: first we present a short introduction to FrameNet, Word- Net, the mapping between them and their axiomatization. Then we show how WordNet’s and FrameNet’s concept relations can be used to identify and axiomatize derived concepts. Next we show how to identify the minimal set of basic concepts (i.e., synsets and frames) for a given domain. Then we explain how we manually axiomatize these basic concepts. We then show how to automatically axiomatize more synsets and frames by chaining the manually encoded axioms with axioms for frame-frame, synset-frame and synset-synset relations. Finally, we present a quantitative and qualitative evaluation of this approach by applying it on the domains of composite entities and sets. 1. Background 1.1 FrameNet FrameNet (Ruppenhofer et al. 2010) is an electronic lexical semantics resource based on Fill- more’s frame semantics (Fillmore, 1976). A frame can be viewed as a conceptual structure that 42 describes an event, relation, or object and the participants in it. The participants of a frame are called roles which are divided into core and non-core roles. For example, the frame Giving has the core roles Donor, Recipient and Theme and such non-core roles as Circumstances, Manner, Means, etc. According to FrameNet, the sentence "John sold a car to Mary" essentially describes the same basic situation (frame) as "Mary bought a car from John", just from a different perspective 27 . Therefore, in FrameNet, words with similar semantics are mapped to the same frame. For ex- ample, “give”, “hand over”, “advance”, “donate” and even “charity” and “donor” are mapped to the Giving frame. FrameNet also provides information about syntactic realization patterns of frame elements. For example, the role Recipient in the frame Giving is most frequently filled by a noun phrase in the indirect object position or by a prepositional phrase with the preposition “to” as the head in the complement position. Entity Total Verb Noun Adjective Adverb Preposition 4605 4742 2122 167 143 Frame Frame Roles 1019 8884 Frame Relation Example Frequency Inheritance Using Sub-frame Perspective-on Precedence Causative-of Inchoative-of See also GIVING – COMMERSE_SELL OPERATE_VEHICLE – MOTION SENTENCING – CRIMINAL_PROCESS OPERATE_VEHICLE – USE_VEHICLE FALL_ASLEEP - SLEEP KILLING - DEATH COMING_TO_BE - EXISTENCE LIGHT_MOVEMENT – LOCATION_OF_LIGHT 617 490 117 99 79 48 16 41 (a) (b) Table 5-1: FrameNet1 statistics Coherence of meaning elements in FrameNet is realized through semantic relations between frames. For example, the Giving and Getting frames are connected by the causation rela- 27 E x a m p lesa n d d escr ip tio n ar e tak en f r o m Fra m eNe t ‟ s W i k i p ed ia p ag e. 43 tion. Roles of the connected frames are also linked, e.g., Donor in Giving is linked to Source in Getting. Table 5-1 (a, b) shows statistics about FrameNet1.5, including number of words mapped to frames, number of frames and number of roles (a); as well as the frequencies and examples for each type of frame relation (b). 1.1.1 Axiomatization of FrameNet (Ovchinnikova 2012; Ovchinnikova et al. 2013) have converted FrameNet’s syntactic patterns and frame relations into axioms. In their framework, a frame F with the roles r 1 , r 2 , r 3 , ..., r n is presented with the predication F’(e,r 1 ,r 2 ,r 3 ,..., r n ), where each role is assigned to a fixed argument position and the first argument is a handle for the predication. For example, the frame Giving with the roles Donor, Recipient, Theme is represented as Giving’(e,d,r,t), where argument positions of d, r and t correspond to the roles Donor, Recipient and Theme respectively. We call the axioms capturing syntactic patterns, syntactic axioms. An example of a syntactic axiom that maps a construction like “x1 gave x2 to x3” to the frame Giving is: give-vb'(e0,x1,u,x2) & to-in'(e1,e0,x3) ⟶ Giving'(e0,x1,x3,x2) The right-hand-side of this axiom corresponds to the logical representation (logical form) of “x1 gave x2 to x3”. This axiom, maps the arguments x1, x2 and x3 to the roles Donor, Theme and Recipient respectively. We call the axioms capturing frame relations frame-relation axioms. An example of a frame relation axiom that captures the Causative-of relation between the frames Giving and Getting is: Giving’ (e0,x1,x2,x3) → CAUSE’(e2,e0,e1) & Getting’(e1,x2,x3,x1) This axiom says that the Giving event causes a Getting event, where the donor (x1), recipient (x2) and theme (x3) of the first event are equal to the source (x1), recipient (x2) and theme (x3) of the second event respectively. 44 1.2 WordNet WordNet (Miller et.al, 1990) is the most widely used lexical semantic resource in the NLP com- munity, mostly due to its large lexical coverage and variety of semantic relations. WordNet has grouped synonym words into synsets - which can be viewed as concepts – and established lexi- cal-semantic relations between these synsets with the main focus being on such paradigmatic relations as synonymy, hypernymy, and meronymy. We represent a synset SYN by the tuple (WS 1 , WS 2 , …, WS n ) where each WS i is a word sense that belongs to the synset. For example, the synset (incorporate-2, contain-1, comprise-2) consists of the second sense of the verb “incorporate”, the first sense of the verb “contain” and the second sense of the verb “comprise”. In this work, we used version 2.0 of WordNet. 1.3 WordNet-FrameNet Mapping FrameNet provides mappings between words and frames. For example, the noun “part” is mapped to the frames Part_whole, Membership and Performers_and_roles; and the verb “part” is mapped to the frame Separating. The problem with this mapping is that the words are not disambiguated into the right senses. Fortunately the MapNet project 28 (Tonelli and Pighin, 2009) has solved this problem by mapping WordNet 1.6’s synsets to FrameNet 1.3’s frames using a supervised learning algorithm. We use version 0.1 of MapNet, which is the only available version at the time of writing this thesis. We used WordNet’s version mapping data to convert MapNet’s synset ids from version 1.6 to version 2.0. 1.4 Disambiguating and Extending Word-Frame Mapping and Syntactic Axioms FrameNet’s word-frame mappings and syntactic realization patterns (and consequently the syn- tactic axioms) are sparse. We extend the word-frame mappings and syntactic axioms using the 28 http://danielepighin.net/cms/research/MapNet 45 above-mentioned synset-frame mappings. In addition, we disambiguate words in word-frame mappings and syntactic axioms by assigning sense numbers to them. To disambiguate a word-frame mapping W ⟶ F, we check the synset-frame mappings to see whether any of the synsets that W appears in them are mapped to F. If W appears in synset SYN with the sense WS and SYN is mapped to F, for all the word senses WSi in SYN, we create a dis- ambiguated mapping from WSi to F. Similarly, for a given syntactic axiom AX for frame F and word W such as: W'(e0,x1,u,x2) & P’(e1,e0,x3) ⟶ F’(e0,x1,x3,x2) If W appears in synset SYN with the sense WS and SYN is mapped to F, we create a disambi- guated copy of AX for all the word senses WSi in SYN by replacing W with WSi: WS i'(e0,x1,u,x2) & P’(e1,e0,x3) ⟶ F’(e0,x1,x3,x2) 2. Using FrameNet and WordNet to Identify and Axiomatize Derived Con- cepts As we explained in the beginning of this chapter, we would like to limit manual axiomatization to the most basic and general concepts and then use concept relations in WordNet and Frame- Net to identify and axiomatize concepts that can be (directly or indirectly) defined in terms of these basic concepts. In this section, we show how we can use synset-frame mappings, frame- frame relations and synset-synset relations in WordNet to identify pairs of (C1, C2) concepts where C1 can be axiomatized in terms of C2 (i.e., C1 is derived from C2). This information will be used later for both identifying the basic concepts (that should be manually axiomatized) and for automatic axiomatization of additional concepts (by chaining WordNet and FrameNet axioms with the manually-created axioms.) 2.1 Word-Frame Mappings and Syntactic Axioms If a synset SYN that consists of the word senses (WS 1 , WS 2 , ..., WS n ) is mapped to a frame F, then SYN and all word senses in it are derived from F and can be axiomatized in terms of F: 46 SYN’(e,e0) ⟶ F’(e,x1,x2,…, xn) WS i’(e,e0) ⟶ F’(e,x1,x2,…, xn) In addition, if syntactic axioms are available for any of the word senses WS i in SYN, we will have more detailed axioms that construct argument structures as well. For example: WS i'(e0,x1,u,x2) & P’(e1,e0,x3) ⟶ F’(e0,x1,x3,x2) 2.2 Frame Relations We use the following frame relations in FrameNet to identify and axiomatize derived frames: Inheritance, Causative-of, Inchoative-of and Perspective-on. Table 5-2 shows, for each type of relation between two frames (F1, F2), which frame is derived and which one is the base, an ex- ample for (F1, F2) and how F1 is axiomatized in terms of F2. Note that the bi-directional Pers- pective-on relation does not specify which frame is derived from the other and yields a bi- directional axiom. The axioms are the same as the frame-relation axioms described in 1.4. Pre- dicates CAUSE and CHANGE_TO in these axioms, belong to the core theories of causality and change-of state respectively. As mentioned earlier, FrameNet provides mappings between the roles of related frames and these mappings are reflected in the identity of arguments (representing roles) of the predicates (representing frames) in the left-hand-side and the right- hand-side of these axioms. Relation Example Example’s Axiom Inheritance (derived, base) (Supply, Giving) Supply(e0,y0,y1,y2,...,y13) ⟶ Giving(e0,y0, y1,...,y10,x11,x12) Causative-of (derived, base) (Separating, Becoming_separated) Separating(e0,y0,y1,y2,y3,...,x15) ⟶ CAUSE’(e3,e0,e1) & Becoming_separated(e1,y0,y1,y2,y3 ,...) Inchoative-of (derived, base) (Come_together, Aggregate) Come_together’(e0,y0,y1,x2,x3,...,x13) ⟶ CHANGE_TO’(e0,e1) & Aggre- gate’(e1,y0,y1,...,y5) Perspective-on (F1,F2) (Giving, Transfer) (Transfer, Giving) Giving’(e0,y0,y1,y2,y3,y4,y5,...,y12) ⟷ Transfer(e0,y1,y2,y0,y3,x4,y5,x6,...,x9) Table 5-2: Axiomatizing derived frames through frame relations. Bold predications are from core theories. 47 2.3 Synset Relations We use hyponymy, entailment and causation relations between Synsets in WordNet to identify and axiomatize derived synsets. Table 5-3 shows, for each type of relation between two synsets (S1, S2), which one is the base and which one is derived (for hypernymy, causation and entail- ment only) and how one synset is axiomatized in terms of the other 29 . For antonymy, however, one cannot decide which synset is derived and which synset is the base and it is possible to axi- omatize each of them in terms of the other. Relation Example Axiom Hyponymy (derived, base) (subset, set) derived’(e,e0) ⟶ base’(e,e1) Entailment (derived, base) (weld, combine) derived’(e,e0) ⟶ base’(e,e1) Causation (derived, base) (divide, separate) derived’(e,e0) ⟶ CAUSE’(e1, e,e2) & base ‘(e2,…) Antonymy (S1,S2) (separate, attached) S1’ (e1,e0) ⟶ NOT’(e4,e1,e2) & S2’(e2,e3) S2’(e1,e0) ⟶ NOT’(e4,e1,e2) & S1’ (e2,e3) Table 5-3: Axiomatizing derived words through WordNet relations. Bold predications are from core theories. Note that unlike frame relations, synset relations don’t capture any argument structure map- pings. 3. Identifying the Minimal Set of Basic Concepts Recall that our main goal is reducing the manual axiomatization work by identifying the minimal set of basic concepts, in terms of which other concepts can be automatically axiomatized. In the 29 For readability, synsets are shown by one of their representative words instead of their synset ID or all participat- ing word senses. 48 previous section, we showed how we can use various types of relations (i.e., synset-frame, frame-frame and synset-synset) between two concepts to identify the concept C1 that is de- rived from the other concept C2 (and axiomatize C1 in terms of C2). In this section, we use this information to identify the minimal set of basic concepts that we need to axiomatize manually. For coherence, we focus on one domain at a time. If we had all the concepts (i.e., synsets and frames) related to a given domain, then we could run a simple algorithm to find the relation graph of these concepts and return the terminal nodes (concepts not derived from other con- cepts) as the set of basic concepts that should be axiomatized manually. Since there is no book of concepts, we need a methodology to identify the synsets and frames associated with a given domain. We are especially interested in capturing as many basic concepts as possible. We propose a method in which we start with central examples of words in the domain and find new word senses and frames using FrameNet’s word-frame mappings, frame names and role names; as well as WordNet’s synsets, glosses, and synset relations. We do this step-by-step and in a supervised manner. The result of this procedure, which we call expansion, is a set of word senses and a set of frames related to the domain, which we call inDomainSynsets and inDomain- Frames respectively. These two sets will hopefully cover most (if not all) of the basic domain concepts. The next step, which we call contraction, is to identify the set of concepts in inDomainSynsets and inDomainFrames that are not derived from any other concept in these two sets. As we al- ready mentioned, we can easily do this by finding the relation graph for concepts in inDomain- Synsets and inDomainFrames using the relations discussed in section 2. In the following, we describe expansion and contraction in more detail. 3.1 Expansion The expansion procedure is shown in Figure 5-3. Given a domain D, we start with a list of seed words that are clearly related to it. For example, for the domain of Composite Entities, we start with words such as “combine”, “compose”, ”part”, “composite”, “split”, “divide”, etc. We call 49 this set of words wordList1. We also create two initially empty sets, inDomainWordSenses and inDomainFrames, and populate them using the following procedure: 1. Disambiguate and expand wordList1 using WordNet: For each of the words W1 in wordList1, manually tag it with its Part of Speech and sense numbers that are relevant to the domain. Then find new words using the following relations (a-d), inspect them and add only the gen- eral domain-related words 30 to wordList1. Repeat these steps recursively for the newly intro- duced word senses until no more word senses are added to wordList1. Then add word senses in wordList1 to inDomainWordSenses. a. Synonyms and other word forms (e.g., “combine” and “combination”) of W1. b. Hypernyms, entailments, causes, meronyms and antonyms of W1. For example, for W1=“include”, we get the antonym “exclude” and the entailment “have”. We ex- clude any word sense that is the hyponym (i.e., specialization) of previous word senses (or synsets). c. Words W2 in whose definition W1 appears. Each W2 should meet these conditions: 1) W1 should be among the first 4 words of W2’s gloss; and 2) W2 should have oc- curred with high frequency (above 50) in the entire WordNet’s glosses. For example, given W1=”compose”, W2= “constitute” meets both conditions; as it has “compose” within the first 4 words of its gloss (“form or compose”) and it has occurred in 560 different WordNet glosses. d. Words W2 that occur in W1’s gloss. For example, for W1=”combine”, we get the words “whole” and “mix” which occur in its definition “combine so as to form a whole; mix”. 2. Find frames (FSet) related to wordList1 and expand them using FN relations: a. Find all the frames F i for which 1) a word in wordList1 is mapped to F i or 2) a word in wordList1 appears in F i ’s name, or 3) a word in wordList1 appears in the name of one of F i ’s core roles. Then inspect these frames and add those related to the domain 31 to FSet. Add frames in FSet to inDomainFrames. 30 Fo r ex a m p le, t h e w o r d “so cie t y ” i s to o s p ec i f ic f o r th e d o m ain o f co m p o s ite en t ities , w h i l e “g r o u p ” i s ab s tr ac t a n d is applicable to many specific domains. 31 Fo r ex a m p le, th e f r a m e Op e r atin g _ a_ s y s te m ( w h ic h h as t h e w o r d “s y s te m ” f r o m w o r d L is t1 in its n a me), is not relevant to the domain of composite entities, as opposed to Cause_to_be_included which belongs to this domain. 50 b. Expand FSet by recursively adding frames that are in a relation with frames in FSet. Filter these new frames and add the domain-relevant ones to inDomainFrames. 3. Find new words from frame and role names: create a new wordList1 containing any new general domain-related words appearing in frame names and role names. 4. Repeat steps 1-4 until no more word senses or frames are introduced to inDomainWord- Senses and inDomainFrames. The above procedure results in 2 sets of concepts: inDomainWordSenses and inDomainFrames which should contain most of the concepts related to the domain. We convert inDomainWord- Senses into inDomainSynsets by taking the synsets that the word senses in inDomainWord- Senses belong to and use this set henceforth. Figure 5-3:The expansion procedure for finding basic word senses and frames. The next step is to identify the set of concepts in inDomainSynsets and inDomainFrames that are not derived from any other concept in these two sets. We call this procedure contraction. 51 3.2 Contraction The concepts in inDomainSynsets and inDomainFrames are not necessarily minimal for axioma- tization purposes: Many of them may be derivable from others through the relations discussed in section 2. If we eliminate such derived concepts from inDomainSynsets and inDomainFrames, what remains will be the minimal set (at least according to the identified concepts and rela- tions) of frames and synsets that need to be axiomatized manually. Figure 5-4 shows how we identify basic frames and synsets using these relations. Starting with frame relations (on the right), we find all the frames in inDomainFrames that are not derived from any other frames. We call this set of frames BasicFrames. Then we find all the frames that can be derived from BasicFrames (with any number of hops and not restricted to inDomainFrames) and call them DerivedFrames. We then use the synset-frame mappings and identify all the synsets (not re- stricted to inDomainSynsets) that are mapped to any frame in either BasicFrames or Derived- Frames and call them DerivedSynsets1. Then we use synset relations to identify all the synsets (with any number of hops and not restricted to inDomainSynsets) that are derived from synsets in DerivedSynsets1. We call them DerivedSynsets2. Now if we subtract inDomainSynsets from DerivedSynsets1 and DerivedSynsets2, the remaining synsets (call them inDomainSynsets1) have no link to BasicFrames. Again using synset relations, we identify the following subsets of inDo- mainSynsets1: BasicSynsets which are synsets in inDomainSynsets1 that are not derived from any other synset in it and DerivedSynsets3, which are synsets in inDomainSynsets1 and derived from BasicSynsets. The concepts that should be axiomatized manually are frames in BasicFrames and synsets in BasicSynsets. Our manual axioms for BasicFrames can be used to axiomatize DerivedFrames, DerivedSynsets1 and DerivedSynsets2 and the manual axioms for BasicSynsets can be used to axiomatize DerivedSynsets3 and any other synsets derived from DerivedSynsets3. In the next section, we show to axiomatize the basic concepts and in section 5 we show how we can auto- matically axiomatize synsets and frames derived from BasicFrames and BasicSynsets. 52 Figure 5-4: Identifying basic frames and basic synsets. 4. Manual Axiomatization of Basic Frames and Synsets Manual axiomatization of basic frames is straightforward, as we already know the semantics of each frame and its roles from FrameNet’s description. For axiomatizing a basic synset SYN, we first need to map it to a predication with an argument structure. If we can find a frame F in FrameNet that captures the meaning of SYN, we create a mapping from SYN to F and create a syntactic axiom for constructing the argument structure of F. On the other hand, if SYN cannot be mapped to any frames in FrameNet, we create a predicate-argument structure Pred’(e0,x1,x2,…,xn) with a unique predicate name and create a mapping between SYN and Pred as well as a syntactic axiom to construct the argument structure of Pred. We call the set of manually-created predication names ManualBasicPreds. Once we map a basic synset to a frame or a manually-crafted predication, we axiomatize that frame or predication in terms of the predications in the core theories. Note that if we have mapped the synset to a frame, we might already have axiom(s) for this frame and there is no need to axiomatize it again. 53 We will see examples of manual axiomatization of frames and manually-crafted predications later in sections 6.4 and 6.5. 5. Automatic Axiomatization of Derived Concepts in Terms of Core Theory Predicates We have shown in section 2 how to axiomatize a derived concept in terms of another one, ac- cording to the relation between them. Let D denote derived concepts and B denote basic con- cepts (axiomatized manually). If D1 is axiomatized in terms of D2, D2 is axiomatized in terms of D3, D3 is axiomatized in terms of B1, and B1 is manually axiomatized in terms of C1, C2,...,Cn which are predicates in the core theory, then we can chain these axioms to get a new axiom that anchors D1 to C1...Cn. More specifically, suppose that the above mentioned concepts have the following axioms: AX1] D1 ⟶ R1 & D2 AX2] D2 ⟶ R2 & D3 Ax3] D3 ⟶ R3 & B1 Ax4] B1 ⟶ C1 & C2 & ... & Cn Chaining axioms Ax1-Ax4 results in the following axiom for D1: Ax5] D1 ⟶ R1 & R2 & R3 & C1 & C2 & ... & Cn In the above axioms, R1-R3 are predications such as CAUSE’(e,e0,e1), CHANGE_TO’ (e,e0) and NOT’(e,e0) that capture the frame-frame or synset-synset relationships between the concepts in the left-hand-side and the right-hand-side of the axiom. (See Table 5-2 and Table 5-3). Note that there might be several other chains for D1, such as (D1 ⟶D4 ⟶B1), (D1 ⟶D4 ⟶B2); and as such, D1 can get several axioms (some of which are identical and some are different.) We use a backward search to identify and axiomatize all possible derived synsets in terms of predicates in the core theories. The schema of this procedure is shown in Figure 5-5. We start with frames in BasicFrames and back-chain on frame-relation axioms (Section 2.2) to find and axiomatize all possible derived frames, DerivedFrames. We merge BasicFrames, DerivedFrames, 54 and ManualBasicPreds into the set AxiomatizadConcepts and find synsets that are mapped to concepts in this set (through either synset-frame mappings or through our manually-created mapping between synsets and manually-crafted predications.) We call this set DerivedSynsets1. Figure 5-5: The schema of the automatic axiomatization procedure. Next we use synset-synset relations in WordNet and back-chain on synset-relation axioms (Sec- tion 2.3) to find and axiomatize the second set of derived synsets, DerivedSynsets2. Note that the automatically-generated axioms for synsets or word senses have the format: SYN’(e, e 0) ⟶ C 1’ (e,...e 1)& C 2’(e 1,...,e 2) & … & C n’(e m,....) WS’(e, e 0) ⟶ C 1’ (e,...e 1)& C 2’(e 1,...,e 2) & … & C n’(e m,....) If syntactic axioms are available for word senses in DerivedSynsets1, we leverage these axioms when back-chaining from frames to these synsets, to construct argument-rich axioms such as: WS'(e0,x 1,u,x 2) & P’(e 1,e 0,x 3) & … ⟶ C1’ (e,.x 1, x 2,..e 1)& C 2’(e 1,.x 3..,e 2) & … & C n’(e m,..x n,...) 55 In addition, we can use the syntactic axioms for mapping a word sense WS to frame F, for all the hyponyms of WS in DerivedSynsets2 that are also derived from F. We only have to modify the syntactic axiom by replacing WS with word senses in the hyponym’s synset. 6. Evaluation In this section we present a quantitative and qualitative evaluation of our mixed approach to deep lexical semantics by applying it on the domain of composite entities and sets. First, we find BasicFrames and BasicSynsets in this domain (using expansion and contraction) and ma- nually axiomatize them. Then we automatically axiomatize additional synsets and frames. We show that by manually axiomatizing only 21 basic concepts, we obtain axioms for 486 derived synsets with an average precision of 0.89 without considering the syntactic structures and an average precision of 0.78 if syntactic structures are considered. 6.1 Theories of Composite Entities and Sets According to the theory of composite entities 32 , a composite entity is a thing composed of oth- er things. The concept is general enough to include complex physical objects (e.g., a telephone), complex events (e.g., the process of erosion) and complex information structures (e.g., a theory). A composite entity is characterized by a set of components, a set of properties, and a set of relations. Predicates that are defined in this theory are COMPOSITE_ENTITY’(e,x) which simply says x is a composite entity; COMPONENTS_OF’(e,s,x) which says that s is the set of x‘s components; and COMPONENT_OF’(e,y,x) which says that x is a component of the composite entity y. The relations between these concepts are captured in several axioms. For example the following axiom defines the relationship between the above predicates: COMPONENT_OF’(e, y, x) ↔ COMPOSITE_ENTITY’(e0,x) & COMPONENTS_OF’(e1,s, x) & MEMBER’(e,y, s) 32 http://www.isi.edu/~hobbs/bgt-composite-entities.text 56 The above axiom says that y is a component of x if and only if x is a composite entity, s is the set of components of x and y is a member of s. In this axiom, the predication MEM- BER’(e,y,s) comes from the core theory of sets. Another predication in this theory is AT’(e,x,y,z)for the figure-ground relation which places an external entity x “at” some component y in a composite entity z. This predication can ab- stractly describe a wide variety of situations including “John is at the back of the store” (spatial location), “Nuance closed at 57” (location on a scale), “John is at a competing compa- ny”(membership in an organization), etc. We used this predicate only for axiomatizing the con- cept “removing”. Two other predications that we will use in axiomatizing basic concepts are REL1’(e,s) and EXTERNAL_TO’(e,x,y). The predication REL1’(e,s)says that all the elements in the set s are related to each other. The predication EXTERNAL_TO’(e,x,y) says that entity y is exter- nal to the composite entity x, in that neither it nor any of its components is equal to x or one of x's components. An important characteristic of the COMPONENT_OF predication is transitivity, which can be represented by the following axiom: COMPONENT_OF’(e, x, y) & COMPONENT_OF’(e, y, z) ⟶ COMPONENT_OF’(e, x, z) This axiom says that if x is a component of y and y is a component of z, then x is a component of z. There is a very close relationship between composite entities and sets 33 . In fact,we consider sets as composite entities that have distinct elements with no relation between them. Impor- tant predications in the set theory that we use in this work are SET, SUBSET and MEMBER. These predications can be associated with the predications COMPONENT_OF and COMPO- NENTS_OF in the theory of composite entities using the following axioms: 33 http://www.isi.edu/~hobbs/bgt-settheory.text 57 SUBSET’(e, x,y) ⟶ COMPONENTS_OF ‘(e0,x,y) MEMBER’(e,x,y) ⟶ COMPONENT_OF ‘(e0,x,y) 6.2 Assumptions We have made the following assumptions when deciding whether a concept belongs to the core theory of composite entities or sets; as well as for axiomatizing the basic concepts: We assume that sets, amalgams and systems are subtypes of composite-entity. o In sets, components are distinct and there are no relations between them. o In amalgams, components are no longer identifiable and distinct. o In systems, elements are identifiable and distinct and there are certain relations be- tween them. Concepts such as attaching, detaching, connection, separation, etc that focus on establishing or abandoning a relation between two entities are not specific to the theory of composite ent- ities. We therefore ignore them. Since our first order axioms don’t have the power necessary for explicating concepts like completeness (i.e., if s is the set of necessary parts of y, then all elements of s are in y), we ig- nore such concepts. 6.3 Identifying Basic Concepts in the Domains of Composite Entities and Sets 6.3.1 Expansion Table 5-4 shows the seed words we used and random examples of the synsets and frames that resulted from the expansion procedure (i.e., inDomainSynsets and inDomainFrames); while Ta- ble 5-5 shows detailed statistics about the expansion procedure. Each cell in this table consists of two numbers separated by a slash. The first number shows how many new concepts are in- troduced in a particular step (via a particular type of relation) and the second number shows 58 how many of these new concepts were judged as being domain-relevant and were chosen for the next step 34 . Each step of the expansion algorithm is assigned a row in Table 5-5 and is marked by the itera- tion number and the type of operation performed at that step (column 1). Columns 2-6 in this table correspond to different types of relations used to find the new concepts: synset-synset relations, synset-frame or word-frame mappings, frame-frame relations, is-in-gloss (i.e., the new synsets appeared in the gloss of previous synsets) and has-in-gloss (i.e., the new synsets, have synsets from the previous level in their gloss). The last two columns represent total syn- sets and total word senses (across all relations) respectively. Seed Words combine, complex, component, compose, composite, divide, group, include, member, mix, part, remove, separate, set, system, whole Sample Synsets from inDomainSynsets (set), (part, portion), (component, constituent, element), (mix, mingle, commix, unify, amalgamate), (separate, disunite, divide, part), (composite), (incorporate, contain, comprise), (division2, partition, partitioning, segmentation, sectionaliza- tion, sectionalisation), (separate, divide), (whole), (mix, mix_in), (append, add_on, supplement, affix), (part, portion, component_part, component), (consist, com- prise), (composition), (aggregate, aggregated, aggregative, mass), (break_up, fragment, fragmentize, fragmentise), (compose), (part, section, division), Sample Frames from inDomainFrames Set_relation, Being_included, Building_subparts, Becoming_a_member, Shaped_part, Fragmentation_scenario, Wholes_and_parts, Connect- ing_architecture, Cause_to_amalgamate, Aggregate, Cause_to_fragment, Break- ing_off, Part_orientational, Be_subset_of, Breaking_apart, Removing Table 5-4: Seed words and sample synstes and frames from inDomainSynsets and inDomainFrames Starting with 16 seed words (presented in the first row of Table 5-4), in the first step we disam- biguated these words into 169 word senses corresponding to 157 distinct synsets. From these, 44 word senses (corresponding to 38 synsets) were domain-relevant and we kept them for the next step. In steps 2 we used WordNet’s synset relations and glosses to find 375 new word senses (corres- ponding to 138 new synsets). After filtering domain-irrelevant cases, we were left with 50 word senses (23 synsets). In step 3, we repeated the same procedure (i.e., using WordNet relations 34 Judgment was done by the author. 59 and glosses) and found 183 new word senses (77 new synsets) out of which 10 word senses (4 synsets) were domain relevant. Repeating this procedure didn’t result in any new synsets and we moved on to finding frames. In step 4, we used synset-frame and word-frame mappings as well as frame names and role names to find frames related to all the synsets and words we have identified as domain relevant till now. We got 208 frames, from which 43 were domain-relevant. In step 5, we expanded the frame set using frame-frame relations to get 25 new frames, of which only 2 were domain-relevant. Repeating this procedure didn’t introduce any additional frames, so we moved back to finding new words. Step-Operation Syn-Syn Syn-Fr Fr-Fr Is-In-Gloss Has-In-Gloss Total Synsets Total Word Senses 1 disambiguation 157/38 169/44 2 WN expansion1 23/9 67/16 61/7 138/23 375/50 3 WN expansion2 10/2 28/2 44/2 77/4 183/10 4 Mapping to Frames 208/43 5 Frame Expansion 25/2 6 New Words ⟶disambiguation 26/6 135/12 W → 40/6 WS 7 WN expansion3 3/1 6/1 8/2 15/2 Table 5-5: Number of new synsets inspected/selected in each step of expansion. Except for last column which represents number of word senses, all the numbers represent number of synsets. In step 6, we split frame names and role names (such as Cause_to_be_included or Contrast_set) to obtain 135 new words, from which 12 were domain-relevant. Just as we did in the beginning of this algorithm, we disambiguated these words by finding all the synsets they participate in and the corresponding senses. We got 40 word senses (corresponding to 26 synsets), from which 6 were domain-relevant (and they correspond to 6 different synsets). Finally, in step 7, we used WordNet relations again to find synsets related to the previous 6 syn- sets and got 15 new word senses (8 synsets) two of which were domain relevant. Repeating this 60 procedure didn’t yield any new domain relevant synsets and we exit the expansion algorithm with a total of 73 synsets and 45 frames, examples of which were presented in Table 5-4 . We found no specific synset or frame for the concept of “removing a component from a whole”. Almost all of the words and frames that capture this concept (e.g., the words “remove”, “with- draw” and “eliminate” and the frames Removing, Removing_scenario and Emptying) are also used to describe such concepts as “leaving a location”. Since we have no other choice to capture this rather important concept, we considered these words and frames as concepts in the domains of composite entities and sets. 6.3.2 Contraction After running the contraction algorithm and inspecting the results, we noticed that many syn- sets that are quite related to inDomainFrames are not mapped to these frames. This is because our synset-frame mappings data was sparse. We therefore, extended these synset_frame map- pings in the following way: if a word w1 is mapped to frame F (in FrameNet’s original word- frame mapping) and w1 occurs in synset SYN, we create a mapping from SYN to F. The downside of using this extended synset_frame mappings, is that it might not be accurate enough. There- fore, in the contraction algorithm, we keep track of the frames that each synset is mapped to and at the end, manually check and correct these mappings 35 . BasicSynsets (append, add_on, supplement, affix), (built-in, constitutional, …), (compose), (compose, compile), (constitution, composition, makeup), (disconnected, disunited, fragmented, split), (division), (exclude), (form, constitute, make), (remove, take, take_away, withdraw), (aggregate, aggregated, aggregative, mass), (aggregate), (amalgamate, amalgamated, coalesced, consolidated, fused), (constituent, constitutional, constitutive, organic), (detached, sepa- rated), (divided), (inclusion, comprehension), (joint), (separate) BasicFrames Aggregate, Amalgamation, Becoming_separated, Being_included, Break- ing_apart, Building, Cause_to_be_included, Creating, Cutting, Ex- clude_member, Inclusion, Ingredients, Membership, Part_piece, Part_whole, Removing, Set_of_interrelated_entities Table 5-6: BasicSynsets and BasicFrames. 35 Of course this step is not needed if richer synset-frame mappings are available. 61 We found 8 erroneous synset-frame mappings one of which was originally in FrameNet (map- ping the verb “exclude” to the frame Inclusion), one error originated from the synset-frame mappings (bunch-up, clump, cluster) to the frame inclusion; and 6 were from the mapping extension (e.g., mapping the synset (form, constitute, make) to the frame Creating). After correcting the synset-frame mappings, we ran the contraction algorithm again which resulted in 19 basic synsets and 17 basic frames (i.e., BasicSynsets and BasicFrames), which are shown in Table 5-6. 6.4 Manual Axiomatization of BasicFrames Our manually-encoded axioms for the basic frames are shown in Table 5-7. In the following, we explain some of the axioms in this table. As the first example, consider the frame Amalgama- tion which is defined in FrameNet as: “These words refer to Parts merging to form a Whole. (The Parts may also be encoded as Part_1 and Part_2.) There is a symmetrical relationship between the components that undergo the process, and afterwards the Parts are consumed and are no longer distinct entities that are easily discernable or separable in the Whole.” We cannot capture all the information in this definition (e.g., that after amalgamation, the parts are no longer distinct entities), however, as far as our predications in the core theory of compo- site entities are concerned, we can axiomatize the frame Amalgamation with the main roles parts, prt1 and prt2 as: (1) Amalgamation'(e0,parts,whole,prt1,prt2,y4,y5,y6,y7,y8,y9)⟶CHANGE_TO'(e0,e300) & COMPONENTS_OF'(e300,parts,whole) (2) Amalgamation'(e0,parts,whole,prt1,prt2,y4,y5,y6,y7,y8,y9)⟶CHANGE_TO'(e0,e300) & COMPONENT_OF'(e300,prt1,whole) & COMPONENT_OF'(e300,prt2,whole) The first axiom says that an amalgamation of parts into whole is a change to parts being the components of whole. The second axiom says that an amalgamation of prt1 and prt2 into whole, is a change to each of them being a component of whole. 62 Frame Axioms Aggregate Aggregate'(e0,indiv,aggr,y2,y3,y4,y5) ⟶ COMPONENTS_OF'(e0,indiv,aggr) Amalgamation Amalgamation'(e0,parts,whole,prt1,prt2,y4,y5,y6,y7,y8,y9) ⟶ CHANGE_TO'(e0,e300) & COMPONENTS_OF'(e300,parts,whole) Amalgamation'(e0,parts,whole,prt1,prt2,y4,y5,y6,y7,y8,y9) ⟶ CHANGE_TO'(e0,e300) & COMPONENT_OF'(e300,prt1,whole) & COMPO- NENT_OF'(e300,prt2,whole) Becoming_separated Becoming_separated'(e0,whole,parts,prt1,prt2,y4,y5,y6,y7,x1,y9,y10,y11) ⟶ CHANGE_FROM'(e0,e2) & COMPONENTS_OF'(e2,parts,whole) Being_included Being_included'(e0,whole,part) ⟶COMPONENT_OF'(e0,part,whole) Breaking_apart Breaking_apart'(e0,whole,pieces,y2,y3,y4,y5,y6) ⟶ CHANGE_FROM'(e0,e1) & REL1'(e1,pieces) Breaking_apart'(e0,whole,pieces,y2,y3,y4,y5,y6) ⟶ CHANGE_FROM'(e0,e1) & COMPONENTS_OF'(e1,pieces,whole) Building Building'(e0,agent,…, y5,components,created_entity,result,y9,y10,…,y15) ⟶ CAUSE'(e0,agent,e1) & CHANGE_TO'(e1,e2) & COMPONENTS_OF'(e2,components,created_entity) Cause_to_be_included Cause_to_be_included'(e0,agent,existingMember,newMember,group,cause,… ) ⟶ CAUSE'(e0,agent,e1) & CHANGE_TO'(e1,e2) & MEMBER'(e2,newMember,group) & MEMBER'(e3,existingMember,group) Creating Creating'(e0,components,created_entity,y2…,cause,y8,y9,creator,y11,…) ⟶ CAUSE'(e0,cause,e1) & CHANGE_TO'(e1,e2) & COMPONENTS_OF'(e2,components,created_entity) Cutting Cutting'(e0,agent,y1,y2,y3,y4,y5,item,y7,y8,pieces) ⟶ CAUSE'(e0,agent,e1) & CHANGE_FROM'(e1,e2) & COMPONENTS_OF'(e2,pieces,item) Exclude_member Exclude_member'(e0,member,group,authority,y3,…) ⟶ CAUSE'(e0,authority,e1) & CHANGE_FROM'(e1,e2) & MEM- BER'(e2,member,group) Inclusion Inclusion'(e0,total,part,y2,y3,y4,y5,y6,y7,y8,y9) ⟶ COMPONENT_OF'(e0,part,total) Ingredients Ingredients'(e0,material,y1,y2,y3,y4,product) ⟶ COMPONENTS_OF'(e0,material, product) Membership Membership'(e0,member,group,y2,y3,y4,y5) ⟶ MEMBER'(e0,member,group) Part_piece Part_piece'(e0,piece,y1,substance) ⟶ COMPONENT_OF'(e0,piece,substance) Part_whole Part_whole'(e0,part,whole,y2,x3) ⟶ COMPONENT_OF'(e0,part,whole) Removing Removing'(e0,agent,theme,source,y3,y4,y5,y6,y7,…) ⟶ CAUSE'(e0,agent,e1) & CHANGE_FROM'(e1,e2) & AT'(e2,theme,source) Set_of_interrelated_entities Set_of_interrelated_entities'(e1,components,complex,y3,y4) ⟶ COMPONENTS_OF'(e1,components,complex) Table 5-7: Manually created axioms for frames in BasicFrames. 63 As the second example, consider our axiom for the frame Removing which is defined in Fra- meNet as “An Agent causes a Theme to move away from a location, the Source”. We axioma- tized it as: (3) Removing'(e0,agent,theme,source,y3,y4,y5,...) ⟶ CAUSE'(e0,agent,e1) & CHANGE_FROM'(e1,e2) & AT'(e2,theme, p, source) This axiom says that the act of an agent removing theme from source is the agent’s causing a change from theme being at point p on the ground source. 6.5 Manual Axiomatization of BasicSynsets As explained earlier, in order to axiomatize a basic synset SYN, we either map it to a frame (and axiomatize that frame if it is not axiomatized already); or we create a new predicate-argument structure, axiomatize it and map SYN to it. In any case, we also need to manually create syntac- tic axioms for words in SYN. From the 19 synsets in BasicSynsets, 12 of them could be mapped to axiomatized frames. For the remaining 7 synsets, we created 4 new predications ComponentsOf, Disunited, Ex- clude and ExternalTo. Table 5-8 shows our axiomatization of these concepts and Table 5-9 shows our mapping of synsets to these predications and the corresponding syntactic axioms. Note that the total number of manual axioms we wrote for the basic synsets are 21 (for 17 frames and 4 additional predications). The predication ComponentsOf’(e,p,w) which we used to describe such words as “com- pose” (e.g., x is composed of y), “form” (e.g., x and y form z) and “constitution” (e.g., x is the constitution of y), simply represents the set of components (p) of a composite entity (w). We didn’t find an equivalent frame in FrameNet: The frames Ingredients, Amalgamation, Ag- gregate, Be_subset_of, Being_included, Membership, Part_piece and Part_whole are all specializations of this general concept. In fact, there is no mapping from the words “compose”, “form”, “constitute”, “make”, “constitution”, “composition” and “ma- keup” to any frames; and this indicates a missing concept. Note that we could bypass this in- 64 termediate synset-predication mapping and simply axiomatize the above-mentioned words (or synsets) in terms of the predication COMPONENTS_OF’(e,x,y) from the core theory. However, we used this intermediate step for uniformity: synsets are mapped to intermediate predicate- argument structures (frames or manual constructs) via a syntactic axiom and these predicate- argument structures are axiomatized in terms of the core-theory predicates. Manually-Crafted Predication Axiom ComponentsOf ComponentsOf'(e,p,w) ⟶ COMPONENTS_OF'(e0,p,w) Disunited Disunited'(e,p,w) ⟶ NOT'(e,e1) & REL1(e1,p) & COMPONENTS_OF'(e0,p,w) Exclude Exclude'(e,w,p) ⟶ NOT'(e,e0) & COMPONENT_OF'(e0,p,w) ExternalTo ExternalTo'(e0,w,x) ⟶ EXTERNAL_TO'(e0,w,x) Table 5-8: Axiomatization of manually-crafted predications. Synset Predication Syntactic Axiom(s) (compose)-v ComponentsOf ComponentsOf'(e,p,w) => W-vb'(e0,u1,u2,w) & of-in'(e1,e0,p) (form, constitute, make)-v ComponentsOf ComponentsOf'(e,p,w) => W-vb'(e0,p,u2,w) (constitution, composition, makeup)-n ComponentsOf ComponentsOf'(e,p,w) => W-nn'(e0,p) & of-in'(e1,p,w) (disconnected, disunited, fragmented, split)-a Disunited Disunited'(e,p,w) => W-adj'(e,w) Disunited'(e,p,w) => W-adj'(e,p) (divided)-a Disunited Disunited'(e,w) => W-adj'(e,w) Disunited'(e,w) => W-adj'(e,p) (exclude)-v Exclude Exclude'(e,w,p) => W-vb'(e,w,u,p) (separate)-a ExternalTo ExternalTo'(e0,w,p) => W-adj'(e0,p) Table 5-9: The mapping of basic synsets to manually-crafted predications and the corresponding syntactic axioms. The predication Disunited’(e,p,w) which we define as a lack of relation between compo- nents (p) of the whole (w) is used to describe such adjectives as “disconnected”, “disunited” and “divided”. Note that we have two syntactic axioms for adjectives mapped to this concept one of which assumes that the adjective describes the components (p) (as in “a league of disunited 65 nations”) and the other one assumes that the adjective describes the whole (w) (as in “a dis- united nation”). Also, note that we didn’t write syntactic axioms for all of the words in a synset. Instead, we used the placeholder W which can be instantiated by any word in the synset. If a word in the synset has a different syntactic behavior, we should create a specific axiom for it (this was not the case for synsets in Table 5-9). 6.6 Automatic Axiomatization of Derived Concepts Axioms for Derived Frames Using the backward searching and axiom chaining method described in Section 5, starting from the 17 manually axiomatized frames, we obtained 54 axioms for 44 derived frames. Sample frame relation chains and the resulting axioms are shown in rows 1-2 of Table 5-10. Relation Chain Resulting Axiom 1 Gathering_up —Causative-of— Come_together –Inchoative-of— Aggregate Gathering_up(e0,Aggregate,Agent,...,Individuals,...) ⟶ CAUSE'(e3,e0,e1) & CHANGE_TO'(e1,e2) & COMPO- NENTS_OF'(e2,Individuals,Aggregate) 2 Knot_creation_scenario – Perspective- on –Knot_creation – Inheritance— Intentionally_create— —Inheritance—Creating Knot_creation_scenario(e0,Agent,...,Rope,Knot,...) ⟶ CAUSE'(e0,c,e1) & CHANGE_TO'(e1,e2) & COMPO- NENTS_OF'(e2,u,Knot) 3 “arm-nn” –> Observable_body_parts –Inheritance— Shaped_part —Inheritance--Part_whole arm-nn'(e0,part) & nn'(e1,y2,part) ⟶ COMPONENT_OF'(e0,part,u) 4 “separate-vb” ⟶ Separating –Causative-of— Becoming_separated separate-vb'(e3,x4,x3,Whole) & into-in'(e1,e3,Parts) ⟶ CAUSE'(e4,e3,e0) & CHANGE_FROM'(e0,e2) & COMPO- NENTS_OF'(e2,Parts,Whole) 5 “vertex”—hyponym— “intersection”—hyponym—“point” —hyponym—“component” —hyponym—“part” ⟶ Part_whole vertex-nn'(e0,Part) & of-in'(e10,Part,Whole) ⟶ COMPO- NENT_OF'(e0,Part,Whole) 6 “disunite” —cause—“divide” ⟶ Se- parating –Causative-of— Becoming_separated disunite-v2’(e,e0) ⟶ CAUSE'(e3,e,e1) & CAUSE'(e2,e1,e2) & CHANGE_FROM'(e2,e3) & COMPONENTS_OF'(e3,x,y) Table 5-10: Sample relations between frames in the composite entities and sets domains. 66 Table 5-11 shows the precision and number of automatically-generated axioms for derived frames, categorized by the types of relations used for their derivation. All 54 axioms were judged by the author. Inheritance Perspective-on Causative-of Inchoative-of All #Axioms 35 16 7 4 54 Precision 0.76 0.77 1.0 1.0 0.8 Table 5-11: Precision of automatically-generated axioms for derived frames. An example of an incorrectly axiomatized frame is Knot_creation_scenario which, according to row 2 in Table 5-10, inherits (in several hops) from the frame Creating. The frame Creating is defined in FrameNet as “A Cause leads to the formation of a Created_entity” with the core roles being Created_entity and Creator and several non-core roles including the role Components. We had axiomatized Creating as: (AX1) Creating'(e0,Components,Created_entity ,…, Cause,..., Creator,...)⟶ CAUSE'(e3,e0,e1) & CHANGE_TO'(e1,e2) & COMPONENTS_OF'(e2, Components, Created_entity) On the other hand, Knot_creation_scenario, defined as “An Agent manipulates a long thin ob- ject (the Rope and creates a Knot.” has the roles Agent, Rope and Knot and is not about putting some components together to form a whole. Still it is a type of Creating and inherits from it, with the mapping of the role Knot to the role Created_entity in the parent frame, resulting in the incorrect axiom: (AX2) Knot_creation_scenario(e0,Agent,...,Rope,Knot,...) ⟶ CAUSE'(e0,cause,e1) & CHANGE_TO'(e1,e2) & COMPONENTS_OF'(e2,u,Knot) Which defines Knot_creation_scenario as causing a change to some u being components of the knot (or creating the knot out of some u). Note that we can identify such flawed axioms easily by checking whether any of the critical ar- guments (i.e., the part or the whole) in the right-hand-side of the axioms does not appear in the left-hand-side (as is the case with u in axiom AX2 above). We therefore eliminated the flawed 67 frame axioms (a total of 8 axioms) before axiomatizing synsets, so that these axioms don’t re- sult in more incorrect axioms for synsets mapped to their corresponding frame. Axioms for Derived Synsets After obtaining axioms for derived frames, we identified and automatically axiomatized synsets that are mapped to either the basic or the derived frames. Sample relation chains (word-frame mappings + frame-frame relations) and the resulting axioms are shown in rows 3-4 of Table 5-10. Note that axioms with unbound arguments can occur here too, as a result of incomplete syntactic axioms. For instance, in example #3, the second argument (i.e., the whole) for predica- tion COMPONENT_OF is not bound to any arguments in the left-hand-side of the axiom: arm-1-nn'(e0,part) & nn'(e1,y2,part) ⟶ COMPONENT_OF'(e0,part,u) As before, we filter these axioms by checking the crucial arguments of the predicates in the right-hand-side of the axioms. Finally, we used synset-synset relations to identify and axiomatize the last set of synsets. To avoid semantic drift, we limited number of hops to 5 for hyponymy and 1 for other relations. In addition, to avoid adding too many synsets through explosive branches (such as all members of a genus of a plant), we cut-off branches with more than 5 children. Sample relation chains (syn- set-synset relations + synset-frame mapping + frame-frame relations) and the resulting axioms are shown in rows 5-6 of Table 5-10. Note that in example #5, the noun “vertex” is derived from the noun “part” through hyponymy (in several hops) and thus could inherit its syntactic axiom, resulting in an argument-rich final axiom; while in example #6, the verb “disunite” is derived through a causation relation with the verb “divide” and hence doesn’t inherit its syntactic axiom. As a result, it has a simple axiom with no argument structure. Table 5-12 shows how many word senses and synsets are derived and axiomatized through syn- set-frame mappings and different synset-synset relations. In addition, the precision of axioms obtained through each type of relation is shown via two numbers: Precision1 and Precision2. Precision1 disregards syntactic structures and evaluates the semantics of the word only, while 68 Precision2 also considers the syntactic structure. Correctness of axioms was judged by the au- thor. Synset-Frame Mapping Hyponymy Antonymy Entailment Causation Total #word senses 715 309 46 25 6 1101 #synsets 246 206 20 11 3 486 Precision1 0.9 0.81 0.62 0.75 1.0 0.89 Precision2 0.87 0.57 - - - 0.78 Table 5-12: Precision of automatically-generated axioms for derived synsets. The first column of this table corresponds to axiomatized word senses and synsets that were directly mapped to a frame (i.e., synsets in DerivedSynsets1 in Figure 5-5). We obtained axioms for 715 word senses (corresponding to 246 synsets). Precision1 on a sample of 100 axioms from this set was 0.9 and Precision2 on this set was 0.87. An example of an incorrect axiom resulting from incorrect synset-frame mapping is the following axiom for the noun “predecessor”: predecessor-n1'(e0,material) & to-in'(e1,material,product) ⟶ COMPO- NENTS_OF'(e0,material,product) This axiom is the result of MapNet’s incorrect synset-frame mapping between the second sense of “precursor” (from which “predecessor” inherits in WordNet) and the frame Ingredients 36 . The second, third, fourth and fifth columns of Table 5-12 show number of axiomatized word senses and synsets through different synset-synset relations. While 309 word senses (corre- sponding to 206 synsets) are derived through hyponymy, the number of word senses derived through other relations is much less. Precision1 is good for all the relations except antonymy. This is because the antonymy relation doesn’t necessarily negate the aspect of meaning we have axiomatized. For example, while “interior” and “exterior” are antonyms, they should both be axiomatized as being a component: interior-n1'(e0,part) & of-in'(e1,part,whole) ⟶ COMPONENT_OF'(e0,part,whole) 36 In fact, the first s e n s e o f “ p r ec u r s o r ” in W o r d Net co r r esp o n d s to a s u b s tan ce an d s h o u ld h av e b ee n m ap p ed to th e frame Ingredients. 69 exterior-n1'(e0,part) & of-in'(e1,part,whole) ⟶ COMPONENT_OF'(e0,part,whole) However, in our axioms, “exterior” is axiomatized by negating “interior”’s axiom: exterior-n1'(e0,part) & of-in'(e1,part,whole) ⟶ NOT’(e3,e0,e4) & COMPO- NENT_OF'(e4,part,whole) We computed Precision2 for synsets derived through hyponymy (as they adopt their parent synset’s syntactic axioms) which turned out to be much less than their Precision1; and also less than Precision2 for their parents (i.e., synsets directly mapped to frames with the original syn- tactic axioms). An example for an incorrect axiom that resulted from adopting the parent syn- set’s (“removal”) syntactic axiom is: simplification-nn'(e0,x1) & of-in'(e2,x1,theme) & from-in'(e1,x1,source) ⟶ CAUSE'(e0,agent,e1) & CHANGE_FROM'(e1,e2) & AT'(e2,theme,source) While “simplification” is a kind of “removal”, it doesn’t appear in the same syntactic patterns that “removal” appears in. Therefore, the above axiom, while being perfect for “removal”, will never apply to “simplification”. Finally, the last column of Table 5-12 shows the overall precisions of all the automatically- generated axioms which we computed by taking the weighted average for Precision1 and Preci- sion2; where the weights are number of synsets (for Precision1) and number of word senses (for Precision2) in each group/column. The average for Precision1 is 0.89 and the average for Precision2 is 0.78. 70 Chapter 6 Evaluating the Usefulness of Our Knowledgebase Perhaps the ideal evaluation framework for our lexical-semantics knowledgebase would be the well-defined Recognizing Textual Entailment (RTE) task, as it captures major semantic inference needs required by many NLP applications such as question answering, information retrieval, information extraction, and document summarization. In the RTE task, the system is given a text and a hypothesis and must decide whether the hypothesis is entailed by the text plus back- ground knowledge. The organized RTE challenges that took place between 2008 and 2011 facili- tate evaluation and comparison to other systems. However, there are a number of problems in using RTE as an evaluation framework. The first problem is that a fully functional RTE system that could compete with existing systems should handle many natural language phenomena (such as anaphora and nominal reference, event conference, modality, negation, numerals, quantifiers, metonymy, syntactic variation, ambiguity, etc.) and implementing such system is beyond the scope of this thesis. The second problem is that the datasets associated with RTE challenges are too broad in terms of the domains and words that need to be axiomatized. This is while our current set of axiomatized words and core theories are very limited and hence there will be only few cases to which our axioms can be applied. Due to the above-mentioned problems, we decided to evaluate our axioms in another well- studied task: relation extraction from text. In particular, according to the domains we have axi- omatized, namely composite entities and sets, we chose the task of extracting part-whole rela- tions from text. We show how we can extract implicit part-whole relations from text using the automatically and manually created axioms for such words as “add”, “cut”, “remove”, “make”, etc. We compare our method to a state of the art system that automatically learns patterns capturing part-whole relations from large corpora. This chapter is organized as follows: first we present some perquisites including terminology, tools and algorithms that we will use in this chapter; then we describe our system for extracting 71 (part, whole) relations from text, using our manually and automatically generated axioms for words related to the domains of composite entities and sets (previous chapter). Then we review previous works on extracting part-whole relations from text and describe the chosen method for comparison, which is a state-of-the-art algorithm that learns part-whole patterns automati- cally from text. Finally, we present our experimental setup, evaluation and comparison of the two methods. 1. Prerequisites 1.1 Terminology Noun-axioms and verb-axioms: We refer to axioms that explicate a noun or a verb as noun- axioms and verb-axioms respectively. An example of a noun axiom is: variety-nn(e27,x8) & of-in(e22,x8,x5) => COMPOSITE_ENTITY(_u5,x8) & COMPO- NENTS_OF(_u6,x5,x8) Noun-referring arguments: In the logical form of a sentence, noun-referring arguments are those that appear as the second argument of a noun predication. For example, if argument x appears in the predication apple-nn’(e,x), x is a noun-referring argument. Part-whole predications: We refer to any core-theory predication in the set {COMPO- NENT_OF’(e,p,w), COMPONENTS_OF’(e,p,w), MEMBER’(e,p,w), SUBSET’(e,p,w)} as a part-whole predication. We refer to the arguments p and w as part-arg and whole-arg respectively. 1.2 Boxer In order to produce logical forms for English, we use the Boxer semantic parser 37 (Bos et al., 2004). As one of the possible formats, Boxer outputs logical forms of sentences in the style of (Hobbs, 1985). In the following, we describe some of Boxer’s features and behaviors that we used or are relevant to this discussion. 37 Boxer can be downloaded from http://svn.ask.it.usyd.edu.au/trac/candc/wiki/boxer 72 - We used Boxer’s anaphora resolution feature which represents equality of entities by intro- ducing additional equal(x,y) predications in the logical form of the sentence. There is an additional option to have boxer remove these equal predications and unify the arguments that were deemed equal by these predications. We used this option too. Therefore, in the final logi- cal form for a sentence like “Clearly, for a fruit to be suitable for jam production it has to con- tain some sugar”, the nouns “fruit” and “it” will be represented by the same argument. - Boxer represents copula with an equal(x,y) predication too. Unlike the previous case, box- er does not provide an option to eliminate these equal predications and unify its correspond- ing arguments. We do this step manually in a post-processing step. Therefore, in the logical form of a sentence like “LaPlace is the southern terminus of Interstate 55”, the words “LaPlace” and “terminus” will be represented by the same argument. - All nouns in a compound proper name are represented by the same argument. For example, in the logical form of the sentence “Kitchener joined Khartoum University College”, the nouns “Khartoum”, “University” and “College” are represented by the same argument. - Boxer represents conjunctions and disjunctions using the predication subset_of(e0,x,y) in the logical form. For example, here is the logical form for the sentence “Kismet is made of waffle and nougat”: kismet-nn(e4,x1) & make-vb(e1,u18,u19,x1) & of-in(e6,e1,x2) & subset_of(E16,x3,x2) & subset_of(E7,x4,x2) & waffle-nn(e17,x3) & nougat-nn(e15,x4) argument x2 that represents the conjunction of “waffle” (represented by x3), and “nougat” (represented by x4) is used in predication of-in(e6,e1,x2) and two separate predications subset_of(E16,x3,x2) and subset_of(E7,x4,x2) are introduced to show that x2 is the conjunction of x3 and x4. 1.3 Phillip We used Phillip (Yamamoto et al. 2015) for applying axioms to logical forms and inferring new predications that capture part-whole relationships. Phillip is the successor of Henry-Tacitus (In- 73 oue and Inui, 2012) which uses Integer Linear Programming (ILP) to solve weighted abduction and is very fast and scalable. We use Phillip in a specific mode that applies all applicable axioms to the given input without pruning the inferences for lower costs; as we are interested in “de- tecting” possible part-whole relations in the input rather than “disambiguating” possible inter- pretations of it. 1.4 Head Noun Detection Algorithm (HND algorithm) We will use the head noun detection algorithm (HND algorithm henceforth) in our axiom-based relation-extraction algorithm (which will follow shortly) as well as in the bootstrapping algo- rithm (that we later compare our method against). Given an argument x and an ordered list of predications, the HND algorithm finds the list of head nouns that represent x. This is a non-trivial task as nouns associated with the same argu- ment may be close to each other and part of a multi-word expression (e.g., “Barak Obama”); or be separate noun phrases linked by a copula (e.g., “Kulen is a flavoured sausage made of minced pork”); or a copula and a pronoun (e.g., Nelson joined Hartman-Cox Architects where he was the Project Architect). Given an argument x and an ordered list of predications, the HND algorithm first finds the se- quence of all the nouns associated with x (which we call noun-sequence), checks whether any other words have occurred in between these noun sequences in the logical form and if so, splits the noun sequence into chunks, and returns the last noun of each chunk. For example, given the noun sequence <Krya, Vrysi, centre, host> which all refer to the same argument in the logical form of the sentence “Krya Vrysi , being the centre for many smaller villages , is the host of various annual events”, the HND algorithm finds three chunks <Krya Vry- si>, <centre> and <host> and returns the last words of each chunk, i.e., Vrysi, centre and host, as the head nouns. 74 2. Extracting Part-Whole Relations Using Axioms Our algorithm for extracting (part, whole) relations from text, consists of the following steps: 1. Selecting high-quality axioms: First, we identify a high-quality subset of axioms that yield fully instantiated part-whole relationships. Such axioms anchor words or phrases (such as “in- clude” or “participate in”) to at least one core-theory predication in the part-whole predication set (see the terminology section) and subcategorize both the part-arg and the whole-arg of these predications. For example, in the following axioms, (1) is a high quality axiom that fills-in both the part-arg and the whole-arg of COMPONENTS_OF; while (2) fails to construct the part- arg for the predicate COMPONENTS_OF: (1) bake-vb'(e0,creator,x3,whole) & with-in'(e1,e0,parts) ⟶ CAUSE'(e0,cause,e1) & CHANGE_TO'(e1,e2) & COMPONENTS_OF'(e2,parts,whole) (2) make-vb'(e0,agent,x3,whole) ⟶ CAUSE'(e0,agent,e1) & CHANGE_TO'(e1,e2) & COM- PONENTS_OF'(e2,parts,whole) We found 973 high-quality axioms and converted them to the format required by Phillip (the abductive reasoner). In addition to the lexical axioms, we added six additional axioms, shown in Table 6-1, that cap- ture transitivity of part-whole relation (trans1), relationships between predications from core theories of sets and composite entities (core1, core2), the meaning of Boxer’s subset_of predi- cation (boxer0) and the meaning of the preposition “in” when combined with the predication COMPONENT_OF (lex-1). Axiom core3 is added to convert all COMPONENTS_OF predications to COMPONENT_OF and its sole purpose is to avoid duplicating trans1, boxer0 and lex-1 for the predication COMPONENTS_OF. trans1+ COMPONENT_OF’(e0,x,y) & COMPONENT_OF’(e1,y,z) -> COMPONENT_OF’(e,x,z) core1+ SUBSET’(e,x, y) -> COMPONENTS_OF’(e,x,y) core2+ MEMBER’(e, x, y) -> COMPONENT_OF’(e, x, y) core3+ COMPONENTS_OF’(e, x, y) -> COMPONENT_OF’(e, x, y) boxer0+ COMPONENT_OF’(e2, x, z) -> subset_of’(e0, x, y)& COMPONENT_OF’(e1, y, z) lex-1] in-in’(e1,x,y) & COMPONENT_OF’(e2,y,z) -> COMPONENT_OF’(e,x,z) Table 6-1: Additional Axioms 75 2. Parsing the corpus: We parse the corpus with Boxer to generate a first order logical form for each sentence. We assign each sentence and its logical form a unique ID for later reference. We also convert these logical forms into the format required by Phillip. 3. Applying axioms and identifying (part, whole) pairs: Next, we run Phillip to apply the axioms on the logical forms. The result is an xml file with a separate node for each input logical form that shows which axioms with which argument bindings are applicable to it. For example, for the sentence “Kid Rock's genre-spanning sound incorporates a wide variety of musical styles and influences.” with the following logical form: kid-nn(e9,x2) & rock-nn(e6,x2) & of-in(e5,x3,x2) & genre-spanning-adj(s1,x3) & sound- nn(e4,x3) & incorporate-vb(e2,x3,u11,x8) & wide-adj(s2,x8) & variety-nn(e27,x8) & of- in(e22,x8,x5) & subset_of(E25,x6,x5) & subset_of(E23,x7,x5) & musical-adj(s3,x6) & style- nn(e26,x6) & influence-nn(e24,x7) the following instantiated axioms were applied: ax^351] incorporate-vb(e2,x3,u11,x8) ⟶ COMPOSITE_ENTITY(_u452,x3) & COMPO- NENT_OF(_u453,x8,x3) ax^502] variety-nn(e27,x8) & of-in(e22,x8,x5) => COMPOSITE_ENTITY(_u5,x8) & COMPO- NENTS_OF(_u6,x5,x8) boxer1] subset_of(E25,x6,x5) & COMPONENTS_OF(_u437,x5,x8) ⟶ COMPO- NENTS_OF(_u454,x6,x8) boxer1] subset_of(E23,x7,x5) & COMPONENT_OF(_u6,x5,x8) ⟶ COMPO- NENT_OF(_u18,x7,x8) core-trans-1] COMPONENTS_OF(_u23,x6,x8) & COMPONENT_OF(_u22,x8,x3) ⟶ COMPO- NENTS_OF(_u29,x6,x3) core-trans-1] COMPONENTS_OF(_u18,x7,x8) & COMPONENT_OF(_u16,x8,x3) ⟶ COMPO- NENT_OF(_u21,x7,x3) These axioms introduce a set of part-whole predications (shown in bold letters) that we use for identifying (part, whole) noun pairs: For each part-whole predication Pred’(e,p,w) where p is the part argument and w is the whole argument, we look in the logical form for sets of nouns that represent p or w. Table 6-2 lists all the part-whole predications and their corresponding part and whole noun predications for the above example. 76 We exclude (part, whole) combinations in which either the part or the whole has been back- chained on. This would eliminate items 1-4 in Table 6-2 , because variety-nn(e27,x8)has been back-chained on with axiom ax^502. We also exclude cases in which no noun predication is found for the part or the whole arguments. This is the case for item#2 in Table 6-2, where argument x5 (that refers to the conjunction “style and influence”) has no corresponding noun predication in the logical form. Item# Part-whole predication Part(s) nouns Whole nouns 1 COMPONENT_OF(_u453,x8,x3) variety-nn(e27,x8) sound-nn(e4,x3) 2 COMPONENTS_OF(_u6,x5,x8) - variety-nn(e27,x8) 3 COMPONENTS_OF(_u454,x6,x8) style-nn(e26,x6) variety-nn(e27,x8) 4 COMPONENT_OF(_u18,x7,x8) influence-nn(e24,x7) variety-nn(e27,x8) 5 COMPONENTS_OF(_u29,x6,x3) style-nn(e26,x6) sound-nn(e4,x3) 6 COMPONENT_OF(_u21,x7,x3) influence-nn(e24,x7) sound-nn(e4,x3) Table 6-2: Part-whole predications and their corresponding part and whole noun predications. Often times, more than one noun predication is associated with a single part or whole argu- ment. This happens when we have noun compounds or copula. In such cases, we use the HND algorithm (i.e., the Head Noun Detection algorithm described earlier) to find the head nouns associated with the part argument and the head nouns associated with the whole argument and then create all possible combinations of (part, whole) nouns. For example, let’s say for a given part-whole predication pred(e,p,w), three nouns {N1’(e1,p), N2’(e2,p), N3’(e3,p)} describe the part argument p; and four nouns {N4’(e6,w), N5’(e7,w), N6’(e8,w), N7’(e8,w)} describe the whole argument w. If our HND algorithm finds that {N1’(e1,p), N3’(e3,p)} are head nouns describing p and {N4’(e6,w), N7’(e8,w)} are head nouns describing w, then we will have the following combinations of (part, whole) pairs: (N1, N4), (N1, N7), (N3, N4) and (N3, N7). For each (part, whole) pair extracted from a sentence sentenceId, we save the following tuple: (part, whole, predicatesBackChanedOn, sentenceId), where predicatesBackChanedOn is the 77 left-hand-side of the instantiated axiom that extracted this tuple (e.g., variety-nn(e27,x8) & of-in(e22,x8,x5)). We use these tuples later in the evaluation. 3. Evaluation and Comparison with a State-of-the-Art Automatic Relation Learning Algorithm 3.1 Previous Work on Extracting Part-whole Relations One of the early works for automatic discovery of semantic relations from text was by (Hearst 1992), who used a fixed set of lexico-syntactic patterns for discovering hypernymy from dis- course. Hearst adopted the same technique for extracting part-whole relations from discourse with little success as the lexico-syntactic patters were ambiguous and often expressed other meanings besides part-whole relations. (Berland and Charniak 1999) also use a small set of manually-crafted patterns and initial “whole” instances (e.g. “building”), to harvest the corresponding “part” instances (e.g. “room”). They achieve an accuracy of 55% over the top-50 results and 70% over the top-20 results. (Girju et al. 2003) used supervised learning techniques for learning part-whole relations from TREC 9 corpus. They used 100 seeds from WordNet’s part-whole relations, extracted 20,000 sentences containing these seeds from two separate corpora and found 54 patterns from which they used only the 3 most frequent ones (“x of y”, “x’s y” and “x verb y”). Since these patterns were ambiguous, they used supervised machine learning on 54000 manually annotated in- stances to learn the semantic type of x and y. The final system had an average precision of 83.57% and a recall of 98.31%. The recall was computed on only those sentences that matched one of the 3 patterns, which were a total of 119 cases. (Van Hage et al. 2006) use a two step method for 1) finding phrase patterns for both explicit and implicit part-whole relations, and 2) applying these patterns to find part-whole relation in- stances. Their focus is on ingredient-food relations and particularly aim for high recall. They as- semble 503 part-whole pairs, from a list of food additives and food product types they can oc- cur in. Then Google was queried by these seeds and after trimming the retrieved snippets, in- 78 specting them manually and filtering bad patterns, 91 patterns were obtained. To find part- whole relations, they substitute the “part” slots in the patterns with known carcinogens, and formulate web-search queries to extract the “whole” entities. These “whole” entities were then filtered and only those that occur in agriculture thesauri were kept. The average precision of this method was 74%. To compute recall, they take 4 carcinogens (parts) that are well studied and there is documentation on which foods (wholes) may contain them. They measure how many of those known foods can be extracted for each carcinogen and report a recall of 73-86%. (Pantel and Pennacchiotti, 2006) introduced a weakly-supervised, general purpose and accurate bootstrapping algorithm called Espresso for extracting arbitrary relations from text. Espresso starts with a set of seed instances for the desired relation to be learned, and iterates through 1) pattern induction, 2) pattern ranking/selection and 3) instance extraction. A principled measure of pattern and instance reliability plays the key role in this algorithm. Espresso uses the Web to overcome sparsity whenever there isn’t enough evidence in the corpus to rely on an instance. Espresso also exploits generic patterns (patterns with high recall and low precision) with the help of Web to filter incorrect instances. Experiments were conducted on a sample of AC- QUAINT (TREC-9) newswire corpus with 6 million words. For part-whole relations, without using the generic patterns, Espresso extracts only 132 instances with a precision of 80%. Exploiting generic patterns dropped the precision to about 70% but dramatically increased extractions to about 87000 instances. (Ittoo and Bouma 2010, 2013) use Espresso with some modifications to learn part-whole rela- tion patterns from a 2007 dump of Wikipedia and then use these patterns to extract part-whole instances from smaller-sized domain-specific text. They pay particular attention to seed selec- tion and choose them from different types of part-whole relations (e.g. place-area, component- integral, member-collection, etc.). Their system achieves 80% precision on Wikpedia and 79.4% on domain-specific text. They didn’t compare their method against Espresso on Wikipedia, but report that running Espresso on the domain-specific text (without using the Web-expansion feature) resulted in a precision of only 44.1%. To compute recall, they manually inspected a subset of the target corpus and identified 500 valid part-whole relations (equally distributed 79 among all types of part-whole relations). Their proposed framework achieved a recall of 79.6% while Espresso had a recall of 45%. 3.2 Chosen Method for Comparison As a baseline for comparison, we chose the bootstrapping part-whole relation extraction algo- rithm by (Ittoo and Bouma 2010, 2013) which is based on the Espresso algorithm (Pantel and Pennacchiotti, 2006). We implemented their system (which we refer to as Ittoo’s system hence- forth) with the following modifications to make it more comparable against our system: - Linguistic pre-processing: Instead of syntactic parse trees of the sentences, we used the logi- cal forms produced by Boxer with the pronoun-resolution feature enabled. - Term identification: Instead of using the term-identification algorithm Termight (Dagan and Church 1994) to identify instances, we use our HND algorithm to identify the head nouns. 3.2.1 The Algorithm The bootstrapping algorithm starts with a set of seed (part, whole) instances and finds patterns that connect them. It then assigns a reliability score to each pattern and chooses the top-k pat- terns with the highest reliability score to find new (part, whole) instances. Next, it assigns a re- liability score to each new instance and selects the top-n instances for finding new patterns that connect them in the next iteration. The steps of pattern selection and instance selection are repeated until the quality of the patterns doesn’t change significantly. In Ittoo’s system, k=10 and n=100 in the first iteration and in each subsequent iterations, k increases by 5 and n in- creases by 20 (i.e., in each iteration, 5 new patterns and 20 new instances are extracted.) They stop the algorithm when the pattern quality (system’s performance) became almost constant (this happened after 15 iterations) and obtained 102 reliable patterns. In each iteration of this algorithm, patterns and instances are scored by the reliability measures that were originally introduced in (Pantel and Pennacchiotti, 2006). The instance_pair_purity measure introduced by (Ittoo and Bouma 2010, 2013) is also used in scoring instances to avoid semantic-drift. We describe these measures below. 80 Instance and Pattern reliability measures Both the instance and pattern reliability measures use the Pointwise mutual information (PMI)(Cover and Thomas 1991) between a pattern and an instance to measure the strength of their association. The Pointwise mutual information between an instance i, consisting of the pair (x,y), and a pattern p is defined as: (1) Where |x,p,y| is the frequency of pattern p instantiated with the pair (x,y) and * represents a wildcard. Pattern Reliability: Equation (2) defines r π (p), the reliability of a pattern p in expressing a part– whole relation, as its average strength of association (PMI) with the (part, whole) pair instances (is) that instantiate it, weighted by the reliability of these pairs r ι (i). The reliability of the initializ- ing seeds is set to 1. In this equation, max pmi is the maximum PMI over all instances and patterns and |I| represents the total number of instances. (2) Instance Reliability: Instance reliabilities are computed very similar to pattern reliabilities. Equ- ation (3) defines r ι (i), the reliability of instance i, as its average strength of association (PMI) with the patterns (ps) that it instantiates, weighted by the reliability of these patterns r π (p). In this equation, |P| represents the total number of patterns and max pmi is as before. (3) 81 Instance_pair_purity Measure Semantic-drift is the phenomenon whereby the relations extracted by a minimally-supervised technique, differs from the target relations instantiated by the initializing seeds. Semantic drift can happen due to ambiguous patterns or noisy patterns that might be the result of a parser error. Such patterns introduce noisy instances which in turn will result in extraction of addition- al irrelevant patterns and instances in the next iterations. To reduce the effect of semantic-drift, (Ittoo and Bouma 2010, 2013) define an in- stance_pair_purity measure that estimates the purity of an instance pair i as its likelihood (probability) of being connected by unambiguous part–whole patterns (that also subcategorize the initializing seeds). For example, if an instance pair i appears 100 times in a corpus, and 63 of these occurrences are sub-categorized by an unambiguous pattern, then the purity of i, puri- ty(i), is 0.63. Ittoo’s system uses the patterns “consist-of” and “contain” as unambiguous pat- terns while we added the additional patterns “member-of”, “part-of”, “component-of” and “constitute”, as we believe they too strongly indicate a part-whole relationship. 3.2.2 Pre-Processing the Corpus and Extracting (part, whole, pattern, sentenceId) tuples As we stated before, we use logical forms instead of syntactic parse trees. Similar to Ittoo’s me- thod, we first extract and count all possible (part, whole, pattern) triples from the corpus, and then use them for the bootstrapping algorithm. In addition to these 3 items, we also keep the ID of the sentence from which each triple is extracted. For each logical form LF with the ID sentenceId, we first identify all noun-referring arguments and their corresponding head nouns (using our HND algorithm). Then we search the logical form for any of the following combinations of predications that relate two noun-referring arguments (arg1, arg2): (1) verb only: p1-vb’(e,arg1,*,arg2) or p1-vb’(e,*,arg1,arg2) or p1- vb’(e,arg1,arg2,*) 82 (2) verb + preposition: p1-vb’(e,arg1,*,*) & p2-in’(e2,e,arg2) or p1- vb’(e,*,arg1,*) & p2-in’(e2,e,arg2) or p1-vb’(e,*,*,arg1) & p2- in’(e2,e,arg2) (3) noun + preposition: p1-nn’(e,arg1) & p2-in’(e2,arg1,arg2) Note that in the second and third cases, the noun/verb and the preposition predications are related by a common argument. We chose these combinations of predications because they match 1) the left-hand-side of our axioms and 2) top (and hopefully most) patterns extracted by Ittoo’s system. We refer to the above predication combinations as instantiated candidate patterns or ICP. We convert these ICPs to (noun1, noun2, pattern, sentenceId) tuples, where noun1 and noun2 are the head nouns associated with arg1 and arg2 (respectively) and pattern has the following for- mat: p1-vb, argIndex1, argIndex2 (e.g., make-vb,1,3) p1-vb, argIndex1 & p2-in, argIndex2 (e.g., make-vb,3 & from-in,2) p1-nn, argIndex1 & p2-in, argIndex2 (e.g., part-nn,1 & of-in,2) In the above patterns, argIndex1 and argIndex2 indicate the indices of arg1 and arg2 respective- ly. Table 6-3 shows sample predication combinations and the corresponding tuples (we omitted the sentenceId from the tuples for readability). Note that each of arg1 or arg2 may be associated with more than one head noun (see the HND algorithm). For example, arg1 might refer to two head nouns, N1 and N2 and arg2 may refer to three head nouns, N3, N4 and N5. As with our own algorithm, we consider all possible noun pairs (e.g., (N1, N3), (N1, N4), (N1, N5), ... , (N2,N5)) and create separate tuples for them. ID Predication Combination Extracted Tuple 1 make-vb’(e1,x,u,y) & actor-nn’(e2,x) & appearance-nn’(e3,y) (actor, appearance,make-vb,1,3) 2 part-nn’(e1,x) & of-in’(e2,x,y) & DNA-nn’(e2,x) & apple-nn’(e3,y) (DNA, apple, part-nn,1 & of-in,2) 3 live-vb’(e1,x,u,y) & in-in’(e2,e1,z) & child-nn’(e2,x) & area-nn’(e3,z) (child, area, live-vb,1 & in-in,2) Table 6-3: Sample Predications and extracted (part, whole, pattern) tuples. 83 We also take care of conjunctions and disjunctions of nouns that Boxer represents by the predi- cate subset_of in the logical form: If arg3 represents the conjunction or disjunction of two noun-referring arguments arg4 and arg5 (e.g., we have the predications sub- set_of’(e,arg4,arg3) & subset_of’(e,arg5,arg3) in the logical from), we consid- er arg3 as a noun-referring argument when searching for ICPs and after finding a matching ICP, decode arg3 back to arg4 and arg5 and consider all the head nouns associated with both of them for constructing tuples. 3.2.3 Seed Selection (Keet and Artale, 2007) define a taxonomy of 8 part-whole relations which are: 1) structural part-of, between integrals and their functional components, e.g. “engine-car”; 2) member-of, between a physical object (or role) and an aggregation, e.g. “player-team”; 3) constituted-of, between a physical object and an amount of matter, e.g. “clay-statue”; 4) sub-quantity-of, be- tween amounts of matter, e.g. “oxygen-water”; 5) located-in, between an entity and its 2- dimensional region, e.g. “city-region”; 6) contained-in, between an entity and its 3-dimensional region, e.g. “tool-trunk”; 7) participates-in, between an entity and a process, e.g. “enzyme- reaction”; 8) involved-in, between a phase and a process, e.g. “chewing-eating”. Omitting the relations participates-in, involved-in and constituted-of, (Ittoo and Bouma 2010, 2013) choose 5 most frequent (part, whole) instances for the 5 remaining types of part-whole relations. The reason for omitting those three types of relations was lack of sufficient instance pairs to initialize them. 3.3 Experiment Setup 3.3.1 Dataset We chose 12 million sentences, with a length greater than 6 words, from folders enwp00 and enwp03 in the Wikipedia section of ClueWeb09. We parsed all the 12 million sentences with Boxer, making use of the parallel computing nodes on USC’s HPC high-performance cluster. This 84 process took several days to complete. We used the resulting logical forms as the input to both our system and Ittoo’s system. 3.3.2 Ittoo’s System Results Extracted Tuples: Using the method we described in 3.2.2, we got about 841,000 (part, whole, pattern, sentenceId) tuples; in which there were about 80,000 unique (part, whole, pattern) triples, 110,000 unique patterns and 300,000 unique (part, whole) instances after discarding the ones with frequencies less than 5. Chosen Seeds: Considering the frames and words that we have axiomatized, we are interested in the following part-whole relations: structural part-whole, member-of, constituted-of and sub- quantity-of. We chose five high-frequency pairs that have occurred with general part-whole relation patterns like “contain”, “consist of”, “member of”, “part of”, “component of” and “con- stitute”. We didn’t find enough high-frequency instances for the sub-quantity-of relation in our data and hence omitted it. The seeds we have used are shown in Table 6-4. Structural Member-of Constituted-of seed, fruit cell, system room, building hall , building route, system fighter, squadron minister, parliament people , group team, league player, team wood, frame substance, body metal, instrument wood ,building sand, castle Table 6-4: Chosen seeds for different types of part-whole relations. Bootstrapping Algorithm: We ran the bootstrapping algorithm with the same parameters as Ittoo’s system; i.e., in the first iteration, we extracted 10 patterns and 100 instances and in each subsequent iteration, we added 5 new patterns and 20 new instances. We ran the algorithm for 60 iterations and for each pattern and instance, kept track of the iteration at which that pattern or instance was introduced. Table 6-5 (a) shows the top-10 reliable patterns and the iteration in which they were extracted; and Table 6-5 (b) shows sample good and bad patterns that were introduced in different iterations. (The POS and argument indices are omitted from the patterns for readability). 85 pattern Iteration contain part-of member-of make-of include constitute consist-of have house (verb) locate-in 0 0 0 0 0 12 0 0 0 0 Iteration Good Patterns Bad Patterns 0-5 part-of, member-of, contain, conclude- with see, serve, know-as, con- clude-after, elect-as, air-over 5-15 capital-of, play-at, take-from, equip-with see-for, write-with, air- during, provide 15-30 play-for, incorporate, center- of, present-in become, build, obtain-on, authority-of 30-60 split-into, play-in, use-in, sign- into, go-into need, depend-on, pass-on, affiliate-of, borrow-from (a) (b) Table 6-5: (a)Top-15 reliable patterns and (b) Sample positive and negative patterns extracted in different iterations. According to examples in Table 6-5 (b), many good patterns are still extracted after the 15th iteration (which was the final iteration in Ittoo’s original system). 3.3.3 Our System’s Results Our system extracted 562883 (part, whole) instances by applying 182 distinct axioms on the same logical forms used by Ittoo’s system. Recall from section 2 that we saved extracted instances as (part, whole, predicatesBackChane- dOn, sentenceId) tuples, where predicatesBackChanedOn was the left-hand-side of the instan- tiated axiom that extracted (part, whole). In order to compare our axioms against Ittoo’s sys- tem’s patterns, we translate predicatesBackChanedOn to patterns. For example, the predica- tions part-nn’(e,p) & of-in’(e1,p,w) is translated to the pattern part-nn,1 & of-in,2. Table 6-6 shows the distribution of the 10 most frequent patterns in our system. The POS and argument Indices are omitted for readability. 86 Pattern Frequency include contain consist of make in join represent part of member of include in build in 299809 41021 34762 33784 29445 27759 21658 13619 11444 11321 Table 6-6: The 10 most frequent patterns (according to the left-hand-side of the axioms). 3.4 Evaluation In this section, we compare the performance of our axioms vs. the patterns learned by Ittoo’s system. We call the set of patterns corresponding to our axioms A and the set of automatically- learned patterns B. To compare these sets, we study and compare the properties their intersec- tion (A∩B) and their differences (A-B and B-A). In the following sections, we try to answer the following questions: 1. Can most of our axioms be learned automatically? In other words, how big is A∩B? 2. How good are the axioms/patterns that could not be learned (A-B) and how good are the learned patterns that our axioms miss (B-A)? Table 6-7 summarizes the key findings about each group of patterns which we use for answer- ing the above questions. We explain these findings in the following sections and will refer to this table frequently. Pattern Set #Patterns Precision1 Precision2 #Extractions Rel. Recall A∩B 28 0.94 0.88 556445 100% A-B 407 0.92 0.67 209539 28% B-A 268 0.46 0.35 3041617 217% Table 6-7: Comparison of patterns in A (the axioms) and patterns extracted by B 87 3.4.1 Pattern Overlap The first column of Table 6-7 shows the number of distinct patterns in each set of patterns. Only 28 patterns are shared between systems A and B. This is a very small fraction of total patterns in A, therefore the answer to our first question is negative. Most of our axioms could not be learned automatically. Table 6-8 shows sample patterns that are shared between A and B as well as patterns that are present in only one system. Pattern Group Examples A∩B include, contain, join, member of, divide into A-B make with, combine with, form of, create with, incorporate into, division of, family of B-A use, character of, have, use in, produce, find in, locate in, serve as Table 6-8: Sample patterns in A∩B, A-B and B-A. 3.4.2 Performance of Patterns To measure how good each group of patterns are, we need to measure the precision and rela- tive recall of each set of patterns. Precision We estimate the precision of a set of patterns by two different measures: Precision1 and Preci- sion2. Precision1 measures how precise the patterns are by themselves (i.e., without consider- ing the instances they extract); while Precision2 estimates the precision of patterns by measur- ing the precision of instances they extract. Annotation of the patterns and instances are done by the author. However, to make sure the author was not biased, a fraction of the annotated cases are annotated by a second judge and the inter-annotator agreement is reported. Precision1: To compute Precision1, we need to measure how well a pattern captures any of the part-whole relationship types we are interested in. To facilitate annotation, patterns are trans- lated into natural language phrases. For example, the patterns “house-vb,1,3”, “make -vb,3 & of- 88 in,2” and “part-nn,1 & of-in,2” are translated to “x houses y”, “x is made of y” and “x is part of y”. Given a natural language representation of a pattern, the judge has to decide how often this statement holds: “either x is/was a part/member/substance of y or y is/was a part/member/substance of x”. Possible answers are 1) almost always, 2) sometimes, 3) rarely and 4) don’t know/skip. We assign a score of 1 to the first answer, a score of 0.5 to the second answer and a score of 0 to the third one. We discard cases that were answered as “don’t know/skip”. The author annotated 168 patterns; 70 taken randomly from each of A-B and B-A, and 28 from A∩B. From these, 51 random patterns (chosen equally from those answered 0, 1 and 0.5) were annotated by a second judge. The judges agreed on 74% of the cases and the Cohen Kappa score was 0.6 for all labels (0,1, 0.5) and 0.73 on labels 0, 1 only. Precision1, which is shown in the third column of Table 6-7, is estimated for each set of patterns (A∩B, A-B and B-A) separately, by taking the sum of all the scores and dividing them by total number of annotated cases. Patterns chosen from A∩B and A-B have the highest precision (0.94 and 0.92 respectively), while patterns from B-A have the lowest precision (0.46). Precision2: To compute Precision2, we need to check whether the pair of nouns that a pattern has extracted, are in a part-whole relationship. Given a pattern pt and the pair (noun1, noun2) that it has extracted from a sentence S, we instantiate pt with noun1 and noun2, convert it to a natural language phrase PH (as we explained earlier) and ask the annotator whether the state- ment PH in the sentence S implies a part-whole relationship between noun1 and noun2. For instance, given the pattern provide-vb,1,3 and the pair (copyright, protection), it has extracted from the sentence “copyright provides the protection of expression of ideas”, the phrasal ver- sion of the instantiated pattern would be “copyright provides protection” and the following question will be formulated for the annotator: Does the statement “copyright provides protection” in the sentence “copyright provides the protection of expression of ideas“, imply that copyright is/was a 89 part/member/substance of protection or protection is/was a part/member/substance of copyright? We chose 70 random patterns from each of A-B and B-A sets, and all the 28 patterns from A∩B. Then for each pattern pt, we took 10 random (noun1, noun2, pt, sentenceId) tuples from the processed corpus (sections 2 and 3.2.2). For each tuple, using the sentenceId, we pulled the original sentence and its logical form from the raw and parsed corpus respectively. The author then inspected the tuples (noun1, noun2, pt, LF, sentence) to filter-out incorrect instances re- sulting from parser errors (About 25% of the inspected cases were filtered). Once an instance (with no parsing problem) was found for a pattern, the rest of the tuples for that pattern were discarded. At the end, we had 70 (noun1, noun2, pattern, sentence) 38 tuples, one for each pat- tern. Since there are only 28 patterns in A∩B, we initially chose 15 tuples per each pattern for parsing-error inspection and kept 3 tuples per pattern. These tuples were then converted to natural language sentences (as explained above) and judged by the author. From these, 50 cases were chosen randomly (25 from those answered “yes” and 25 from those answered “no”) and judged by a second person. There was agreement on 83% of cases and Cohen Kappa was 0.66. Precision2 is shown in the fourth column of Table 6-7 for each pattern set. Again, patterns in A∩B and A-B have the highest precision, while patterns in B-A have the lowest precision. Relative Recall To measure the recall, we use the relative recall measure, introduced by (Pantel et al. 2004) which measures the recall of a system relative to another system’s recall. The relative recall of system A given system B, R A|B , is defined as: 38 W e d o n ‟ tn ee d th e lo g ical f o r m f o r th e a n n o tatio n . 90 where R A is the recall of A, C A is the number of correct instances extracted by A, C is the (un- known) total number of correct instances in the corpus, P A is A’s precision and |A| is the total number of instances discovered by A. We compute the relative recall for each pattern group in relation to Group A∩B. For precision, we use Precision2. The total number of instances extracted by each group of patterns is shown in the fifth column of Table 6-7. Compared to patterns in A∩B, patterns in B-A have much higher recall (217%), while patterns in A-B have a much lower recall (28%). We can now answer our second question by considering Precision1, Precision2 and relative re- call. Patterns that our axioms miss (i.e., patterns in B-A) have a high recall but very low preci- sion, while the axioms that could not be learned automatically (i.e., patterns in A-B) have a much higher precision and much lower recall. 3.4.3 Analysis of Incorrect Extractions by Our System: Except for parsing errors, most of the incorrect (part, whole) extractions by our axioms are due to ambiguous verbs, nouns or phrases that instantiate the left-hand-sides of the axioms. Exam- ples of ambiguous phrases for verbs are “x merged with y” and “x joined y”. We noted that am- biguity is much higher among noun phrases like “x is a/the body of y” or “x is a/the team of y”. This higher rate of ambiguity can be attributed to the ambiguity of the preposition “of”. For example, while the phrase “x is a team of ys” can indicate that ys are members of the team x, it doesn’t convey such meaning in the sentences “Forde was the GPA team of the year”, and “The Indomitable Lions is the national team of Cameroon”. To solve the ambiguity problem, one needs to consider the semantics of a pattern’s arguments. For example, in “x merged with y” and “x joined y”, a part-whole relation exists between x and y, only if y is a composite entity. Syntactic information might help with this situation as well: for “x is a team of y” to indicate a part-whole relationship between x and y, y needs to be a plural noun. As another example, “x is a body of y” usually indicates that x is the whole and y is the material (e.g., “lake is a body of water”), while “x is the body of y” indicates that x is part of y. 91 Chapter 7 Related Work In this section we review related work on developing lexical semantic resources and compare them against our work. We have already mentioned most of these resources in the introduction of this thesis and even used some of them (like WordNet and FrameNet) in the previous chap- ters. The purpose of this chapter is to compare these resources to our Deep Lexical Semantics enterprise in more detail. We start with Cyc 39 which, due to its rich knowledgebase and support for mapping English to its underlying predicates, is quite relevant to our work. Next we review works following the relational approach such as WordNet, VerbOcean and Con- ceptNet that capture lexical or world knowledge using predefined relations between words or phrases. We then focus on works that, like us, choose a decompositional approach to meaning represen- tation. We review past projects like VerbNet and FrameNet as well as the recent works by Len Schubert and James Allen in manual and/or automatic axiomatizing of concepts/words. 1. Cyc Cyc (Lenat 1995) is the biggest project aiming to build a resource of scientific and commonsense knowledge. Cyc is relevant to our Deep Lexical Semantics effort in two ways : 1) It has a rich axiomatization of commonsense knowledge and 2) it has axioms for mapping words to its predi- cates (and mapping syntactic arguments of a word to the semantic arguments of the corres- ponding concept). Cyc’s approach to map words to very specialized concepts has severe shortcomings. According to the example given by (Mahesh et.al 1996), Cyc makes fine grained distinction between dif- 39 Cyc is barely referred to as a lexical semantic resource. However, due to its support for English language (both syntax and semantics) it contains lexical semantic knowledge as well. 92 ferent senses of ‘take’ by mapping this word to concepts such as takingABath and takin- gAShower, leaving out other senses in the same granularity level such as ‘taking a course. Cyc has axioms for these fine grained senses that are not relevant to the general meaning of ‘take’ at all : (implies (and (isa ?BATH TakingABath) (isa ?WATER (LiquidFn Water-Fresh)) (stuffUsed ?BATH ?WATER)) (objectOfStateChange ?BATH ?WATER)) More interestingly, Cyc has another concept Bathing that should essentially be the same as takingABath, but it’s not and rather has its own set of axioms. We view ‘take a bath’ or ‘take a shower’ as compositions of ‘take’ and ‘bath’ or ‘shower’. In all these compositions, the word ‘take’ appears in its first WordNet sense defined as ‘carry out’. We axiomatize this sense of ‘take’ as : take-v1’ (e, x, e0) → changeTo’(e,e0) & agentOf(x, e0) This axiom equally applies to ‘take a bath’ and ‘take a shower’ in addition to many other combi- nations such as ‘take a test’ or ‘take a rest’ which only specify the argument e0. We leave the detailed semantics of these compositions to be captured in the meaning of their objects ‘bath’, ‘shower’, etc. Therefore, we are less concerned with the sparsity problem since we are able to axiomatize and compose meanings of very general words. In addition to sparsity, Cyc suffers from suboptimal axiomatizations of words as demonstrated in (Cox 2005)’s report on assessing Cyc for recognizing textual entailment. They examined whether ‘Z sold Y to X’ can be inferred from ‘X bought Y from Z’ using ResearchCyc’s axioms. 93 This inference failed because the relationship between ‘OfferingForSale’ and ‘Buying’ concepts in Cyc were not well-defined. It seems that in Cyc, there isn’t much cohesion between predicates related to the same theory of the world. We didn’t find axioms that relate such relevant and generic concepts as in- crease, scale or more. In fact, we didn’t find any predicate for more, but rather found its specifications such as moreLikelyThan and moreSkilledThan. We also didn’t find any generic concept for change, hence no mapping of the word ‘change’ either. Instead there were different specific concepts such as TemperatureChangingProcess, condensation and cleaningAnObject under the ‘change-of-state’ topic. 2. Works Based on the Relational Approach Rather than decomposing word meaning into more abstract concepts, relational approaches capture one or several aspect(s) of a concept/word/phrase’s meaning through a set of pre- defined relations between that concept/word/phrase and other concepts/words/phrases. Pair- wise relations between words, especially when the relation types are limited to a small set, have very limited capacity for capturing knowledge. Here we give two illustrative examples. In these examples, we assume that argument structures are available, although in most resources this is not the case. In the first example, we try to model the meaning of ‘move’ in terms of binary relations. In Deep Lexical Semantics, we define x’s moving from p1 to p2 via the following axiom: move'(e, x, p1, p2) → change'(e, e1, e2) & at’(e1,x, p1) & at’(e2, x, p2) This axiom can be partially represented using the following two binary relations: before [at(x,p1), move(e,x,p1,p2)] after[at(x,p2), move(e,x,p1,p2)] The reason for considering this translation incomplete is that the extra information in the predi- cate change - which states that at(x,p1) and at(x,p1)are inconsistent - is missing. 94 In the second example, the situation is worse. We cannot translate the following axiom for ‘in- crease’ into binary relations: Increase’(e, x) → (there exists some p1 and p2), change'(e, e1, e2) & at’(e1, x, p1) & at’(e2, x, p2) & lessThan(p1, p2) Despite these limitations, it is worth to mention the following relation-based resources, espe- cially because of some of their verb-verb relations (e.g., temporal-relation and entailment). WordNet 40 : WordNet (Miller et.al, 1990) is the most widely used lexical semantic resource in the NLP community, mostly due to its large lexical coverage and variety of semantic relations. WordNet has grouped synonym words into synsets - which can be viewed as concepts – and established lexical-semantic relations between these synsets. Examples of these relations are hypernymy (cat is an animal), member (policeman is a member of police), meronymy (hand is a part of body), entailment (license entails approve) and causation (kill causes die). In addition to semantic information, WordNet also provides some form of syntactic information about verbs under the sentence frame entries. Extended WordNet KB (XWN-KB): (Moldovan et al. 2006) describe a system that can extend the knowledge in WordNet by leveraging semantic relations between word senses in Extended WordNet’s glosses 41 . They used a semantic parser to transform the glosses into semantic triples. 30 semantic relations are introduced in the resulting resource, XWN-KB, including IsA (ISA), Lo- cation (LOC), Part-Whole (PW), Property Type (PRO), Temporal relation (TMP) and Theme (THM). For example, the WordNet gloss for window#2 which is a transparent opening in a vehicle that allow vision out of the sides or back ; usually is ca- pable of being opened. is transformed into the following semantic relations: 40 http://wordnet.princeton.edu/ 41 In Extended WordNet (Moldovan and Rus, 2001), glosses are sense-disambiguated and translated into logical forms. 95 ISA(window, opening) & PW(opening, vehicle) & TMP(usually, capable) & PRO(transparent, opening) & THM(vision, allow) Using hand-coded meta-rules such as ‘if X is-a Y and Y has property Z, then X has property Z’, the system can infer from ISA(window, opening) and PRO(transparent, opening) that PRO(transparent, window). VerbOcean: VerbOcean (Chklovski and Pantel, 2004) is a semantic network of verbs in terms of verb-verb relations including, but not limited to, enablement and happens-before. Chklovski and Pantel used different lexico-syntactic patterns to query the web for such relations. For example, V1 is accomplished by V2 is a pattern for the enablement relation. Examples of happens-before relations are (‘increase’ happens-before ‘double’) and (‘increase’ happens-before ‘exceed’). Ver- bOcean, like other automatically generated resources, is not a reliable source for entailment rules, as for example it contains tuples like (‘shrink’ happens-before ‘increase’) which without additional context (e.g., shrinking of a material increases its density), is misleading. Examples of basic definitional knowledge found in VerbOcean are (‘have’ happens-before ‘put’) and (‘have’ happens-before ‘regain’). However, such higher-quality tuples are not prevalent in VerbOcean. ConceptNet : ConceptNet 42 (Liu and Singh, 2004 ; Havasi et al, 2007 ; Speer and Havasi 2013) is a machine-readable commonsense knowledge resource automatically mined out of the Open Mind Commonsense Corpus (OMSC; Singh et al., 2002) which is the result of a collaborative effort of thousands of anonymous non-expert users over the internet. ConceptNet nodes are natural language fragments, which are semi-structured according to preferred syntactic pat- terns. Although ConceptNet is not considered a lexical-semantic resource by the NLP communi- ty, it contains relations that, when applied to word pairs, are quite similar to relational lexical semantics. Examples of such relations are IsA, PartOf, HasA, MadeOf, DefinedAs, HasProperty, UsedFor, DerivedFrom, SimilarTo, CapableOf, Causes, HasSubevent, HasPrerequisite, Has- FirstSubevent, ReceivesAction, HasLastSubevent and CreatedBy. Other relations are AtLocation, RelatedTo, MotivatedByGoal, Desires, CausesDesire, LocatedNear and SymbolOf. Some of the interesting lexical knowledge found in ConceptNet are listed in Table 7-1. 42 http://conceptnet5.media.mit.edu/ 96 adult -has property- older than child number -part of- scale scale -is related to– measure set –made of– member set –isA- collection with strict rule of mem- bership whole –made of- part Zero –isA - lower number than one zero plus zero -hasProperty- zero ‘zero –isA- set containing one element’ fall –causes- go down fall –hasSubevent- hit ground fall –hasFirstSubevent- accelerate downward fall –hasPrerequisite- free space under you fall –hasSubevent- break bone run –causes- get somewhere fast run –motivatedByGoal- go fast run -relatedTo– fast run -relatedTo– fast walk run -hasProperty- walk Table 7-1: Examples of ConceptNet’s lexical-semantics knowledge As the examples in Table 7-1 suggest, there are interesting deep lexical (set –isA- collection with strict rule of membership) and commonsense knowledge (fall –hasPrerequisite- free space under you) in ConceptNet. However, words and phrases participating in the relations are not grounded in any theory that systematically relates them. For example, there is no relation be- tween ‘more’ and ‘scale’. In addition, there are complex phrases such as ‘collection with strict rule of membership’ that is not related to any other words or phrases that explicate it. There are also many relations that make no sense without more context. An example from Table 7-1 is fall –has subevent- break bone. Another drawback of ConceptNet is its limited set of relations which has resulted in many concept relations being under specified using the very general rela- tion RelatedTo. Although the knowledge in ConceptNet seems sparse and not coherent, its supplementary rea- soning methodology - that works with semi-structured linguistic representation of knowledge - can compute similarity between concepts (e.g., ‘buy food’ and ‘purchase groceries’) and also create inference chains; both of which can alleviate the sparsity and coherence problems. 97 3. Works Based on the Decompositional Approach VerbNet: VerbNet 43 (Kipper et al., 2000, 2006) contains syntactic and semantic information about verb classes. Verbs are classified according to (Levin, 1993) and each class can be seen as a conceptual entity. Each verb class is assigned several syntactic frames as well as a semantic axiom consisting of the conjunction of semantic predicates. An example of the semantics repre- sentation of a verb is the ‘Agent Event Patient’ frame for the hit-18.1 class which is: cause(Agent, Event) & manner(during(Event) & directed motion,Agent) & ~contact(during(Event), Agent, Patient) & manner(end(Event), forceful, Agent) & con- tact(end(Event), Agent, Patient) The main problem with VerbNet’s axioms is that the predicates (into which verb meanings are decomposed) are not axiomatized or linked to any formal theory and are just arbitrary labels. In addition, VerbNet’s grouping and hence axiomatization of verbs can be too general. For exam- ple, VerbNet groups ‘increase’ and ‘decrease’ (which are opposites) along with other verbs such as ‘dip’ and ‘swell’ under the same class calibratable_cos-45.6. VerbNet also doesn’t have enough coverage of verb senses. For example, we could not find the verb ‘compose’ in the sense of ‘putting together’ in VerbNet. FrameNet: FrameNet (Ruppenhofer et al. 2010) is based on Fillmore’s frame semantics (Fill- more, 1976) and supported by corpus evidence. The lexical meaning of predicates in FrameNet is expressed in terms of frames, which are supposed to describe prototypical situations spoken about in natural language. Every frame contains a set of roles corresponding to the participants of the described situation, e.g., DONOR, RECIPIENT, THEME for the GIVING frame. Predicates with similar semantics evoke the same frame, e.g., “give” and “hand over” both refer to the GIVING frame. Coherence of meaning elements in FrameNet is captured by semantic relations, which are de- fined on frames. For example, the GIVING and GETTING frames are connected by the causation relation. Roles of the connected frames are also linked, e.g., DONOR in GIVING is linked to 43 http://verbs.colorado.edu/ mpalmer/projects/verbnet.html 98 SOURCE in GETTING. Table 7-2 shows types of frame relations, their frequencies and an exam- ple for each relation. Frame Relation Example Frequency Inheritance Using Sub-frame Perspective Precedence Causative-of Inchoative-of See also GIVING – COMMERSE_SELL OPERATE_VEHICLE – MOTION SENTENCING – CRIMINAL_PROCESS OPERATE_VEHICLE – USE_VEHICLE FALL_ASLEEP - SLEEP KILLING - DEATH COMING_TO_BE - EXISTENCE LIGHT_MOVEMENT – LOCATION_OF_LIGHT 617 490 117 99 79 48 16 41 Table 7-2: FrameNet’s frame relations with examples and number of occurrences Although these kinds of relations make FrameNet useful for some kinds of shallow inferences, FrameNet does not support inferences that require deeper decomposition such as axioms relat- ing ‘increase’ to its meaning components like change, scale, less or more. In addition, Frame- Net’s frames are sometimes too general and mapping words to these frames leaves out some important implications. For example, the words ‘increase’ and ‘decrease’ are both mapped to the frame Change_position_on_a_scale (similar to VerbNet’s grouping of these words under the calibratable_cos-45.6 class). Despite these limitations, FrameNet is a valuable resource and a complement for our Deep Lexi- cal Semantics enterprise (as we have shown in chapter 5) due to the following reasons: 1) It groups many similar words under more abstract concepts, i.e., frames. 2) The roles proposed for a frame are suggestive of the meaning elements that need to be captured in the frame’s axiomatization 3) The syntactic patterns facilitate mapping phrases to frames and constructing the right argument structures. 99 (Schubert et al. 2011) try to find meaning of eventive words such as ‘dressing oneself involves putting on clothing’, ‘picking up an object involves grasping and lifting it’ and ‘requesting people to do something conveys to them the requester’s desire that they do it’. They begin with as- sembly of about 150 ‘primitive’ verb senses which they describe as senses corresponding to concepts that 1) even small children understand and 2) numerous verbal concepts have entail- ments involving them. Example of such primitives are ‘grasping’ , ‘letting go of an object’ or ‘asking someone to do something’. In addition to these intuitions, they also use about 20 se- mantic predicates found in VerbNet (e.g., begin, exist, force), and 65 VerbNet class names (e.g., break, carry, fill, learn, own, pour, and stop). They have axiomatized about 100 of the primitive verb-senses in terms of other primitives (which aren’t necessarily eventive). Ex- amples of axioms for the primitive verbs ‘lift’ and ‘ask-of’ are: (all x (all y (all e: [[x lift y] ** e] [[x ((adv-a upwards) (move-trans y))] * e]))), (all_pred p (all x (all y: (all e1 [[x ask-of y (Ka p)] ** e1] [[x convey-info-to y (that [[x want-tbt (that (some e2 [e2 right-after e1] [[y p] ** e2]))] @ e1])] * e1])))) The first axiom says that lifting something entails moving it upward, where this moving event is a part of the lifting event. The second axiom says that asking someone to do something entails conveying to them that one wants them to do it. The goal of this project is to axiomatize all verb classes in VerbNet in terms of primitives as much as possible. The authors create (one or more) schemas for each class with parameters that can be instantiated for different members of that class. For example, the verb class create is axiomatized with the following schema, where VERB and A (the type of the object of VERB) are parameters: (all x (all y: [y A] (all e: [[x VERB y] ** e] [[x make.v y] ** e]))) 100 In the above schema, the predicate make.v is a primitive axiomatized as cause to exist. Differ- ent members of the create class such as ‘compute’ can be axiomatized by simply instantiating VERB and A with ‘compute’ and ‘information’ respectively. Schubert and colleagues also pro- pose to take an object-oriented approach for further specializing their ‘generic’ axioms. The example they give is the verb ‘open’ which can be axiomatized by knowing the object of ‘open’ (door, book, briefcase, etc). Although this method seems very similar to our axiomatization effort, it is silent about the na- ture of the primitives and whether they are related through any coherent theory- which is the main point of Deep Lexical Semantics. (Allen et. al 2013) Recently, James Allen and colleagues have started to build knowledgebases automatically by reading definition of words. In one work, (Allen et al. 2013), they extended the hierarchical organization of WordNet verbs (the troponym hierarchy) by categorizing the 559 top level verb senses under more general concepts. For example kill%2:35:00 will have the su- perclass concept cause%2:36:00 ⊓ ∃_effect.die%2:30:00 which itself has the superclass concept cause%2:36:00. After completing the first iteration of classification, they recursively continue this procedure on the definitions of newly introduced concepts until no more new concepts are introduced. At this stage the definitions of certain verbs become quite abstract and/or circular. To tackle this problem, as reported in (Allen, 2014), they manually axiomatized a small set of aspectual verbs (e.g., ‘start’, ‘end’, and ‘continue’) and causal verbs (e.g., ‘cause’, ‘prevent’ and ‘stop’) in their temporal logic 44 . Once the definition of a verb reaches the point of including one of these verbs, they create a ‘temporal map’ of entailments from the event. This allows them to, for example, infer from the definition of ‘keep up’ that it does not occur over the time pe- riod of occurrence of ‘going to bed’ (Allen, 2014). Initial attempts to generate entailments from definition of verbs is reported in (Allen and Teng 2013) in which they try to learn the meaning of ‘change’ (which according to WordNet is ‘be- come different’), by combining their hand-coded definitions of ‘become’ and ‘different’. They 44 Decribed in (Allen and Teng, 2013) 101 found that a special formalization of scales is essential for obtaining the meaning of ‘change’ by combining the meaning of ‘become’ and ‘being different’. Table 7-3 shows their (manual) axi- omatization of ‘change’, ‘become’ and ‘different’ based on predicates in their theory of scales. They then prove that the full meaning of ‘change’ can be derived from its definition: ‘become different’. According to the work reported in (Allen and Teng, 2013), the authors only axiomatize theories for change-of-state and scales; which they require for learning entailments from verb defini- tions. They have compared their work to Deep Lexical Semantics and describe their knowledge- base as ‘a messy knowledge base that covers as much of the subtleties of language and word senses as possible, rather than developing a more minimal, but more abstract, theory’. Table 7-3: Axiomatization of ‘change’, ‘become’ and ‘different’ by (Allen and Teng, 2013) 102 Chapter 8 Conclusion In this dissertation, we argued that there is a lack of connection between lexical and world knowledge in existing resources and that a deep lexical semantics knowledgebase fills this gap by anchoring word meaning to predicates from language-motivated core theories of common- sense knowledge. This enables inference to go deeper than word level, into world level. We presented three approaches to building a deep lexical semantics knowledgebase: a purely manual approach, automatic extraction from text and a mixed approach where concept rela- tions in existing lexical-semantics resources are used to 1) identify the set of basic concepts that need manual axiomatization and 2) automatically axiomatize more specific concepts. In the purely manual approach, we aimed to manually axiomatize the 5000 most frequent Eng- lish word senses (corresponding to 3500 words). These words were first grouped according to the underlying core theories required for their axiomatization. We worked on one group at a time. In particular, in this dissertation, we axiomatized the change-of-state group. We used a three-step methodology for axiomatizing words: For each word, we first analyze the radial structure of its WordNet senses to identify the most general senses; we then manually axioma- tize these general senses in terms of predicates in the core theories; finally, we evaluate these axioms on textual entailment pairs to debug our axioms and find holes in the core theories. We axiomatized all the change-of-state words in Core WordNet -100 words with a total of 2500 word senses- resulting in 720 axioms for the most general senses. Since manual axiomatization is slow and needs a lot of effort (especially for analyzing the radial structure of word senses), we looked at possible ways to automate axiomatization. The first idea that we tried was automatic extraction of axioms from text. In particular, we tried to ex- tract the meaning of change-of-state event verbs (such as “retire”) from text by finding the states that are changed by the events (e.g., “be employed”) and plugging them in axiom tem- plates. We used lexico-syntactic patterns to extract candidate (STATE, EVENT) pairs from mil- lions of web pages and then used machine learning to filter non-change-of-state pairs. Our analysis of the change-of-state pairs showed that they result in three different types of axioms 103 and only a small fraction of them yield axioms in our desired level of generality; examples of which are (“steady”, “accelerate”) and (“not official”, “announce”). In most cases however, STATE is a specialization of the general state changed by EVENT and hence cannot be used for representing the meaning of EVENT. An example of such pairs is (“be teacher”, “retire”) where “be teacher” is a specialization of “be employed” and is too specific to capture the meaning of “retire”. We also showed that crowd sourcing tools like Mechanical Turk can be used for differ- entiating between (STATE, EVENT) pairs that yield different types of axioms. Automatic extraction of axioms from text has its own shortcomings. First of all, not all words fit into simple axiom templates. For example, the verb “return” has a complex meaning that can- not be described with a single word: it means “a change to a state S where there has been a change from S in the past”. Second, as we mentioned already, only a small fraction of the ex- tracted data yield axioms in our desired level of generality; so using crowd sourcing to separate this small fraction is not economic. Finally, the axioms obtained from this method consist of both predicates from core theories (here, changeFrom or changeTo) and simple words which, themselves, need to be anchored in core theories (e.g., “employed”, “in”, “at”). Assuming that we cannot get away with manual axiomatization of words altogether, we came up with a third approach which is an optimum mix of manual and automatic axiomatization. In the mixed approach, which is the main contribution of this dissertation, we use concept rela- tions in existing lexical semantics resources to identify the minimal set of concepts that should be manually axiomatized and then axiomatize a large number of relevant concepts automati- cally. We evaluated this approach on the domain of Composite Entities and its neighbor domain, Sets. We have identified 21 basic concepts related to these domains and manually axiomatized them in terms of predicates in our core theories. Using WordNet and FrameNet concept rela- tions, we could identify and automatically axiomatize 486 derived synsets. Our evaluation of the automatically-generated axioms showed an average precision of 0.78-0.89. 104 Finally, we showed one application of our lexical semantics knowledgebase in relation extrac- tion -in particular, part-whole relation extraction- from text. We showed how we can extract implicit part-whole relations from text using the automatically and manually created axioms for such words as “add”, “cut” and “remove”; as well as core-theory axioms such as the one that captures transitivity of part-whole relations. We compared our method to a state of the art sys- tem that automatically learns patterns capturing part-whole relations from large corpora and showed that 1) most of our axioms were not learned from text by the automatic system; 2) our axioms have a much higher precision (but lower recall) than the patterns learned from text by the automatic system. The mixed approach can be extended in the following dimensions: First of all, currently we have evaluated this method on the domains of composite entities and sets. Our evaluation of the performance of this method will be more accurate if we apply it to several other domains of commonsense knowledge. Second, this method greatly benefits from FrameNet’s word-frame mappings and syntactic patterns which are currently sparse. Adding more word-frame map- pings and syntactic patterns to FrameNet will increase the coverage and precision of the auto- matically-generated axioms. Third, we can use dictionary definitions (i.e., glosses) as another source for identifying and axiomatizing derived concepts. Indeed, we have currently imple- mented this method, but did not present it in this work due to its currently low-quality results. Fourth, we can potentially benefit from other resources such as VerbNet that also provide us with a hierarchy of concepts, word-concept mappings, syntactic structures and even semantic axioms. Currently, semantic axioms in VerbNet use dangling predicates that need to be an- chored in core theories. Although we evaluated our axioms on the simple task of relation extraction from text, they have application in any task that benefits from deep understanding and reasoning. As we argued be- fore, our ideal evaluation framework is the well-defined Recognizing Textual Entailment (RTE) task, as it captures major semantic inference needs required by many NLP applications such as question answering, information retrieval, information extraction, and document summariza- tion. Developing a fully functional RTE system was beyond the scope of this work. However, it 105 should be possible to integrate our lexical semantics knowledgebase with existing RTE systems, especially those that use inference. In order to do a fair evaluation however, current RTE data- sets from organized challenges are not recommended, as they 1) rarely require deep reasoning and 2) are too broad in terms of the domains and words that need to be axiomatized. We rather recommend developing domain-specific textual entailment datasets that require deeper under- standing. 106 References Allen, J. F., "Learning a Lexicon for Broad-Coverage Semantic Parsing." ACL 2014 (2014): 1. Allen, J. F., De Beaumont, W., Galescu, L., Orfan, J., Swift, M., & Teng, C. M. (2013). “Automatically Deri v- ing Event Ontologies for a CommonSense Knowledge Base”. In Proceedings of the International Confe- rence for Computational Semantics. Baker, C. F., Charles J. Fillmore, and John B. Lowe. "The berkeley framenet project." Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 1998. Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., & Etzioni, O. (2007, January). Open information extraction for the web. In IJCAI (Vol. 7, pp. 2670-2676). Becker, J., and Dominik Kuropka. "Topic-based vector space model." Proceedings of the 6th International Conference on Business Information Systems. 2003. Berland, Matthew, and Eugene Charniak. "Finding parts in very large corpora." Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. Associa- tion for Computational Linguistics, 1999. Bos, J.. "Wide-coverage semantic analysis with Boxer." Proceedings of the 2008 Conference on Seman- tics in Text Processing. Association for Computational Linguistics, 2008. Carlson, Andrew, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr, and Tom M. Mit- chell. "Toward an Architecture for Never-Ending Language Learning." In AAAI, vol. 5, p. 3. 2010. Chklovski, T. and P. Pantel (2004). VerbOcean: Mining the Web for Fine-Grained Se- mantic Verb Rela- tions. In D. Lin and D. Wu (Eds.), Proceedings of EMNLP 2004, pp. 33–40. Association for Computational Linguistics. Clark, P., and Phil Harrison. "Large-scale extraction and use of knowledge from text." Proceedings of the fifth international conference on Knowledge capture. ACM, 2009. Cox, C., "Assessing the utility of ResearchCyc in recognizing textual entailment". Technical report, De- partment of Computer Science, Stanford University, 2005. Cycorp, I. (2008). The Cyc project home page (2008) Available online at: http://www. cyc. com. Retrieved on December 10th. Dagan, I., Oren Glickman, and Bernardo Magnini. "The PASCAL recognising textual entailment chal- lenge." Machine learning challenges. evaluating predictive uncertainty, visual object classification, and recognising tectual entailment. Springer Berlin Heidelberg, 2006. 177-190. Dagan, I., and Ken Church. "Termight: Identifying and translating technical terminology." Proceedings of the fourth conference on Applied natural language processing. Association for Computational Linguistics, 1994. Davis, E. (1990). Representations of commonsense knowledge. Morgan Kaufmann Publishers Inc.. 107 Etzioni, O., Fader, A., Christensen, J., Soderland, S., & Mausam, M. (2011, July). Open Information Extrac- tion: The Second Generation. In IJCAI (Vol. 11, pp. 3-10). Girju, R., and Dan I. Moldovan. "Text mining for causal relations." FLAIRS Conference. 2002. Girju, R. "Automatic detection of causal relations for question answering." Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering-Volume 12. Association for Computa- tional Linguistics, 2003. Girju, R., Adriana Badulescu, and Dan Moldovan. "Automatic discovery of part-whole relations." Compu- tational Linguistics 32.1 (2006): 83-135. Gordon, J., and Lenhart K. Schubert. "Discovering commonsense entailment rules implicit in sentences." Proceedings of the TextInfer 2011 Workshop on Textual Entailment. Association for Computational Lin- guistics, 2011. Gruber, J.C. , Studies in Lexical Relations, unpublished Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, Massachusetts. (1965) Guha, R. V., and Douglas B. Lenat. "Cyc: a mid-term report." Applied Artificial Intelligence an Internation- al Journal 5.1 (1991): 45-86. Havasi, C., Robert Speer, and Jason Alonso. "ConceptNet 3: a flexible, multilingual semantic network for common sense knowledge." Recent Advances in Natural Language Processing. 2007. Hearst, M. A. "Automatic acquisition of hyponyms from large text corpora." Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1992. Hobbs, Jerry R. "Ontological promiscuity." Proceedings of the 23rd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 1985. Hobbs, Jerry R. "World knowledge and word meaning." Proceedings of the 1987 workshop on Theoreti- cal issues in natural language processing. Association for Computational Linguistics, 1987. Hobbs, J. R., Stickel, M., Martin, P., & Edwards, D. (1988, June). Interpretation as abduction. In Proceed- ings of the 26th annual meeting on Association for Computational Linguistics (pp. 95-103). Association for Computational Linguistics. Hobbs, Jerry R. Metaphor and abduction. Springer Berlin Heidelberg, 1992. Hobbs, Jerry R. "Monotone decreasing quantifiers in a scope-free logical form." Semantic Ambiguity and Underspecification (1996): 55-76. Hobbs, Jerry R., and Srini Narayanan. "Spatial representation and reasoning." Encyclopedia of Cognitive Science (2002). Hobbs, Jerry R. "Toward a useful concept of causality for lexical semantics." Journal of Semantics 22.2 (2005): 181-209. Hobbs, Jerry R. "Deep lexical semantics." Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg, 2008. 183-193. Hobbs, Jerry R., and Andrew S. Gordon. "Goals in a Formal Theory of Commonsense Psychology." FOIS. 2010. Hobbs, Jerry R., and Alicia Sagae. "A Commonsense Theory of Microsociology: Interpersonal Relation- ships." AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning. 2011. 108 Hobbs, Jerry R., and Andrew S. Gordon. "The Deep Lexical Semantics of Emotions." Affective Computing and Sentiment Analysis. Springer Netherlands, 2011. 27-34. Hobbs, Jerry R. "Case, Constructions, FrameNet, and the Deep Lexicon." ACL 2014 1929 (2014): 10-12. Hovy, D., Berg-Kirkpatrick, T., Vaswani, A., & Hovy, E. H. (2013). Learning Whom to Trust with MACE. In HLT-NAACL (pp. 1120-1130). Inoue, N. and Kentaro Inui. "ILP-Based Reasoning for Weighted Abduction." Plan, Activity, and Intent Recognition. 2011. Ittoo, A., and Gosse Bouma. "On learning subtypes of the part-whole relation: do not mix your seeds." Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2010. Ittoo, A., and Gosse Bouma. "Minimally-supervised extraction of domain-specific part–whole relations using Wikipedia as knowledge-base." Data & Knowledge Engineering 85 (2013): 57-79. Jackendoff, R.S. Semantic interpretation in generative grammar. Cambridge, MA: The MIT Press. (1972) Keet, C.M. and Alessandro Artale, “Representing and reasoning over a taxonomy of part–whole rela- tions”, Applied Ontology 3 (1) (2008) 91–110. Kipper, K., Hoa Trang Dang, and Martha Palmer. "Class-based construction of a verb lexicon." AAAI/IAAI. 2000. Kipper, K., Korhonen, A., Ryant, N., & Palmer, M. (2006, June). Extending VerbNet with novel verb classes. In Proceedings of LREC (Vol. 2006, No. 2.2, p. 1). Lakoff, G.., Women, fire, and dangerous things: What categories reveal about the mind. Chicago: Univer- sity of Chicago press, 1990. Lakoff, G.and Mark Johnson. Metaphors we live by. University of Chicago press, 2008. Lapata, M., and Frank Keller. "Web-based models for natural language processing." ACM Transactions on Speech and Language Processing (TSLP) 2.1 (2005): 3. Lenat, Douglas B. "CYC: A large-scale investment in knowledge infrastructure." Communications of the ACM 38.11 (1995): 33-38. Liu, H., and Push Singh. "ConceptNet—a practical commonsense reasoning tool-kit." BT technology jour- nal 22.4 (2004): 211-226. Mahesh, K., Nirenburg, S., Cowie, J., & Farwell, D. (1996). An assessment of CYC for natural language processing. McCarthy, J., "Circumscription—a form of non-monotonic reasoning." Artificial intelligence 13.1 (1980): 27-39. Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to wordnet: An on- line lexical database*. International journal of lexicography, 3(4), 235-244. Mitchell, T. M., Cohen, W., Hruschka, E., Talukdar, P., Betteridge, J., Carlson,A., Mishra, B. D., Gardner, M., …, “Never-Ending Learning”. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelli- gence (AAAI-15). Moldovan, Dan I., Mitchell Bowden, and Marta Tatu. "A Temporally-Enhanced PowerAnswer in TREC 2006." TREC. 2006. 109 Montazeri, N and J. R. Hobbs, “Synonymy and Near-Synonymy in Deep Lexical Semantics”. Proceedings of Workshop on Computational Approaches to Synonymy, Helsinki, Finland, October 2010. Montazeri, N., and J. R. Hobbs. "Elaborating a knowledge base for deep lexical semantics." Proceedings of the Ninth International Conference on Computational Semantics. Association for Computational Lin- guistics, 2011. Montazeri, N., and J. R. Hobbs. "Axiomatizing Change-of-State Words." FOIS. 2012. Montazeri, N., Hobbs, J. R. and Hovy, E. (2013). “How Text Mining Can Help Lexical and Commonsense Knowledgebase Construction.” Proceedings of 11th International Symposium on Logical Formalizations of Commonsense Reasoning (Commonsense 2013). Montazeri N. and J. R. Hobbs, Which States Can Be Changed by Which Events?. 12th International Sym- posium on Logical Formalizations of Commonsense Reasoning (Commonsense 2015) Ovchinnikova, Ekaterina. Integration of world knowledge for natural language understanding. Vol. 3. Springer Science & Business Media, 2012. Ovchinnikova, E., Montazeri, N., Alexandrov, T., Hobbs, J. R., McCord, M. C., & Mulkar-Mehta, R. (2014). Abductive reasoning with a large knowledge base for discourse processing. In Computing Meaning (pp. 107-127). Springer Netherlands. Palmer, M., “Are Wordnet sense distinctions appropriate for computational lexicons?” In Advanced Pa- pers of the SENSEVAL Workshop, Sus-sex, UK. (1998) Pan F., and J. R. Hobbs, 2005. ``Temporal Aggregates in OWL-Time", Proceedings, Workshop on Natural Language-based Knowledge Representations: New Perspectives, Florida Artificial Intelligence Research Society International Conference (FLAIRS 2005), Clearwater Beach, FL, May 2005. [pdf, 127K] Pantel, P., and Marco Pennacchiotti. "Espresso: Leveraging generic patterns for automatically harvesting semantic relations." Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006. Pantel, P.; Ravichandran, D.; Hovy, E.H. 2004. Towards terascale knowledge acquisition. In Proceedings of COLING-04. pp. 771-777. Geneva, Switzerland. Ravichandran, D. and Hovy, E.H. 2002. Learning surface text patterns for a question answering system. In Proceedings of ACL-2002. pp. 41-47. Philadelphia, PA. Ruppenhofer, J., M. Ellsworth, M. Petruck, C. Johnson, and J. Scheffczyk (2010). FrameNet II: Extended Theory and Practice. Technical report, Berkeley, USA. Schubert, L. K.,"Can we derive general world knowledge from texts?." Proceedings of the second inter- national conference on Human Language Technology Research. Morgan Kaufmann Publishers Inc., 2002. Schubert, L. K., Gordon, J., Stratos, K., & Rubinoff, A. (2011, November). Towards Adequate Knowledge and Natural Inference). In AAAI Fall Symposium: Advances in Cognitive Systems. Shinyama, Yusuke and Satoshi Sekine. 2006. Preemptive information extraction using unrestricted rela- tion discovery. In HLT-NAACL-06, pages 304–311,New York, NY. Singh, P., Lin, T., Mueller, E. T., Lim, G., Perkins, T., & Zhu, W. L. (2002). Open Mind Common Sense: Knowledge acquisition from the general public. In On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE (pp. 1223-1237). Springer Berlin Heidelberg. 110 Speer, R., and Catherine Havasi. "ConceptNet 5: A large semantic network for relational knowledge." The People’s Web Meets NLP. Springer Berlin Heidelberg, 2013. 161-176. Szpektor, I., Tanev, H., Dagan, D., & Coppola, B. (2004). “Scaling web -based acquisition of entailment relations”. Szpektor, I., and Ido Dagan. "Augmenting wordnet-based inference with argument mapping." Proceed- ings of the 2009 Workshop on Applied Textual Inference. Association for Computational Linguistics, 2009. Thomas, J. A., and T. M. Cover. “Elements of information theory”. Vol. 2. New York: Wiley, 2006. Tonelli, S., and Daniele Pighin. "New features for FrameNet: WordNet mapping." Proceedings of the Thir- teenth Conference on Computational Natural Language Learning. Association for Computational Linguis- tics, 2009. Tratz, S., and Eduard Hovy. “A fast, accurate, non-projective, semantically-enriched parser”, Proceedings of the Conference on EMNLP.2011 Yamamoto, K., Inoue, N., Inui, K., Arase, Y., & Tsujii, J. I. (2015). Boosting the Efficiency of First-Order Abductive Reasoning Using Pre-estimated Relatedness between Predicates. International Journal of Ma- chine Learning and Computing, 5(2), 114. Yates, A., Cafarella, M., Banko, M., Etzioni, O., Broadhead, M., & Soderland, S. (2007, April). Textrunner: open information extraction on the web. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstra- tions (pp. 25-26). Association for Computational Linguistics. Zanzotto, F.M. and M. Pennacchiotti. “Expanding textual entailment corpora from Wikipedia using co- training.” Proceedings of the 2nd Workshop on Collaboratively Constructed Semantic Resources, COLING 2010. 111 Appendix I. Sample Change-of-State Axioms Move 4] move-s0'(e,x,p0,p1) → changeFrom'(e,e0,e1) & at'(e0,x,p0) & at'(e1,x,p1) 5] move-v7'(e,x,p0,p1) → move-s0'(e,x,p0,p1) & subject'(ha0,p0) & subject'(ha1,p1) 6] move-v1'(e,x,p0,p1) → move-v7'(e,x,p0,p1) & location'(ha0,p1) & location'(ha1,p2) 7] move-v4'(e,x,p0,p1) → move-v7'(e,x,p0,p1) & affiliation'(ha0,p1) & affiliation'(ha1,p2) 8] move-v68'(e,x) → act'(e,x) 9] move-v23'(e,x,y,p0,p1) → cause'(e,x,e0) & move-v7'(e0,y,p0,p1) & location'(ha0,p0) & loca- tion'(ha1,p1) 10] move-v12'(e,x,y) → move-v23'(e,x,y,p0,p1) & sell'(e,x,y) 11] move-v911'(e,x,y,e0) → cause'(e,x,e0) & mental-event'(ha0,e0) & arg'(ha1,y,e0) 12] move-v10'(e,x,y) → move-v911'(e,x,y,e0) & move-v68'(e0,y) 13] move-n1'(e0,e,x) → move-v68'(e,x) 14] move-n2'(e0,e,p0,p1) → move-v4'(e,x,p0,p1) 15] move-n3'(e,x) → move-v1'(e,y,p0,p1) & physicalPartOf'(ha0,y,x) & agentOf'(ha1,x,e) 16] move-n4'(e,e0,x) → move-v1'(e0,x,p0,p1) 17] movement-n9'(e,e0,p0,p1) → change'(e,x,e2,e3) & hasOpinion'(e2,x,p0) & hasOpinion'(e3,x,p1) 19] movement-n4'(e,x) → movement-n6'(e,e0,x) 20] movement-n10'(e,x) → cause'(e0,x,e1) & move-s0'(e1,y,p0,p1) Retire 21] retire-v2'(e,x,e1) → decrease'(e,e2) & levelOf'(e2,e1) & activityOf'(ha0,e1,x) 22] retire-v1'(e,x) → retire-v2'(e,x,e1) & work'(ha0,e1) 23] retire-v6'(e0,y,x,e1) → cause'(e0,y,e) & retire-v1'(e,x) 112 24] retire-v7'(e,x,y) → retire-v6'(e0,y,x,e1) & use'(e1,x,y) 25] retire-v11'(e,x) → retire-v2'(e,x,e1) & daily-activity'(e1) 26] retire-v5'(e,x) → retire-v2'(e,x,e1) & intend'(ha0,x,e2) & temporary'(e2,e) & gathering'(ha1,e1) 27] retire-v8'(e,x,y) → retire-v2'(e,x,e1) & interestedIn'(ha0,e1,x,y) 28] retire-v3'(e,x,y) → changeFrom'(e,e1) & at'(e1,x,y) & timeSpanOf'(ha0,t,e1) & relatively- long??'(ha1,t) Kick 29] kick-n1'(e0,e) → kick-s1'(e,x,y) 30] kick-n3'(e0,e) → kick-v5'(e,x) 31] kick-n5'(e,e0) → affect'(e0,x,y) & sudden'(ha0,e0) 32] kick-s0'(e,x,y) → cause'(e,x,e1) & changeFrom'(e1,e2) & rel'(e2,y,x) 33] kick-v1'(e,x,y) → kick-s0'(e,x,y) & hit-s11'(e1,z,y) & foot'(ha0,z,x) & gen'(ha1,e,e1) 34] kick-s1'(e,x,y) → cause'(e,x,e1) & hit-s11'(e1,z,y) & foot'(ha0,z,x) Pass 35] pass-s0'(e,e3) → and'(e,e1,e2) & changeTo'(e1,e3) & changeFrom'(e2,e3) & before'(ha0,e1,e2) 36] pass-s5'(e,x,y,s) → pass-s0'(e,e3) & rel'(e3,x,y,s) 37] pass-s1'(e,x,y,s) → pass-s0'(e,e3) & at'(e3,x,p,s) & at'(ha0,y,p,s) 38] pass-v4'(e,t) → pass-s0'(e,e3) & at'(e3,t,t0,s) & time'(ha0,s) 39] pass-v10'(e,x,t) → cause'(e,x,e1) & feel'(e1,x,e2) & pass-v4'(e2,t) 40] pass-s4'(e,x,p) → changeFrom'(e,e1) & at'(e1,x,p) & changeTo'(e0,e1) & before'(ha0,e0,e) 41] pass-v25'(e,x,y) → cause'(e,x,e0) & pass-s4'(e0,y,p) 42] pass-s2'(e,x,y) → cause'(e,x,e1) & pass-s4'(e,y,p) 43] pass-s3'(e,x,y,z) → cause'(e,x,e1) & and'(e,e1,e2) & before'(ha0,e1,e2) & changeTo'(e1,e3) & have'(e3,x,y) & change'(e2,e3,e4) & have'(e4,z,y) & not'(e5,e6) & use'(e6,x,y) 113 44] pass-v18-21'(e,y,z) → change'(e,e3,e4) & have'(e3,x,y) & changeTo'(e0,e1) & have'(e4,z,y) & not'(e5,e6) & use'(e6,x,y) 45] pass-s6'(e,x,y,z) → cause'(e,x,e1) & change'(e1,e3,e4) & have'(e3,x,y) & changeTo'(e0,e1) & have'(e4,z,y) & not'(e5,e6) & use'(e6,x,y) 46] pass-v20'(ha0,e,x,z) → pass-s6'(e,x,y,z) & ball'(ha1,y) & player'(ha2,x) & player'(ha3,z) 47] pass-v6'(e,x,p0) → at'(e,x,p0,s) 48] pass-v19'(e,x,p0) → changeTo'(e,e1) & at'(e1,x,p0,s) 49] pass-adj1'(e,x) → pass-v20'(e0,x) & arg'(ha0,x,p) 50] pass-n3'(e,e0) → pass-v20'(ha0,e,x,z)
Abstract (if available)
Abstract
Words describe the world, so if we are going to draw the appropriate inferences in understanding a text, we must have a prior explication of how we view the world (world knowledge) and how words and phrases map to this view (lexical semantics knowledge). ❧ Existing world knowledge and lexical semantics knowledge resources are not particularly suitable for deep reasoning, either due to lack of connection between their elements or due to their simple knowledge representation method (binary relations between natural language phrases). ❧ To enable deep understanding and reasoning over natural language, (Hobbs 2008) has proposed the idea of ""Deep Lexical Semantics"". In Deep Lexical Semantics, principal and abstract domains of commonsense knowledge are encoded into ""core theories"" and words are linked to these theories through axioms that use predicates from these theories. This research is concerned with the second task: Axiomatizing words in terms of predicates in core theories. ❧ We show that a large scale lexical semantics knowledgebase for a given domain can be developed by dividing the authoring task using the optimum mix of manual and automatic methods. We use concept relations in existing lexical semantics resources to systematically identify the optimum set of concepts that need to be axiomatized manually and axiomatize a large number of relevant concepts automatically. We have used this method to axiomatize concepts related to the domain of composite entities and evaluated the quality of the resulting axioms. Furthermore, we have evaluated the usefulness of these axioms on the well-studied task of extracting part-of relations from text.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Learning semantic types and relations from text
PDF
Representing complex temporal phenomena for the semantic web and natural language
PDF
Learning the semantics of structured data sources
PDF
Neural creative language generation
PDF
Lexical complexity-driven representation learning
PDF
Syntactic alignment models for large-scale statistical machine translation
PDF
Generating psycholinguistic norms and applications
PDF
Modeling, searching, and explaining abnormal instances in multi-relational networks
PDF
Learning distributed representations from network data and human navigation
PDF
Exploiting web tables and knowledge graphs for creating semantic descriptions of data sources
PDF
Neural networks for narrative continuation
PDF
Decipherment of historical manuscripts
PDF
Word, sentence and knowledge graph embedding techniques: theory and performance evaluation
PDF
Beyond parallel data: decipherment for better quality machine translation
PDF
Deep learning models for temporal data in health care
PDF
Syntax-aware natural language processing techniques and their applications
PDF
Grounding language in images and videos
PDF
From matching to querying: A unified framework for ontology integration
PDF
Enriching spoken language processing: representation and modeling of suprasegmental events
PDF
Memorable, secure, and usable authentication secrets
Asset Metadata
Creator
Montazeri, Niloofar
(author)
Core Title
Building a knowledgebase for deep lexical semantics
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
03/31/2016
Defense Date
01/19/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
AI,knowledge representation,lexical semantics,natural language processing,natural language understanding,NLP,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Hobbs, Jerry R. (
committee chair
), Gordon, Andrew S. (
committee member
), Kaiser, Elsi (
committee member
), Knight, Kevin (
committee member
)
Creator Email
montazeri.niloofar@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-223556
Unique identifier
UC11276923
Identifier
etd-MontazeriN-4216.pdf (filename),usctheses-c40-223556 (legacy record id)
Legacy Identifier
etd-MontazeriN-4216.pdf
Dmrecord
223556
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Montazeri, Niloofar
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
AI
knowledge representation
lexical semantics
natural language processing
natural language understanding
NLP