Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Exploiting structure in the Boolean weighted constraint satisfaction problem: a constraint composite graph-based approach
(USC Thesis Other)
Exploiting structure in the Boolean weighted constraint satisfaction problem: a constraint composite graph-based approach
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
EXPLOITING STRUCTURE IN THE BOOLEAN WEIGHTED CONSTRAINT SATISFACTION PROBLEM: A CONSTRAINT COMPOSITE GRAPH-BASED APPROACH by Hong Xu A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (PHYSICS) May 2019 Copyright 2019 Hong Xu Acknowledgments I would like to thank my advisors Dr. T. K. Satish Kumar and Dr. Sven Koenig for their persistent guidance and support on research, academic principles, and especially their willingness to accept me as their student, who had approached them with a very dierent academic background. I also would like to thank them for helping me revise this document. I also would like to thank my family members for their understanding, support, and encouragement. Specically, I would like to thank my wife for her extraordi- nary patience. I also would like to thank junior PhD, master's, and undergraduate students who have been advised by me, including Cheng Cheng, Dylan Johnke, Masaru Nakajima, Kexuan Sun, Zhi Wang, and Ka Wa Yip. Many ideas that excited me would not be published without their help. I also would like to thank other group members and collaborators, including Liron Cohen, Ferdinando Fioretto, Itay Hen, Craig A. Knoblock, Anoop Kumar, Jiaoyang Li, Hang Ma, Craig Milo Rogers, Tansel Uras, and Tang Zheng, for their intriguing discussions from time to time. I also would like to thank qualifying and defense committee members Gene Bickers, Stephan Haas, Itay Hen, and Aiichiro Nakano for their valuable comments 2 and suggestions. In particular, I would like to thank Dr. Aiichiro Nakano for his extra patience and help with my qualifying exam and defense. I would like to thank the National Science Foundation (NSF). The research at the University of Southern California was supported by NSF under grant numbers 1724392, 1409987, and 1319966. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the ocial policies, either expressed or implied, of the sponsoring organizations, agencies or the U.S. government. The order in which I have acknowledged everyone is no indication of any order in which I feel thankful to them. 3 Contents List of Figures 6 List of Tables 7 Abstract 8 1 Introduction 11 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.1.1 Why Studying COPs is Important . . . . . . . . . . . . . . . 12 1.1.2 Specicity of COPs . . . . . . . . . . . . . . . . . . . . . . . 13 1.1.3 Generality of the Weighted Constraint Satisfaction Problem 14 1.1.4 Solving the WCSP . . . . . . . . . . . . . . . . . . . . . . . 16 1.1.5 The Constraint Composite Graph . . . . . . . . . . . . . . . 19 1.2 Hypothesis and Research Questions . . . . . . . . . . . . . . . . . . 20 1.3 Overview and Contributions of This Dissertation . . . . . . . . . . . 21 2 Background 23 2.1 Basics in Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 The Weighted Constraint Satisfaction Problem . . . . . . . . . . . . 24 2.3 The Constraint Composite Graph . . . . . . . . . . . . . . . . . . . 26 2.3.1 Theoretical Properties of the CCG . . . . . . . . . . . . . . 31 3 The Nemhauser-Trotter Reduction on the CCG 33 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4 The Min-Sum Message Passing Algorithm on the CCG 40 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2 The Min-Sum Message Passing Algorithm Applied Directly on the Boolean WCSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3 The Min-Sum Message Passing Algorithm Applied on the CCG . . 44 4.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 50 4 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5 Integer Linear Programming Encoding of the Boolean WCSP via the CCG 59 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2 ILP Encodings of the WCSP . . . . . . . . . . . . . . . . . . . . . . 60 5.2.1 Direct ILP Encoding . . . . . . . . . . . . . . . . . . . . . . 61 5.2.2 Improved Direct ILP Encoding . . . . . . . . . . . . . . . . 62 5.2.3 CCG-Based ILP Encoding . . . . . . . . . . . . . . . . . . . 63 5.2.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.3 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 65 5.4 A Theoretical Property of the CCG-Based ILP Encoding . . . . . . 69 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6 Quantum Annealing for the WCSP via the Constraint Composite Graph 73 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.2 Quantum Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.3 Polynomial-based HQCA for the Binary Boolean WCSP . . . . . . 76 6.4 ILP-based HQCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.5 CCG-Based HQCA . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.5.1 An HQCA for the MWVC Problem . . . . . . . . . . . . . . 78 6.6 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 79 6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7 Promising Direction: Extending the Concept of the Constraint Composite Graph to the WCSP with Non-Boolean Variables 85 7.1 Formal Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.2 Construction of the CCG for the WCSP with Non-Boolean Variables 86 7.2.1 High-Degree Polynomial-Based Encoding . . . . . . . . . . . 88 7.2.2 Binary Number-Based Encoding . . . . . . . . . . . . . . . . 89 7.2.3 Direct Symmetric Encoding . . . . . . . . . . . . . . . . . . 90 7.2.4 Clique-Based Encoding . . . . . . . . . . . . . . . . . . . . . 90 7.3 Experimental Evaluation of the CCG-Based HQCA . . . . . . . . . 95 7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 8 Conclusion 97 8.1 Conclusion of Contributions . . . . . . . . . . . . . . . . . . . . . . 97 8.2 Further Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 8.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5 List of Figures 1.1 Potts Model and WCSP . . . . . . . . . . . . . . . . . . . . . . . . 16 1.2 RNA Motif Localization . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1 Minimum Weighted Vertex Cover . . . . . . . . . . . . . . . . . . . 24 2.2 An Example Binary Constraint . . . . . . . . . . . . . . . . . . . . 26 2.3 Example Projection of MWVCs on an IS . . . . . . . . . . . . . . . 27 2.4 Example Lifted Graphical Representations . . . . . . . . . . . . . . 28 2.5 Construction of an Example CCG . . . . . . . . . . . . . . . . . . . 29 2.5 Construction of an Example CCG (cont.) . . . . . . . . . . . . . . . 30 3.1 Kernelization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 The NT Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3 Eectiveness of the NT Reduction . . . . . . . . . . . . . . . . . . . 36 4.1 Factor Graph of a Boolean WCSP Instance . . . . . . . . . . . . . . 42 4.2 Comparison of Solution Qualities of the Original and Lifted MSMP Algorithms with Optimal Solutions . . . . . . . . . . . . . . . . . . 49 4.3 Comparison of Solution Qualities of the Original and Lifted MSMP Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.1 Comparison of ILP Encodings . . . . . . . . . . . . . . . . . . . . . 68 6.1 Polynomial Form of an Example Binary Constraint (Ising Formula- tion) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.2 Chimera Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.3 Comparison of Solution Qualities of HQCAs . . . . . . . . . . . . . 81 7.1 Polynomial Form of an Example Constraint with Non-Boolean Vari- ables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.2 Non-Boolean Variable Encoding: Example High-Degree Polynomial- Based Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 7.3 Comparison of Solution Qualities of HQCAs (with Non-Boolean variables) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6 List of Tables 4.1 Numbers of Benchmark Instances on which the Original and Lifted MSMP Algorithms Converged . . . . . . . . . . . . . . . . . . . . . 52 4.2 Number of Iterations and Running Time for the Benchmark Instances on which both the Original and Lifted MSMP Algorithms Terminated 53 4.3 Comparison of the Suboptimalities Produced by MSMP Algorithms on Small Random Instances . . . . . . . . . . . . . . . . . . . . . . 54 4.4 Comparison of the MSMP Algorithms on Small Random Instances . 55 5.1 Comparison of ILP Encodings . . . . . . . . . . . . . . . . . . . . . 64 5.2 Comparison of the Termination Statuses of the Direct/Improved Direct and CCG-Based Algorithms . . . . . . . . . . . . . . . . . . 67 5.3 Two Types of WCSP Constraints for the MWVC Problem . . . . . 71 7.1 Comparison of the Sizes of CCG Gadgets Using Dierent Non- Boolean Variable Encodings . . . . . . . . . . . . . . . . . . . . . . 94 7.2 Trade-O Between Non-Boolean Variable Encodings . . . . . . . . . 94 7 Abstract What is \structure"? And how can we exploit it in combinatorial optimization? These are the fundamental questions addressed in this dissertation for many rea- soning tasks on complex physical and non-physical systems. Reasoning tasks involving system design, state estimation, and prediction can be cast as combinatorial optimization problems (COPs). Traditionally, dierent kinds of COPs have been solved using dedicated algorithms. While such algorithms are certainly valuable, they have some important drawbacks. First, the algorithms developed for very specic subclasses are not applicable to real-world instances if they don't belong to these subclasses. Second, dierent research communities working on very specic COPs could be oblivious of each other's works and end up developing dierent terminologies and techniques for solving the same prob- lem. Therefore, a general mathematical framework that captures a wide variety of COPs facilitates informedness of dierent research communities, cross-fertilization of dierent perspectives, and a wider applicability to real-world domains. The weighted constraint satisfaction problem (WCSP) is a general mathemat- ical framework for COPs. It not only subsumes important COPs studied in many dierent research communities but also has a strong representational power use- ful for reasoning about complex physical and non-physical systems. Such systems 8 include classical spin glass systems, percolation theory-characterized systems, and social networks, among many other examples. Yes, the WCSP is representationally very powerful. But how can we design algorithms for solving it eciently? Isn't the very generality of the WCSP a curse? Are we up against a very general and intractable problem? While these questions are certainly valid for the general WCSP, the proposal in this dissertation is to exploit \structure". Imagine two subclasses of COPs, Class A and Class B. If Class B is more general than Class A, it is likely that we can build specialized algorithms for solving instances from Class A more eciently. But the hallmark of a good algorithm for solving instances from Class B is its ability to imitate the specialized algorithm if the input is in fact from Class A. Such an algorithm is said to exploit \structure." Although the formal denition of \structure" is elusive, the idea is to create a general-purpose algorithm for solving the WCSP that automatically simulates more specialized algorithms for subclasses of the WCSP. We can talk about two types of structure in the WCSP. The macro structure or the graphical structure represents which variables interact with each other. The micro structure or the numerical structure represents how the variables interact with each other. Two completely dierent schools of thought have led to algorith- mic techniques that exploit either the macro structure or the micro structure, but not both simultaneously. In 2008, the quest for a unifying mathematical framework that represents the macro structure as well as the micro structure of a WCSP was settled by the novel idea of the constraint composite graph (CCG). The CCG of a WCSP instance is an undirected graph that uses the same variables as the WCSP instance and an auxiliary set of variables to capture structure. Solving the WCSP is equivalent to solving the minimum weighted vertex cover (MWVC) problem on its associated CCG. Although the CCG can be constructed very eciently, not 9 much work was done until now in exploiting this transformation for theoretical or practical gains. In this dissertation, we raise and answer three research questions: Are there any theoretical advantages of the CCG other than for identifying tractable classes of the WCSP? Is there any practical usefulness of the CCG? Is it promising to extend the CCG to the WCSP with non-Boolean variables? We answer all three questions armatively. We answer the rst question by proving new theoretical properties of the CCG. We answer the second question by eciently implementing the CCG construction procedure and conducting experiments. We answer the third question by proposing new encodings for non-Boolean variables and preliminarily demonstrate their promisingness. On the one hand, the generality of the WCSP is intended to make it widely applicable and bring together researchers from dierent research communities. On the other hand, our theory of the CCG reduces it to a very specic COP, i.e., the MWVC problem. This transformation not only holds the remarkable promise of a general-purpose algorithm that can exploit structure in the WCSP but also empha- sizes the importance of the MWVC problem as a substrate COP. Specically, in this dissertation, we show how the CCG-based transformation can be used to: (a) kernelize a WCSP instance, i.e., x the optimal values of a subset of its variables using a max ow procedure even before search is initiated, (b) improve the eciency of the min-sum message passing algorithm, (c) make use of integer linear program- ming (ILP) solvers, and (d) solve COPs on quantum annealers more eectively. In addition, because our algorithms solve general COPs in the WCSP framework more eciently on classical computers, we provide better baselines for comparison against quantum computers, whose true eciency over classical computers is still debated. 10 Chapter 1 Introduction 1.1 Motivation Combinatorial optimization problems (COPs) use discrete variables and the inter- actions between them to characterize real-world and abstract systems. They usu- ally cannot be solved scalably by simply applying exhaustive search, since the amount of time required by such search increases exponentially as the number of variables increases, and therefore intelligent algorithms are usually required for solving them. Fortunately, after decades of eorts by researchers, we know that there is a rich class of COPs in P, meaning that there are known algorithms to solve them in polynomial time. However, on the other hand, we also know that there is another rich class of COPs that are known to be NP-hard, meaning that no algorithm can solve them in polynomial time under the assumption of P6=NP. Yet, in practice, we can still often solve many of them quickly, thanks to intelligent algorithms that exploit their structure. What is \structure"? And how can we exploit it in combinatorial optimiza- tion? These are the fundamental questions addressed in this dissertation for many reasoning tasks on complex physical and non-physical systems. In this section, we present the motivation to study these questions. We rst present two reasons for studying COPs: (a) Many problems in complex physical and non-physical sys- tems have a deep nexus to COPs, and (b) improving algorithms for solving COPs helps advance the state of the debates on whether the quantum annealer has a 11 true advantage over classical computers. After that, we discuss the specicity of COPs, i.e., how COPs have been categorized and studied individually. Then we discuss the generality of the weighted constraint satisfaction problem (WCSP) and why we should study it. We later discuss the ways to solve the WCSP and two types of structure|macro and micro structure|in the WCSP. Finally, we discuss the constraint composite graph (CCG) for the WCSP and how it simultaneously captures the two types of aforementioned structure. 1.1.1 Why Studying COPs is Important Many classical complex physical and non-physical systems, such as classical spin glass systems, percolation theory-characterized systems, and social networks, have a deep nexus to COPs due to their discrete nature. The task of computing ground energy states of a spin glass system can be modeled as a COP in which each dis- crete variable represents a spin and the optimization goal characterizes interactions between them. In fact, many discrete physics models, such as the random Ising model, have been frequently used as test beds for techniques that solve COPs (De Simone et al. 1995). COPs such as the minimum spanning tree problem are com- mon tools to study percolation theory (Alexander 1995; Bezuidenhout, Grimmett, and L oer 1998). In social networks and citation networks, two categories of complex systems that are commonly studied in physics (Golosovsky 2017; Wu et al. 2018), and many problems such as the maximum in uence problem (Kempe, Kleinberg, and Tardos 2003) and community detection (Kanawati 2014) can be modeled as COPs. Therefore, improving COP solving in general may help us solve and understand these systems. On the other hand, quantum annealers, the only commercially available phys- ical realization of quantum computers nowadays, by their nature solve quadratic 12 unconstrained binary optimization (QUBO) problems, a subset of COPs. A com- mon way to use quantum annealers for solving COPs in general is via hybrid quantum-classical algorithms (HQCAs), a class of algorithms that interleave the use of quantum annealers and classical computers. However, despite their quantum nature, their true eciency is controversial: Shor's algorithm (Shor 1994), the only known polynomial-time algorithm on quantum computers for solving NP-complete problems until today, cannot be implemented on quantum annealers. It is doubt- ful that there may exist an HQCA that has a super-polynomial speedup compared to existing algorithms on classical computers. Indeed, none of such algorithms have been conrmed to exist. For this reason, to disprove its eciency, a lot of research has been put into solving COPs using algorithms on classical computers more eciently. 1.1.2 Specicity of COPs Traditionally, dierent COPs have been solved independently using dedicated algo- rithms. For example, the (weighted) Max-Cut problem, which is equivalent to the problem of nding a minimum energy state of an Ising system, has its dedicated algorithms such as (Gao, Zeng, and Dong 2008; Kochenberger et al. 2013; Krish- nan and Mitchell 2006; Rendl, Rinaldi, and Wiegele 2008); the Max-SAT problem, which can be used to model problems in logic with uncertainty, has its dedicated algorithms such as EvaSolver (Narodytska and Bacchus 2014), OpenWBO (Mar- tins, Manquinho, and Lynce 2014), and LMHS (Saikko, Berg, and J arvisalo 2016); and the maximum (weighted) clique problem, which has been used to help solve many important problems, such as graph coloring, has its dedicated algorithms such as Cliquer (Niskanen and Osterg ard 2003), FastWClq (Cai and Lin 2016), MWCLQ (Fang, Li, and K. Xu 2016), and OTClique (Shimizu et al. 2017). 13 While such algorithms are valuable, they have important drawbacks. First, these algorithms are designed to solve a specic problem|they are not applicable to a dierent problem, often even if it is only slightly dierent. Secondly, dierent research communities working on dierent specic COPs could be oblivious of each other's works and often end up developing dierent terminologies and techniques for solving the same or similar problems. Therefore, a mathematical framework that captures a wide variety of COPs not only leads to wider applicability, but also facilitates informedness of dierent research communities and cross-fertilization of dierent perspectives. 1.1.3 Generality of the Weighted Constraint Satisfaction Problem The weighted constraint satisfaction problem (WCSP) is a general mathematical framework of COPs. It subsumes many important COPs such as the aforemen- tioned (weighted) Max-SAT, (weighted) Max-Cut, and maximum (weighted) clique problems. The constraint satisfaction problem (CSP) is a classic combinatorial problem. It consists of a set of discrete variables of nite domains and a set of constraints, each of which allows and forbids certain assignments of values to a subset of variables. The WCSP can be viewed as an optimization variant of the CSP where constraints are no longer \hard," but associated with non-negative costs (weights). The goal of the WCSP is to nd an assignment of values to the variables that minimizes the sum of the costs (Bistarelli et al. 1999). Studying the WCSP brings together dierent research communities. While the terminology was proposed in the constraint programming (CP) community, the problem itself is also known in dierent communities by dierent names and 14 has been solved and understood using dierent algorithms. In CP, branch-and- bound (BnB) search has been traditionally used to solve the WCSP (Hurley et al. 2016; Marinescu and Dechter 2006; Marinescu and Dechter 2007). In physics, the WCSP can be seen as a general framework that characterizes classical systems withn-body interactions. Simulated annealing and Monte Carlo algorithms, due to their embodiment of physical principles, are commonly used to understand physical systems (e.g., (Ferrenberg, J. Xu, and Landau 2018; Heim et al. 2015; Wauters et al. 2017)). In probabilistic reasoning, the WCSP is known to be equivalent to the maximum-a-posteriori (MAP) problem on a Markov random eld. Message passing algorithms are commonly used to solve this problem (Koller and Friedman 2009). In multi-agent systems, the WCSP is known as the distributed constraint optimization problem (DCOP). Here, distributed versions of BnB (e.g., (Yeoh, Felner, and Koenig 2010)) and message passing algorithms (Cohen and Zivan 2018; Farinelli et al. 2008) have been used to solve this problem. Studies on the WCSP, therefore, are of interest for many dierent elds and facilitate the bond between physics and computer science. The WCSP has a strong representational power|it is a general powerful tool that has been used to model many important COPs in many complex physical and non-physical systems. For example, in condensed matter physics, the WCSP can be used to nd the ground state in Potts model and its generalizations such as the multi-body p-spin model (M ezard and Montanari 2009, p. 155) (as illustrated in Figure 1.1); in biophysics, it can be used to locate motifs in RNA sequences (Zyt- nicki, Gaspin, and Schiex 2008) (as illustrated in Figure 1.2); in information theory, it can be used to reconstruct a message sent through a noisy channel using error correcting codes (Yedidia, Freeman, and Weiss 2003); in social science, it can be used to solve Schelling's model of segregation (Easley and Kleinberg 2010); and in 15 X 1 X 2 X 3 X 4 X 5 X i =0: s i =-1 X i =1: s i =+1 Figure 1.1: Illustrates the problem of nding the ground state energy of a random Potts model seen as a WCSP instance. Each solid circle represents a spin. Each black edge represents a two-body interaction. The three red edges represent a three-body interaction. computer vision applications, it can be used to solve energy minimization prob- lems towards tasks such as image restoration, total variation minimization, and panoramic image stitching (Kolmogorov 2005). 1.1.4 Solving the WCSP How can we design algorithms for solving the WCSP eciently? Isn't the very generality of the WCSP a curse? Are we up against a hard intractable problem? 16 a b c d e (a) Some Elements of Structure (Zytnicki, Gaspin, and Schiex 2008, Fig. 1) (b) RNA Motif Localization as the WCSP (Zytnicki, Gaspin, and Schiex 2008, Fig. 2a) Figure 1.2: Illustrates locating motifs in RNA sequences. Figure (a)a shows an example word in an RNA sequence. Figures (a)b-e show some elements of structure that have dierent stabilities. (b) illustrates the formulation of the RNA motif localization problem as the WCSP. In the WCSP formulation, each variable is the location of a key position in an element of structure in an RNA sequence. Costs in constraints characterize the stability associated with dierent elements of structure. 17 While these questions are certainly valid, in this dissertation, we exploit \struc- ture." Imagine two subclasses of COPs, Class A and Class B. If Class B is more general than Class A, it is likely that we can build specialized algorithms for solv- ing instances from Class A more eciently. But the hallmark of a good algorithm for solving instances from Class B is its ability to imitate the specialized algorithm if the input is in fact from Class A. Such an algorithm is said to exploit \struc- ture." Although the formal denition of \structure" is elusive, the idea is to create a general-purpose algorithm for solving the WCSP that automatically simulates more specialized algorithms for subclasses of the WCSP. By its nature, the WCSP has two types of structure: the macro structure, or the graphical structure, and the micro structure, or the numerical structure. The graphical structure characterizes which variables interact locally and the numerical structure characterizes the details of each local interaction. For example, if the WCSP is used to characterize a spin glass system, then the graphical structure is a fully connected graph under a mean-eld assumption and is sparse under a nearest- neighbor assumption; the numerical structure for spin interactions is symmetric for an Ising system and is asymmetric for a system with more than one type of spin. Unfortunately, traditional algorithms do not exploit both types of structure simultaneously. Rather, they either focus on one of them or exploit them indi- vidually. For example, one traditional way in which this has been done is by studying the underlying variable-interaction graphs (Dechter 1992). The variable- interaction graph incorporates basic information about which variables are con- strained with which other variables in the problem instance, and this \locality" information can be exploited in solution procedures that employ dynamic pro- gramming. Despite its apparent usefulness, the variable-interaction graph does not represent/capture information about the costs in the weighted constraints, 18 and therefore cannot be used to characterize/exploit any important combinatorial structure that might be present in them. In fact, there are many fundamental combinatorial problems|like the hypergraph min-st-cut problem|that can be formulated as the Boolean WCSP, and that are tractable not by virtue of the graphical structure in their associated variable-interaction graphs, but by virtue of the numerical structure in their associated weighted constraints. 1.1.5 The Constraint Composite Graph In 2008, the quest for a unifying mathematical framework that represents the macro structure as well as the micro structure of a WCSP was settled by the novel idea of the constraint composite graph (CCG). The CCG of a WCSP instance is an undirected graph that uses the same variables as the WCSP instance and an auxiliary set of variables to capture structure. For a WCSP instance, it is equivalent to solve the minimum weighted vertex cover (MWVC) problem on its CCG. It has many interesting properties: It can be constructed in polynomial time; it is always tripartite; and its construction can be done on individual constraints and then be merged (meaning that its construction can be easily made parallel and incremental). It has also been used to discover some tractable subclasses of the WCSP. However, since then, there has not been much work done until today in exploiting this transformation for theoretical or practical gains. Due to these reasons, the development and discovery of the usefulness of the CCG for the WCSP have become important and interesting. 19 1.2 Hypothesis and Research Questions In this dissertation, we hypothesize that the CCG can help algorithms discover structure in the Boolean WCSP and therefore help solve it faster and with better theoretical guarantees. Under this hypothesis, we propose three research questions in this section. While the CCG has been used to identify some tractable subclasses of the Boolean WCSP, just like other algorithmic techniques, its theoretical usefulness can be far beyond that. Hence, the rst research question is: Q1: Is there any theoretical usefulness of the CCG other than for identifying tractable subclasses of the Boolean WCSP? Although the CCG has demonstrated some theoretical usefulness for the Boolean WCSP, there is no known implementation of it and its practical usefulness remains unknown and unexplored. Hence, the second research question is: Q2: Is there any practical usefulness of the CCG for the Boolean WCSP? While the CCG is promising, its applicability has been limited to the Boolean WCSP. However, many real-world problems can be more easily modeled as the WCSP with non-Boolean variables. The extension of the CCG to the WCSP with non-Boolean variables is understudied and also inecient (Kumar 2008b). Hence, the third research question is: Q3: Can the CCG be eciently extended to the WCSP with non-Boolean variables? 20 1.3 Overview and Contributions of This Disser- tation In this dissertation, we attempt to answer the three aforementioned research ques- tions: 1. We answer Q1 by demonstrating the theoretical benets that the CCG brings. In particular, we show that it enables the Nemhauser-Trotter reduc- tion (NT reduction) and prove that it improves the ILP encoding of the Boolean WCSP. 2. We answer Q2 by implementing the CCG construction algorithm as described in (Kumar 2008a)|along with various improvements|and experimentally evaluate various algorithms for solving the (Boolean) WCSP with the help of the CCG. In particular, we experimentally demonstrate that several CCG- based algorithms are more advantageous than their counterparts that work directly on the (Boolean) WCSP, including the NT reduction, the min-sum message passing (MSMP) algorithm, and the HQCA. 3. We answer Q3 by proposing three new non-Boolean variable encodings, namely the binary number-based encoding, the direct symmetric encoding, and the clique-based encoding, for the WCSP with non-Boolean variables. We also compare them using theoretical arguments and preliminary experi- mental results. We pose it as a promising future work of this dissertation. This dissertation is organized as follows. In Chapter 2, we introduce back- ground material, including that on the CCG. In Chapter 3, we demonstrate that the CCG enables the use of the NT reduction, a polynomial-time procedure that 21 reduces problem sizes for the MWVC problem, on the Boolean WCSP. In Chap- ter 4, we experimentally demonstrate that the MSMP algorithm is more ecient when applied to the CCG of a Boolean WCSP instance than on the Boolean WCSP instance itself. In Chapter 5, we demonstrate the theoretical advantage of the CCG-based ILP encoding of a Boolean WCSP instance over other ILP encodings, and experimentally compare three dierent ILP encodings. In Chapter 6, we show that the CCG-based HQCA for solving the Boolean WCSP is more advantageous than a few other baseline HQCAs. In Chapter 7, we point out that extending the CCG for the WCSP with non-Boolean variables can be promising. We do this by proposing three non-Boolean variable encodings and demonstrating the usefulness of the CCG on the WCSP with non-Boolean variables in preliminary experiments. Finally, in Chapter 8, we draw our conclusions and discuss other potential future research directions. In each of Chapters 3 to 7, we also point out the potential impact of improving the specic algorithm presented in that chapter. In summary, in this dissertation, we address Q1 and Q2 in Chapters 3 to 6, and Q3 in Chapter 7. 22 Chapter 2 Background 2.1 Basics in Graph Theory We denote an undirected graph usingG =hV;Ei, whereV is a set of vertices and E is a set of edges. A vertex-weighted undirected graph is an undirected graph with a non-negative weight (integer or real number) associated with each vertex. We denote a vertex-weighted undirected graph using G =hV;E;wi, where V and E have the same meaning as before and w is a function that maps a vertex to a non-negative integer or real number. (For notational simplicity, we also write w i short for w(v i ), where v i is a vertex in V .) The weight of a subset of vertices SV is the sum of all weights of vertices in S. A set of vertices S V is an independent set (IS) of an undirected graph G =hV;Ei if and only if no two vertices in S are connected by an edge, i.e., 8u;v2S : (u;v)62E. A set of vertices SV is a vertex cover (VC) of a graph G =hV;Ei if and only if every edge has at least one endpoint vertex in V , i.e., 8(u;v)2 E : u2 S_v2 S. A VC S of an undirected graph G is a minimum VC (MVC) if and only ifjSj is no greater than the cardinality of any other VC of G. A VC S of a vertex-weighted undirected graph G is a minimum weighted VC (MWVC) if and only if the weight of S is no greater than the weight of any other VC of G. Figure 2.1 illustrates the concept of MWVCs. 23 1 2 2 0 1 1 (a)7 1 2 2 0 1 1 (b)7 1 2 2 0 1 1 (c)7 1 2 2 0 1 1 (d)3 Figure 2.1: Illustrates MWVCs. Each circle represents a vertex. The number in each circle represents the weight of the corresponding vertex. The red circles represent vertices inS. 3 and7 mean thatS in the corresponding gures are and are not MWVCs, respectively. S in (a) and (b) are not MWVCs because their weights are not minimized. S in (c) is not an MWVC because it is not a VC. S in (d) is an MWVC, although it is not an MVC. 2.2 The Weighted Constraint Satisfaction Prob- lem Formally, the weighted constraint satisfaction problem (WCSP) is a triplethX;D;Ci, where X =fX 1 ; ;X N g is a set of variables, D =fD 1 ; ;D N g is the set of discrete-valued domains that specify the set of values that each variable can take, 24 Algorithm 2.1: Solve the WCSP using branch-and-bound search. 1 Function SolveWCSP(P) Input: P : A WCSP instance. Output: The optimal solution of P and its total weight. 2 return BranchAndBound(P ,;, 0,;, +1); 3 Function BranchAndBound(P =hX;D;Ci, a, w a , a y , w y ) Input: P =hX;D;Ci: A WCSP instance. Input: a: A partial or complete assignment of values to variables. Input: w a : The total weight associated with a. Input: a y : The current best solution. Input: w y : The weight of the current best solution. Output: Updated current best solution and its total weight. 4 ifX =; then 5 if w a <w y then 6 return a;w a ; 7 else 8 (P 0 =hX 0 ;D 0 ;C 0 i);global consist := EnforceLocalConsistency(P , w y w a ); 9 if:global consist then 10 return a y ;w y ; 11 X := ChooseVariable(X 0 ); 12 D := OrderDomain(D 0 (X)); 13 foreach x2D do 14 a 0 :=a[fX =xg; 15 w a 0 :=w a +E C 0 X (fX =xg); 16 P 00 := ConstructWCSPSubInstance(X, x, P 0 ); 17 a y ;w y := BranchAndBound(P 00 , a 0 , w a 0, a y , w y ); 18 return a y ;w y ; and C =fC 1 ; ;C N g is a set of constraints. Each constraint C i is dened on a subset of variablesS i 2X and species a non-negative cost for each possible assign- ment of values to the varibles inS i . An optimal solution is an assignment of values to all variables such that the sum of the costs is minimized. If8D i 2D :jD i j = 2, then the WCSP is called a Boolean WCSP (Kumar 2008a). 25 X 1 X 2 0 1 0 0.5 0.6 1 0.7 0.3 Figure 2.2: Shows an example binary constraint. The most mainstream class of algorithms for solving the WCSP is based on branch-and-bound search, which explores a search tree with each node representing an assignment of values to a subset of variables (Larrosa and Schiex 2004). In a search tree, internal nodes represent partial assignments, whereas leaf nodes represent complete assignments. During search, a currently known best solution a y , which we refer to as the current best solution, is maintained along with its total weightw y . At each node, the search algorithm computes the total weight w a corresponding to the assignment of that node. If w a >w y , the subtree below this node is pruned. The details of this algorithm are depicted in Algorithm 2.1. While this approach works well in practice, it does not (intend to) explicitly discover structure in WCSP instances. A representative state-of-the-art solver that falls into this class of algorithms is toulbar2 (Hurley et al. 2016). It is centralized, single-threaded, and CPU-based. It is known to solve all 715 benchmark instances on CVPR/Scene Decomposition, with a maximum number of variables being 208 and a maximum domain size being 8, within 0.07 seconds (Hurley et al. 2016). 2.3 The Constraint Composite Graph The constraint composite graph (CCG) for a Boolean WCSP instancehX;D;Ci is dened using a construction procedure (Kumar 2008a). It proceeds in 3 stages: 26 X 1 0.2 Y 1 0.5 X 2 0.1 Figure 2.3: Shows that the projection of MWVCs on the ISfX 1 ;X 2 g of this vertex- weighted undirected graph leads to Figure 2.2. The weights on x 1 , x 2 , and y 1 are 0.2, 0.1, and 0.5, respectively. The entry 0:6 in cell (X 1 = 0;X 2 = 1) in Figure 2.2, for example, indicates that, whenX 1 is necessarily excluded from the MWVC but X 2 is necessarily included in it, then the weight of the MWVC|fX 2 ;Y 1 g|is 0:6. 1. Expressing Constraints as Polynomials In this stage, each constraint C 2C is converted into a polynomial p C using standard Gaussian elimination. Consider the example constraint in Figure 2.2, which involves the variables X 1 and X 2 . It can be written as a polynomial p C (X 1 =x 1 ;X 2 =x 2 ) in x 1 and x 2 of degree 1 each: p C (X 1 =x 1 ;X 2 =x 2 ) =c 00 +c 01 x 1 +c 10 x 2 +c 11 x 1 x 2 : (2.1) The coecientsc 00 ;c 01 ;c 10 , andc 11 of the polynomial can be computed by solving a system of linear equations, where each equation corresponds to an entry in the constraint table, using standard Gaussian elimination. In our example, we have p C (0; 0) = 0:5 p C (1; 0) = 0:6 p C (0; 1) = 0:7 p C (1; 1) = 0:3 c 00 = 0:5 c 01 = 0:1 c 10 = 0:2 c 11 =0:5: 2. Decomposing the Terms of the Polynomials In this stage, for the polyno- mial constructed from each constraint, we construct a CCG gadget, a subgraph of the CCG. Before describing this procedure, we describe the projection of MWVCs on an IS, a cornerstone concept for the notion of the CCG. 27 X i w 1 Y 1 w 2 X i X j X k Y 2 w X i X j X k Z L Y 3 w (a) wx i (b)w (X i X j X k ) (c) w (X i X j X k ) Figure 2.4: Shows the lifted graphical representations of (a) linear, (b) negative nonlinear, and (c) positive nonlinear terms in a polynomial. We assume thatw> 0 in (b) and (c) (but not necessarily in (a)). A vertex has a zero weight if no weight is shown. In (a), w 1 and w 2 satisfy w 1 w 2 =w. For a given graphG, one can project MWVCs on a given ISUV . The input to such a projection is the graph G as well as an IS U =fu 1 ;u 2 ;:::;u k g on G. The output is a table of 2 k numbers. Each entry in this table corresponds to a k-bit vector. We say that a k-bit vector t imposes the following restrictions: (a) If the i th bit t i is 0, then vertex u i has to be excluded from the MWVC; and (b) if the i th bit t i is 1, then the vertex u i has to be included in the MWVC. The projection of an MWVC on the IS U is then dened to be a table with entries corresponding to each of the 2 k possible k-bit vectors t (1) ;t (2) ;:::;t (2 k ) . The value of the entry that corresponds tot (j) is the weight of the MWVC conditioned on the restrictions imposed byt (j) . Figure 2.3 illustrates this projection for the subgraph of our example constraint in Figure 2.2. The table produced by projecting an MWVC on the IS U can be viewed as a constraint overjUj Boolean variables. Conversely, given a constraint (consisting of Boolean variables), we design a lifted representation for it so as to be able to view it as the projection of an MWVC on an IS for some intelligently constructed vertex- weighted undirected graph (Kumar 2008a; Kumar 2008b). The lifted graphical 28 X 1 X 2 X 3 X 2 1 0 1 0 X 3 1.0 0.6 1.3 1.1 X 1 1 0 1 0 X 3 0.7 0.4 0.9 0.8 X 1 1 0 1 0 X 2 0.7 0.5 0.6 0.3 X 1 1 0 0.2 0.7 X 3 1 0 1.0 0.1 X 2 1 0 0.8 0.3 (a) The WCSP instance X 2 1 0 1 0 X 3 1.0 0.6 1.3 1.1 X 1 1 0 1 0 X 3 0.7 0.4 0.9 0.8 X 1 1 0 1 0 0.7 0.5 0.6 0.3 X 1 1 0 0.2 0.7 X 3 1 0 1.0 0.1 X 2 1 0 0.8 0.3 X 1 A 4 0.2 0.7 X 2 A 5 0.8 0.3 X 3 A 6 1.0 0.1 X 1 A 1 0.2 0.5 X 2 0.1 X 2 A 2 0.4 0.6 X 3 0.7 X 1 A 3 0.3 0.4 X 3 0.5 X 2 (b) CCG gadgets Figure 2.5: Illustrates the construction of the CCG for a given WCSP instance. Green vertices represent variable vertices and red vertices represent auxiliary ver- tices. The green tick marks represent the vertices in an MWVC. In this exam- ple, the constructed CCG is a bipartite graph, meaning that the original WCSP instance falls in a tractable subclass and can be eciently solved. (to be continued) 29 X 1 A 1 0.7 0.5 X 2 1.3 A 2 0.6 X 3 2.2 A 3 0.4 A 4 0.7 A 5 0.3 A 6 0.1 (c) The constructed CCG Figure 2.5: Continued. representation of a constraint depends on the nature of the terms in the polynomial that describes the constraint. We distinguish three classes of terms: linear terms, negative nonlinear terms, and positive nonlinear terms. We can construct a lifted graphical representation, i.e., a CCG gadget, for each term in the polynomial of each constraint as follows. A linear term is represented with the two-vertex graph shown in Figure 2.4(a) by connecting the variable vertex with an auxiliary vertex. A negative nonlinear term is represented with the \ ower" structure as shown in Figure 2.4(b). Consider the termw(X i X j X k ) wherew> 0. Projecting an MWVC on the \ ower" structure on the variable vertices represents w w (X i X j X k ). The constant term w does not aect the optimality of the solution. 30 A positive nonlinear term is represented using the \ ower+thorn" struc- ture as shown in Figure 2.4(c). Consider the term w (X i X j X k ) where w> 0. The projection of an MWVC on the \ ower+thorn" structure on the variable vertices represents L (1X k ) +ww (X i X j (1X k )), where L>w + 1 is a large real number. By constructing CCG gadgets that cancel out the lower order terms as shown before, we arrive at a lifted graphical representation of the positive nonlinear term. 3. Merging CCG Gadgets into a CCG Finally, we construct the CCG by merging their CCG gadgets: We merge vertices representing the same variables by adding their weights and keep all edges connecting them to all other vertices. Computing the MWVC for the CCG yields a solution for the Boolean WCSP: If variable X2X is in the MWVC, then it is assigned the value 1 in the Boolean WCSP, otherwise it is assigned the value 0. This construction procedure is also illustrated in Figure 2.5. 2.3.1 Theoretical Properties of the CCG The CCG has the following known theoretical properties: It can always be constructed in polynomial time. This is due to the fact that each CCG gadget can be constructed in polynomial time and that the total number of CCG gadgets is polynomial with respect to the problem size of the WCSP instance. Ecient CCG construction makes the applicability of the CCG more practical. It is always tripartite and can be bipartite for a subclass of the Boolean WCSP. Since the MWVC problem on a bipartite graph is tractable, a 31 Boolean WCSP instance can be solved eciently if its CCG is bipartite. This can be used to discover tractable subclasses of the Boolean WCSP. It is decomposable. The construction procedure (a) constructs a CCG gadget for each individual constraint and then (b) assembles them together to form the CCG. This construction procedure is decomposable: If we add or remove a constraint from the original Boolean WCSP instance, we can obtain the CCG of the new Boolean WCSP instance by modifying the CCG of the original Boolean WCSP instance using the CCG gadget of the added or removed constraint. 32 Chapter 3 The Nemhauser-Trotter Reduction on the CCG 3.1 Introduction Many interesting combinatorial problems are NP-hard. Despite many sophisti- cated search algorithms dedicated to solving them, the search spaces still remain intractable for large instances. Therefore, a polynomial-time procedure that reduces the sizes of problem instances and identies a combinatorial core can be benecial as a preprocessing step. Such a procedure is called a kernelization algorithm, and the combinatorial core is called a kernel (illustrated in Figure 3.1). The Nemhauser-Trotter Reduction (NT reduction) is one such algorithm for the MWVC problem (Nemhauser and Trotter 1975). Hence, it can be applied on the MWVC problem on the CCG as well. It is based on the observation that the MWVC problem is a half-integral problem. This means that its Integer Linear Programming (ILP) formulation exhibits the following property. We consider a vertex-weighted graph G = hV;E;wi. In the ILP formulation of the MWVC 33 N Variables Small Kernel Polynomial Time Figure 3.1: Illustrates kernelization algorithms. After a polynomial-time proce- dure, the problem instance of N variables is reduced to a small combinatorial kernel. problem instance on G, a Boolean decision variable Z i is rst associated with the presence of vertex v i in the MWVC. Then, the ILP formulation is minimize jVj X i=1 w i Z i ; 8 v i 2V : Z i 2f0; 1g; 8 (v i ;v j )2E : Z i +Z j 1: (3.1) If we relax the integrality constraintsZ i 2f0; 1g for alli2f1; 2;:::;jVjg and solve the relaxed LP, the optimal solution of the LP is guaranteed to be half-integral| i.e.,8i2f1; 2;:::;jVjg : Z i 2f0; 1 2 ; 1g. There then exists an MWVC on G that includes v i if Z i = 1 and excludes v i if Z i = 0. Therefore, one can kernelize the MWVC problem instance on G to an MWVC problem instance on a subgraph of G by retaining only those vertices whose Boolean variables in an optimal solution of the LP are 1 2 . The half-integrality property can be further exploited to solve the LP relax- ation of the MWVC problem with a max ow algorithm instead of a general LP solver (Kumar 2003). We rst transformG to a vertex-weighted undirected bipar- tite graphG b =hV L G b ;V R G b ;E G b ;wi as follows. For each vertexv i 2V , we create two vertices v L i 2V L G b and v R i 2V R G b , both with weight w i . For each edge (v i ;v j )2E, 34 A C D B w 4 w 3 w 1 w 2 A(w 1 ) B(w 2 ) C(w 3 ) D(w 4 ) A'(w 1 ) C'(w 3 ) D'(w 4 ) B'(w 2 ) A(w 1 ) B(w 2 ) C(w 3 ) D(w 4 ) A'(w 1 ) C'(w 3 ) D'(w 4 ) B'(w 2 ) A is in the minimum weighted VC B is not in the minimum weighted VC C and D are in the Kernel Figure 3.2: Illustrates the NT reduction. The left-upper panel shows the graphG. The right-upper panel shows its corresponding vertex-weighted bipartite graphG b where each vertex inG has a corresponding vertex in each partition ofG b . The NT reduction then computes an MWVC on G b . The lower-left panel illustrates one possible computed MWVC of G b . The call-outs in the lower-right panel show the result of the NT reduction by inspecting the presences of vertices in the computed MWVC in G b . we create two edges (v L i ;v R j )2 E G b and (v L j ;v R i )2 E G b . The MWVC problem can be solved in polynomial time on the bipartite graph G b using a max ow algo- rithm (Kumar 2003); and the half-integral solution of the above LP relaxation can be retrieved as follows. If both v L i and v R i are in the MWVC of G b , then Z i = 1 and v i can be safely included in the MWVC of G; if neither v L i nor v R i is in the MWVC of G b , then Z i = 0 and v i can be safely excluded from the MWVC of G; if exactly one ofv L i orv R i is in the MWVC ofG b , thenZ i = 1 2 andv i is retained in the kernel of the MWVC problem instance posed on G. Figure 3.2 illustrates this procedure. 35 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Fraction 0 25 50 75 100 125 Number of Instances (a) Benchmark instances from the UAI 2014 Inference Competition: 19 out of 160 benchmark instances solved by the NT reduction 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Fraction 0 50 100 150 200 250 Number of Instances (b) Benchmark instances from (Hurley et al. 2016): 53 out of 410 benchmark instances solved by the NT reduction Figure 3.3: Shows the eectiveness of the NT reduction. The x-axes show the fraction of variables that are eliminated by the NT reduction. The y-axes show the number of benchmark instances on which this happens for a fraction range. 36 3.2 Experimental Evaluation In this section, we present experimental evaluations of the NT reduction. We used two sets of Boolean WCSP benchmark instances for our experiments. The rst set of benchmark instances is from the UAI 2014 Inference Competition 1 . Here, maximum-a-posteriori (MAP) inference queries with no evidence on the PR and MMAP benchmark instances can be reformulated as Boolean WCSP instances by rst taking the negative logarithms of the probabilities in each factor and then nor- malizing them. The second set of benchmark instances is from (Hurley et al. 2016) 2 . This set includes the Probabilistic Inference Challenge 2011, the Computer Vision and Pattern Recognition OpenGM2 benchmark, the Weighted Partial MaxSAT Evaluation 2013, the MaxCSP 2008 Competition, the MiniZinc Challenge 2012 & 2013, and the CFLib (a library of cost function networks). The experiments were performed on those benchmark instances that have only Boolean variables. The optimal solutions of the benchmark instances in (Hurley et al. 2016) were computed using toulbar2 (Hurley et al. 2016). Since toulbar2 cannot solve WCSP instances with non-integral weights, the optimal solutions of the bench- mark instances from the UAI 2014 Inference Competition were computed by nd- ing MWVCs on their CCGs. For each benchmark instance, the MWVC problem was solved by rst kernelizing it using the NT reduction, then reformulating it as an ILP (H. Xu, Kumar, and Koenig 2016) and nally solving the ILP using the Gurobi optimizer (Gurobi Optimization, Inc. 2018) with a running time limit of 5 minutes. 1 http://www.hlt.utdallas.edu/ ~ vgogate/uai14-competition/index.html 2 http://genoweb.toulouse.inra.fr/ ~ degivry/evalgm/ 37 For our experiments, we implemented the NT reduction using the Gurobi opti- mizer (Gurobi Optimization, Inc. 2018) as the LP solver. The CCG construction algorithm was implemented in C++ using the Boost graph library (Siek, Lee, and Lumsdain 2002) and was compiled by gcc 4.9.2 with the \-O3" option. Our exper- iments were performed on a GNU/Linux workstation with an Intel Xeon processor E3-1240 v3 (8MB Cache, 3.4GHz) and 16GB RAM. Figure 3.3 shows the eectiveness of the NT reduction on the benchmark instances. The polynomial-time NT reduction solved about 1=8 th of these bench- mark instances yielding empty kernels. Being able to solve this many benchmark instances without search is indicative of the potential usefulness of the NT reduc- tion for solving structured real-world problems. We also observed that the NT reduction had very little or no eects for the majority of the rest of the benchmark instances. Here, we discuss an intuitive understanding of why the NT reduction is ineective on many benchmark instances. G b =hV L G b ;V R G b ;E G b ;wi cannot have an MWVC with weight larger than W = P v i 2V L G b w i , since the VC consisting of all vertices in either partition of G b has weight W . If the NT reduction has no eect, then the weight of an MWVC of G b must equalW . This is because, for eachv i 2G, exactly one ofv L i andv R i is in the computed MWVC of G b . If the weight of an MWVC ofG b isW , then the NT reduction has the option of simply choosing all vertices in one partition as the computed MWVC. The probability of the existence of an alternative MWVC that consist of vertices in both partitions is low if the weights and the additions thereof are mostly unique. In fact, this is true for many real-world applications and, in practice, we can achieve this by adding a random small number to each weight. 38 Combining the two points above, while it is not true in theory, practically, we can understand that the ineectiveness of the NT reduction is mostly caused by the fact that the weight of an MWVC of G b is W . Therefore, if G b of a benchmark instance has an MWVC of weight W , the NT reduction is likely to be ineective on it. 3.3 Conclusion We showed that the NT reduction popularly used for kernelization of the MWVC problem can also be applied to the CCG of the WCSP. This leads to a polynomial- time preprocessing algorithm that xes the optimal values of a subset of variables in a WCSP instance. This subset is often the set of all variables: We observed that the NT reduction could determine the optimal values of all variables for about 1=8 th of the benchmark instances without search. Enabling the NT reduction for the WCSP can be potentially useful for improv- ing branch-and-bound search for the WCSP: In principle, we can potentially replace EnforceLocalConsistency in Algorithm 2.1 with running the NT reduc- tion and develop a new variant of branch-and-bound search algorithm for the WCSP. In other words, thanks to the enabling of the NT reduction for the WCSP, we can now conceptually view the NT reducibility of the WCSP as a new implicit form of local consistency. It is well known that local consistency is an important segue towards understanding the WCSP and our work in this chapter therefore can potentially help researchers deepen understanding of the WCSP. 39 Chapter 4 The Min-Sum Message Passing Algorithm on the CCG 4.1 Introduction Belief propagation (BP) is a well-known technique for solving many combinatorial problems across a wide range of elds such as probabilistic reasoning, articial intel- ligence, and information theory. It can be used to solve hard inference problems that arise in statistical physics, computer vision, error-correcting coding theory, or, more generally, on graphical models such as Bayesian Networks and Markov random elds (Yedidia, Freeman, and Weiss 2003). BP is an ecient algorithm that is based on local message passing. Although a complete theoretical analysis of its convergence and correctness is elusive, it works well in practice on many important combinatorial problems. While BP performs message passing for the objective of marginalization over probabilities, the min-sum message passing (MSMP) algorithm is a variant of BP that is used to nd an assignment of values to all variables in ~ X that minimizes functions of the form E( ~ X) = X i E i ( ~ X i ); (4.1) 40 where ~ X is the set of all variables in the global function E; E i is a local function constituting thei th term ofE; and ~ X i is a subset of ~ X containing all variables that participate in E i . To minimize the function E( ~ X), the MSMP algorithm rst builds a factor graph, i.e., an undirected bipartite graph with one partition containing vertices that represent the variables in ~ X and the other partition containing vertices that represent the local functions E i for all i. An edge represents the participation of a variable in a local function. Furthermore, a message is a vector associated with each direction of each edge. Intuitively, messages represent interactions between individual variables and local functions. The value ofE i is the potential of its cor- responding vertex because it is indicative of its \potential" to aect other vertices. Messages are updated iteratively until convergence. In each iteration, the message from vertex u to vertex v is in uenced by incoming messages to u as well as u's potential if it represents a local function. Upon convergence, a solution can be extracted from the messages. The MSMP algorithm converges and produces an optimal solution if the factor graph is a tree (M ezard and Montanari 2009). This is, however, not necessarily the case if the factor graph is loopy (M ezard and Montanari 2009). Although the clique tree algorithm alleviates this problem to a certain extent by rst converting loopy graphs to trees (Koller and Friedman 2009), the technique only scales to graphs with low treewidths. If the MSMP algorithm operates directly on loopy graphs, the theoretical underpinnings of its convergence and optimality properties still remain poorly understood. Nonetheless, it works well in practice on a number of important combinatorial problems in articial intelligence, statistical physics, and signal processing (M ezard and Montanari 2009; Moallemi and Roy 2010). 41 X 1 C 12 X 2 C 23 X 3 C 13 X1!C12 ! ^ C12!X1 Figure 4.1: Illustrates the factor graph of a Boolean WCSP instance with 3 variablesfX 1 ;X 2 ;X 3 g and 3 constraintsfC 12 ;C 13 ;C 23 g. Here, X 1 ;X 2 2 C 12 , X 1 ;X 3 2C 13 , andX 2 ;X 3 2C 23 . The circles are variable vertices, and the squares are constraint vertices. X 1 !C 12 and ^ C 12 !X 1 are the messages fromX 1 toC 12 and fromC 12 toX 1 , respectively. Such a pair of messages annotates each edge (not all are explicitly shown). Examples include the CSP (Montanari, Ricci-Tersenghi, and Semerjian 2007), K- satisability (M ezard and Zecchina 2002), and the MVC problem (Weigt and Zhou 2006). In this chapter, we show how we improve the MSMP algorithm for the Boolean WCSP by using the CCG. 4.2 The Min-Sum Message Passing Algorithm Applied Directly on the Boolean WCSP We now describe how the MSMP algorithm can be applied directly to solving the Boolean WCSP dened byhX;D;Ci. We refer to this as the original MSMP algorithm. As explained before, we rst construct its factor graph. We create a vertex for each variable inX (variable vertex) and for each constraint inC (constraint vertex). A variable vertexX i and a constraint vertexC j are connected by an edge if and only if C j contains X i . Figure 4.1 shows an example. 42 After the factor graph is constructed, a message (two real numbers) for each of the two directions along each edge is initialized, for instance, to zeros. A pair of messages X 1 !C 12 and ^ C 12 !X 1 is illustrated in Figure 4.1. The messages are then updated iteratively by using the min-sum update rules given by (t) X i !C j (X i =x i ) = X C k 2@X i nfC j g h ^ (t1) C k !X i (X i =x i ) i +c (t) X i !C j (4.2) ^ (t) C j !X i (X i =x i ) = min a2A(@C j nfX i g) " E C j (a[fX i =x i g) + X X k 2@C j nfX i g (t) X k !C j (ajfX k g) # + ^ c (t) C j !X i (4.3) for all X i 2X;C j 2C and x i 2f0; 1g until convergence (M ezard and Montanari 2009), where ^ (t) C j !X i (X i =x i ) for bothx i 2f0; 1g are the two real numbers of the message that is passed from the constraint vertex C j to the variable vertex X i in the t th iteration, (t) X i !C j (X i =x i ) for bothx i 2f0; 1g are the two real numbers of the message that is passed from the variable vertex X i to the constraint vertex C j in the t th iteration, @X i and @C j are the sets of neighboring vertices of X i and C j , respectively, and c (t) X i !C j and ^ c (t) C j !X i are normalization constants such that min h (t) X i !C j (X i = 0); (t) X i !C j (X i = 1) i = 0 (4.4) min h ^ (t) C j !X i (X i = 0); ^ (t) C j !X i (X i = 1) i = 0: (4.5) 43 The message update rules can be understood as follows. Each message from a variable vertex X i to a constraint vertex C j is updated by summing up all X i 's incoming messages from its other neighboring vertices. Each message from a constraint vertex C j to a variable vertex X i is updated by nding the minimum of the constraint functionE C j plus the sum of allC j 's incoming messages from its other neighboring vertices. The messages can be updated in various orders. We use the superscript1 to indicate the values of messages upon convergence. The nal assignment of values to variables inX =fX 1 ;X 2 ;:::;X N g can then be found by computing E X i (X i =x i ) X C k 2@X i ^ (1) C k !X i (X i =x i ) (4.6) for allX i 2X andx i 2f0; 1g. Here,E X i (X i = 0) andE X i (X i = 1) can be proven to be equal to the minimum values of the total weights conditioned onX i = 0 and X i = 1, respectively. By selecting the value of x i that leads to a smaller value of E X i (X i =x i ), we obtain the nal assignment of values to all variables inX . 4.3 The Min-Sum Message Passing Algorithm Applied on the CCG To solve a given WCSP instance, we can rst transform it to an MWVC problem instance on its CCG. We can then apply the MSMP algorithm on the CCG. We refer to this procedure as the lifted MSMP algorithm. The MWVC problem on the vertex-weighted graphhV;E;wi is a subclass of the Boolean WCSP. Throughout this section, we use the variable X i to represent the i th vertex inV : X i = 1 means thei th vertex is selected in the MWVC, andX i = 0 44 means the i th vertex is not selected in the MWVC. The MWVC problem can therefore be rewritten as a subclass of the Boolean WCSP with only the following two types of constraints: Unary weighted constraints: Each of these weighted constraints corresponds to a vertex in the MWVC problem. We use C V i to denote the weighted constraint that corresponds to the i th vertex. C V i therefore only has one variable X i . In the weighted constraint C V i , the tuple in which X i = 1 has weight w i 0 and the other tuple has weight zero. This type of weighted constraints represents the minimization objective of the MWVC problem. Binary weighted constraints: Each of these weighted constraints corresponds to an edge in the MWVC problem. We use C E j to denote the weighted constraint that corresponds to the j th edge. The indices of the endpoint vertices of this edge are denoted by j(+1) and j(1). C E j therefore has two variables X j(+1) and X j(1) . In the weighted constraint C E j , the tuple in which X j(+1) = X j(1) = 0 has weight innity and the other tuples have weight zero. This type of weighted constraints represents the requirement that at least one endpoint vertex must be selected for each edge. Given that the MWVC problem is a subclass of the Boolean WCSP, Equa- tions (4.2), (4.3) and (4.6) can be reused for the MSMP algorithm on it. For the MWVC problem, these equations can be further simplied. For notational convenience, we omit normalization constants in the following derivation. For each of the unary weighted constraints C V i , we have 45 the added weight for selecting a vertex: E C V i (X i =x i ) = 8 > > < > > : w i x i = 1 0 x i = 0 ; (4.7) and exactly one variable in C V i : @C V i nfX i g =;: (4.8) By plugging Equations (4.7) and (4.8) into Equation (4.3) for t =1, we have ^ (1) C V i !X i (X i =x i ) = 8 > > < > > : w i x i = 1 0 x i = 0 (4.9) for allC V i . Note that here we do not need Equation (4.2) for C V i since it has only one variable and thus the message passed to it does not aect the nal solution. For each of the binary weighted constraints C E j , we have the requirement that at least one endpoint vertex must be selected for each edge: E C E j (X j(+1) =x j(+1) ;X j(1) =x j(1) ) = 8 > > < > > : +1 x j(+1) =x j(1) = 0 0 otherwise ; (4.10) and exactly two variables in C E j : @C E j nfX j(`) g =fX j(`) g 8`2f+1;1g: (4.11) 46 By plugging Equations (4.9) to (4.11) into Equations (4.2) and (4.3) along with the fact that there exist only unary and binary weighted constraints, we have (t) X j(`) !C E j (X j(`) = 1) = X C k 2@X j(`) nfC V j(`) ;C E j g h ^ (t1) C k !X j(`) (X j(`) = 1) i +w j(`) (4.12) (t) X j(`) !C E j (X j(`) = 0) = X C k 2@X j(`) nfC V j(`) ;C E j g ^ (t1) C k !X j(`) (X j(`) = 0) (4.13) ^ (t) C E j !X j(`) (X j(`) = 1) = min a2f0;1g (t) X j(`) !C E j (X j(`) =a) (4.14) ^ (t) C E j !X j(`) (X j(`) = 0) = (t) X j(`) !C E j (X j(`) = 1) (4.15) for all C E j and both `2f+1;1g. By plugging Equations (4.12) and (4.13) into Equations (4.14) and (4.15), we have ^ (t) C E j !X j(`) (X j(`) = 1) = min a2f0;1g 2 6 4 X C k 2@X j(`) nfC V j(`) ;C E j g h ^ (t1) C k !X j(`) (X j(`) =a) i +w j(`) a 3 7 5 (4.16) ^ (t) C E j !X j(`) (X j(`) = 0) = X C k 2@X j(`) nfC V j(`) ;C E j g h ^ (t1) C k !X j(`) (X j(`) = 1) i +w j(`) (4.17) for all C E j and both `2f+1;1g, where ^ (t) C E j !X j(`) (X j(`) = b) for both b2f0; 1g are the two real numbers of the message that is passed from the j th edge to the j(`) th vertex. Since each edge has exactly two endpoint vertices, the message from an edge to one of its endpoint vertices can be viewed as a message from the other 47 endpoint vertex to it. Formally, for the j th edge, we dene the message from the j(+1) th vertex to the j(1) th vertex in the t th iteration as (t) j(+1)!j(1) ^ (t) C E j !X j(1) : (4.18) By plugging in Equation (4.18) and substituting j(`) with i and j(`) with j, Equations (4.16) and (4.17) can be rewritten (with normalization constants) in the form of messages between vertices as (t) j!i (X i = 1) = min a2f0;1g 2 4 X k2N(j)nfig (t1) k!j (X j =a) +w j a 3 5 +c (t) j!i (4.19) (t) j!i (X i = 0) = X k2N(j)nfig (t1) k!j (X j = 1) +w j +c (t) j!i (4.20) for all i and j such that the i th and j th vertices are connected by an edge in E. Here,N(j) is the set of neighboring vertices of the j th vertex inV andc (t) j!i repre- sents the normalization constant such that min h (t) j!i (X i = 1); (t) j!i (X i = 0) i = 0. Equations (4.19) and (4.20) are the message update rules of the MSMP algorithm adapted to the MWVC problem. If the messages converge, by plugging Equations (4.9) and (4.18) into Equa- tion (4.6), the nal assignment of values to variables can be found by computing E X i (X i =x i ) = X j2N(i) h (1) j!i (X i =x i ) i +w i x i ; (4.21) where the meaning of E X i (X i =x i ) is similar to that in Equation (4.6). 48 0 < 10% 10%;< 20% 20%;< 30% > 30% (MSMP solution - optimal solution) / optimal solution 0 25 50 75 100 125 Number of Instances Lifted MSMP Original MSMP (a) Benchmark instances from the UAI 2014 Inference Competition 0 < 10% 10%;< 20% 20%;< 30% > 30% (MSMP solution - optimal solution) / optimal solution 0 25 50 75 100 Number of Instances Lifted MSMP Original MSMP (b) Benchmark instances from (Hurley et al. 2016) Figure 4.2: Shows the qualities of the solutions (total weights) produced by the original and the lifted MSMP algorithms in comparison to the optimal solutions (for benchmark instances with known optimal solutions). The x-axes show the suboptimality of the MSMP solutions. The y-axes show the number of benchmark instances for a range of suboptimality. Higher bars on the left are indicative of better solutions. 49 10 0 10 7 10 14 10 21 The Lifted MSMP Solution Quality 10 0 10 5 10 10 10 15 10 20 The Original MSMP Solution Quality (a) Benchmark instances from the UAI 2014 Inference Competition: 126/9/18 above/below/closetothediagonaldashed line 10 0 10 4 10 8 10 12 The Lifted MSMP Solution Quality 10 0 10 3 10 6 10 9 10 12 The Original MSMP Solution Quality (b) Benchmark instances from (Hurley et al. 2016): 222/68/19 above/below/close to the diagonal dashed line Figure 4.3: Shows the qualities of the solutions produced by the original MSMP algorithm in direct comparison to those produced by the lifted MSMP algorithm for both sets of benchmark instances. Each point in these plots represents a bench- mark instance. The x and y coordinates of a benchmark instance represent the solution qualities produced by the lifted MSMP algorithm and the original MSMP algorithm, respectively. Benchmark instances above (red)/below (blue) the diago- nal dashed line have better/worse solution qualities when using the lifted MSMP algorithm instead of the original MSMP algorithm. Benchmark instances whose MSMP solution qualities dier by only 1% are considered close (green) to the diagonal dashed line. 4.4 Experimental Evaluation In this section, we present experimental evaluations of the lifted MSMP algorithm. We used two sets of Boolean WCSP benchmark instances for our experiments. The rst set of benchmark instances is from the UAI 2014 Inference Competi- tion 1 . Here, maximum a posteriori (MAP) inference queries with no evidence on the PR and MMAP benchmark instances can be reformulated as Boolean WCSP 1 http://www.hlt.utdallas.edu/ ~ vgogate/uai14-competition/index.html 50 instances by rst taking the negative logarithms of the probabilities in each factor and then normalizing them. The second set of benchmark instances is from (Hur- ley et al. 2016) 2 . This set includes the Probabilistic Inference Challenge 2011, the Computer Vision and Pattern Recognition OpenGM2 benchmark, the Weighted Partial MaxSAT Evaluation 2013, the MaxCSP 2008 Competition, the MiniZinc Challenge 2012 & 2013 and the CFLib (a library of cost function networks). The experiments were performed on those benchmark instances that have only Boolean variables. The optimal solutions of the benchmark instances in (Hurley et al. 2016) were computed using toulbar2 (Hurley et al. 2016). Since toulbar2 cannot solve WCSP instances with non-integral weights, the optimal solutions of the bench- mark instances from the UAI 2014 Inference Competition were computed by nd- ing MWVCs on their CCGs. For each benchmark instance, the MWVC problem was solved by rst kernelizing it using the NT reduction, then reformulating it as an ILP (H. Xu, Kumar, and Koenig 2016) and nally solving the ILP using the Gurobi optimizer (Gurobi Optimization, Inc. 2018) with a running time limit of 5 minutes. For the MSMP algorithms, we set the initial values of all messages to zeros. If no message changed by an amount more than 10 6 in any iteration, we declared convergence. We used the synchronous message updating order, i.e., messages were updated in parallel in each iteration. This standardized the comparison between the two MSMP algorithms, factoring out the eects of dierent message updating orders within each iteration. In case of failure to converge within the time limit (5 minutes) for any benchmark instance, we reported the solution produced by the MSMP algorithm on that benchmark instance at the end of that time limit. 2 http://genoweb.toulouse.inra.fr/ ~ degivry/evalgm/ 51 Table 4.1: Shows the number of benchmark instances on which each MSMP algo- rithm converged. The column \Neither"/\Both" indicates the number of bench- mark instances on which neither/both of the MSMP algorithms converged within the time limit of 5 minutes. The column \Original"/\Lifted" indicates the num- ber of benchmark instances on which only the original/lifted MSMP algorithm converged. Benchmark Instance Set Neither Both Original Lifted UAI 2014 Inference Competition 25 4 124 0 (Hurley et al. 2016) 258 7 44 0 The CCG construction algorithm and the MSMP algorithms were implemented in C++ using the Boost graph library (Siek, Lee, and Lumsdain 2002) and were compiled by gcc 4.9.2 with the \-O3" option. Our experiments were performed on a GNU/Linux workstation with an Intel Xeon processor E3-1240 v3 (8MB Cache, 3.4GHz) and 16GB RAM. Figure 4.2 shows the qualities of the solutions (total weights) produced by the original MSMP algorithm versus the lifted MSMP algorithm in comparison to the optimal solutions. A signicant fraction of the solutions produced by the lifted MSMP algorithm are very close to the optimal solutions. However, both MSMP algorithms produced solutions that are highly suboptimal in the > 30% suboptimality range. Therefore, Figure 4.3 presents a direct comparison of the qualities of the solutions produced by the two MSMP algorithms. From this gure, it is evident that solution qualities of the lifted MSMP algorithm are signicantly better than those of the original MSMP algorithm. Table 4.1 shows the number of benchmark instances on which each MSMP algorithm converged within the time limit. Table 4.2 shows the convergence time and number of iterations for those benchmark instances on which both algorithms converged. Although the original MSMP algorithm converged more frequently and faster, the lifted MSMP algorithm produced better solutions in general. In 52 Table 4.2: Shows the number of iterations and running time for each of the bench- mark instances on which both MSMP algorithms converged within the time limit of 5 minutes. The column \Benchmark Instance" indicates the name of each benchmark instance. The \U:" and \T:" at the beginning of the names indicate that they are from the UAI 2014 Inference Competition and (Hurley et al. 2016), respectively. The columns \Iterations" and \Running Time" under \The Original MSMP" and \The Lifted MSMP" indicate the number of iterations and running time (in seconds) after which the original MSMP algorithm and the lifted MSMP algorithm converged, respectively. With a few exceptions, the number of itera- tions and running time for the original MSMP algorithm are in general smaller than those of the lifted MSMP algorithm. Benchmark Instance The Original MSMP The Lifted MSMP Iterations Running Iterations Running Time Time U:PR/relational 2 5 0.84 9 4.00 U:PR/ra.cnf 1 0.35 6 0.34 U:PR/relational 5 5 1.18 3 0.76 U:PR/Segmentation 12 9 0.04 44 0.14 T:MRF/Segmentation/4 30 s.binary 31 0.10 60 0.13 T:MRF/Segmentation/2 28 s.binary 9 0.05 44 0.11 T:MRF/Segmentation/18 10 s.binary 15 0.07 102 0.18 T:MRF/Segmentation/12 20 s.binary 31 0.13 50 0.14 T:MRF/Segmentation/11 3 s.binary 47 0.15 176 0.24 T:MRF/Segmentation/1 28 s.binary 35 0.11 60 0.14 T:MRF/Segmentation/3 20 s.binary 31 0.12 54 0.14 addition, both MSMP algorithms are anytime and can be easily implemented in distributed settings. Therefore, the comparison of the qualities of the solutions produced is more important than that of the frequency and speed of convergence. To further understand the lifted MSMP algorithm, we also experimented on small random benchmark instances. We generated 9 groups of benchmark instances. In each group, we generated 100 Boolean WCSP instances with 50 variables. For every pair of variables, we add a constraint between them with probability p (referred to as the constraint density), which equals 0:05, 0:1, 0:15, 0:2, 0:25, 0:3, 0:5, 0:7, and 0:9, respectively, in the 9 groups. In each tuple of each constraint, we 53 Table 4.3: Comparison of suboptimalities of the two MSMP algorithms on small random benchmark instances. Each row represents the experimental results of one group. \Original" indicates the median suboptimality of the solutions produced by the original MSMP algorithm. \Lifted" indicates the median suboptimality of the solutions produced by the lifted MSMP algorithm. p Original Lifted 0.05 0.63 0.00 0.1 0.39 0.08 0.15 0.30 0.21 0.2 0.26 0.23 0.25 0.22 0.21 0.3 0.19 0.20 0.5 0.16 0.16 0.7 0.12 0.13 0.9 0.11 0.10 assign the weight with an integer randomly chosen between 0 and 100. We set a running time limit of 30 seconds for each benchmark instance. Table 4.3 shows the suboptimalities of the solutions produced by the original and lifted MSMP algorithms. On these benchmark instances, the lifted MSMP algorithm in general produced solutions that are closer to optimal than the origi- nal MSMP algorithm, especially for p 0:1. In addition, the lifted MSMP algo- rithm produced more optimal solutions when p is closer to either 0 or 1. On the other hand, the original MSMP algorithm produced more optimal solutions as p increases. Table 4.4 shows our experimental results of comparing the original and lifted MSMP algorithms directly. The lifted MSMP algorithm works better in terms of solution qualities when the constraint density is small, but this dierence becomes smaller as the constraint density increases. In terms of running times, the original MSMP algorithm overall has an advantage (for p 0:1) over the lifted MSMP algorithm due to the fact that the original MSMP algorithm converged on more 54 Table 4.4: Comparison of solution qualities and running times of the two MSMP algorithms on small random benchmark instances. In both tables, each row repre- sents the experimental results of one group. (Upper Panel): \Original" indicates the number of benchmark instances on which the original MSMP algorithm won. \Lifted" indicates the number of benchmark instances on which the lifted MSMP algorithm won. \Tie" indicates the number of benchmark instances on which the two MSMP algorithms reached a tie. (Lower Panel): \Original" indicates the number of benchmark instances on which the original MSMP algorithm con- verged within the running time limit. \Lifted" indicates the number of benchmark instances on which the lifted MSMP algorithm converged within the running time limit. p Solution Quality Running Time Original Lifted Tie Original Lifted Tie 0.05 0 100 0 24 22 54 0.1 1 99 0 56 4 40 0.15 13 87 5 60 0 40 0.2 19 76 5 58 0 42 0.25 25 54 21 54 0 46 0.3 28 35 37 55 0 45 0.5 28 29 43 44 0 56 0.7 28 25 47 54 0 46 0.9 26 28 46 50 0 50 p Original Lifted 0.05 92 89 0.1 67 32 0.15 60 4 0.2 58 0 0.25 54 0 0.3 55 0 0.5 44 0 0.7 54 0 0.9 50 0 benchmark instances. This advantage increases as p increases. This is consistent with the experimental results shown in Tables 4.1 and 4.2. 55 4.5 Discussion Similar to other message passing algorithms, a theoretical analysis on the original and lifted MSMP algorithms is hard to carry out. Therefore, we provide some intuitions that might explain why the lifted MSMP algorithm works better than the original MSMP algorithm. Direct eects: Equation (4.21) is much simpler than Equations (4.19) and (4.20). First, the lifted MSMP algorithm needs only one equation, instead of two equations, for each message update. Second, the size of each message is only one real number in the lifted MSMP algorithm instead of two real numbers. These signicantly increase the eciency for updating messages. Structural eects: The CCG provides a much simpler topological structure for message updating. The original MSMP algorithm creates an additional constraint vertex in the factor graph for each constraint, which needs to handle its internal constraint table during message updates. On the contrary, the lifted MSMP algorithm creates auxiliary vertices for each constraint as in the CCG, which do not by themselves consist of internal constraint tables. In other words, the lifted MSMP algorithm breaks the complexity of constraint vertices into multiple vertices in a vertex-weighted graph. 4.6 Conclusion We revived the MSMP algorithm for solving the Boolean WCSP by applying it on its CCG instead of its original form. We observed not only that the lifted MSMP algorithm produced solutions that are close to optimal for a large fraction of benchmark instances, but also that, in general, it produced signicantly better 56 solutions than the original MSMP algorithm. Although the lifted MSMP algorithm requires slightly more work in each iteration since the CCG is constructed using auxiliary variables, the size of the CCG is only linear in the size of the tabular representation of the WCSP (Kumar 2008a; Kumar 2008b; Kumar 2016), and the lifted MSMP algorithm has the benet of producing better solutions. Finally, we experimentally compared the two MSMP algorithms on small random benchmark instances with dierent constraint densities. We found that the lifted MSMP algorithm is more advantageous on benchmark instances with smaller constraint densities, and has almost the same eectiveness as the original MSMP algorithm when the constraint density becomes larger. There are a number of implications of a better MSMP algorithm for the (Boolean) WCSP. (a) In distributed optimization such as DCOPs, the MSMP algorithm is a state-of-the-art algorithm 3 . While there exist a number of improved variants of the MSMP algorithm such as splitting (Ruozzi and Tatikonda 2013) and damp- ing (Cohen and Zivan 2017), to the best of our knowledge, their changes to the standard MSMP algorithm are relatively minor. On the contrary, the idea of applying the MSMP algorithm on the CCG of a given WCSP instance aects a major change to the standard MSMP algorithm. If it can be further proved to be more useful than the standard MSMP algorithm, we will not only create a new direction for improving and understanding the MSMP algorithm, but also poten- tially advance the MSMP algorithm to a new paradigm. (b) Even in a centralized setting, unlike branch-and-bound search, the MSMP algorithm has the advantage of being able to easily parallelize WCSP solving. Due to the simplicity of message update rules, they can be further implemented on GPUs as well (Grauer-Gray, Kambhamettu, and Palaniappan 2008). In an era when GPUs are getting popular 3 The MSMP algorithm is referred to as the \Max-sum" algorithm in the DCOP community. 57 quickly, a revolution to the MSMP algorithm can potentially revolutionize solv- ing the WCSP and other optimization problems where the MSMP algorithm is applicable. 58 Chapter 5 Integer Linear Programming Encoding of the Boolean WCSP via the CCG 5.1 Introduction There are many ways to solve a given WCSP instance. The state-of-the-art meth- ods include best-rst AND/OR search (Marinescu and Dechter 2007) and branch- and-bound search algorithms that exploit soft arc consistencies (Hurley et al. 2016). Unfortunately, none of these WCSP solvers make use of the power of integer lin- ear programming (integer LP, ILP) solvers, such as the Gurobi Optimizer (Gurobi Optimization, Inc. 2018) and lp solve (Berkelaar, Eikland, and Notebaert 2004). ILP solvers are highly optimized and are extensively used for solving problems in operations research. An ecient ILP encoding of the WCSP would therefore create a connection between constraint programming and operations research. An ILP encoding of the WCSP can be borrowed from the probabilistic reasoning community. Here, the WCSP arises as the max-a-posteriori (MAP) problem. 1 Although this ILP encoding is popularly used in probabilistic reasoning (Koller 1 A MAP problem instance on a probabilistic graphic model, such as a Belief Network, can be formulated as a WCSP instance by taking the negative logarithm on the individual probabilities. 59 and Friedman 2009, Section 13.5), it does not scale to large instances since it creates an unwieldy number of variables and constraints. In this chapter, we introduce a new ILP encoding of the WCSP that is based on the idea of the CCG. We refer to this encoding as the CCG-based ILP encoding. We compare it with the ILP encoding in (Koller and Friedman 2009, Section 13.5) and an improved version thereof for the Boolean WCSP. We refer to these two ILP encodings as the direct and improved direct ILP encodings, respectively. We rst derive and compare the theoretical bounds on the number of variables, the number of constraints, and the number of variables in each constraint in the ILPs generated by these three ILP encodings. We show that the CCG-based ILP encoding is more advantageous in terms of these theoretical bounds. In addition, experimentally, we found that the CCG-based ILP encoding is often more ecient than the direct and improved direct ILP encodings. Finally, we establish an important theoretical property of the CCG-based ILP encoding. 5.2 ILP Encodings of the WCSP In this section, we describe three methods to encode a given WCSP instance as an ILP: The direct ILP encoding, the improved direct ILP encoding, and our proposed CCG-based ILP encoding. For notational convenience, throughout this section, we consider the WCSP instanceB =hX =fX 1 ;X 2 ;:::;X N g;D = fD(X 1 );D(X 2 );:::;D(X N )g;C =fC 1 ;C 2 ;:::;C M gi. 60 5.2.1 Direct ILP Encoding For each C 2 C and a 2 A(S(C)), we introduce an ILP variable q C a . Here, A(S(C)) is the set of all assignments of values to variables in constraint C (there- forejA(S(C))j = Q X2S(C) jD(X)j). q C a is either 0 or 1: If q C a = 1, then the assignmenta to the variables inC is part of the to-be-determined optimal solution a , i.e., a jS(C) =a; otherwise it is not. The direct ILP encoding ofB is minimize q C a :q C a 2q X C2C X a2A(S(C)) w C a q C a (5.1) s.t. q C a 2 f0; 1g 8q C a 2q (5.2) X a2A(S(C)) q C a = 1 8C2C (5.3) X a2A(S(C)):ajS(C)\S(C 0 )=s q C a = X a 0 2A(S(C 0 )):a 0 jS(C)\S(C 0 )=s q C 0 a 0 8C;C 0 2C and (5.4) s2A(S(C)\S(C 0 )); where q =fq C a j C2C^a2 A(S(C))g, w C a denotes the weight of assignment a specied by constraint C, and ajS(C)\S(C 0 ) is the projection of the complete assignment a onto the set of common variables in C and C 0 . The cardinality of q is P C2C Q X2S(C) jD(X)j. Here, Equation (5.2) represents the ILP constraints that enforce the Boolean prop- erty for all q C a 's. It consists of P C2C Q X2S(C) jD(X)j = O jCj ^ D ^ C ILP constraints, where ^ C = max C2C jS(C)j and ^ D = max X2X jD(X)j. Equation (5.3) represents the ILP constraints that enforce a unique assign- ment of values to variables in each WCSP constraint. It consists ofjCj ILP constraints, each of which hasjA(S(C))j = Q X2S(C) jD(X)j =O ^ D ^ C vari- ables. 61 Equation (5.4) represents the ILP constraints which enforce that every two assignments in two WCSP constraints must be consistent on their shared variables. It consists ofO jCj 2 ^ D ^ C ILP constraints. Each of these ILP constraints hasO ^ D ^ C variables. Therefore, ifB is a Boolean WCSP instance, the direct ILP encoding hasjqj = O jCj ^ D ^ C =O jCj2 ^ C variables andO jCj 2 ^ D ^ C =O jCj 2 2 ^ C ILP constraints. Each of these ILP constraints hasO ^ D ^ C =O 2 ^ C variables. 5.2.2 Improved Direct ILP Encoding The improved direct ILP encoding is similar to the direct ILP encoding, except that Equation (5.4) is replaced by X a2A(S(C)):ajS(C 0 )=a 0 q C a = q C 0 a 0 (5.5) 8C2C;8C 0 :jS(C 0 )j = 1^S(C 0 )S(C);8a 0 2A(S(C 0 )); with a dummy unary constraint|a constraint that has zero weights in all its tuples|imposed on each variable on which there is no unary constraint. Similar to Equation (5.4), Equation (5.5) also represents the ILP constraints which enforce that every two assignments in two WCSP constraints must be con- sistent on their shared variables. It consists ofO jCjjXj ^ D ^ C ILP constraints. Each of these ILP constraints has O ^ D ^ C variables. However, unlike Equa- tion (5.4), which enforces this ILP constraint by considering every pair of WCSP constraints, here, it only considers each WCSP constraint with all its relevant unary WCSP constraints. This improvement eectively reduces the number of ILP constraints fromO jCj 2 ^ D ^ C toO jCjjXj ^ D ^ C . 62 In summary, ifB is a Boolean WCSP instance, the improved direct ILP encod- ing hasjqj =O jCj ^ D ^ C =O jCj2 ^ C variables andO jCjjXj ^ D ^ C =O jCjjXj 2 ^ C ILP constraints. Each of these ILP constraints hasO ^ D ^ C =O 2 ^ C variables. 5.2.3 CCG-Based ILP Encoding We can encode a WCSP instance as an ILP after transforming it to an equivalent MWVC problem instance on its CCG G =hV;E;wi. The resulting CCG-based ILP encoding is minimize x i :v i 2V jVj X i=1 w i x i (5.6) s.t. x i 2f0; 1g 8 v i 2V (5.7) x i +x j 1 8 (v i ;v j )2E; (5.8) where variable x i represents the presence of v i in the MWVC, i.e., x i = 1 and x i = 0 indicate that v i is and is not in the MWVC, respectively (H. Xu, Kumar, and Koenig 2016). The numbers of ILP variables and constraints are determined by the CCG. We now assume thatB is a Boolean WCSP instance. We can compute the number of vertices and edges in the CCG by following the CCG construction procedure in (Kumar 2008a). A constraint C can be represented by the multivariate polynomial X T2P(S(C)) " c T Y X2T X # ; (5.9) whereP(S(C)) is the power set of S(C) and the c T 's are constants. The CCG gadget corresponding to term c T Q X2T X has at most 2 auxiliary vertices and 63 Table 5.1: Shows the numbers of variables, constraints, and variables per constraint in the three ILP encodings of Boolean WCSP instanceB =hX;D;Ci. Encoding Direct Improved Direct CCG-Based Number of Variables O jCj2 ^ C O jCj2 ^ C O jCj2 ^ C Number of Constraints O jCj 2 2 ^ C O jCjjXj2 ^ C O jCj2 ^ C ^ C Number of Variables per Constraint O 2 ^ C O 2 ^ C 2 O(jTj) edges. The CCG gadget corresponding to constraint C has at mostjS(C)j variable vertices. Therefore, it has an upper bound of O (jS(C)j + 2jP(S(C))j) =O 2 jS(C)j+1 (5.10) vertices and O 0 @ X T2P(S(C)) jTj 1 A =O 0 @ jS(C)j X jTj=0 jS(C)j jTj jTj 1 A =O 2 jS(C)j1 jS(C)j (5.11) edges. Therefore, ifB is a Boolean WCSP instance, the CCG hasO jCj2 ^ C vertices andO jCj2 ^ C ^ C edges constituting the ILP variables (Equation (5.7)) and constraints (Equation (5.8)), respectively, with each of these ILP constraints having at most 2 variables. 5.2.4 Comparison We compare various parameters of the three ILP encodings for the Boolean WCSP in Table 5.1. For any non-trivial Boolean WCSP instances, the CCG-based ILP encoding has a huge advantage over the other two ILP encodings with respect to the number of variables per constraint. This is true even if ^ C is bounded because, in the other two ILP encodings, the number of variables in an ILP constraint 64 corresponding to a WCSP constraint C in Equation (5.3) is 2 jS(C)j 2. For the number of constraints, while dierent values of the parameters lead to dierent trade-os, the most interesting real-world applications of the WCSP have a large numberjCj of constraints and a bounded arity ^ C of the individual constraints. Under such assumptions, the CCG-based ILP encoding is more advantageous than the other two ILP encodings with respect to the number of constraints as well. The CCG-based ILP encoding has the same asymptotic number of variables as the other two ILP encodings. In general, when ^ C is bounded, the CCG-based ILP encoding retains the same order of the number of variables as the other two ILP encodings and signicantly wins on the number of constraints and the number of variables per constraint. 5.3 Experimental Evaluation In this section, we experimentally evaluate the eciencies of solving the Boolean WCSP using the three ILP encodings. We refer to the three algorithms that use these ILP encodings as the direct algorithm, the improved direct algorithm, and the CCG-based algorithm. We used two sets of Boolean WCSP benchmark instances for our experiments. The rst set of benchmark instances is from the UAI 2014 Inference Competi- tion 2 . Here, maximum a posteriori (MAP) inference queries with no evidence on the PR and MMAP benchmark instances can be reformulated as Boolean WCSP instances by rst taking the negative logarithms of the probabilities in each factor 2 http://www.hlt.utdallas.edu/ ~ vgogate/uai14-competition/index.html 65 and then normalizing them. The second set of benchmark instances is from (Hur- ley et al. 2016) 3 . This set includes the Probabilistic Inference Challenge 2011, the Computer Vision and Pattern Recognition OpenGM2 benchmark, the Weighted Partial MaxSAT Evaluation 2013, the MaxCSP 2008 Competition, the MiniZinc Challenge 2012 & 2013 and the CFLib (a library of cost function networks). The experiments were performed on those benchmark instances that have only Boolean variables. We set a running time limit of 120 seconds for each algorithm on the rst set of benchmark instances and 15 seconds on the second set of benchmark instances. We used the Gurobi Optimizer (Gurobi Optimization, Inc. 2018) as the ILP solver. All default settings of the Gurobi Optimizer were kept except that it was congured to use only one CPU thread. The ILP encoding procedures and the CCG construction algorithm were implemented in C++ and were compiled by the GNU Compiler Collection (GCC) 6.3.0 with the \-O3" option. We used the Boost graph library (Siek, Lee, and Lumsdain 2002) to implement the graph representations and operations. We performed our experiments on a GNU/Linux workstation with an Intel Xeon processor E3-1240 v3 (8MB Cache, 3.4GHz) and 16GB RAM. Table 5.2 shows the number of benchmark instances on which all three algo- rithms terminated within the running time limits. We compare the CCG-based encoding with the other two ILP encodings individually on each set of benchmark instances. The number of benchmark instances on which only the CCG-based algorithm terminated is much larger than the number of benchmark instances on which only the direct or improved direct algorithm terminated. We also examine 3 http://genoweb.toulouse.inra.fr/ ~ degivry/evalgm/ 66 Table 5.2: Shows the number of benchmark instances on which the direct algo- rithm/improved direct algorithm and the CCG-based algorithm terminated within the running time limits. (a) CCG-based versus Direct on UAI Termination Status Total CCG-Based Only Direct Only Neither Both Number of Benchmark Instances 160 23 5 14 118 (b) CCG-based versus Improved Direct on UAI Termination Status Total CCG-Based Only Improved Direct Only Neither Both Number of Benchmark Instances 160 12 5 14 129 (c) CCG-based versus Direct on (Hurley et al. 2016) Termination Status Total CCG-Based Only Direct Only Neither Both Number of Benchmark Instances 510 283 0 173 54 (d) CCG-based versus Improved Direct on (Hurley et al. 2016) Termination Status Total CCG-Based Only Improved Direct Only Neither Both Number of Benchmark Instances 510 167 27 146 170 the benchmark instances on which both the CCG-based algorithm and the direct algorithm (or the improved algorithm) terminated. Figure 5.1 reports the comparison of eciencies of the directed, improved directed, and CCG-based algorithms on the benchmark instances on which both algorithms, the CCG-based and the directed or improved directed algorithms, ter- minated within the running time limits. The two left panels of Figure 5.1 com- pare the eciencies of the CCG-based and direct algorithms on the benchmark instances on which both of them terminated within the running time limits. On the UAI benchmark instances, the CCG-based algorithm was more ecient on 110 benchmark instances (red points), and the direct algorithm was more ecient on 8 benchmark instances (blue points). On the (Hurley et al. 2016) benchmark instances, the CCG-based algorithm was more ecient on 54 benchmark instances (red points), and the direct algorithm was more ecient on no benchmark instance. For both sets of benchmark instances, most red points are far from the dashed diag- onal line, meaning that the gap between the running times of the two algorithms 67 20 40 60 80 100 Running Time of the CCG-Based Algorithm 20 40 60 80 100 Running Time of the Direct Algorithm (a) CCG-based versus Direct on UAI: 110/8 20 40 60 80 100 Running Time of the CCG-Based Algorithm 20 40 60 80 100 Running Time of the Improved Direct Algorithm (b) CCG-based versus Improved Direct on UAI: 36/93 0 2 4 6 8 10 Running Time of the CCG-Based Algorithm 0 2 4 6 8 10 Running Time of the Direct Algorithm (c) CCG-based versus Direct on (Hurley et al. 2016): 54/0 0 2 4 6 8 10 12 14 Running Time of the CCG-Based Algorithm 0 2 4 6 8 10 12 14 Running Time of the Improved Direct Algorithm (d) CCG-based versus Improved Direct on (Hurley et al. 2016): 165/5 Figure 5.1: Compares the eciencies of the direct, improved direct, and CCG- based algorithms on the benchmark instances on which both algorithms, the CCG- based and the direct or improved direct algorithms, terminated within the running time limits. Each point represents a benchmark instance. The x and y coordinates of each point show the running times of the CCG-based and (improved) direct algorithms on its corresponding benchmark instance, respectively. The dashed diagonal line represents equal running times. Points above and below this line are colored red and blue, respectively. Red and blue points represent benchmark instances on which the CCG-based and (improved) direct algorithms terminated more quickly, respectively. The caption of each plot shows the number of red/blue points. 68 was very large for those benchmark instances on which the CCG-based algorithm was more ecient. On the other hand, all blue points are close to the dashed diagonal line, meaning that the direct algorithm only marginally outperformed the CCG-based algorithm on these benchmark instances in terms of running time. The two right panels of Figure 5.1 compare the eciencies of the CCG-based and improved direct algorithms on the benchmark instances on which both of them terminated within the running time limit. On the UAI benchmark instances, the CCG-based algorithm was more ecient on 36 benchmark instances (red points), and the improved direct algorithm was more ecient on 93 benchmark instances (blue points). Here, contrary to the theoretical results, experimentally, the CCG- based algorithm was less ecient than the improved direct algorithm. Nevertheless, on the (Hurley et al. 2016) benchmark instances, the CCG-based algorithm was more ecient on 165 benchmark instances (red points), and the improved direct algorithm was more ecient on 5 benchmark instances (blue points). Here, the CCG-based algorithm was signicantly more ecient than the improved direct algorithm. 5.4 A Theoretical Property of the CCG-Based ILP Encoding Since an ILP itself can be interpreted as a WCSP instance with an innite weight marking the violation of an ILP constraint and unary constraints representing the ILP objective function, the concept of the CCG is well dened for ILPs. It can be constructed in polynomial time for an ILP and can be used to generate the CCG- based ILP encoding of the given ILP. A desirable property of the CCG-based ILP 69 encoding is therefore its ability to preserve the integrality of the vertices of the feasible region of its LP relaxation. ILPs can be relaxed to LPs by removing all integrality constraints on their variables. LPs have convex feasible regions and can therefore be solved eciently (in polynomial time). If the feasible region of the LP relaxation of an ILP has only integer vertices (equivalent to an ILP having a totally unimodular (TUM) constraint matrix (Sierksma 2001)), an optimal solution of the LP also yields an optimal solution of the ILP. An ILP can be viewed as a WCSP instance as follows. Each ILP constraint translates to a WCSP constraint with weights of values zero or innity. The ILP objective function translates to a set of unary WCSP constraints. The CCG-based ILP encoding of an ILP produces a new ILP. If the original ILP has only integer vertices in the feasible region of its LP relaxation, it is desirable for the new ILP to also have the same property. This would mean that, if the original ILP is solvable through LP relaxation, the new ILP is also solvable through LP relaxation. In this section, we show that this property of the CCG-based ILP encoding in fact holds for an important subclass of such ILPs, namely, ILPs that model MWVC problem instances on bipartite graphs. The MWVC problem on a given vertex-weighted graph G =hV;E;wi is for- mulated as an ILP of the same form of Equations (5.6) to (5.8), where we simply associate a 0/1 variable x i with each vertex v i 2 V of non-negative weight w i indicating the presence ofv i in the MWVC. IfG is bipartite, its constraint matrix is TUM. Therefore, the LP relaxation of this ILP has only integer vertices in its feasible region (Sierksma 2001). We can formulate this ILP as a WCSP instance with the two types of constraints shown in Table 5.3. 70 x i x j 0 1 0 +1 0 1 0 0 (a) The binary constraint that repre- sents the requirement of covering each edge (v i ;v j )2E x i 0 1 Value 0 w i (b)Theunaryconstraintforeachvertex v i that represents a term in the objec- tive function of minimizing the total weight of the vertex cover Table 5.3: Shows the two types of WCSP constraints for the MWVC problem. Now we show that the CCG created for the MWVC problem on any given bipartite graph is also bipartite, which establishes that the LP relaxation of the CCG-based ILP encoding has only integer vertices in its feasible region. Consider an edge (v i ;v j )2 E. The CCG gadget that represents the constraint of covering this edge involves auxiliary vertices A and A 0 (Kumar 2008a). The CCG gadget itself has the edges (v i ;A), (A;A 0 ) and (A 0 ;v j ). If the original graph is bipartite, then its vertices can be colored using either of two colors, red and blue, such that every edge connects a red vertex and a blue vertex. Without loss of generality, we assume that v i is colored red and v j is colored blue. We then color A blue and A 0 red. Such a coloring of the vertices ensures that the edges of the CCG gadgets also always connect a red vertex and a blue vertex. This means that the CCG is also bipartite. Hence, we establish the desired property of the CCG-based ILP encoding for the MWVC problem on any given bipartite graph. 5.5 Conclusion In this chapter, we introduced the CCG-based ILP encoding of the WCSP. We compared it to the direct and improved direct ILP encodings adapted from the probabilistic reasoning community. We showed that the CCG-based ILP encoding 71 has several theoretical advantages over the direct and improved direct ILP encod- ings. We experimentally showed that the CCG-based ILP encoding was more ecient than the direct ILP encoding. While it is less ecient than the improved ILP encoding on the UAI benchmark instances, it is more ecient on the (Hur- ley et al. 2016) benchmark instances. Finally, we showed that MWVC problem instances on bipartite graphs, whose corresponding ILPs have only integer ver- tices in the feasible regions of their LP relaxations, preserve this property in their CCG-based ILP encodings as well. In theory, the CCG-based ILP encoding is asymptotically more ecient than the (improved) direct ILP encoding. This may be benecial when growing problem scales in the future require us to solve extremely large WCSP instances: The dierence in the asymptotic expressions may be better represented in practice as the problem sizes increase. Furthermore, branch-and-bound search algorithms for large-scale WCSP instances do not exist yet, whilst ILP solvers have been and will continue to be actively researched in large-scale settings. Therefore, reformulating a large-scale WCSP instance as an ILP and solving it can potentially become a more viable solution than branch-and-bound search for the WCSP. Furthermore, improving ILP encoding of the WCSP can have implications for developing branch-and-bound search algorithms for the WCSP. Similar to the WCSP, the current mainstream algorithms for solving the ILP are also based on branch-and-bound search, and they have been studied for decades and have a much richer literature than branch-and-bound search for the WCSP. By building the bridge between the ILP and the WCSP, we introduced a new ILP-based perspective of WCSP solving, which can potentially inspire researchers to borrow branch-and- bound search techniques from ILP solving to solve the WCSP. 72 Chapter 6 Quantum Annealing for the WCSP via the Constraint Composite Graph 6.1 Introduction Theoretical studies in physics suggest that quantum computers are inherently more ecient than classical computers, due to the unique features in quantum processes, such as superposition, interference and entanglement. For example, the integer factorization problem can be solved comfortably in polynomial time by Shor's algorithm (Shor 1994) on quantum computers, but is not known to admit an ecient classical algorithm. Among all types of quantum computer hardware, the quantum annealer is perhaps the most widely used type nowadays due to the commercial availability of its physical realization. The quantum annealer solves combinatorial optimization problems using a quantum process called quantum annealing. It has been shown that quantum annealing is more advantageous than certain classical algorithms on certain classes of problems (Rieel and Polak 2014). In reality, quantum annealing processors have only been built by D-Wave Sys- tems Inc. These so-called \D-Wave processors" (D-Wave Systems Inc. 2017) can 73 only take the Ising problem, equivalent to the quadratic unconstrained binary 1 optimization(QUBO) problem, as its input. Therefore, to solve a combinatorial optimization problem other than the Ising problem, such as the WCSP, a refor- mulation process on classical computers is required. Such an algorithm is called a hybrid quantum-classical algorithm (HQCA). Currently, there exist many HQCAs for constraint optimization problems, such as the maximum weighted independent set (MWIS) problem (Choi 2008), the graph partition problem (Hen and Spedalieri 2016), the graph isomorphism problem (Hen and Sarandy 2016), and the set cover problem (Lucas 2014) as well as its generalization (Cao et al. 2016). However, due to the short history of the quantum annealer, HQCAs for constraint optimization problems still remain understudied in general. Therefore, developing HQCAs for the WCSP, a very general type of constraint optimization problem, not only facili- tates WCSP solving, but also introduces HQCAs to other constraint optimization problems. In addition, HQCAs for the WCSP can enhance branch-and-bound search algorithms for solving combinatorial optimization problems (Tran et al. 2016). In this chapter, we propose the rst three HQCAs for approximately solving the WCSP. One HQCA is specically for the binary Boolean WCSP based on the polynomial forms of constraints. The other two are for the general WCSP (where there may exist non-Boolean variables and non-binary constraints), one based on integer linear programming (ILP) and the other based on the CCG (Kumar 2008a; Kumar 2008b; Kumar 2016). We experimentally compare these approaches and show that while the simple polynomial reformulation works well on the binary Boolean WCSP, the CCG-based HQCA works better on the non-binary Boolean WCSP compared to the ILP-based HQCA. We note that these HQCAs are still 1 Here, binary means Boolean in our terminology. 74 far behind solvers on classical computers in terms of both runtime and solution optimality. Therefore, this chapter serves as a feasibility study on HQCAs for the Boolean WCSP and we hope that they can be more useful as the quantum annealer evolves and that they can intrigue future studies in this direction. 6.2 Quantum Annealing Quantum annealing is a physical process that can be used to approximately solve combinatorial optimization problems. Naively, it can be understood as a meta- heuristic algorithm that makes use of features in quantum processes, such as superposition, interference, and entanglement. The expected solution optimal- ity of quantum annealing for a given problem can be theoretically analyzed, albeit requiring methods that are too sophisticated and derivation that is too complicated to be within the scope of this chapter. In particular, the minimum gap between the energies of the ground state and the rst-excited state during the quantum annealing is indicative of its solution optimality. In practice, the D-Wave processor, a physical realization (and perhaps the most widely used realization) of the quantum annealer, solves the Ising problem, i.e., computes arg min x=hx 1 ;:::;xni E(x) = X x i 2x h i x i + X fx i ;x j g2J J ij x i x j ; (6.1) where h i and J ij are input parameters, x2f1; +1g n , andJ is a subset of the set of all pairs of variables in x determined by the D-Wave processor. Variables x are mapped to qubits in the processor, and parameters h i and J ij are mapped to interactions of each qubit with the external eld and every other qubit, respectively. 75 X i X j 0 1 0 1 3 1 2 6 E C ij (fX i = (x 0 i + 1)=2;X j = (x 0 j + 1)=2g) = c 1;1 +c +1;1 x 0 i +c 1;+1 x 0 j +c +1;+1 x 0 i x 0 j E C ij (fX i = 0;X j = 0g) = 1 E C ij (fX i = 0;X j = 1g) = 3 E C ij (fX i = 1;X j = 0g) = 2 E C ij (fX i = 1;X j = 1g) = 6 c 1;1 = 3:0 c +1;1 = 1:0 c 1;+1 = 1:5 c +1;+1 = 0:5 Figure 6.1: Shows the polynomial form of the binary constraint C ij on the left. The top-right panel shows the polynomial form, where c 1;1 , c +1;1 , c 1;+1 and c +1;+1 are to-be-determined coecients. The middle-right panel shows the system of linear equations that determines all coecients. The bottom-right panel shows the coecients after solving the system of linear equations. 6.3 Polynomial-based HQCA for the Binary Boolean WCSP We can reduce the binary Boolean WCSP to the Ising problem through the con- struction of polynomial forms of unary and binary constraints. A unary constraint C i involving one variable X i can be rewritten in a polynomial form E C i (fX i = (x 0 i + 1)=2g) = k 1 +k 0 2 + k 1 k 0 2 x 0 i ; (6.2) where k 0 = E C i (fX i = 0g), k 1 = E C i (fX i = 1g) and x 0 i 2f1; +1g. A binary constraintC ij involving two variablesX i andX j can be rewritten in a polynomial form E C ij (fX i = (x 0 i +1)=2;X j = (x 0 j +1)=2g) =c 1;1 +c +1;1 x 0 i +c 1;+1 x 0 j +c +1;+1 x 0 i x 0 j (6.3) by simply solving a system of linear equations, wherex 0 i ;x 0 j 2f1; +1g. Figure 6.1 illustrates this procedure. Finding an assignment of values to all x 0 i 's so as to minimize the sum of the polynomial forms of all constraints is an Ising problem that is equivalent to solving the binary Boolean WCSP. 76 6.4 ILP-based HQCA As shown in Equations (5.1) to (5.3) and (5.5), we have an ILP encoding of the WCSP improved upon (Koller and Friedman 2009, Section 13.5). Since a hybrid quantum-classical approach for a special class of ILP is known (Lucas 2014), this yields a possible HQCA for the WCSP. For notational convenience, in this sub- section, we assume that, for each variable X2X , there exists a unary constraint C such that S(C) =fXg. We adapt the Ising formulation of a special class of ILPs (Lucas 2014) to our case as follows. The Ising formulation is divided into two parts min p C a :p C a 2p H =H +H ; (6.4) where p C a = 2q C a 12f1; +1g and p =fp C a j q C a 2qg. Here, H represents ILP constraints andH represents the ILP optimization goal, and and are positive numbers. For each ILP constraint, we add a squared term to H to represent it. The value of H is zero if all constraints are satised and otherwise positive: H = X C2C 2 4 X a2A(S(C)) q C a 1 3 5 2 + X C;C 0 2C:jS(C 0 )j=1^S(C 0 )S(C) a 0 2A(S(C 0 )) 2 4 X a2A(S(C)):ajS(C 0 )=a 0 q C a q C 0 a 0 3 5 2 : (6.5) Here, after polynomial expansion, the quadratic terms (q C a ) 2 can be merged into linear terms due to their Boolean nature, i.e., c(q C a ) 2 =cq C a . 77 To characterize the ILP optimization goal (objective function), we have H = X C2C X a2A(S(C)) w C a q C a : (6.6) and only need to satisfy > X C2C 2 4 X a2A(S(C)) w C a 3 5 : (6.7) This guarantees that the minimum positive value ofH is greater than the max- imum value of H , and therefore any assignment leading to a non-zero H , i.e., violating at least one ILP constraint, cannot be optimal. Combining Equations (6.4) to (6.7) and making the substitution ofq C a = (p C a + 1)=2, we have an Ising formulation of the WCSP. 6.5 CCG-Based HQCA The outline of the CCG-based HQCA is as follows: We rst (a) convert the WCSP to the MWVC problem on its CCG, and then (b) approximately solve this MWVC problem using an HQCA as follows. 6.5.1 An HQCA for the MWVC Problem An Ising formulation of the MWVC problem on a vertex-weighted graph G = hV;E;wi is as follows (Choi 2008). This formulation is adapted from that of the MWIS problem. For each vertex v i , we associate a variable x i with it. x i = 0 and 78 x i = 1 represent the presence and absence of v i in the MWVC, respectively. Then the QUBO formulation is to minimize H(x 1 ;:::;x jVj ) = X v i 2V w i x i + X (v i ;v j )2E J ij x i x j ; (6.8) where w i is the weight associated with vertex v i , and J ij 's satisfy8(v i ;v j )2 E : J ij > minfw i ;w j g. The minimumH(x 1 ;:::;x jVj ) (denoted by H ) is the negative total weight of the MWIS on G|that is, P v i w i +H is the total weight of the MWVC on G. By making the substitution x i = (x 0 i + 1)=2 and x j = (x 0 j + 1)=2, where x 0 i ;x 0 j 2f1; +1g, we have an Ising formulation: H 0 (x 0 1 ;:::;x 0 jVj ) = X v i 2V w i 2 (x 0 i + 1) + X (v i ;v j )2E J ij 4 (x 0 i x 0 j +x 0 i +x 0 j + 1): (6.9) 6.6 Experimental Evaluation We experimentally evaluated the eciency and eectiveness of these three HQCAs using a D-Wave 2X processor. It is based on a physical lattice of qubits (variables in the Ising problem) and the couplers (coecients in the Ising problem) that connect them. These qubits and couplers together are called the Chimera graph, as illustrated in Figure 6.2. In a D-Wave 2X processor (or any other currently available D-Wave processor), the Chimera graph is sparse. Therefore, it may not be possible to feed many Ising problem instances with dense connectivity directly into the D-Wave 2X processor. In this case, the process of embedding is necessary, which is to nd an equivalent Ising problem instance that can be directly fed into the D-Wave 2X processor. In our experiments, for a proof of concept, we simply used the D-Wave library (D-Wave Systems Inc. 2017) to nd embeddings. 79 Figure 6.2: Shows the Chimera graph in a D-Wave 2X processor. The Chimera graph consists of a lattice of \imperfect" K 4;4 bipartite graph units. The green dots represent qubits and edges represent couplers. The red dots represent missing qubits in the K 4;4 units. In our experiments, we selected real-world benchmark instances from the indus- trial weighted partial Max-SAT category of the Eleventh Max-SAT Evaluation 2 and reformulated them as the Boolean WCSP. We selected benchmark instances that have numbers of variables less than 30 to accommodate the limited number of qubits of the D-Wave 2X processor. Of these benchmark instances, only two (wcsp/spot5/dir/8.wcsp.dir.wcnf andwcsp/spot5/log/8.wcsp.log.wcnf) sat- isfy this criteria. The polynomial-based HQCA is not applicable to them because they have non-binary constraints. In addition, our experiments showed that the ILP-based HQCA could not embed any of them within the 5-minute time limit. The solutions produced by the CCG-based HQCA are 96 and 5, respectively, while the optimal solutions for both benchmark instances are 2. 2 http://www.maxsat.udl.cat/16/benchmarks/index.html 80 0% 20% 40% 60% 80% 100% (Solution of an Algorithm - Optimal Solution) Optimal Solution 0 10 20 30 40 50 Number of benchmark instances CCG-based Polynomial-based 0% 20% 40% 60% 80% 100% (Solution of CCG - Optimal Solution) Optimal Solution 0 2 4 6 8 10 12 Number of benchmark instances Figure 6.3: Compares suboptimalities of solutions produced by HQCAs on the two benchmark instance sets. The x-axes show the suboptimalities of the solutions produced by the CCG-based and the polynomial-based HQCAs. The y-axes show the number of benchmark instances in a range of suboptimality. The upper gure compares qualities of solutions produced by the CCG-based and the polynomial- based HQCAs with optimal solutions on the rst benchmark instance set with only binary constraints. The polynomial-based HQCA produced optimal solutions on 23 out of 50 benchmark instances and the suboptimalities on all other benchmark instances are less than 10%. The lower gure compares suboptimalities of solutions produced by the CCG-based HQCA on WCSP benchmark instances from the second benchmark instance set. 81 Popular real-world benchmark instances, such as those in the Eleventh Max- SAT evaluation as well as those used in (Hurley et al. 2016), are too large to be embedded into a D-Wave 2X processor. For this reason, we also generated two random Boolean WCSP benchmark instance sets. (HQCAs that work on the D- Wave 2X processor will also work on larger benchmark instances on more advanced quantum annealing processors in the future.) In each benchmark instance in the rst benchmark instance set, for every pair of variables, we generated a binary constraint between them with probability p = 0:1. We assigned a random integer weight between 0 and 100 to each tuple in these constraints. In each benchmark instance in the second benchmark instance set, we generated both binary and ternary constraints. Binary constraints were generated in the same way as in the rst benchmark instance set, except with p = 0:12. For every triplet of variables, we also generated a ternary constraint between them with probability 0.0001. The number of variables in all benchmark instances is 50. Given the way the benchmark instances were generated, the average number of constraints that each variable par- ticipates in is about 3. We used functions find embedding and unembed answer from the D-Wave Python library to nd embeddings and restore solutions to the original benchmark instances, respectively. For find embedding, we set the time- out limit to 1000 seconds, and turned on the fast embedding option for trading o fast embedding against embedding quality. For each benchmark instance, we requested the D-Wave 2X processor to run for 1000 times 3 . For all benchmark instances, we also obtained optimal solutions using toulbar2 (Hurley et al. 2016), 3 While this may seem odd for algorithms on classical computers, it is common practice to run the quantum annealing procedure for thousands of times (and they normally terminate very quickly). 82 a state-of-the-art WCSP solver. The process of solving Ising instances was per- formed on a D-Wave 2X processor while other processes including nding embed- dings and unembedding solutions were performed on a GNU/Linux workstation with an Intel Xeon processor E3-1240 v3 (8MB Cache, 3.4GHz) and 16GB RAM. Figure 6.3 compares the qualities of solutions produced by the polynomial- based and CCG-based HQCAs with optimal solutions on the benchmark instances from both benchmark instance sets. The ILP-based HQCA could not embed any benchmark instances into the Chimera graph within the time limit and is thus not shown. The CCG-based HQCA terminated between 4 to 85 seconds on all bench- mark instances. Despite 1000 runs, the running time of the D-Wave 2X processor on each benchmark instance is within 450 milliseconds. The Ising formulation processes in both HQCAs also cost insignicant amounts of time (within 60 mil- liseconds). The majority of the time was consumed by functions find embedding and unembed answer. In fact, if find embedding and unembed answer are not required, the eciency and eectiveness of HQCAs can be outstanding for approx- imately solving the Boolean WCSP. To verify this, we generated 50 Ising problem instances, which can be seen as special cases of Boolean WCSP instances, by randomly selecting 50% of edges of the Chimera graph as constraints of Boolean WCSP instances with random integer weights. We used the D-Wave 2X processor, and toulbar2 to solve them. We also reformulated them as weighted Max-SAT and solved them using open-wbo (Martins, Manquinho, and Lynce 2014). The experimental results showed that the quantum annealer produced solutions within 0.4 seconds and the qualities of the solutions were better than those produced by toulbar2 and open-wbo within a 5-minute time limit. 83 6.7 Conclusion In this chapter, we proposed the rst three HQCAs for solving the WCSP: Polynomial- based, ILP-based, and CCG-based HQCAs. We evaluated them on the Boolean WCSP using experiments on a D-Wave 2X processor, a physical realization of the quantum annealer. We showed that the polynomial-based HQCA works well on the binary Boolean WCSP, but the CCG-based HQCA is not only more widely applica- ble, but also works better than the ILP-based HQCA on the general Boolean WCSP (where the polynomial-based HQCA is not applicable). While these HQCAs are still far behind solvers such as toulbar2 on classical computers in terms of both runtime and solution optimality, we hope that these HQCAs become more useful as the quantum annealer evolves, and that they can serve as a starting point for future developments in using the quantum annealer for solving the WCSP. 84 Chapter 7 Promising Direction: Extending the Concept of the Constraint Composite Graph to the WCSP with Non-Boolean Variables While we have demonstrated the advantages of the CCG in the previous few chapters, many real-world problems are hard to be eciently modeled as the Boolean WCSP. Unfortunately, the existing literature only studied the CCG for the Boolean WCSP. This primarily stems from the fact that it is easy to reduce the Boolean WCSP to the MWVC problem on the CCG. The presence/absence of a vertex in the MWVC is used to represent a Boolean variable. In this chapter, we extend the concept of the CCG to the WCSP with non- Boolean variables. We rst give a formal denition of the CCG for the WCSP with non-Boolean variables. We then review the non-Boolean variable encod- ing from (Kumar 2008b), which we refer to as the high-degree polynomial-based encoding. We then propose three new|and more ecient|non-Boolean variable encodings, i.e., the binary number-based encoding, the direct symmetric encoding, and the clique-based encoding. Finally, experimentally, we preliminarily demon- strate the promisingness of the CCG for the WCSP with non-Boolean variables via quantum annealers. 85 Y i Y j 0 1 0 1 3 1 3 6 2 7 1 E C (fY i =y i ;Y j =y j g) =c 0;0 +c 1;0 y i +c 2;0 y 2 i +c 0;1 y j +c 1;1 y i y j +c 2;1 y 2 i y j E C (fY i = 0;Y j = 0g) = 1 E C (fY i = 1;Y j = 0g) = 3 E C (fY i = 2;Y j = 0g) = 7 E C (fY i = 0;Y j = 1g) = 3 E C (fY i = 1;Y j = 1g) = 6 E C (fY i = 2;Y j = 1g) = 1 c 0;0 = 1 c 1;0 = 1 c 2;0 = 1 c 0;1 = 2 c 1;1 = 6 c 2;1 =5 Figure 7.1: Shows the polynomial form of the constraint C on the left. The top- right panel shows the polynomial form, where c 0;0 , c 1;0 , c 2;0 , c 0;1 , c 1;1 , and c 2;1 are to-be-determined coecients. The middle-right panel shows the system of linear equations that determines all coecients. The bottom-right panel shows the coecients after solving the system of linear equations. 7.1 Formal Denitions We now dene the CCG for the WCSP with non-Boolean variables. Denition 7.1. A vertex-weighted undirected graphG =hV;E;wi is a CCG of a WCSP instancehX;D;Ci (with non-Boolean variables) if and only if there exists a subset S V , to whose elements we refer as variable vertices, such that their presences and absences in any VC of G correspond to an assignment of values to variables in the WCSP, and there exists a function f :R!R that maps the minimum possible weight of all VCs respecting these presences and absences to the weight of the corresponding assignment of values to variables in the original WCSP. 7.2 Construction of the CCG for the WCSP with Non-Boolean Variables In this section, we review and formally describe one non-Boolean variable encoding and propose three new and more ecient non-Boolean variable encodings. 86 w 1 v Y i ;0 w 2 v Y i ;1 . . . v Y i ;d i 1 (a) wY i v Y1;0 v Y1;1 v Y1;2 v Y2;0 v Y2;1 v Y2;2 v Y3;0 v Y3;1 w 2 4 0 0 0 3 5 2 4 0 0 1 3 5 2 4 0 1 0 3 5 2 4 0 1 1 3 5 2 4 0 2 0 3 5 2 4 0 2 1 3 5 2 4 1 0 0 3 5 2 4 1 0 1 3 5 2 4 1 1 0 3 5 2 4 1 1 1 3 5 2 4 1 2 0 3 5 2 4 1 2 1 3 5 2 4 2 0 0 3 5 2 4 2 0 1 3 5 2 4 2 1 0 3 5 2 4 2 1 1 3 5 2 4 2 2 0 3 5 2 4 2 2 1 3 5 (b)w(Y 1 Y 2 Y 3 ) v Y1;0 v Y1;1 v Y1;2 v Y2;0 v Y2;1 v Y2;2 v Y3;0 v Y3;1 L w 2 4 0 0 0 3 5 2 4 0 0 1 3 5 2 4 0 1 0 3 5 2 4 0 1 1 3 5 2 4 0 2 0 3 5 2 4 0 2 1 3 5 2 4 1 0 0 3 5 2 4 1 0 1 3 5 2 4 1 1 0 3 5 2 4 1 1 1 3 5 2 4 1 2 0 3 5 2 4 1 2 1 3 5 2 4 2 0 0 3 5 2 4 2 0 1 3 5 2 4 2 1 0 3 5 2 4 2 1 1 3 5 2 4 2 2 0 3 5 2 4 2 2 1 3 5 (c) w(Y 1 Y 2 Y 3 ) Figure 7.2: Illustrates the high-degree polynomial-based encoding. In (b) and (c), the variables Y 1 , Y 2 , and Y 3 are assumed to have domain sizes 4, 4, and 3, respectively. Circles represent variable vertices. Their weights are 0 in (b) and (c) (not explicitly shown). Empty and lled squares represent the auxiliary vertices that encode the coecients and negation of variables, respectively. The triplet of numbers below each empty square indicates the variable vertices that it connects to. 87 7.2.1 High-Degree Polynomial-Based Encoding The high-degree polynomial-based encoding was rst proposed in (Kumar 2008b). It uses a high-degree polynomial to represent a constraint with non-Boolean vari- ables (as illustrated in Figure 7.1). In this subsection, we letd i denote the domain size of a non-Boolean variable Y i . Each non-Boolean variable Y i is represented by d i vertices V Y i =fv Y i ;0 ;v Y i ;1 ;:::;v Y i ;d i 1 g, referred to as variable vertices, in the CCG gadgets. The number of these vertices in the computed MWVC indicates the value of Y i . A linear term wY i , where w may be either positive or negative, can be rep- resented by d i 1 connected components, where each connected component con- sists of 2 connected vertices with weights of w 1 and w 2 respectively such that w 1 w 2 = w. The vertices with weight w 1 represent Y i . Figure 7.2a illustrates this. For a negative non-linear termw(Y 1 Y 2 :::Y m ), wherew> 0, we construct the CCG gadget as follows. We create a bipartite graph. The rst partition consists of all and only variable vertices. In the second partition, we add Q m i=1 (d i 1) auxiliary vertices with weight w, with each of these vertices representing an assignment of values to the variables in the term. Each auxiliary vertex connects to exactly one variable vertex of each variable. It connects to variable vertices that constitute its corresponding assignment. For example, for the termw(Y 1 Y 2 Y 3 ), an auxiliary vertex connected tov Y 1;0 ,v Y 2;2 , andv Y 3;1 corresponds to the assignment fY 1 = 0;Y 2 = 2;Y 3 = 1g. This CCG gadget represents the term w ( Q m i=1 (d i 1) Q m i=1 Y i ). Figure 7.2b illustrates this. Intuitively, this can be seen as follows. The i th variable leaves Y i d i 1 of all auxiliary vertices to be potentially excluded from the MWVC, i.e., at least one adjacent edge is already covered. This leaves Q m i=1 Y i d i 1 of all auxiliary vertices, i.e., Q m i=1 Y i auxiliary vertices, to be excluded 88 from the MWVC. That is, the total weight of vertices selected in the MWVC is w ( Q m i=1 (d i 1) Q m i=1 Y i ). For a positive non-linear term w (Y 1 Y 2 :::Y m ), wherew> 0, we construct the CCG gadget as follows. We rst create the CCG gadget as if w < 0. Then, to accommodate the positive coecient, we split each edge that is adjacent to any variable vertex of Y 1 into two parts by inserting a vertex of a large weight L. These newly introduced vertices form a third partition, and are meant to represent the negation of the variable Y 1 . For this reason, L should be chosen such that it is greater than the sum of the weights of the vertices that it connects to, i.e., L > w Q m i=2 (d i 1). This CCG gadget represents w [ Q m i=1 (d i 1) (d 1 1Y 1 ) Q m i=2 Y i ] +L (d 1 1Y 1 ), in which the highest-degree term is the non-linear term of interest and the CCG gadgets for lower-degree terms are recursively constructed (Kumar 2008b). An illustration is shown in Figure 7.2c, which represents w (18 (3Y 1 )Y 2 Y 3 ) +L (3Y 1 ). 7.2.2 Binary Number-Based Encoding For each non-Boolean variable Y with domain size d, the binary number-based encoding usesdlog 2 de Boolean variablesX Y =fX Y;1 ;X Y;2 ;:::;X Y;dlog 2 de g to rep- resent it. This converts any constraint involving this variable into a Boolean con- straint, i.e., a constraint with only Boolean variables. The binary representation of the value ofY corresponds to the values of these Boolean variables. For example, if d = 6, thenY = 3 corresponds toX Y;1 = 1,X Y;2 = 1, andX Y;3 = 0. If log 2 d is not an integer, then some assignments of values to variables inX Y are forbidden, since they may represent values larger than what Y can take. Continuing the above example,fX Y;1 = 1;X Y;2 = 1;X Y;3 = 1g is forbidden since Y = 7 is not allowed. To forbid such assignments, we impose a high weight corresponding to them in 89 the Boolean constraint. The binary number-based encoding is similar to the \log encoding" used in converting the CSP to SAT (Walsh 2000). 7.2.3 Direct Symmetric Encoding For each non-Boolean variableY with domain sized, the direct symmetric encoding uses d Boolean variablesX Y =fX Y;0 ;X Y;1 ;:::;X Y;d1 g to represent it. X Y;i = 1 and8j2f0; 1;:::;d 1gnfig : X Y;j = 0 together indicate Y = i. All other assignments of values toX Y;0 ;X Y;1 ;:::;X Y;d1 are forbidden via a global constraint on these d variables. Constraints over non-Boolean variables are converted to constraints over these Boolean variables. This encoding is similar to the \direct encoding" used in converting the CSP to SAT (Walsh 2000). 7.2.4 Clique-Based Encoding The clique-based encoding exploits the unique structure of the MWVC problem. To the best of our knowledge, this encoding does not have a counterpart in SAT encoding of the CSP. For each non-Boolean variable Y with domain size d, the clique-based encoding uses (d1) Boolean variablesX Y =fX Y;1 ;X Y;2 ;:::;X Y;d1 g to represent it. Similar to the binary number-based and direct symmetric encod- ings, the clique-based encoding converts any constraint C involving non-Boolean variables into a Boolean constraintC 0 . Y = 0 corresponds to all these Boolean vari- ables equal to 1, and Y =y, where y2f1; 2;:::;d 1g, corresponds to X Y;y = 0 and all other Boolean variables equal to 1. All other possible assignments of values to variables in S(C 0 ) are forbidden. Since they are forbidden for representational reasons, they are referred to as being variable-representationally forbidden. We impose zero weight to variable-representationally forbidden assignments inC 0 , but forbid them with additional edges in the CCG gadget. In particular, in the CCG 90 gadget, for each non-Boolean variable Y , we connect every pair of vertices rep- resenting Boolean variables inX Y to form a clique. It is easy to see that all variable-representationally forbidden assignments of values to variables inX Y cor- respond to only invalid VCs of the CCG gadget, but all other assignments have valid corresponding VCs. Consider the polynomial form of C 0 as illustrated in Figure 7.1. Since all variables in C 0 are Boolean, this polynomial is only multi-linear, i.e., the highest- degree of any variable in C 0 is 1. Furthermore, the polynomial form of C 0 has only terms with degrees no less thanjS(C 0 )jjS(C)j. This signicantly simplies the construction of the CCG gadget, since the procedure, as shown in (Kumar 2008a), to construct the CCG gadget considers each term of the polynomial one at a time. This property of the polynomial form of C 0 is further leveraged in the clique-based encoding as follows. While the construction procedure in (Kumar 2008a) is straightforward for linear and negative non-linear terms, for a positive non-linear term T , we need to introduce a lower order term T 0 by removing a variable from T . To minimize the size of the resulting CCG gadget, we always choose the variable to remove from a preset order on all variables. We create this preset order by (a) xing an order on all variables in the WCSP instance and all Boolean variables representing each of them, and (b) concatenating these groups of ordered Boolean variables according to the order of variables in S(C). Compared to arbitrary choices of the variable to remove, this scheme usually decreases the number of introduced lower order terms. Comparing Non-Boolean Variable Encodings We consider a constraint consisting of n variablesfY 1 ;Y 2 ;:::;Y n g in which in general no two weights are equal and all variables have domain sized. For the sake 91 of theoretical analysis, we examine the asymptotic size of the CCG gadget for it with respect to either large n or large d (with n 2). The number of vertices, dominated by the number of auxiliary vertices, and the number of edges can be counted as follows (for generality, we assume that lower order terms introduced during the construction of the CCG gadget do not cancel existing lower order terms): High-Degree Polynomial-Based Encoding The high-degree polynomial has n types of terms, among which each type involves i = 1; 2;:::;n variables. The type of term that involves i variables has n i combinations of participating vari- ables. For i given variables, there are (d 1) i terms, among which each term corresponds to (d 1) i auxiliary vertices and i(d 1) i edges. Therefore, the numbers of vertices and edges of the CCG gadget produced by the high-degree polynomial-based encoding are P n i=1 (d 1) 2i n i = (d 1) 2 + 1 n and P n i=1 i(d 1) 2i n i = n(d 1) 2 (d 1) 2 + 1 n1 , respectively. Binary Number-Based Encoding There aredlog 2 de Boolean variables repre- senting each variable, and therefore there are ndlog 2 de variables in total. By enu- merating the presence and absence of these variables in each term, we have ndlog 2 de i terms that involve i variables, among which each term consists of (1) auxiliary vertices and (i) edges. Therefore, there are P ndlog 2 de i=1 ndlog 2 de i = d n ver- tices and P ndlog 2 de i=1 i ndlog 2 de i = ndlog 2 de d n edges in total, where dd is the smallest integer such that log 2 d is an integer. Direct Symmetric Encoding There ared Boolean variables representing each non-Boolean variable and each Boolean constraint consists of exactly one Boolean variable representing each non-Boolean variable, and therefore there ared n Boolean 92 constraints of arityn. Now we consider the worst case, i.e., all these Boolean con- straints have positive terms in their polynomial forms. For each of these Boolean constraints, we haveO(n) auxiliary vertices andO(n 2 ) edges. Therefore, we have O(nd n ) vertices andO(n 2 d n ) edges. We note that the number of vertices and edges introduced by the global constraint on the Boolean variables representing each non- Boolean variable can be neglected since they are only polynomial with respect tod. This is because the global constraint can be seen as an exact 1-out-of-d function, which in turn can be converted toO(d 2 ) binary Boolean constraints (Anthony et al. 2016). This leads to onlyO(nd 2 ) auxiliary vertices and edges. Clique-Based Encoding There are (d 1) Boolean variables representing each non-Boolean variable, and therefore there are n(d 1) Boolean variables in total. Now we consider the worst case, i.e., all terms have positive coecients. We follow the recursive algorithm to introduce lower order terms described in Sec- tion 7.2.4 and, without loss of generality, assume that non-Boolean variables are in the orderY n ;Y n1 ;:::;Y 1 . There areO ((j + 1)d i1 ) terms which consist of exactly 1 j d 1 Boolean variables representing Y i and no Boolean variable repre- senting Y i 0, where i 0 > i. Each term corresponds to (1) auxiliary vertices and ((i1)(d1)+j) edges. Therefore, the total numbers of vertices and edges equal O P n i=1 P d1 j=1 (j + 1)d i1 =O(d n+1 ) andO P n i=1 P d1 j=1 ((i 1)(d 1) +j)(j + 1)d i1 = O(nd n+2 ), respectively. We note that we neglect the edges connecting Boolean vari- ables representing each non-Boolean variable, since the number n(d 1)(d 2)=2 of these edges is far less than nd n+2 . We also note that, in practice, since it is common that some terms have negative coecients, the number of vertices and edges can be much lower. 93 Table 7.1: Compares the sizes of the CCG gadget using dierent non-Boolean variable encodings. We construct the CCG gadget corresponding to a constraint with n 2 variablesfY 1 ;Y 2 ;:::;Y n g in which in general no two weights are equal and all variables have domain size d. Encoding Vertices Edges High-Degree Polynomial-Based Encoding (d 1) 2 + 1 n n(d 1) 2 (d 1) 2 + 1 n1 Binary Number-Based Encoding d n =O (2 n d n ) ndlog 2 de d n =O (n(log 2 d)2 n d n ) Direct Symmetric Encoding O(nd n ) O(n 2 d n ) Clique-Based Encoding O d n+1 O nd n+2 Table 7.2: Shows the most advantageous non-Boolean variable encoding for dier- ent bounding on n or d in terms of the asymptotic numbers of vertices and edges of the produced CCG gadgets. Bounded Vertices Edges n but not d Binary Number-Based; Direct Symmetric Direct Symmetric d but not n Clique-Based Clique-Based Table 7.1 summarizes the numbers of vertices and edges in the CCG gadget for each of the four non-Boolean variable encodings. If either d or n is bounded, the high-degree polynomial-based encoding is the least favorable. Table 7.2 shows the advantages (in terms of asymptotic numbers of vertices and edges of the produced CCG gadgets) of the other three non-Boolean variable encodings under dierent settings. We see that none of them is always the best: There are trade-os among them. The clique-based encoding has another advantage over the binary number- based and direct symmetric encodings: It does not introduce very large weights. This can be potentially helpful for avoiding numerical accuracy issues in practice. All these non-Boolean variable encodings except for the direct symmetric encod- ing reduce to the same encoding for the Boolean WCSP. Despite the numbers being exponential with respect to n, they are all polynomial with respect to the actual input size, since the input size of the constraint, i.e., the number of input bits required to represent it, is (d n ). 94 0% 20% 40% 60% 80% 100% (Solution of CCG - Optimal Solution) Optimal Solution 0 2 4 6 8 Number of benchmark instances Figure 7.3: Compares suboptimalities of solutions produced by the CCG-based HQCA. The x-axis shows the suboptimalities of the solutions produced by the CCG-based HQCA. The y-axis shows the number of benchmark instances in a range of suboptimality. 7.3 Experimental Evaluation of the CCG-Based HQCA We now repeat the experiments in Chapter 6 with clique-based encoding enabled in the CCG-based HQCA on a benchmark instance set with WCSP instances with non-Boolean variables. For similar reasons as in Chapter 6, we generated a random WCSP benchmark instance set. In each benchmark instance, the number of variables is 20 and the domain size of each variable is randomly set to be 2 or 3. Constraints are generated in the same way as in Chapter 6, except with p = 0:2. 95 Given the way the benchmark instances were generated, the average number of constraints that each variable participates in is about 3. Figure 7.3 compares the qualities of solutions produced by the CCG-based HQCA with optimal solutions on the benchmark instances. Once again, within the time limit, the ILP-based HQCA could not embed any benchmark instances into the Chimera graph and is thus not shown. The polynomial-based HQCA is not shown since it is not applicable. The CCG-based HQCA successfully embedded 38 out of 50 benchmark instances between 3 and 292 seconds. Since its advantage over the ILP-based HQCA carries over to the non-Boolean WCSP, we believe that the CCG for the non-Boolean WCSP can be promising. 7.4 Conclusion In this chapter, we extended the concept of the CCG to the WCSP with non- Boolean variables. We reviewed one non-Boolean variable encoding and proposed three new non-Boolean variable encodings. We compared all of them, and we con- cluded that, in theory, the binary number-based encoding, the direct symmetric encoding, and the clique-based encoding are more advantageous than the high- degree polynomial-based encoding, while there are trade-os among the former three non-Boolean variable encodings under dierent settings. In practice, the clique-based encoding can potentially be the most preferable non-Boolean variable encoding due to its better numerical accuracy. Finally, experimentally, we pre- liminarily demonstrated the promisingness of the CCG for the WCSP with non- Boolean variables. We reran the experiments in Chapter 6 on WCSP benchmark instances with non-Boolean variables by introducing the clique-based encoding to the CCG construction procedure. 96 Chapter 8 Conclusion 8.1 Conclusion of Contributions The WCSP is a general mathematical framework for COPs. It is known by dierent names in dierent research communities and therefore studying the WCSP brings them together. In this dissertation, we demonstrated that the CCG can be useful in both theory and practice for improving some algorithms for solving the Boolean WCSP. We also extended the concept of the CCG to the WCSP with non-Boolean variables. This dissertation makes the following contributions: In Chapter 3, we experimentally studied the eects of enabling the Nemhauser- Trotter reduction (NT reduction) on the Boolean WCSP via the CCG. This leads to a polynomial-time preprocessing algorithm that xes the optimal values of a subset of variables in a WCSP instance. This subset can often be the set of all variables: We observed that the NT reduction could determine the optimal values of all variables for about 1=8 th of the benchmark instances without search. The enabling of the NT reduction can also be potentially meaningful for improving branch-and-bound search for the WCSP if we view the NT reducibility as a kind of implicit local consistency. In Chapter 4, we experimentally studied the advantages of applying the min- sum message passing (MSMP) algorithm to the CCG of the Boolean WCSP. We observed not only that the lifted MSMP algorithm produced solutions 97 that are close to optimal for a large fraction of benchmark instances, but also that, in general, it produced signicantly better solutions than the original MSMP algorithm. Although the lifted MSMP algorithm requires slightly more work in each iteration since the CCG is constructed using auxiliary variables, the size of the CCG is only linear in the size of the tabular rep- resentation of the Boolean WCSP (Kumar 2008a; Kumar 2008b; Kumar 2016), and the lifted MSMP algorithm has the benet of producing bet- ter solutions. We experimentally compared the two MSMP algorithms on small random benchmark instances with dierent constraint densities. We found that the lifted MSMP algorithm is more advantageous on benchmark instances with smaller constraint densities, and has almost the same eective- ness as the original MSMP algorithm when the constraint density becomes larger. Furthermore, this lifted MSMP algorithm non-trivially altered the standard MSMP algorithm and may inspire, or even directly advance, the message passing algorithms to a new generation in the future. In addition, due to the parallel nature of the MSMP algorithm, it has the advantage of being able to make use of GPUs, which is harder for branch-and-bound search. In Chapter 5, We compared the CCG-based ILP encoding with the direct and improved direct ILP encodings adapted from the probabilistic reasoning community. We showed that the CCG-based ILP encoding has several the- oretical advantages over the direct and improved direct ILP encodings. We experimentally showed that the CCG-based ILP encoding was more ecient than the direct ILP encoding. While it is less ecient than the improved ILP encoding on the UAI benchmark instances, it is more ecient on the (Hurley et al. 2016) benchmark instances. Finally, we showed that MWVC problem 98 instances on bipartite graphs, whose corresponding ILPs have only integer vertices in the feasible regions of their LP relaxations, preserve this property in their CCG-based ILP encodings as well. Having an ecient ILP encoding for the WCSP may potentially lead to more viable large-scale WCSP solving due to the fact that ILP solvers have been actively developed and advanced for decades for large problem instances. Having an ecient ILP encoding for the WCSP may also facilitate the development of branch-and-bound search algorithms for the WCSP by introducing an ILP perspective, since the major class of algorithms for solving the ILP is also based on branch-and-bound search and has been actively advanced for decades. In Chapter 6, we demonstrated the advantages of solving the Boolean WCSP using the quantum annealer via the CCG. We evaluated the CCG-based HQCA on the Boolean WCSP using experiments on a D-Wave 2X proces- sor, a physical realization of the quantum annealer. We showed that the polynomial-based HQCA works well on the binary Boolean WCSP, but the CCG-based HQCA is not only more widely applicable, but also works bet- ter than the ILP-based HQCA on the general Boolean WCSP (where the polynomial-based HQCA is not applicable). While these HQCAs are still far behind solvers such as toulbar2 on classical computers in terms of both runtime and solution optimality, we hope that these HQCAs become more useful as the quantum annealer evolves, and that they can serve as a starting point for future developments in using the quantum annealer for solving the WCSP. 99 In Chapter 7, we extended the concept and the construction of the CCG to the WCSP with non-Boolean variables and showed that it can be a poten- tially promising future direction. We reviewed one non-Boolean variable encoding and proposed three new non-Boolean variable encodings. We com- pared all of them, and we concluded that, in theory, the binary number-based encoding, the direct symmetric encoding, and the clique-based encoding are more advantageous than the high-degree polynomial-based encoding, while there are trade-os among the former three non-Boolean variable encodings under dierent settings. In practice, the clique-based encoding can poten- tially be the most preferable non-Boolean variable encoding due to its bet- ter numerical accuracy. Experimentally, we preliminarily demonstrated the promisingness of the CCG for the WCSP with non-Boolean variables. We reran the experiments in Chapter 6 on WCSP benchmark instances with non-Boolean variables by introducing the clique-based encoding to the CCG construction procedure. If we are able to present more evidence support- ing the usefulness of the CCG for the WCSP with non-Boolean variables, we would have a handy tool for exploiting the structure of the WCSP with non-Boolean variables. 8.2 Further Discussion We have demonstrated that the CCG can be useful in both theory and practice for improving some algorithms for solving the Boolean WCSP. We discuss answers to the following natural questions: Is the CCG the holy grail for solving the Boolean WCSP? In terms of the bipartitivity of the CCG, the short answer is no. For example, the maximum 100 matching problem is in P, but the corresponding CCG is not necessarily bipartite and the MWVC problem on it cannot be easily identied as belonging to P. There exists a polynomial-time factor-2 approximation algorithm for solving the MWVC problem. Does the CCG carry over this approx- imability property to the WCSP? In general, the CCG does not. The rea- son lies in the procedure that constructs CCG gadgets. CCG gadgets represent WCSP constraints but usually with an additional constant (such as w in the case of negative nonlinear terms). This additional constant inhibits the approximability property from being carried over to the WCSP. For the CSP, does the CCG have any advantages or disadvantages? Advantages: The CCG can also be used for discovering tractable subclasses of the CSP, and all methods discussed in this dissertation are also applicable. Dis- advantages: The CSP is a satisfaction problem. However, using the CCG would eectively cast the problem as an optimization problem, which is in general much harder than satisfaction problems. How do the various aforementioned CCG-based algorithms experimen- tally scale with the problem size? This dissertation does not discuss how var- ious aforementioned CCG-based algorithms experimentally scale with the problem size. While this looks simple on the surface, here, we argue that such experiments face multiple diculties. The availability of real-world WCSP instances with dierent sizes from the same application domain is quite limited. Therefore, experimenting this scalability on real-world problem instances is quite challenging, and requires 101 systematic compilation of more real-world WCSP instances for meaningful results. When real-world problem instances are unavailable, many researchers would turn to random problem instances. However, this practice has been criti- cized in experimental algorithm studies in general, because random problem instances usually bear properties that are very dierent from real-world prob- lem instances and the applications of a specic algorithm on these problem instances also exhibit dierent scalabilities. For example, dierent ways to generate random Boolean satiability problem (SAT) instances lead to very dierent runtimes for each type of algorithm (Balyo and Chrpa 2018). Unless the random problem instances can be generated in a way that simulates real-world problem instances, which by itself commonly requires dedicated research, this disparity between real-world and random problem instances often diminishes the meaningfulness of this practice. This is also one of the major reasons why researchers carefully compile and build benchmarks for various problems. 8.3 Future Work In addition to Chapter 7, concerning the essence of the CCG, in the future, we can also do the following: Further experiment and improve on non-Boolean variable encodings. Cur- rently, as shown in Chapter 7, we only preliminarily experimented on our newly proposed non-Boolean variable encodings. To further study non- Boolean variable encodings, we should perform more thorough experiments 102 on them. In addition, there may be potentially new non-Boolean variable encodings that are more advantageous on one or many aspects. Explore the usefulness of constructing the CCG recursively, i.e., constructing the CCG of the MWVC problem instance on the CCG associated with a WCSP instance, and so on. From the dissertation, we have already seen that the CCG oers benets regarding the structure of COPs. Since the MWVC problem is also a kind of COP, constructing the CCG for the MWVC problem may lead to more interesting results and may potentially introduce new concepts such as a new kind of higher orders of structure. Explore the crown reduction (Chleb k and Chleb kov a 2008), another ker- nelization algorithm for the MWVC problem. In this dissertation, while we have only discussed how the CCG helps solve and understand the WCSP via the NT reduction, other kernelization algorithms for the MWVC problem can also be interesting and are worth further attention. Develop ecient CCG representations dedicated for more specialized types of COPs such as weighted Max-SAT and weighted Max-Cut problems. The current CCG construction procedure is made for the general mathematical framework of COPs, i.e., the WCSP, and we have already demonstrated its usefulness for it. Therefore, it would be interesting to see whether specic new structure can be understood using the CCG for specialized types of COPs. For specic algorithms applied on the WCSP, we can further: Develop a distributed version of the lifted MSMP algorithm using grid/cloud computing facilities. As discussed in Chapter 4, one advantage of the MSMP 103 algorithm is that it can be easily adapted to be used in distributed settings. Therefore, it would be interesting to study and experiment the lifted MSMP algorithm in real-world distributed settings. Prove properties of the CCG-based ILP encoding for ILPs with TUM con- straint matrices. ILPs with TUM constraint matrices have a number of nice properties, which make them tractable. Therefore, it would be useful to dis- cover tractable subclasses of the WCSP by nding those whose CCG-based ILP encoding leads to ILPs with TUM constraint matrices. Use our techniques to make ILP-based approaches competitive with other approaches for solving the WCSP. As discussed in Chapter 5, the ILP-based approaches have several potential advantages over the currently mainstream branch-and-bound search algorithms. Therefore, it would be interesting to study and experiment the ILP-based approaches to attempt to demonstrate their true advantages. Combine the powers of state-of-the-art WCSP solvers on classical computers and the quantum annealer. For solving the WCSP, algorithms on classical computers are often inecient but have theoretically guaranteed solution qualities, and HQCAs often produce low-quality solutions but are fast. It would be interesting if we can bring the best of both worlds together by combining these two kinds of algorithms. Theoretically understand HQCAs for the WCSP. While we have already experimentally showed that the CCG-based HQCA has an advantage over other HQCAs, a theoretical study of them is certainly useful in further under- standing them. 104 Bibliography Alexander, Kenneth S. (1995). \Percolation and Minimal Spanning Forests in In- nite Graphs". In: The Annals of Probability 23.1, pp. 87{104. doi: 10.1214/ aop/1176988378. Anthony, Martin, Endre Boros, Yves Crama, and Aritanan Gruber (2016). \Quadra- tization of symmetric pseudo-Boolean functions". In: Discrete Applied Mathe- matics 203, pp. 1{12. doi: 10.1016/j.dam.2016.01.001. Balyo, Tom a s and Luk a s Chrpa (2018). \Using Algorithm Conguration Tools to Generate Hard SAT Benchmarks". In: Proceedings of the International Sympo- sium on Combinatorial Search, pp. 133{137. Berkelaar, Michel, Kjell Eikland, and Peter Notebaert (2004). lp solve 5.5 Open Source (Mixed Integer) Linear Programming Software. url: http://lpsolve. sourceforge.net/5.5/. Bezuidenhout, Carol, Georey Grimmett, and Armin L oer (1998). \Percolation and Minimal Spanning Trees". In: Journal of Statistical Physics 92.1, pp. 1{34. doi: 10.1023/A:1023092317419. Bistarelli, S., U. Montanari, F. Rossi, T. Schiex, G. Verfaillie, and H. Fargier (1999). \Semiring-Based CSPs and Valued CSPs: Frameworks, Properties, and Com- parison". In: Constraints 4.3, pp. 199{240. doi: 10.1023/A:1026441215081. 105 Cai, Shaowei and Jinkun Lin (2016). \Fast Solving Maximum Weight Clique Prob- lem in Massive Graphs". In: Proceedings of the International Joint Conference on Articial Intelligence, pp. 568{574. Cao, Yudong, Shuxian Jiang, Debbie Perouli, and Sabre Kais (2016). \Solving Set Cover with Pairs Problem using Quantum Annealing". In: Scientic Reports 6.33957. doi: 10.1038/srep33957. Chleb k, Miroslav and Janka Chleb kov a (2008). \Crown Reductions for the Min- imum Weighted Vertex Cover Problem". In: Discrete Applied Mathematics 156.3, pp. 292{312. doi: 10.1016/j.dam.2007.03.026. Choi, Vicky (2008). \Minor-embedding in adiabatic quantum computation: I. The parameter setting problem". In: Quantum Information Processing 7.5, pp. 193{ 209. doi: 10.1007/s11128-008-0082-9. Cohen, Liel and Roie Zivan (2017). \Max-sum Revisited; The Real Power of Damp- ing". In: Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, pp. 1505{1507. Cohen, Liel and Roie Zivan (2018). \Balancing Asymmetry in Max-sum Using Split Constraint Factor Graphs". In: Proceedings of the International Conference on Principles and Practice of Constraint Programming, pp. 669{687. doi: 10. 1007/978-3-319-98334-9_43. De Simone, C., M. Diehl, M. J unger, P. Mutzel, G. Reinelt, and G. Rinaldi (1995). \Exact ground states of Ising spin glasses: New experimental results with a branch-and-cut algorithm". In: Journal of Statistical Physics 80.1, pp. 487{ 496. doi: 10.1007/BF02178370. Dechter, Rina (1992). \Constraint Networks". In: Encyclopedia of Articial Intel- ligence, pp. 276{285. D-Wave Systems Inc. (2017). Developer Guide for Python (09-1024A-F). 106 Easley, David and Jon Kleinberg (2010). Networks, Crowds, and Markets: Rea- soning About a Highly Connected World. Cambridge University Press. isbn: 978-0-52-119533-1. Fang, Zhiwen, Chu-Min Li, and Ke Xu (2016). \An Exact Algorithm Based on MaxSAT Reasoning for the Maximum Weight Clique Problem". In: Journal of Articial Intelligence Research 55, pp. 799{833. doi: 10.1613/jair.4953. Farinelli, A., A. Rogers, A. Petcu, and N. R. Jennings (2008). \Decentralised Coor- dination of Low-Power Embedded Devices Using the Max-Sum Algorithm". In: Proceedings of the International Conference on Autonomous Agents and Mul- tiagent Systems, pp. 639{646. Ferrenberg, Alan M., Jiahao Xu, and David P. Landau (2018). \Pushing the Lim- its of Monte Carlo Simulations for the Three-Dimensional Ising Model". In: Physical Review E 97.4, p. 043301. doi: 10.1103/PhysRevE.97.043301. Gao, Lin, Yan Zeng, and Anguo Dong (2008). \An Ant Colony Algorithm for Solving Max-Cut Problem". In: Progress in Natural Science 18.9, pp. 1173{ 1178. url: http://www.sciencedirect.com/science/article/pii/ S1002007108002219. Golosovsky, Michael (2017). \Power-law Citation Distributions are not Scale-Free". In: Physical Review E 96.3, p. 032306. doi: 10.1103/PhysRevE.96.032306. Grauer-Gray, Scott, Chandra Kambhamettu, and Kannappan Palaniappan (2008). \GPU Implementation of Belief Propagation Using CUDA for Cloud Tracking and Reconstruction". In: Proceedings of the IAPR Workshop on Pattern Recog- nition in Remote Sensing, pp. 1{4. doi: 10.1109/PRRS.2008.4783169. Gurobi Optimization, Inc. (2018). Gurobi Optimizer Reference Manual.url:http: //www.gurobi.com. 107 Heim, Bettina, Troels F. Rnnow, Sergei V. Isakov, and Matthias Troyer (2015). \Quantum Versus Classical Annealing of Ising Spin Glasses". In: Science 361.6406, pp. 215{217. doi: 10.1126/science.aaa4170. Hen, Itay and Marcelo S. Sarandy (2016). \Driver Hamiltonians for Constrained Optimization in Quantum Annealing". In: Physical Review A 93.6, p. 062312. doi: 10.1103/PhysRevA.93.062312. Hen, Itay and Federico M. Spedalieri (2016). \Quantum Annealing for Constrained Optimization". In: Physical Review Applied 5.3, p. 034007. doi: 10.1103/ PhysRevApplied.5.034007. Hurley, Barry, Barry O'Sullivan, David Allouche, George Katsirelos, Thomas Schiex, Matthias Zytnicki, and Simon de Givry (2016). \Multi-language evaluation of exact solvers in graphical model discrete optimization". In: Constraints 21.3, pp. 413{434. doi: 10.1007/s10601-016-9245-y. Kanawati, Rushed (2014). \YASCA: An Ensemble-Based Approach for Commu- nity Detection in Complex Networks". In: Proceedings of the International Con- ference on Computing and Combinatorics, pp. 657{666. Kempe, David, Jon Kleinberg, and Eva Tardos (2003). \Maximizing the Spread of In uence Through a Social Network". In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 137{ 146. doi: 10.1145/956750.956769. Kochenberger, Gary A., Jin-Kao Hao, Zhipeng L u, Haibo Wang, and Fred Glover (2013). \Solving Large Scale Max Cut Problems via Tabu Search". In: Journal of Heuristics 19.4, pp. 565{571. doi: 10.1007/s10732-011-9189-8. Koller, Daphne and Nir Friedman (2009). Probabilistic Graphical Models: Princi- ples and Techniques. MIT Press. isbn: 978-0262258357. 108 Kolmogorov, Vladimir (2005). Primal-dual Algorithm for Convex Markov Random Fields. Tech. rep. MSR-TR-2005-117. Microsoft Research. Krishnan, Kartik and John E. Mitchell (2006). \A Semidenite Programming Based Polyhedral Cut and Price Approach for the Maxcut Problem". In: Com- putational Optimization and Applications 33.1, pp. 51{71. doi: 10.1007/ s10589-005-5958-3. Kumar, T. K. Satish (2003). \Incremental Computation of Resource-Envelopes in Producer-Consumer Models". In: Proceedings of the International Conference on Principles and Practice of Constraint Programming, pp. 664{678. doi: 10. 1007/978-3-540-45193-8_45. Kumar, T. K. Satish (2008a). \A Framework for Hybrid Tractability Results in Boolean Weighted Constraint Satisfaction Problems". In: Proceedings of the International Conference on Principles and Practice of Constraint Program- ming, pp. 282{297. doi: 10.1007/978-3-540-85958-1_19. Kumar, T. K. Satish (2008b). \Lifting Techniques for Weighted Constraint Satis- faction Problems". In: Proceedings of the International Symposium on Articial Intelligence and Mathematics. Kumar, T. K. Satish (2016). \Kernelization, Generation of Bounds, and the Scope of Incremental Computation for Weighted Constraint Satisfaction Problems". In: Proceedings of the International Symposium on Articial Intelligence and Mathematics. Larrosa, Javier and Thomas Schiex (2004). \Solving weighted CSP by maintaining arc consistency". In: Articial Intelligence 159.1, pp. 1{26. doi: 10.1016/j. artint.2004.05.004. Lucas, Andrew (2014). \Ising formulations of many NP problems". In: Frontiers in Physics 2, p. 5. doi: 10.3389/fphy.2014.00005. 109 Marinescu, Radu and Rina Dechter (2006). \Dynamic Orderings for AND/OR Branch-and-Bound Search in Graphical Models". In: Proceedings of the Euro- pean Conference on Articial Intelligence, pp. 138{142. Marinescu, Radu and Rina Dechter (2007). \Best-First AND/OR Search for Graph- ical Models". In: Proceedings of the AAAI Conference on Articial Intelligence, pp. 1171{1176. Martins, Ruben, Vasco Manquinho, and In^ es Lynce (2014). \Open-WBO: A Modu- lar MaxSAT Solver". In: Proceedings of the International Conference on Theory and Applications of Satisability Testing, pp. 438{445. doi: 10.1007/978-3- 319-09284-3_33. M ezard, Marc and Andrea Montanari (2009). Information, Physics, and Compu- tation. Oxford University Press. isbn: 978-0-19-857083-7. M ezard, Marc and Riccardo Zecchina (2002). \Random K-satisability problem: From an analytic solution to an ecient algorithm". In: Physical Review E 66.5, p. 056126. doi: 10.1103/PhysRevE.66.056126. Moallemi, C. C. and B. Van Roy (2010). \Convergence of Min-Sum Message- Passing for Convex Optimization". In: IEEE Transactions on Information The- ory 56.4, pp. 2041{2050. doi: 10.1109/TIT.2010.2040863. Montanari, Andrea, Federico Ricci-Tersenghi, and Guilhem Semerjian (2007). \Solv- ing Constraint Satisfaction Problems through Belief Propagation-guided deci- mation". In: Proceedings of the Annual Allerton Conference, pp. 352{359. Narodytska, Nina and Fahiem Bacchus (2014). \Maximum Satisability Using Core-Guided MaxSAT Resolution". In: Proceedings of the AAAI Conference on Articial Intelligence, pp. 2717{2723. 110 Nemhauser, G. L. and L. E. Trotter (1975). \Vertex packings: Structural properties and algorithms". In: Mathematical Programming 8.1, pp. 232{248. doi: 10. 1007/BF01580444. Niskanen, Sampo and Patric R. J. Osterg ard (2003). Cliquer User's Guide, Ver- sion 1.0. Tech. rep. T48. Communications Laboratory, Helsinki University of Technology, Espoo, Finland. Rendl, Franz, Giovanni Rinaldi, and Angelika Wiegele (2008). \Solving Max-Cut to Optimality by Intersecting Semidenite and Polyhedral Relaxations". In: Mathematical Programming 121.2, pp. 307{335. doi: 10.1007/s10107-008- 0235-8. Rieel, Eleanor G. and Wolfgang H. Polak (2014). Quantum Computing: A Gentle Introduction. MIT Press. isbn: 978-0262526678. Ruozzi, Nicholas and Sekhar Tatikonda (2013). \Message-Passing Algorithms: Repa- rameterizations and Splittings". In: IEEE Transactions on Information Theory 59.9, pp. 5860{5881. doi: 10.1109/TIT.2013.2259576. Saikko, Paul, Jeremias Berg, and Matti J arvisalo (2016). \LMHS: A SAT-IP Hybrid MaxSAT Solver". In: Proceedings of the International Conference on Theory and Applications of Satisability Testing, pp. 539{546. doi: 10.1007/978-3- 319-40970-2_34. Shimizu, Satoshi, Kazuaki Yamaguchi, Toshiki Saitoh, and Sumio Masuda (2017). \Fast Maximum Weight Clique Extraction Algorithm: Optimal Tables for Branch- and-Bound". In: Discrete Applied Mathematics 223, pp. 120{134. doi: 10. 1016/j.dam.2017.01.026. Shor, Peter W. (1994). \Algorithms for Quantum Computation: Discrete Loga- rithms and Factoring". In: Proceedings of the Annual Symposium on Founda- tions of Computer Science, pp. 124{134. doi: 10.1109/SFCS.1994.365700. 111 Siek, Jeremy, Lie-Quan Lee, and Andrew Lumsdain (2002). The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley.isbn: 978-0201729146. Sierksma, Gerard (2001). Linear and Integer Programming: Theory and Practice. 2nd. CRC Press. isbn: 978-0824706739. Tran, Tony T., Minh Do, Eleanor G. Rieel, Jeremy Frank, Zhihui Wang, Bryan O'Gorman, Davide Venturelli, and J. Christopher Beck (2016). \A Hybrid Quantum-Classical Approach to Solving Scheduling Problems". In: Proceedings of the International Symposium on Combinatorial Search, pp. 98{106. Walsh, Toby (2000). \SAT v CSP". In: Proceedings of the International Conference on Principles and Practice of Constraint Programming, pp. 441{456. doi: 10. 1007/3-540-45349-0_32. Wauters, Matteo M., Rosario Fazio, Hidetoshi Nishimori, and Giuseppe E. Santoro (2017). \Direct Comparison of Quantum and Simulated Annealing on a Fully Connected Ising Ferromagnet". In: Physical Review A 96.2, p. 022326. doi: 10.1103/PhysRevA.96.022326. Weigt, Martin and Haijun Zhou (2006). \Message passing for vertex covers". In: Physical Review E 74.4, p. 046110. doi: 10.1103/PhysRevE.74.046110. Wu, Xin-Zeng, Peter G. Fennell, Allon G. Percus, and Kristina Lerman (2018). \Degree Correlations Amplify the Growth of Cascades in Networks". In: Phys- ical Review E 98.2, p. 022321. doi: 10.1103/PhysRevE.98.022321. Xu, Hong, T. K. Satish Kumar, and Sven Koenig (2016). \A New Solver for the Minimum Weighted Vertex Cover Problem". In: Proceedings of the Inter- national Conference on Integration of Articial Intelligence and Operations Research Techniques in Constraint Programming, pp. 392{405. doi: 10.1007/ 978-3-319-33954-2_28. 112 Yedidia, Jonathan S, William T Freeman, and Yair Weiss (2003). \Understanding belief propagation and its generalizations". In: Exploring Articial Intelligence in the New Millennium 8, pp. 239{269. Yeoh, William, Ariel Felner, and Sven Koenig (2010). \BnB-ADOPT: An Asyn- chronous Branch-and-Bound DCOP Algorithm". In: Journal of Articial Intel- ligence Research 38, pp. 85{133. doi: 10.1613/jair.2849. Zytnicki, Matthias, Christine Gaspin, and Thomas Schiex (2008). \DARN! A Weighted Constraint Solver for RNA Motif Localization". In: Constraints 13.1, pp. 91{109. doi: 10.1007/s10601-007-9033-9. 113
Abstract (if available)
Abstract
What is “structure”? And how can we exploit it in combinatorial optimization? These are the fundamental questions addressed in this dissertation for many reasoning tasks on complex physical and non-physical systems. ❧ Reasoning tasks involving system design, state estimation, and prediction can be cast as combinatorial optimization problems (COPs). Traditionally, different kinds of COPs have been solved using dedicated algorithms. While such algorithms are certainly valuable, they have some important drawbacks. First, the algorithms developed for very specific subclasses are not applicable to real-world instances if they don’t belong to these subclasses. Second, different research communities working on very specific COPs could be oblivious of each other’s works and end up developing different terminologies and techniques for solving the same problem. Therefore, a general mathematical framework that captures a wide variety of COPs facilitates informedness of different research communities, cross-fertilization of different perspectives, and a wider applicability to real-world domains. ❧ The weighted constraint satisfaction problem (WCSP) is a general mathematical framework for COPs. It not only subsumes important COPs studied in many different research communities but also has a strong representational power useful for reasoning about complex physical and non-physical systems. Such systems include classical spin glass systems, percolation theory-characterized systems, and social networks, among many other examples. ❧ Yes, the WCSP is representationally very powerful. But how can we design algorithms for solving it efficiently? Isn’t the very generality of the WCSP a curse? Are we up against a very general and intractable problem? While these questions are certainly valid for the general WCSP, the proposal in this dissertation is to exploit “structure”. Imagine two subclasses of COPs, Class A and Class B. If Class B is more general than Class A, it is likely that we can build specialized algorithms for solving instances from Class A more efficiently. But the hallmark of a good algorithm for solving instances from Class B is its ability to imitate the specialized algorithm if the input is in fact from Class A. Such an algorithm is said to exploit “structure.” Although the formal definition of “structure” is elusive, the idea is to create a general-purpose algorithm for solving the WCSP that automatically simulates more specialized algorithms for subclasses of the WCSP. ❧ We can talk about two types of structure in the WCSP. The macro structure or the graphical structure represents which variables interact with each other. The micro structure or the numerical structure represents how the variables interact with each other. Two completely different schools of thought have led to algorithmic techniques that exploit either the macro structure or the micro structure, but not both simultaneously. In 2008, the quest for a unifying mathematical framework that represents the macro structure as well as the micro structure of a WCSP was settled by the novel idea of the constraint composite graph (CCG). The CCG of a WCSP instance is an undirected graph that uses the same variables as the WCSP instance and an auxiliary set of variables to capture structure. Solving the WCSP is equivalent to solving the minimum weighted vertex cover (MWVC) problem on its associated CCG. Although the CCG can be constructed very efficiently, not much work was done until now in exploiting this transformation for theoretical or practical gains. ❧ In this dissertation, we raise and answer three research questions: Are there any theoretical advantages of the CCG other than for identifying tractable classes of the WCSP? Is there any practical usefulness of the CCG? Is it promising to extend the CCG to the WCSP with non-Boolean variables? We answer all three questions affirmatively. We answer the first question by proving new theoretical properties of the CCG. We answer the second question by efficiently implementing the CCG construction procedure and conducting experiments. We answer the third question by proposing new encodings for non-Boolean variables and preliminarily demonstrate their promisingness. ❧ On the one hand, the generality of the WCSP is intended to make it widely applicable and bring together researchers from different research communities. On the other hand, our theory of the CCG reduces it to a very specific COP, i.e., the MWVC problem. This transformation not only holds the remarkable promise of a general-purpose algorithm that can exploit structure in the WCSP but also emphasizes the importance of the MWVC problem as a substrate COP. Specifically, in this dissertation, we show how the CCG-based transformation can be used to: (a) kernelize a WCSP instance, i.e., fix the optimal values of a subset of its variables using a maxflow procedure even before search is initiated, (b) improve the efficiency of the min-sum message passing algorithm, (c) make use of integer linear programming (ILP) solvers, and (d) solve COPs on quantum annealers more effectively. In addition, because our algorithms solve general COPs in the WCSP framework more efficiently on classical computers, we provide better baselines for comparison against quantum computers, whose true efficiency over classical computers is still debated.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Advancements in understanding the empirical hardness of the multi-agent pathfinding problem
PDF
Speeding up path planning on state lattices and grid graphs by exploiting freespace structure
PDF
Understanding physical quantum annealers: an investigation of novel schedules
PDF
Analysis and algorithms for distinguishable RNA secondary structures
PDF
Imposing classical symmetries on quantum operators with applications to optimization
PDF
Target assignment and path planning for navigation tasks with teams of agents
PDF
Plasmons in quantum materials
PDF
Neural network for molecular dynamics simulation and design of 2D materials
PDF
Topological protection of quantum coherence in a dissipative, disordered environment
PDF
Improving decision-making in search algorithms for combinatorial optimization with machine learning
PDF
Error correction and quantumness testing of quantum annealing devices
PDF
Advancing the state of the art in quantum many-body physics simulations: Permutation Matrix Representation Quantum Monte Carlo and its Applications
PDF
Speeding up multi-objective search algorithms
PDF
Efficient and effective techniques for large-scale multi-agent path finding
PDF
Architecture design and algorithmic optimizations for accelerating graph analytics on FPGA
PDF
Explorations in the use of quantum annealers for machine learning
PDF
Algorithms and landscape analysis for generative and adversarial learning
PDF
Applications in optical communications: quantum communication systems and optical nonlinear device
PDF
Essays on revenue management with choice modeling
PDF
Out-of-equilibrium dynamics of inhomogeneous quantum systems
Asset Metadata
Creator
Xu, Hong
(author)
Core Title
Exploiting structure in the Boolean weighted constraint satisfaction problem: a constraint composite graph-based approach
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Physics
Publication Date
02/15/2019
Defense Date
12/03/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
combinatorial optimization,constraint composite graph,OAI-PMH Harvest,quantum annealing
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Koenig, Sven (
committee chair
), Kumar, T. K. Satish (
committee chair
), Haas, Stephan (
committee member
), Hen, Itay (
committee member
), Nakano, Aiichiro (
committee member
)
Creator Email
hongx@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-122601
Unique identifier
UC11675799
Identifier
etd-XuHong-7076.pdf (filename),usctheses-c89-122601 (legacy record id)
Legacy Identifier
etd-XuHong-7076.pdf
Dmrecord
122601
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Xu, Hong
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
combinatorial optimization
constraint composite graph
quantum annealing