Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Local optimization in cooperative agent networks
(USC Thesis Other)
Local optimization in cooperative agent networks
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
LOCALOPTIMIZATIONINCOOPERATIVEAGENTNETWORKS by JonathanP.Pearce ADissertationPresentedtothe FACULTYOFTHEGRADUATESCHOOL UNIVERSITYOFSOUTHERNCALIFORNIA InPartialFulfillmentofthe RequirementsfortheDegree DOCTOROFPHILOSOPHY (COMPUTERSCIENCE) August2007 Copyright 2007 JonathanP.Pearce Acknowledgments I would like to thank my advisor, Milind Tambe, for his constant support and encouragement in all aspects of my research and career. I also thank my collaborator, Rajiv Maheswaran, for helping to guide me in my first two years at USC and for many helpful discussions and ideas aboutk-optimality. IthankVictorLesser,FernandoOrd ´ o˜ nez,SvenKoenig,andSaritKraus,both fortheirinsightfulthoughtsanddiscussionsontheworkinthisthesis,andfortheirsupportinmy career. I also would like to thank the late Jay Modi for helpful discussions and on this research and on navigating graduate school. Thanks also go to Gal Kaminka, Bhaskar Krishnamachari, Paul Scerri, and Makoto Yokoo for helpful insights on the material in this thesis. I also thank Amos Freedy and Gershon Weltman of Perceptronics Solutions, Inc. very much for their help in supportingthisworkandobtainingreal-worlddata. I also thank members and alumni of the TEAMCORE research group for their support and forbeinggoodfriends: EmmaBowring,TapanaGupta,HyuckchulJung,JanuszMarecki,Ranjit Nair,PraveenParuchuri,NathanSchurr,ZviTopol,andPradeepVarakantham. Additionalthanks areduetoZviTopolforhisassistanceinimplementingtheMGM-3algorithm. I thank my parents, brothers, and sister for supporting my graduate school adventure, and finally,IthankmywifeLiangforallthehelpandsupportshehasgivenmeduringthistime. ii Contents Acknowledgments ii ListOfFigures v ListOfAlgorithms vii Abstract viii Chapter1 IntroductionandRelatedWork 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 ConstraintReasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 DistributedMarkovDecisionProcesses . . . . . . . . . . . . . . . . . . 7 1.2.3 LocalSearchandGameTheory . . . . . . . . . . . . . . . . . . . . . . 7 1.2.4 LocalSearchinCombinatorialOptimization . . . . . . . . . . . . . . . 8 1.2.5 Multi-LinkedContractsbetweenSelf-InterestedAgents . . . . . . . . . 8 1.3 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter2 k-OptimalityandDistributedConstraintOptimization 10 2.1 TheDistributedConstraintOptimizationProblem(DCOP) . . . . . . . . . . . . 10 2.2 Propertiesofk-OptimalDCOPSolutions . . . . . . . . . . . . . . . . . . . . . . 14 Chapter3 LowerBoundsonSolutionQualityfork-optima 16 3.1 QualityGuaranteesonk-Optima . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Graph-BasedQualityGuarantees . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 QualityGuaranteesforDCOPswithHardConstraints . . . . . . . . . . . . . . . 28 3.4 DominationAnalysisofk-Optima . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.5 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Chapter4 UpperBoundsontheNumberofk-OptimainaDCOP 45 4.1 UpperBoundsonk-Optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2 Graph-BasedAnalysis: k-Optima . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3 ApplicationtoNashEquilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.4 Graph-BasedBounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.5 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 iii Chapter5 Algorithms 69 5.1 1-OptimalAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2 2-OptimalAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.3 3-OptimalAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.4.1 Medium-SizedDCOPs . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.4.2 LargeDCOPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Chapter6 k-OptimalityandTeamFormation 95 6.1 ProblemFormulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.2 SampleDomain1: TaskforceforHomelandSecurityExercise . . . . . . . . . . 96 6.2.1 TeamRoles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.2.2 DataSources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.2.3 DCOPConstraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.2.3.1 SoftUnaryConstraints . . . . . . . . . . . . . . . . . . . . . 98 6.2.3.2 SoftBinaryConstraints . . . . . . . . . . . . . . . . . . . . . 100 6.2.3.3 HardConstraints . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.2.3.4 WeightingtheConstraints . . . . . . . . . . . . . . . . . . . . 102 6.2.4 Formationofk-OptimalTeams . . . . . . . . . . . . . . . . . . . . . . . 102 6.3 SampleDomain2: SecondHomelandSecurityExercise . . . . . . . . . . . . . . 102 Chapter7 OtherWork 106 7.1 PrivacyinDCOPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.2 SolvingDCOPsEfficiently . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Chapter8 ConclusionsandFutureWork 109 8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 8.2 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Bibliography 117 iv ListOfFigures 2.1 DCOPexample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 1-optimavs. assignmentsetschosenusingothermetrics . . . . . . . . . . . . . 15 3.1 Quality guarantees for k-optima with respect to the global optimum for DCOPs ofvariousgraphstructures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Qualityguaranteesfork-optimainDCOPscontaininghardconstraints. . . . . . . 41 3.3 Qualityguaranteesfork-optimainamultiply-constrainedringDCOP. . . . . . . 42 3.4 Quality guarantees for k-optima with respect to the space of dominated assign- mentsforvariousgraphstructures. . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.1 Hypotheticalexampleillustratingtheadvantagesoftighterbounds . . . . . . . . 49 4.2 AvisualrepresentationoftheeffectofProposition 8. . . . . . . . . . . . . . . . 56 4.3 Exclusivity graphs for 1-optima for Example 1 with MIS shown in gray, (a) not usingProposition8and(b)usingit. . . . . . . . . . . . . . . . . . . . . . . . . 58 4.4 Computationof β SRP forExample1 . . . . . . . . . . . . . . . . . . . . . . . . 63 4.5 1-optimavs. assignmentsetschosenusingothermetrics . . . . . . . . . . . . . 64 4.6 β SRP vs. β HSP forDCOPgraphsfromFigure2.2 . . . . . . . . . . . . . . . . . 65 4.7 Comparisonsof β SRP vs. β HSP . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.8 Comparisonsof β SRP , β HSP , β FCLIQUE . . . . . . . . . . . . . . . . . . . . . . . 68 4.9 Improvementof β MH onmin{β H ,β S ,β P } . . . . . . . . . . . . . . . . . . . . . . 68 5.1 SampleTrajectoriesofMGMandDSAforaHigh-StakesScenario . . . . . . . . 74 v 5.2 ComparisonoftheperformanceofMGMandDSA . . . . . . . . . . . . . . . . 86 5.3 ComparisonoftheperformanceofMGMandMGM-2 . . . . . . . . . . . . . . 88 5.4 ComparisonoftheperformanceofDSAandSCA-2 . . . . . . . . . . . . . . . . 89 5.5 ComparisonoftheperformanceofMGM,MGM-2,andMGM-3 . . . . . . . . . 91 5.6 ComparisonoftheperformanceofDSA,SCA-2,andSCA-3 . . . . . . . . . . . 92 5.7 Distributions of Solution Quality for X,X E ,X 2E and Cardinality of X 2E as a pro- portionof X E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.8 ResultsforMGMandMGM-3forlargeDCOPs: GraphColoring . . . . . . . . 94 5.9 ResultsforMGMandMGM-3forlargeDCOPs: RandomRewards . . . . . . . 94 6.1 TabulatedresultsofExperiencesectionofCheck-InForm . . . . . . . . . . . . . 99 6.2 2-optimalteamsformedaccordingtovariouscriteria . . . . . . . . . . . . . . . 103 6.3 2-optimalteams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 8.1 Adepictionofthreekeyregionsintheassignmentspace,givenak-optimum,that allowedforthediscoveryofvarioustheoreticalpropertiesaboutk-optima . . . . 110 8.2 Graphicalgameexample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 vi ListofAlgorithms 1 AlgorithmforSymmetricRegionPacking(SRP)bound . . . . . . . . . . . . . . 62 2 DSA(myNeighbors,myValue) . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3 MGM(myNeighbors,myValue) . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4 MGM2(myNeighbors,myConstraints,myValue) . . . . . . . . . . . . . . . . . 79 5 MGM3(myNeighbors,myConstraints,myValue) . . . . . . . . . . . . . . . . . 81 vii Abstract My research focuses on constructing and analyzing systems of intelligent, autonomous agents. These agents may include people, physical robots, or software programs acting as assistants, teammates, opponents, or trading partners. In a large class of multi-agent scenarios, the e ffect of local interactions between agents can be compactly represented as a network structure such asadistributedconstraintoptimizationproblem(DCOP)forcooperativedomains. Collaboration between large groups of agents, given such a network, can be difficult to achieve; often agents canonlymanagetocollaborateinsmallersubgroupsofacertainsize,inordertofindaworkable solutioninatimelymanner. Thegoalofmythesisistoprovidealgorithmstoenablenetworksof agents that are bounded in this way to quickly find high-quality solutions, as well as theoretical results to understand key properties of these solutions. Relevant domains for my work include personalassistantagents,sensornetworks,andteamsofautonomousrobots. In particular, this thesis considers the case in which agents optimize a DCOP by forming groups of one or more agents until no group of k or fewer agents can possibly improve the solu- tion; we define this type of local optimum, and any algorithm guaranteed to reach such a local optimum,ask-optimal . Inthisdocument,Ipresentfourkeycontributionsrelatedtok-optimality. The first set of results are worst-case guarantees on the solution quality of k-optima in a DCOP. viii These guarantees can help determine an appropriate k-optimal algorithm, or possibly an appro- priate constraint graph structure, for agents to use in situations where the cost of coordination between agents must be weighed against the quality of the solution reached. The second set of resultsareupperboundsonthenumberofk-optimathatcanexistinaDCOP.Becauseeachjoint action consumes resources, knowing the maximal number of k-optimal joint actions that could existforagivenDCOPallowsustoallocatesufficientresourcesforagivenlevelofk,or,alterna- tively, choosinganappropriate levelof k-optimality, givenfixedresource. Thethirdcontribution isasetof2-optimaland3-optimalalgorithmsandanexperimentalanalysisoftheperformanceof 1-, 2-, and 3-optimal algorithms on several types of DCOPs. The final contribution of this thesis is a case study of the application of k-optimal DCOP algorithms and solutions to the problem of the formation of human teams spanning multiple organizations. Given a particular specification of a human team (such as a task force to respond to an emergency) and a pool of possible team members, a DCOP can be formulated to match this specification. A set of k-optimal solutions to the DCOP represents a set of diverse, locally optimal options from which a human commander canchoosetheteamthatwillbeused. ix Chapter1 IntroductionandRelatedWork 1.1 Introduction In a large class of multi-agent scenarios, a set of agents chooses a joint action as a combination of individual actions. Often, the locality of agents’ interactions means that the utility generated by each agent’s action depends only on the actions of a subset of the other agents. In this case, the outcomes of possible joint actions can be compactly represented by graphical models, such as a distributed constraint optimization problem (DCOP)[Modi et al., 2005; Mailler and Lesser, 2004; Zhang et al., 2003], for cooperative domains, or by a graphical game [Kearns et al., 2001; Vickrey and Koller, 2002], for noncooperative domains. Each of these models can take the form ofagraphinwhicheachnodeisanagentandeachedgedenotesasubsetofagentswhoseactions, whentakentogether,incurcostsorrewards,eithertotheagentteam(inDCOPs)ortoindividual agents (in graphical games). This thesis focuses primarily on the team setting, using DCOP, whose applications include multi-agent plan coordination [Cox et al., 2005], sensor networks [Modietal.,2005],meetingscheduling[PetcuandFaltings,2006]andRoboCupsoccer[Vlassis etal.,2004],butalsodrawsconnectionstononcooperativesettingswhenapplicable. 1 Traditionally, researchers have focused on obtaining a single, globally optimal solution to DCOPs, introducing complete algorithms such as Adopt [Modi et al., 2005], OptAPO [Mailler and Lesser, 2004], and DPOP [Petcu and Faltings, 2005]. However, because DCOP has been shown to be NP-hard[Modi et al., 2005], as the scale of these domains become large, current complete algorithms can incur large computation or communication costs. For example, a large- scale network of personal assistant agents might require global optimization over hundreds of agents and thousands of variables. However, incomplete algorithms in which agents form small groups and optimize within these groups can lead to a system that scales up easily and is more robust to dynamic environments. In existing incomplete algorithms, such as DSA [Fitzpatrick and Meertens, 2003] and DBA [Yokoo and Hirayama, 1996; Zhang et al., 2005a], agents are bounded in their ability to aggregate information about one another’s constraints; in these algo- rithms, each individual agent optimizes based on its individual constraints, given the actions of all its neighbors, until a local optimum is reached where no single agent can improve the overall solution. Unfortunately, no guarantees on solution quality currently exist for these types of lo- cal optima. This thesis considers a generalization of this approach, in which agents optimize by forminggroupsofoneormoreagentsuntilnogroupofkorfeweragentscanpossiblyimprovethe solution;wedefinethistypeoflocaloptimumasak-optimum . Accordingtothisdefinition,fora DCOPwithnagents,DSAandDBAare1-optimal,whileallcompletealgorithmsare n-optimal. InthisthesisIfocusonk-optimaand k-optimalalgorithmsfor1 < k < n. Thek-optimalityconceptprovidesanalgorithm-independentclassificationforlocaloptimain aDCOPthatallowsforqualityguarantees. Inadditiontotheintroductionofk-optimalityitself,I presentfoursetsofresultsaboutk-optimality. Thefirstsetofresultsareworst-caseguaranteeson the solution quality of k-optima in a DCOP. These guarantees can help determine an appropriate 2 k-optimal algorithm, or possibly an appropriate constraint graph structure, for agents to use in situations where the cost of coordination between agents must be weighed against the quality of the solution reached. If increasing the value of k will provide a large increase in guaranteed solution quality, it may be worth the extra computation or communication required to reach a higher k-optimal solution. For example, consider a team of autonomous underwater vehicles (AUVs) [Zhang et al., 2005b] that must quickly choose a joint action in order to observe some transitoryunderwaterphenomenon. ThecombinationofindividualactionsbynearbyAUVsmay generatecostsorrewardstotheteam,andtheoverallutilityofthejointactionisdeterminedbythe sumofthesecostsandrewards. IfthisproblemwererepresentedasaDCOP,nearbyAUVswould share constraints in the graph, while far-away AUVs would not. However, the actual rewards on these constraints may not be known until the AUVs are deployed, and in addition, due to time constraints, an incomplete, k-optimal algorithm, rather than a complete algorithm, must be used to find a solution. In this case, worst-case quality guarantees for k-optimal solutions for a given k, that are independent of the actual costs and rewards in the DCOP, are useful to help decide which algorithm to use. Alternatively, the guarantees can help to choose between different AUV formations,i. e. differentconstraintgraphs. The second set of results are upper bounds on the number of k-optima that can exist in a DCOP. These bounds are necessitated by two key features of the typical domains where a k- optimal set is applicable. First, each k-optimum in the set consumes some resources that must be allocated in advance. Such resource consumption arises because: (i) a team actually executes each k-optimum in the set, or (ii) the k-optimal set is presented to a human user (or another agent) as a list of options to choose from, requiring time. In each case, resources are consumed based on the k-optimal set size. Second, while the existence of the constraints between agents is 3 knownapriori,theactualrewardsandcostsontheconstraintsdependonconditionsthatarenot known until runtime, and so resources must be allocated before the rewards and costs are known and before the agents generate the k-optimal set. To see this, consider another domain involving a team of disaster rescue agents that must generate a set of k-optimal joint actions in order to present a set of diverse options to a human commander, where each option represents the best jointactionwithinaneighborhoodofsimilarjointactions. Thecommanderwillchooseonejoint action for the team to actually execute. Constraints exist between agents whose actions must be coordinated (i.e., members of subteams) but their costs and rewards depend on conditions on the ground that are unknown until the time when the agents must be deployed. Here, the resource is thetimeavailabletothecommandertomakethedecision. Presentingtoomanyoptionswillcause thecommandertorunoutoftimebeforeconsideringthemall,andpresentingtoofewmaycause high-quality options to be omitted; in either case, the commander’s decision could be impaired. Because each joint action consumes resources, knowing the maximal number of k-optimal joint actions that could exist for a given DCOP allows us to allocate sufficient resources for a given levelofk. Thethirdcontributionisasetof2-and3-optimalalgorithmsandanexperimentalanalysisof theperformanceof1-,2-and3-optimalalgorithmsonseveraltypesofDCOPs. Althoughwenow have theoretical lower bounds on solution quality of k-optima, experimental results are useful to understandaverage-caseperformanceoncommonDCOPproblems. The final contribution of this thesis is a case study of the application of k-optimal DCOP algorithms and solutions to the problem of the formation of human teams spanning multiple organizations. Given a particular specification of a human team (such as a task force to respond toanemergency)andapoolofpossibleteammembers,aDCOPcanbeformulatedtomatchthis 4 specification. Asetofk-optimalsolutionstotheDCOPrepresentsasetofdiverse,locallyoptimal options from which a human commander can choose the team that will be used. This thesis will show how DCOP can be mapped to real data from different domains and how k-optimality can beusedinordertogeneratetheseoptionsquicklyandintuitively. 1.2 RelatedWork Althoughk-optimalityisafundamentallynewconceptindistributedconstraintreasoning,wecan drawparallelstorelatedwork. 1.2.1 ConstraintReasoning An active area of recent research has been in the developtment of complete “n-optimal” algo- rithms, including Adopt [Modi et al., 2005], OptAPO [Mailler and Lesser, 2004] and DPOP [PetcuandFaltings,2005]. TheAsynchronousDistributedOptimization(Adopt)algorithm[Modietal.,2005]isthefirst knownalgorithmforDCOP;itisanasychnronousalgorithminwhichaDepth-FirstSearch(DFS) treeisextractedfromtheDCOPgraph,andsmallmessagesarepassedupanddownthetree. The runtime of Adopt was significantly improved using preprocessing techniques in [Maheswaran et al., 2004b] and [Ali et al., 2005]. A second algorithm for DCOP, Optimal Asynchronous Par- tialOverlay(OptAPO)[MaillerandLesser,2004],requiresnoDFStree. Insteadagentspartially centralize the problem by forming subgroups; one agent in the subgroup would receive the con- straints on the other agents and would solve the subproblem for that subgroup; in most cases completecentralizationwasnotnecessarytoreachanoptimalsolution. Athirdalgorithm,DPOP 5 [Petcu and Faltings, 2005], is a dynamic programming-based algorithm in which a DFS tree is used,asinAdopt. InDPOP,eachagentpassesjustonemessageupthetree. Theagentconsiders everypossiblecombinationofvaluesthatcouldbechosenbythehigher-priorityagentsthathave constraintswiththisagent,andfindsthebestvalueitcanchoose,foreachofthesecombinations; the value and reward for every combination is included in this message. Multiply-constrained DCOPs, for which the bounds on solution quality for k-optima in this thesis can also be applied, were introduced in [Bowring et al., 2006], along with a modification of Adopt to solve these problemsoptimally. Previous work in incomplete, 1-optimal algorithms for DCOP is described in detail in Chap- ter 5. The aim of this thesis is to consider the large space in between these two classes of algo- rithmsandsolutions. Thisthesisalsoprovidesatheoreticalcomplementtotheexperimentalanalysisoflocalmin- ima (1-optima) and landscapes in centralized constraint satisfaction problems (CSPs) [Yokoo, 1997]aswellasincompleteDCOPalgorithms[Zhangetal.,2003]. Whilethisthesisisprimarily concerned with the effects of varying k and graph structure, the cited works provide insight into theeffectsofthechoicebetweenk-optimalalgorithmsofthesame k-levelonsolutionqualityand convergence time. Note that k-optimality can also apply to centralized constraint reasoning as a measure of the relative quality and diversity of local optima. However, examining properties of solutions that arise from coordinated value changes of small groups of variables is especially usefulindistributedsettings,giventhecomputationalandcommunicationexpenseoflarge-scale coordination. Finally, despite the seeming similarity of k-optimality to k-consistency [Freuder, 1978] in centralized constraint satisfaction, the twoconcepts are entirely different, as k-consistency refers 6 to reducing the domains of subsets of variables to maintain internal consistency in a satisfac- tionframeworkwhilek-optimalityreferstocomparingfixedsolutionswheresubsetsofvariables optimizewithrespecttoanexternalcontext. 1.2.2 DistributedMarkovDecisionProcesses Itshouldalsobenotedthattheboundsonsolutionqualityofk-optimainthisthesisalsoapplyto networked distributed partially-observable Markov decision processes as defined in [Nair et al., 2005]. ThesestructurescanbeviewedasDCOPswhereeachagent’sdomainrepresentsitsentire policy space, and rewards arise from the combinations of policies chosen by subsets of agents. Since the solution quality bounds do not depend on domain size, they can be directly applied to k-optimal ND-POMDP policies; in fact, the locally optimal algorithm presented in [Nair et al., 2005]isa1-optimalalgorithm. 1.2.3 LocalSearchandGameTheory Myresearchonupperboundsonthenumberofk-optimainDCOPsisrelatedtoworkinlandscape analysis in local search and evolutionary computing. In particular, [Caruana and Mullin, 1999] and [Whitley et al., 1998] have provided techniques for estimating numbers of local optima in these problems. In contrast, my work provides worst-case bounds on the number of k-optima in a DCOP, rather than an estimate based on sampling the solution space, and also exploits the structure of the agent network (constraint graph) to obtain these bounds, which was not done in previouswork. Given that counting the number of Nash equilibria in a game with known payoffs is #P-hard [Conitzer and Sandholm, 2003], upper bounds on this number have indeed been investigated 7 for particular types of games [McLennan and Park, 1999; Keiding, 1995]. Additionally, graph structureisutilizedinseveraldifferentalgorithmstoexpeditefindingNashequilibriaforagiven graphical game with known payoffs [Kearns et al., 2001; Vickrey and Koller, 2002; Blum et al., 2003; Littman et al., 2002]. However, using graph structure to finding bounds on pure-strategy Nash equilibria over all possible games on a given graph (i.e., reward-independent bounds) re- mainedanopenproblem;thisthesiscontainsthefirstknownupperboundsonNashequilibriain suchgames. 1.2.4 LocalSearchinCombinatorialOptimization The definition of k-optimality in DCOPs is analogous to the k-opt family of algorithms for the canonical Traveling Salesman Problem (TSP) [Lin, 1965]. The objective of the TSP is, given a complete graph in which each edge is associated with a weight, to choose a set of edges that forms a cycle (route) over all nodes in the graph that has the lowest total weight. In TSP, a route is considered k-optimal if it cannot be improved by replacing any k or fewer edges with new edgesintheroutegraph. Incontrast,inthisthesis,k-optimalityreferstoaDCOPassignmentthat cannot be improved if k or fewer agents change their values. A quality guarantee on 2-optimal TSPtourshasbeengivenin[Lin,1965]. 1.2.5 Multi-LinkedContractsbetweenSelf-InterestedAgents We note that the definition of k-optimality in DCOPs is also analogous (but di fferent) to k- optimality as given in [Sandholm, 1996]. In this work, agents make contracts with each other to optimally allocate tasks. A k-optimal assignment is one in which no cluster of k tasks can be beneficiallytransferedbetweenanytwoagents. 8 1.3 Guide Therestofthisdocumentisarrangedasfollows: Chapter2givesthebackgroundandmotivation for the work and introduces the concept of k-optimality. Chapter 3 includes the lower bounds on solutionqualityfork-optima,andChapter4includestheupperboundsonthenumberof k-optima inaDCOP.Chapter5includesnew2-optimalDCOPalgorithmsandanexperimentalanalysisof these algorithms, as well as existing 1-optimal algorithms, in several domains. Other work that I have done in DCOPs and game theory, not directly related to k-optimality, is briefly detailed in Chapter7. Finally,conclusionsandareasforfutureworkaregiveninChapter8. 9 Chapter2 k-OptimalityandDistributedConstraintOptimization This chapter formally introduces the concept of k-optimality and provides background on the distributedconstraintoptimizationformalism. 2.1 TheDistributedConstraintOptimizationProblem(DCOP) A Distributed Constraint Optimization Problem (DCOP) consists of a set of variables, each as- signedtoanagentwhichmustassignavaluetothevariable;thesevaluescorrespondtoindividual actions that can be taken by the agents. Constraints exist between subsets of these variables that determine costs and rewards to the agent team based on the combinations of values chosen by theirrespectiveagents. Becauseweassumethateachagentcontrolsasinglevariable,wewilluse theterms“agent”and“variable”interchangeably. Formally, a DCOP is a set of variables (one per agent) N := {1,...,n} and a set of domains A := {A 1 ,...,A n }, where the i th variable takes value a i ∈ A i . We denote the assignment of a subgroup of agents S ⊂ I by a S := × i∈S a i ∈ A S where A S := i∈S A i and the assignment of the multi-agent team by a = [a 1 ···a n ]. Valued constraints exist on various minimal subsets S ⊂ N of these variables. By minimality, we mean that the reward component R S cannot be 10 decomposed further through addition. Mathematically: ∀S ∈ θ,R S (a S ) , R S 1 (a S 1 )+ R S 2 (a S 2 ) for any R S 1 (·) : A S 1 →,R S 2 (·) : A S 2 →,S 1 ,S 2 ⊂ N such that S 1 ∪ S 2 = S,S 1 ,S 2 , ∅. It is important to express the constraints minimally to accurately represent dependencies among agents. A constraint on S is expressed as a reward function R S (a S ). This function represents the re- wardtotheteamgeneratedbytheconstraintonS whentheagentstakeassignmenta;costsareex- pressedasnegativerewards. θisthesetofallsuchsubsetsS onwhichaconstraintexists.Forcon- venience, we will refer to these subsets S as “constraints” and the functions R S (·) as “constraint reward functions.” The solution quality for a particular complete assignment a is the sum of the rewardsforthatassignmentfromallconstraintsintheDCOP:R(a)= P S∈θ R S (a)= P S∈θ R S (a S ). Several algorithms exist for solving DCOPs: complete algorithms, which are guaranteed to reach a globally optimal solution, and incomplete algorithms, which reach a local optimum, and do not provide guarantees on solution quality. In this thesis, we provide k-optimality as an algorithm-independent classification of local optima, and show how their solution quality can beguaranteed. Beforeformallyintroducingtheconceptofk-optimality,wemustdefinethefollowingterms. Fortwoassignments,aand ˜ a,thedeviatinggroupis D(a,˜ a) :={i∈I : a i , ˜ a i }, i. e.,thesetofagentswhoseactionsinassignment ˜ adifferfromtheiractionsina. 11 Thedistanceis d(a,˜ a) :=|D(a,˜ a)| where|·| denotes the cardinality of the set. The relative reward of an assignment a with respect toanotherassignment ˜ ais Δ(a,˜ a) := R(a)−R(˜ a)= X S∈θ:S∩D(a,˜ a),∅ [R S (a S )−R S (˜ a S )]. In this summation, only the rewards on constraints incident on deviating agents are considered, sincetheotherrewardsremainthesame. Weclassifyanassignmentaasak-optimalassignment ork-optimum if Δ(a,˜ a)≥ 0∀˜ a suchthat d(a,˜ a)≤ k. That is, a has a higher or equal reward to any assignment a distance of k or less from a. Equiv- alently, if the set of agents have reached a k-optimum, then no subgroup of cardinality ≤ k can improvetheoverallrewardbychoosingdifferentactions;everysuchsubgroupisactingoptimally withrespecttoitscontext. If no ties occur between the rewards of DCOP assignments that are a distance of k or less apart, then a collection of k-optima must be mutually separated by a distance ≥ k + 1 as they each have the highest reward within a radius of k. Thus, a higher k-optimality of assignments 12 1 2 3 R 23 R 12 5 0 1 0 10 0 1 0 11 0 1 0 20 0 1 0 Figure2.1: DCOPexample in a collection of assignments, or assignment set, implies a greater level of relative reward and diversity. Let A q (n,k)={a∈A :Δ(a,˜ a) > 0∀˜ a suchthat d(a,˜ a)≤ k} bethesetofallk-optimaforateamof nagentswithdomainsofcardinalityq. Itisstraightforward toshow A q (n,k+1)⊆ A q (n,k). Example1 Figure2.1isabinaryDCOPinwhichagentschooseactionsfrom{0,1},withrewards shownforthetwoconstraints(minimalsubgroups)S 1,2 ={1,2}andS 2,3 ={2,3}.Theassignment a= [111](withatotalrewardof16)is1-optimalbecauseanysingleagentthatdeviatesreduces the team reward. For example, if agent 1 changes its value from 1 to 0, the reward on S 1,2 decreases from 5 to 0. If agent 2 changes its value from 1 to 0, the rewards on S 1,2 and S 2,3 decrease from 5 to 0 and from 11 to 0, respectively. If agent 3 changes its value from 1 to 0, the reward on S 2,3 decreases from 11 to 0. However, [1 1 1] is not 2-optimal because if the group {2,3} deviated, making the assignment ˜ a = [1 0 0], team reward would increase from 16 to 20. The globally optimal solution, a ∗ = [0 0 0], with a total reward of 30, is k-optimal for all k ∈{1,2,3}. 13 2.2 Propertiesofk-OptimalDCOPSolutions We now show, in an experiment, the advantages of k-optimal assignment sets as capturing both diversity and high reward compared with assignment sets chosen by other metrics. Diversity is importantindomainswheremanyk-optimalassignmentsarepresentedaschoicestoahuman,so thatallthechoicesarenotessentiallythesamewithveryminordiscrepancies. Thelowerhalfof Figure 2.2(a) shows a DCOP graph representing a team of 10 patrol robots, each of whom must choose one of two routes to patrol in its region. The nodes are agents and the edges represent binary constraints between agents assigned to overlapping regions. The actions (i. e., the chosen routes)oftheseagentscombinetoproduceacostorrewardtotheteam. Foreachof20runs,the edges were initialized with rewards from a uniform random distribution. The set of all 1-optima wasenumerated. Then,forthesameDCOP,weusedtwoothermetricstoproduceequal-sizedsets ofassignments. Foronemetric,theassignmentswithhighestrewardwereincludedintheset,and for the next metric, assignments were included in the set by the following method, which selects assignments purely based on diversity (expressed as distance). We repeatedly cycled through all possibleassignmentsinlexicographicorder,andincludedanassignmentinthesetifthedistance between it and every assignment already in the set was not less than a specified distance; in this case 2. The average reward and the diversity (expressed as the minimum distance between any pair of assignments in the set) for the sets chosen using each of the three metrics over all 20 runs is shown in the upper half of Figure 2.2(a). While the sets of 1-optima come close to the reward level of the sets chosen purely according to reward, they are clearly more diverse (T-tests for this claim showed a significance within .0001%). If a minimum distance of 2 is required in order to guarantee diversity, then using reward alone as a metric is insufficient; in fact the 14 (a) 1-optima reward only dist. of 2 avg. reward avg. min. distance (b) (c) .850 .950 .037 2.25 1.21 2.00 1-optima reward only dist. of 2 avg. reward avg. min. distance .809 .930 -.101 2.39 1.00 2.00 1-optima reward only dist. of 2 avg. reward avg. min. distance .832 .911 .033 2.63 1.21 2.00 Figure2.2: 1-optimavs. assignmentsetschosenusingothermetrics assignmentsetsgeneratedusingthatmetrichadanaverageminimumdistanceof1.21,compared with2.25for1-optimalassignmentsets(whichguaranteeaminimumdistanceof k+1= 2). The 1-optimal assignment set also provides significantly higher average reward than the sets chosen tomaintainagivenminimumdistance,whichhadanaveragerewardof0.037(T-testsignificance within.0001%.). Similarresultswithequalsignificancewereobservedforthe10-agentgraphin Figure2.2(b)andthenine-agentgraphinFigure2.2(c). Notealsothatthisexperimentused k= 1, the lowest possible k. Increasing k would, by definition, increase the diversity of the k-optimal assignmentsetaswellastheneighborhoodsizeforwhicheachassignmentisoptimal. InadditiontocategorizinglocaloptimainaDCOP,k-optimalityprovidesanaturalclassifica- tion for DCOP algorithms. Many known algorithms are guaranteed to converge to k-optima for k= 1,includingDBA[Zhangetal.,2003],DSA[FitzpatrickandMeertens,2003],andcoordinate ascent[Vlassisetal.,2004]fork= 1. Completealgorithmssuchas[Modietal.,2005],OptAPO [MaillerandLesser,2004]andDPOP[PetcuandFaltings,2005]are k-optimalfor k = n. New2 and3-optimalalgorithmswillbepresentedinChapter5. 15 Chapter3 LowerBoundsonSolutionQualityfork-optima In this chapter we introduce the first known guaranteed lower bounds on the solution quality of k-optimal DCOP assignments. These guarantees can help determine an appropriate k-optimal algorithm, or possibly an appropriate constraint graph structure, for agents to use in situations wherethecostofcoordinationbetweenagentsmustbeweighedagainstthequalityofthesolution reached. If increasing the value of k will provide a large increase in guaranteed solution quality, it may be worth the extra computation or communication required to reach a higher k-optimal solution. Forexample,considerateamofautonomousunderwatervehicles(AUVs)[Zhangetal., 2005b] that must quickly choose a joint action in order to observe some transitory underwater phenomenon. The combination of individual actions by nearby AUVs may generate costs or rewards to the team, and the overall utility of the joint action is determined by their sum. If this problem were represented as a DCOP, nearby AUVs would share constraints in the graph, while far-away AUVs would not. However, the actual rewards on these constraints may not be known until the AUVs are deployed, and in addition, due to time constraints, an incomplete, k-optimal algorithm, rather than a complete algorithm, must be used to find a solution. In this case, worst- case quality guarantees for k-optimal solutions for a given k, that are independent of the actual 16 costs and rewards in the DCOP, are useful to help decide which algorithm to use. Alternatively, the guarantees can help to choose between different AUV formations, i. e. different constraint graphs. We present two distinct types of guarantees for k-optima. The first, in Sections 3.1 and 3.2, is a lower bound on the quality of any k-optimum, expressed as a fraction of the quality of the optimal solution. The second, in Section 3.4, is a lower bound on the proportion of all DCOP assignments that a k-optimum must dominate in terms of quality. This type is useful in approximating the difficulty of finding a better solution than a given k-optimum. For both, we provide general bounds that apply to all constraint graph structures, as well as tighter bounds madepossibleifthegraphisknowninadvance. 3.1 QualityGuaranteesonk-Optima Thissectionprovidesreward-independentguaranteesonsolutionqualityforany k-optimalDCOP assignment. If we must choose a k-optimal algorithm for agents to use, it is useful to see how muchrewardwillbegainedorlostintheworstcasebychoosingahigherorlowervaluefork. We assume the actual costs and rewards on the DCOP are not known a priori (otherwise the DCOP could be solved centrally ahead of time; or all k-optima could be found by brute force, with the lowest-quality k-optimum providing an exact guarantee for a particular problem instance). We provide a guarantee for a k-optimal solution as a fraction of the reward of the optimal solution, assuming that all rewards in the DCOP are non-negative (the reward structure of any DCOP can benormalizedtoonewithallnon-negativerewardsaslongasnoinfinitelylargecostsexist). 17 Proposition1 For any DCOP of n agents, with maximum constraint arity of m, where all con- straint rewards are non-negative, and where a ∗ is the globally optimal solution, then, for any k-optimalassignment,a,whereR (a) < R(a ∗ )andm≤ k < n, R(a)≥ n−m k−m n k − n−m k R(a ∗ ). (3.1) Proof: Bythedefinitionofk-optimality,anyassignment ˜asuchthatd(a,˜ a)≤ kmusthavereward R(˜ a) ≤ R(a). We call this set of assignments ˜ A. Now consider any non-null subset ˆ A ⊂ ˜ A. For anyassignment ˆ a∈ ˆ A,theconstraintsθintheDCOPcanbedividedintothreediscretesets,given aand ˆ a: • θ 1 (a,ˆ a)⊂ θ suchthat∀S ∈ θ 1 (a,ˆ a),S ⊂ D(a,ˆ a). • θ 2 (a,ˆ a)⊂ θ suchthat∀S ∈ θ 2 (a,ˆ a),S ∩D(a,ˆ a)=∅. • θ 3 (a,ˆ a)⊂ θ suchthat∀S ∈ θ 3 (a,ˆ a),S < θ 1 (a,ˆ a)∪θ 2 (a,ˆ a). θ 1 (a,ˆ a)containstheconstraintsthatincludeonlythevariablesin ˆ awhichhavedeviatedfrom their values in a; θ 2 (a,ˆ a) contains the constraints that include only the variables in ˆ a which have not deviated from their values in a; and θ 3 (a,ˆ a) contains the constraints that include at least one ofeach. Thus: R(ˆ a)= X S∈θ 1 (a,ˆ a) R S (ˆ a)+ X S∈θ 2 (a,ˆ a) R S (ˆ a)+ X S∈θ 3 (a,ˆ a) R S (ˆ a). 18 Thesumofrewardsofallassignments ˆ ain ˆ Ais: X ˆ a∈ ˆ A R(ˆ a)= X ˆ a∈ ˆ A X S∈θ 1 (a,ˆ a) R S (ˆ a)+ X S∈θ 2 (a,ˆ a) R S (ˆ a)+ X S∈θ 3 (a,ˆ a) R S (ˆ a) = X ˆ a∈ ˆ A X S∈θ 1 (a,ˆ a) R S (ˆ a)+ X ˆ a∈ ˆ A X S∈θ 2 (a,ˆ a) R S (ˆ a)+ X ˆ a∈ ˆ A X S∈θ 3 (a,ˆ a) R S (ˆ a). Giventhepreviousassumptionthatallrewardsarenonnegative,weobtain: X ˆ a∈ ˆ A R(ˆ a)≥ X ˆ a∈ ˆ A X S∈θ 1 (a,ˆ a) R S (ˆ a)+ X ˆ a∈ ˆ A X S∈θ 2 (a,ˆ a) R S (ˆ a). SinceR(a)≥ R(ˆ a),∀ˆ a∈ ˆ A, R(a)≥ P ˆ a∈ ˆ A P S∈θ 1 (a,ˆ a) R S (ˆ a)+ P ˆ a∈ ˆ A P S∈θ 2 (a,ˆ a) R S (ˆ a) | ˆ A| . (3.2) Now,forany ˆ A,ifthetwonumeratortermsandthedenominatorcanbeexpressedintermsof R(a ∗ )andR(a),thenwehaveaboundonR(a)intermsofR(a ∗ ). Todothis,weconsidertheparticular ˆ A,denoted ˆ A a,k ,whichcontainsallpossibleassignments ˆ asuchthat • d(a,ˆ a)= k • ˆ a i = a ∗ i ,∀i∈ D(a,ˆ a). Thismeansthat ˆ A a,k containsallassignments ˆ awhereexactlykvariablesin ˆ ahavedeviatedfrom their values in a and these variables are taking the same values that they take in a ∗ . Note that, 19 giventhedefinitionof ˆ A a,k ,thati∈ D(a,ˆ a)⇒ a i , a ∗ i (novariablethathasthesamevalueinaas ina ∗ canbein D(a,ˆ a)). There are d(a,a ∗ ) variables whose values are different between a and a ∗ , and ˆ A a,k contains each assignment where exactly k of these variables are deviating from a to take the same values they take in a ∗ . Thus, there are d(a,a ∗ ) k such assignments ˆ a ∈ ˆ A a,k , and so the denominator term fromEquation3.2isequalto d(a,a ∗ ) k . Notethatd(a,a ∗ ) > k,otherwiseawouldnotbek-optimal. We can now re-express the first numerator term in Equation 3.2 in terms of R(a ∗ ). Given any k ∈ N, any ˆ A a,k , any S ∈ θ, and any assignment a, let Γ a,k 1 (S) denote the set of all ˆ a ∈ ˆ A a,k where S ∈ θ 1 (a,ˆ a). Now, we have ∀ˆ a ∈ Γ a,k 1 (S),d(a,ˆ a) = k (exactly k variables are in the deviatinggroup D(a,ˆ a)). Bydefinitionof θ 1 (a,ˆ a),S ⊂ D(a,ˆ a).Thus,thereareexactlyd(a,a ∗ )− |S| variables not in S for which it is possible to be an element of D(a,ˆ a), where |S| represents the cardinality (arity) of S. Therefore, there are exactly d(a,a ∗ )−|S| k−|S| possible deviating groups D(a,ˆ a), and therefore d(a,a ∗ )−|S| k−|S| assignments in Γ a,k 1 (S). Finally, note that ∀a,∀S ∈ θ,∀ˆ a ∈ Γ a,k 1 (S),R S (ˆ a)= R S (a ∗ ). Therefore, X ˆ a∈ ˆ A X S∈θ 1 (a,ˆ a) R S (ˆ a)= X ˆ a∈ ˆ A a,k X S∈θ 1 (a,ˆ a) R S (a ∗ )= X S∈θ X ˆ a∈Γ a,k 1 (S) R S (a ∗ )= X S∈θ d(a,a ∗ )−|S| k−|S| ! R S (a ∗ ). Because|S|≤ m,weobtainaninequalityforthefirstnumeratorterminEquation3.2interms ofR(a ∗ ): X ˆ a∈ ˆ A X S∈θ 1 (a,ˆ a) R S (ˆ a)≥ d(a,a ∗ )−m k−m ! R(a ∗ ). 20 Similarly,wenowexpressthesecondnumeratorterminEquation3.2intermsofR(a). Given any k ∈ N, any ˆ A a,k , any S ∈ θ, and any assignment a, let Γ a,k 2 (S) denote the set of all ˆ a ∈ ˆ A a,k whereS ∈ θ 2 (a,ˆ a). Now,justasforΓ a,k 1 (S),wehave∀ˆ a∈Γ a,k 2 (S),d(a,ˆ a)= k(exactlykvariables are in the deviating group D(a,ˆ a)). However, by definition of θ 2 (a,ˆ a),S ∩ D(a,ˆ a) = ∅. (no variables in S are in the deviating group D(a,ˆ a)). Thus, there are exactly d(a,a ∗ )−|S| variables notinS forwhichitispossibletobeanelementofD(a,ˆ a).Therefore,thereareexactly d(a,a ∗ )−|S| k possible deviating groups D(a,ˆ a) and therefore d(a,a ∗ )−|S| k assignments in Γ a,k 2 (S). Finally, note that∀a,∀S ∈ θ,∀ˆ a∈Γ a,k 2 (S),R S (ˆ a)= R S (a). Therefore, X ˆ a∈ ˆ A X S∈θ 2 (a,ˆ a) R S (ˆ a)= X ˆ a∈ ˆ A a,k X S∈θ 2 (a,ˆ a) R S (a)= X S∈θ X ˆ a∈Γ a,k 2 (S) R S (a)= X S∈θ d(a,a ∗ )−|S| k ! R S (a). Because |S| ≤ m, we obtain an inequality for the second numerator term in Equation 3.2 in termsofR(a ∗ ): X ˆ a∈ ˆ A X S∈θ 2 (a,ˆ a) R S (ˆ a)≥ d(a,a ∗ )−m k ! R(a). Therefore,fromEquation3.2, R(a)≥ d(a,a ∗ )−m k−m R(a ∗ )+ d(a,a ∗ )−m k R(a) d(a,a ∗ ) k ⇒ R(a)≥ d(a,a ∗ )−m k−m d(a,a ∗ ) k − d(a,a ∗ )−m k R(a ∗ ). The ratio of R(a) to R(a ∗ ) is minimized when d(a,a ∗ ) = n, representing the case where a has no variable assignments in common with a ∗ , so Equation 3.1 holds as a guarantee for a k- optimum in any DCOP. We note that it is possible for k > n− m; in this case the term n−m k in Equation 3.1 is technically undefined. In this case, we take this term to be zero, representing the 21 factthattherearezeroassignmentsinΓ a,k 2 (S)foranyS ∈ θ,sinceenumeratingtheseassignments wouldrequireselectingk variablesfromapoolofn−mvariables. ForbinaryDCOPs(m= 2),Equation3.1simplifiesto: R(a)≥ n−2 k−2 n k − n−2 k R(a ∗ )= k−1 2n−k−1 R(a ∗ ). Finally, we note that no lower-bound guarantee is possible when k < m, such as for a binary DCOP(m= 2)wherek= 1. InthesecasesEquation3.1givesalowerboundofR(a)≥ 0·R(a ∗ ). ThefollowingexampleillustratesProposition1: Example2 Consider a DCOP with five variables numbered 1 to 5, with domains of{0,1}. Sup- pose that this DCOP is a fully connected binary DCOP with constraints between every pair of variables (i.e. θ = {S : |S| = 2}). Suppose that a = [0 0 0 0 0] is a 3-optimum, and that a ∗ = [11111]istheglobaloptimum. Thend(a,a ∗ )= 5,and ˆ A a,k contains d(a,a ∗ ) k = 10assign- ments,listedbelow: [11100],[11010],[11001],[10110],[10101], [10011],[01110],[01101],[01011],[00111]. Since R(a) ≥ ˆ a,∀ˆ a ∈ ˆ A a,k , then 10·R(a) ≥ P ˆ A a,k R(ˆ a). Now,∀S ∈ θ,∀i ∈ S,a ∗ i = ˆ a i for exactly n−2 k−2 = 3 assignments in ˆ A a,k . For example, for S = {1,2},a ∗ 1 = ˆ a 1 = 1 and a ∗ 2 = ˆ a 2 = 1 for ˆ a = [1 1 1 0 0], [1 1 0 1 0], and [1 1 0 0 1]. Therefore, ∀S ∈ θ,R S (a ∗ ) = R S (ˆ a) for these n−2 k−2 = 3 assignments. Similarly, ∀S ∈ θ,∀i ∈ S,a i = ˆ a i for exactly n−2 k = 1 as- signment in ˆ A a,k . For example, for S = {1,2},a 1 = ˆ a 1 = 0 and a 2 = ˆ a 2 = 0 only for ˆ a = [0 0 1 1 1]. Therefore, ∀S ∈ θ,R S (a) = R S (ˆ a) for n−2 k = 1 assignment in ˆ A a,k . Thus, 10·R(a)≥ P ˆ A a,k R(ˆ a)≥ 3·R(a ∗ )+1·R(a),andsoR(a)≥ 3 10−1 R(a ∗ )= 1 3 R(a ∗ ). 22 WenowshowthatProposition1istight,i.e. thatthereexistDCOPswithk-optimaofquality equaltothebound. Proposition2 ∀n,m,k such that m ≤ k < n, there exists some DCOP with n variables, with maximumconstraintaritymwithak-optimalassignment,a,suchthat,ifa ∗ isthegloballyoptimal solution, R(a)= n−m k−m n k − n−m k R(a ∗ ). (3.3) Proof: Consider a fully-connected m-ary DCOP where the domain of each variable contains at leasttwovalues{0,1}andeveryconstraintR S containsthefollowingrewardfunction: R S (a)= ( n−m k−m ) ( n k )−( n−m k ) a i = 0,∀i∈ S 1 a i = 1,∀i∈ S 0 otherwise The optimal solution a ∗ is a ∗ i = 1,∀i. If a is defined such that a i = 0,∀i, then Equation 3.3 is true. Nowweshowthataisk-optimal. Foranyassignment ˆa,suchthatd(a,ˆ a)= k, R(ˆ a)= X S∈θ 1 (a,ˆ a) R(ˆ a S )+ X S∈θ 2 (a,ˆ a) R(ˆ a S )+ X S∈θ 3 (a,ˆ a) R(ˆ a S ). 23 Giventhechosenrewardfunction,∀S ∈ θ 1 (a,ˆ a),R(ˆ a S )= 1,and∀S ∈ θ 2 (a,ˆ a),R(ˆ a S )= ( n−m k−m ) ( n k )−( n−m k ) . Note that|θ 1 (a,ˆ a)| = k m and|θ 2 (a,ˆ a)| = n−k m because|S| = m and must be chosen from a set of eitherk orn−k variables,respectively. Finally,forallotherS,R(ˆ a S )= 0.Then,wehave: R(ˆ a)= k m ! + n−k m ! n−m k−m n k − n−m k +0 = k m h n k − n−m k i + n−k m n−m k−m n k − n−m k = n! m!(k−m)!(n−k)! !, n!(n−m−k)!−(n−m)!(n−k)! k!(n−k)!(n−m−k)! ! = n!k!(n−m−k)! m!(k−m)![n!(n−m−k)!−(n−k)!(n−m)!] = n m ! (n−m)!k!(n−m−k)! (k−m)![n!(n−m−k)!−(n−m)!(n−k)!] = n m ! n−m k−m k!(n−m−k)! n!(n−m−k)! (n−k)! −(n−m)! = n m ! n−m k−m n k − n−m k = R(a) because in a, each of the n m constraints in the DCOP are producing the same reward. Since this canbeshownford(a,ˆ a)= j,∀jsuchthat1≤ j≤ k,aisk-optimal. 3.2 Graph-BasedQualityGuarantees The guarantee for k-optima in Section 3.1 applies to all possible DCOP graph structures. How- ever, knowledge of the structure of constraint graphs can be used to obtain tighter guarantees. This is done by again expressing the two numerator terms in Equation 3.2 as multiples of R(a ∗ ) 24 and R(a). If the graph structure of the DCOP is known, we can exploit it by refining our defini- tion of ˆ A a,k . We can take ˆ A from Proposition 1, i.e. ˆ A which contains all ˆ a such that d(a,ˆ a) = k and ∀ˆ a ∈ ˆ A,∀ˆ a i ∈ D(a,ˆ a),ˆ a i = a ∗ i . Then, we restrict this ˆ A further, so that ∀ˆ a ∈ ˆ A, the vari- ables in D(a,ˆ a) form a connected subgraph of the DCOP graph (or hypergraph), meaning that any two variables in D(a,ˆ a) must be connected by some chain of constraints. Since the previous section dealt with fully-connected DCOP graphs, this last restriction was actually already being used; however, it only becomes relevant to express it when working with non-fully-connected graph structures, as we will do in this section. With this refinement, we transform Equation 3.2 to express R(a) in terms of R(a ∗ ) as before; however, we can now produce tighter guarantees for k-optima in sparse graphs. As an illustration, provably tight guarantees for binary DCOPs on ring graphs (each variable has two constraints) and star graphs (each variable has one constraint exceptthecentralvariable,whichhasn−1)aregivenbelow. Proposition3 For any binary DCOP of n agents with a ring graph structure, where all con- straint rewards are non-negative, and a ∗ is the globally optimal solution, then, for any k-optimal assignment,a,wherek < n, R(a)≥ k−1 k+1 R(a ∗ ). (3.4) Proof: ReturningtoEquation3.2,| ˆ A|= nbecauseD(a,ˆ a)couldconsistofanyofthenconnected subgraphs of k variables in a ring. For any constraint S ∈ θ, there are k− 1 assignments ˆ a ∈ ˆ A 25 for which S ∈ θ 1 (a,ˆ a) because there are k−1 connected subgraphs of k variables in a ring that containS. Therefore, X ˆ a∈ ˆ A X S∈θ 1 (a,ˆ a) R S (ˆ a)= (k−1)R(a ∗ ). Also, there are n− k− 1 assignments ˆ a ∈ ˆ A for which S ∈ θ 2 (a,ˆ a) because there are n− k− 1 waystochooseS inaringsothatitdoesnotincludeanyvariableinagivenconnectedsubgraph ofk variables. Therefore, X ˆ a∈ ˆ A X S∈θ 2 (a,ˆ a) R S (ˆ a)= (n−k−1)R(a). So,fromEquation3.2, R(a)≥ (k−1)R(a ∗ )+(n−k−1)R(a) n andthereforeEquation3.4holds. Proposition4 For any binary DCOP of n agents with a star graph structure, where all con- straint rewards are non-negative, and a ∗ is the globally optimal solution, then, for any k-optimal assignment,a,wherek < n, R(a)≥ k−1 n−1 R(a ∗ ). (3.5) Proof: The proof is similar to the previous proof. In a star graph, there are n−1 k−1 connected subgraphs of k variables, and therefore | ˆ A| = n−1 k−1 . Every constraint S ∈ θ includes the central 26 variable and one other variable, and thus there are n−2 k−2 connected subgraphs of k variables that containS,andtherefore X ˆ a∈ ˆ A X S∈θ 1 (a,ˆ a) R S (ˆ a)= n−2 k−2 ! R(a ∗ ). Finally,therearenowaystochooseS sothatitdoesnotincludeanyvariableinagivenconnected subgraphofk variables. Therefore, X ˆ a∈ ˆ A X S∈θ 2 (a,ˆ a) R S (ˆ a)= 0·R(a). So,fromEquation3.2, R(a)≥ n−2 k−2 R(a ∗ )+0·R(a) n−1 k−1 andthereforeEquation3.5holds. Tightness can be proven by constructing DCOPs on ring and star graphs with the same re- wards as in Proposition 2, except with R S (a) = (k − 1)/(k + 1) in the first case for rings, and R S (a) = (k − 1)/(n− 1) in the first case for stars. The bound for rings can also be applied to chains,sinceanychaincanbeexpressedasaringwhereallrewardsononeconstraintarezero. Finally,boundsforDCOPswitharbitrarygraphsandnon-negativeconstraintrewardscanbe found using a linear-fractional program (LFP). In this method, key rewards on the constraints in the DCOP are treated as variables in the LFP. When the LFP is solved optimally, these variables are instantiated with values that, if used as constraint rewards, would produce the DCOP whose k-optimahavelowestrewardwithrespecttotheoptimalsolution. Thus,thismethodgivesatight 27 bound for any graph, since it instantiates the rewards for all constraints, but requires a globally optimal solution to the LFP, in contrast to the constant-time guarantees of Equations 3.1, 3.4 and 3.5. An LFP such as this is reducible to a linear program (LP) [Boyd and Vandenberghe, 2004] which is solvable in polynomial time with respect to the number of variables in the LP (the same as the number of variables in the LFP). The objective is to minimize R(a) R(a ∗ ) such that ∀˜ a ∈ ˜ A,R(a)− R(˜ a) ≥ 0, given ˜ A as defined in Proposition 1. Note that R(a ∗ ) and R(a) can be expressed as P S∈θ R S (a ∗ ) and P S∈θ R S (a). We can now transform the DCOP so that every R(˜ a) canalsobeexpressedintermsofsumsofR S (a ∗ )andR S (a),withoutchangingorinvalidatingthe guarantee on R(a). Therefore, the LFP will contain only two variables for each S ∈ θ, one for R S (a ∗ ) andone for R S (a), where the domainof each one isthe set ofnon-negative real numbers. The transformation is to set all reward functions R S (·) for all S ∈ θ to 0, except for two cases: when all variables i ∈ S have the same value as in a ∗ , or when all i ∈ S have the same value as in a. This has no effect on R(a ∗ ) or R(a), because R S (a ∗ ) and R S (a) will be unchanged for all S ∈ θ. Italsohasnoeffectontheoptimalityofa ∗ orthek-optimalityof a,sincetheonlychange is to reduce the global reward for assignments other than a ∗ and a. Thus, the tight lower bound on R(a) R(a ∗ ) stillappliestotheoriginalDCOP. 3.3 QualityGuaranteesforDCOPswithHardConstraints TheprevioussectionsinthischapterpresentedqualityguaranteesforDCOPsinwhichallrewards wererestrictedtobeingnon-negative. ADCOPwithbothcostsandrewardscouldbenormalized to one that met this restriction as long as it contained no hard constraints (constraints with an infinitelylargecost). However,inmanydomains,hardconstraintsexist,andsolutionsthatviolate 28 a hard constraint are not useful. For example, a schedule where two people disagree on the time of their meeting with each other may be considered useless, no matter how good the schedule is for other people. This section shows how, in some cases, we can guarantee the solution quality ofk-optimalDCOPsolutionsevenwhentheDCOPcontainshardconstraints. To obtain these guarantees we will assume that, in the DCOPs with hard constraints that we areconsidering,therealwaysexistsasolutionthatdoesnotviolateanyhardconstraints(i.e. that the globally optimal solution is a feasible solution). Given this assumption, we will show how the methods for obtaining guarantees given in the previous sections can be modified to allow for the existence of hard constraints. We define a hard constraint as a constraint in which at least onecombinationofvaluesproducesasufficientlylargecost(anegativerewardmanytimeslarger thanthesumofallpositiverewardsintheDCOP).Finally,wewillassumethatweknowapriori whichconstraintsintheDCOPgrapharehardconstraints(butnotwhichcombinationsofvalues ontheseconstraintscausethemtobeviolated). OnecomplicatingissueisthatinsomeDCOPswithhardconstraints,ak-optimalsolutionmay beinfeasible;thatis,thek-optimalsolutionviolatesatleastonehardconstraint,andanydeviation of k or fewer agents also results in an infeasible solution. If a k-optimum is infeasible, then it is impossible to guarantee its solution quality with respect to the global optimum. Therefore, to avoid these cases we will restrict our analysis to the following kind of DCOP: Consider a subgraphH oftheDCOPconstraintgraphthatconsistsallthehardconstraintsintheDCOPonly. We will only consider DCOPs where the largest connected subgraph of H contains k or fewer agents (nodes). That is to say, we will not consider any DCOP for which a connected subgraph of H exists that contains more than k agents. In the DCOPs we are considering, no k-optimum couldbeinfeasible. Toseethis,supposeaninfeasiblek-optimumdidexistinsuchaDCOP.This 29 means that at least one hard constraint in one of these such subgraphs is being violated. Since thenumberofagentsinthesubgraphisk orless,alltheseagentscouldchangetheirvaluestothe valuesthattheytakeintheoptimalsolution. Thischangewouldensurethatnoconstraintsinthe subgraph were being violated, improving the overall reward. However, if k or fewer agents can improvetherewardofagivenassignment,itcannotbeak-optimuminthefirstplace. Wenowshowhowourmethodsforcalculatingqualityguaranteesintheprevioussectioncan be adapted to handle DCOPs with hard constraints. Previously, in the proof for the quality guar- anteesinProposition1,itwasshownthat,givenak-optimalsolution aandanyofthedominated assignments ˆ a∈ ˆ A a,k (inwhichk agentsaredeviatingfromthek-optimum),thesetofconstraints θ in the DCOP could be divided into the sets θ 1 (a,ˆ a), θ 2 (a,ˆ a), and θ 3 (a,ˆ a). As before, we can expressthesumoftheglobalrewardsofallassignments ˆ a∈ ˆ A a,k as: X ˆ a∈ ˆ A a,k X S∈θ 1 (a,ˆ a) R S (ˆ a)+ X ˆ a∈ ˆ A a,k X S∈θ 2 (a,ˆ a) R S (ˆ a)+ X ˆ a∈ ˆ A a,k X S∈θ 3 (a,ˆ a) R S (ˆ a). Weknowfromourassumptionsthat X ˆ a∈ ˆ A a,k X S∈θ 1 (a,ˆ a) R S (ˆ a) > 0and X ˆ a∈ ˆ A a,k X S∈θ 2 (a,ˆ a) R S (ˆ a) > 0 becauseotherwisethegloballyoptimalsolutionorthek-optimalsolutionwouldcontainaviolated hardconstraint,andthatisnotpossibleintheDCOPsweareconsidering. However,itisnolonger certainthat P ˆ a∈ ˆ A a,k P S∈θ 3 (a,ˆ a) R S (ˆ a) > 0andsowecannotobtainEquation3.2asbefore. However, ifwemodifythedescriptionof ˆ A a,k fromProposition1toexcludeallassignments ˆ a∈ ˆ A a,k where 30 S hard ∈ θ 3 (a,ˆ a) for any hard constraint S hard , then we can apply the methods of the previous section to obtain quality guarantees. Excluding all assignments ˆ a ∈ ˆ A a,k where S hard ∈ θ 3 (a,ˆ a) means including only the assignments ˆ a ∈ ˆ A a,k where S hard ∈ θ 1 (a,ˆ a)∪ θ 2 (a,ˆ a), meaning only theassignmentswhereallvariablesinvolvedinallhardconstraintsaretakingthesamevaluesas intheoptimalsolution,orwhereallvariablesinvolvedinallhardconstraintsaretakingthesame valuesasinthek-optimalsolution. To see this, first consider a DCOP of six variables with domains of {0,1} on a star-shaped graph with agent1 at the center(i.e. θ ={S : 1∈ S,|S|= 2}). Suppose that a= [0 0 00 0 0] isa 4-optimum, and that a ∗ = [1 1 1 1 1 1] is the global optimum. In the case of no hard constraints, d(a,a ∗ )= 6,and ˆ A a,k contains d(a,a ∗ )−1 k−1 = 10assignments,listedbelow: [111100],[111010],[111001],[110110],[110101], [110011],[101110],[101101],[101011],[100111]. ByProposition4,R(a)= 3 5 R(a ∗ ). NowsupposethatS hard ={1,2}isahardconstraint. Asmentionedabove,wemodify ˆ A a,k to exclude all assignments ˆ a ∈ ˆ A a,k where S hard ∈ θ 3 (a,ˆ a). This means removing all assignments where a 1 = 0 and a 2 = 1 and all assignments where a 1 = 1 and a 2 = 0. Now, we are left with 6 assignmentsin ˆ A a,k = [111100],[111010],[111001],[110110],[110101],[110011]. Since R(a) ≥ ˆ a,∀ˆ a ∈ ˆ A a,k , then 6·R(a) ≥ P ˆ A a,k R(ˆ a). Now, ∀S , S hard ∈ θ,∀i ∈ S,a ∗ i = ˆ a i for exactly n−2 k−2 = 3 assignments in ˆ A a,k . For example, for S = {1,3},a ∗ 1 = ˆ a 1 = 1 and a ∗ 2 = ˆ a 2 = 1 for ˆ a= [111100],[111010],and[111001]. Therefore,∀S ∈ θ,R S (a ∗ )= R S (ˆ a)forthese n−2 k−2 = 3 assignments. Since this is a star graph,∀S ∈ θ,∀i ∈ S,a i = ˆ a i for zero assignments in ˆ A a,k . Thus,6·R(a)≥ P ˆ A a,k R(ˆ a)≥ 3·R(a ∗ )+0·R(a),andsoR(a)≥ 3 6−0 R(a ∗ )= 1 2 R(a ∗ ). 31 For other graph types, we can apply an modification to the linear fractional program (LFP) method in the previous section. The original LFP was a minimization of R(a) R(a ∗ ) such that ∀˜ a ∈ ˜ A,R(a)−R(˜ a)≥ 0,given ˜ AasdefinedinProposition1. Withtheexistenceofhardconstraintsin the DCOP that can have an infinitely large negative reward, this constraint in the LFP no longer holds for all ˜ a. Instead this constraint holds only for those ˜ a ∈ ˜ A where S hard ∈ θ 1 (a,˜ a) ∪ θ 2 (a,˜ a) for all hard constraints S hard . This occurs because any assignment ˜ a that does not meet this condition could violate a hard constraint, resulting in a large negative value for R(˜ a). Not includingtheseassignmentsproducesasmaller ˜ aandthusasmallersetofconstraintsintheLFP, resulting in a lower guarantee in the presence of hard constraints. This can be implemented by, whenconstructingtheLFP,omittinganyconstraintintheLFPwhichcorrespondsto The following proposition gives a more general proof for the case of more than one hard constraintinastar-shapedgraph. Proposition5 ForanybinaryDCOPofnagentswithastargraphstructure,whereallconstraint rewards are non-negative except for h hard constraints, and a ∗ is the globally optimal solution, then,foranyk-optimalassignment,a,wherek < nand0 < h < n−1. R(a)≥ k−h−1 n−h−1 R(a ∗ ). (3.6) Proof: TheproofissimilartotheproofofProposition4. Inastargraph,thereare n−1 k−1 connected subgraphs of k variables, and so for the case of no hard constraints, | ˆ A a,k | = n−1 k−1 because each assignment ˆ a ∈ ˆ A a,k corresponds to one of those connected subgraphs. However, with hard constraints, we only include assignments in ˆ A a,k where S hard ∈ θ 1 (a,ˆ a) ∪ θ 2 (a,ˆ a) for all hard constraintsS hard . Inastargraph,thereare n−h−1 k−h−1 connectedsubgraphsofkvariablesthatinclude 32 all variables involved in a hard constraint (corresponding to the assignments ˆ a where S hard ∈ θ 1 (a,ˆ a) for all hard constraints S hard ). Additionally, because all hard constraints must involve the central variable, there are (in the worst case, where d(a,a ∗ ) = n) zero connected subgraphs of k variables that include all variables in the k-optimal solution( S hard ∈ θ 2 (a,ˆ a) for all hard constraints S hard ). Therefore,| ˆ A a,k | = n−h−1 k−h−1 . For every non-hard constraint S, there are n−h−2 k−h−2 connectedsubgraphsofk variablesthatcontainS,andtherefore, X ˆ a∈ ˆ A a,k X S∈θ 1 (a,ˆ a) R S (ˆ a)= n−h−2 k−h−2 ! R(a ∗ ). Finally,justasinthecasewithouthardconstraints,therearenowaystochooseaconstraintS so thatitdoesnotincludeanyvariableinagivenconnectedsubgraphofk variables. Therefore, X ˆ a∈ ˆ A a,k X S∈θ 2 (a,ˆ a) R S (ˆ a)= 0·R(a). andthereforeEquation3.6holds. When h = n− 1 (all constraints are hard constraints), it is easy to see that only k = n will guaranteeoptimality;otherwise,noguaranteeispossible. Finally,wenotethat,theseguaranteesfork-optimainstandardDCOPswithhardconstraints also apply to multiply-constrained DCOPs as introduced in [Bowring et al., 2006]. In these problems, in addition to the standard DCOP constraint cost functions, there exist additional cost functions on certain agents, known as g-constraints. A g-constraint on an agent contains a func- tion, whose inputs are the agent’s value and all its neighbors’ values, and whose output is a cost 33 known as a g-cost (which is not related to the standard costs and rewards in the DCOP). Any as- signment where an agent’s g-cost exceeds its preassigned g-budget is infeasible. In other words, ag-constraintonanagentwith mneighborsisashorthandexpressionforthemoregeneralnotion of a hard m+ 1-ary constraint between the agent and all its neighbors (where any assignment where the g-budget is exceeded is considered infeasible). Thus, the methods of this section can beappliedtoobtainguaranteesonk-optimainmultiply-constrainedDCOPsaswell;thisisshown intheexperimentalresultslaterinthechapter. 3.4 DominationAnalysisofk-Optima In this section we now provide a different type of guarantee: lower bounds on the proportion of allpossibleDCOPassignmentswhichanyk-optimummustdominateintermsofsolutionquality. This proportion, called a domination ratio, provides a guide for how difficult it may be to find a solution of higher quality than a k-optimum; this metric is commonly used to evaluate heuristics forcombinatorialoptimizationproblems[GutinandYeo,2005]. For example, suppose for some k, the solution quality guarantee from Section 3.2 for any k-optimum was 50% of optimal, but, additionally, it was known that any k-optimum was guar- anteed to dominate 95% of all possible assignments to the DCOP. Then, at most only 5% of the other assignments could be of higher quality, indicating that it would likely be computationally expensivetofindabetterassignment,eitherwithahigherk algorithm,orbysomeothermethod, and so a k-optimal algorithm should be used despite the low guarantee of 50% of the optimal solution quality. Now suppose instead for the same problem, the k-optimum was guaranteed to 34 dominate only 20% of all assignments. Then it becomes more likely that a better solution could befoundquickly,andsothek-optimalalgorithmmightnotberecommended. To find the domination ratio, observe that any k-optimum a must be of the same or higher qualitythanall ˜ a∈ ˜ AasdefinedinProposition1. So,theratiois: 1+| ˜ A| Q i∈N |A n | . (3.7) If the constraint graph is fully connected (or not known, and so must be assumed to be fully connected),andeachvariablehasqvalues,then| ˜ A|= P k j=1 n j (q−1) j and Q i∈N |A n |= q n . If the graph is known to be not fully connected, then the set ˜ A from Equation 3.7 can be expandedtoincludeassignmentsofdistancegreaterthankfroma,providingastrongerguarantee ontheratiooftheassignmentspacethatmustbedominatedbyanyk-optimum. Specifically,if a isk-optimal,thenanyassignmentwhere anynumberofdisjointsubsetsofsize≤ khavedeviated fromamustbeofthesameorlowerqualityasa,aslongasnoconstraintincludesanytwoagents indifferentsuchsubsets;thisideaisillustratedbelow: Example3 Consider a binary DCOP of five variables, numbered 1 to 5, with domains of two values, with unknown constraint graph. Any 3-optimum must be of equal or greater quality than 1+ 5 1 + 5 2 + 5 3 /2 5 = 81.25%ofallpossibleassignments,i.e. where0,1,2,or3agentshave deviated. Now, suppose the graph is known to be a chain with variables ordered by number. Since a deviation by either the variables{1,2} or{4,5} cannot increase global reward, and no constraint exists across these subsets, then neither can a deviation by{1,2,4,5}, even though four variables are deviating. The same applies to {1,3,4,5} and {1,2,3,5}, since both are made up of subsets of 35 threeorfewervariablesthatdonotshareconstraints. So,a3-optimumisnowofequalorgreater qualitythan 1+ 5 1 + 5 2 + 5 3 +3 /2 5 = 90.63%ofallassignments. An improved guarantee can be found by enumerating the set ˜ A of assignments ˜ a with equal or lower reward than a; this set is expanded due to the DCOP graph structure as in the above example. The following proposition makes this possible; we introduce new notation for it: If we definendifferentsubsetsofagentsas D i fori= 1...n,weuse D m =∪ m i=1 D i ,i.e. D m istheunion ofthefirstmsubsets. Theproofisbyinductionovereachsubset D i fori= 1...n. Proposition6 Let a be some k-optimal assignment. Let ˜ a n be another assignment for which D(a,˜ a n )canbeexpressedas D n =∪ n i=1 D i where: • ∀D i ,|D i |≤ k. (subsetscontaink orfeweragents) • ∀D i ,D j ,D i ∩D j =∅. (subsetsaredisjoint) • ∀D i ,D j ,@i ∈ D i , j ∈ D j such that i, j ∈ S, for any S ∈ θ. (no constraint exists between agentsindifferentsubsets) Then,R(a)≥ R(˜ a n ). Proof: Basecase: Ifn= 1then D n = D 1 andR(a)≥ R(˜ a n )bydefinitionofk-optimality. Inductivestep: R(a)≥ R(˜ a n−1 )⇒ R(a)≥ R(˜ a n ). 36 Thesetofallagentscanbedividedintothesetofagentsin D n−1 ,thesetofagentsin D n ,and thesetofagentsnotin D n . Also,byinductivehypothesis,R(a)≥ R(˜ a n−1 ). Therefore, R(a)= X S∈θ:S∩D n−1 ,∅ R S (a)+ X S∈θ:S∩D n ,∅ R S (a)+ X S∈θ:S∩D n =∅ R S (a) ≥ X S∈θ:S∩D n−1 ,∅ R S (˜ a n−1 S )+ X S∈θ:S∩D n ,∅ R S (˜ a n−1 )+ X S∈θ:S∩D n =∅ R S (˜ a n−1 ), andso X S∈θ:S∩D n−1 ,∅ R S (a)≥ X S∈θ:S∩D n−1 ,∅ R S (˜ a n−1 ) becausetherewardsfromagentsoutside D n−1 arethesameforaand ˜ a n−1 . Let a 0 be an assignment such that D(a,a 0 ) = D n = D(˜ a n−1 ,˜ a n ). Because a is k-optimal, R(a)≥ R(a 0 );therefore, R(a)= X S∈θ:S∩D n−1 ,∅ R S (a)+ X S∈θ:S∩D n ,∅ R S (a)+ X S∈θ:S∩D n =∅ R S (a) ≥ X S∈θ:S∩D n−1 ,∅ R S (a 0 )+ X S∈θ:S∩D n ,∅ R S (a 0 )+ X S∈θ:S∩D n =∅ R S (a 0 ), andso X S∈θ:S∩D n ,∅ R S (a)≥ X S∈θ:S∩D n ,∅ R S (a 0 ) becausetherewardsfromagentsoutside D n arethesameforaanda 0 . 37 Wealsoknowthat X S∈θ:S∩D n =∅ R S (a)= X S∈θ:S∩D n =∅ R S (˜ a n ) becausetherewardsfromagentsoutside D n arethesameforaand ˜ a n ;therefore, R(a)≥ X S∈θ:S∩D n−1 ,∅ R S (˜ a n−1 )+ X S∈θ:S∩D n ,∅ R S (a 0 )+ X S∈θ:S∩D n =∅ R S (˜ a n ) = X S∈θ:S∩D n−1 ,∅ R S (˜ a n )+ X S∈θ:S∩D n ,∅ R S (˜ a n )+ X S∈θ:S∩D n =∅ R S (˜ a n ) because the rewards from D n−1 are the same for ˜ a n−1 and ˜ a n , and the rewards from D n are the samefora 0 and ˜ a n . Therefore,R(a)≥ R(˜ a n ). 3.5 ExperimentalResults While the main thrust of this chapter is on theoretical guarantees for k-optima, this section pro- vides an illustration of these guarantees in action, and how they are affected by constraint graph structure. Figures3.1a,3.1b,and3.1cshowqualityguaranteesforbinaryDCOPswithfullycon- nected graphs, ring graphs, and star graphs, calculated directly from Equations 3.1, 3.4 and 3.5. Figure 3.1d shows quality guarantees for DCOPs whose graphs are binary trees, obtained using the LFP from Section 3.2. Constructing these NLPs and solving them optimally with LINGO 8.0 global solver took about two minutes on a 3 GHz Pentium IV with 1GB RAM. The x-axis plots the value chosen for k, and the y-axis plots the lower bound for k-optima as a percentage of the optimal solution quality for systems of 5, 10, 15, and 20 agents. These results show how the worst-case benefit of increasing k varies depending on graph structure. For example, in a 38 Figure 3.1: Quality guarantees for k-optima with respect to the global optimum for DCOPs of variousgraphstructures. 39 five-agent DCOP, a 3-optimum is guaranteedto be 50% of optimal whether the graph is astar or a ring. However, moving to k = 4 means that worst-case solution quality will improve to 75% forastar,butonlyto60%foraring. Forfullyconnectedgraphs,thebenefitofincreasingk goes upask increases;whereasforstarsitstaysconstant,andforchainsitdecreases,exceptforwhen k= n. Resultsforbinarytreesaremixed. Figure 3.2 shows quality guarantees in the presence of hard constraints for fully connected graphs, ring graphs, and star graphs. These experiments began with a DCOP containing all soft constraintsandnohardconstraints,andgraduallymoreandmoresoftconstraintsweremadeinto hard constraints. The left column shows the effect of one and two hard constraints in a DCOP of five agents, and the right column shows the effect of two and four hard constraints in a DCOP of 10 agents. The constraints that were set as hard constraints were, in order, {0,1}, {2,3}, {4,5}, and {6,7}. This methodology was chosen so that no agent would be subject to more than one hard constraint, and so that k-optimal solutions would always be feasible. For star graphs, the guaranteefromProposition5wasused;whilefortheothers,theLFPmethodwasused. Figure 3.3 shows quality guarantees for a multiply-constrained DCOP with 30 agents, ar- ranged in a ring structure. These experiments begin with a DCOP containing no agents with g-constraints. Thenumberofagentswith g-constraintswasgraduallyincreasedbyassigninga g- constrainttoeverythirdagent,startingwithagent0,untiltherewere10agentswithg-constraints. Similar to the previous experiments, this methodology was chosen so that k-optimal solutions would always be feasible. The LFP method was used to calculate the guarantees. Figure 3.3 shows guarantees for up to 4 agents with g-constraints as k increases. For the cases of 5 to 10 agentswithg-constraintstheguaranteewasthesameasfor4agentswith g-constraints. 40 Figure3.2: Qualityguaranteesfork-optimainDCOPscontaininghardconstraints. 41 Figure3.3: Qualityguaranteesfork-optimainamultiply-constrainedringDCOP. 42 Figure 3.4: Quality guarantees for k-optima with respect to the space of dominated assignments forvariousgraphstructures. Figure3.4showstheguaranteesfork-optimafromSection3.4,expressedaspercentilerank- ingsoverallpossibleassignments,forDCOPswherevariableshavedomainsoftwovalues. This figure, when considered with Figure 3.1, provides insight into the difficulty of finding a solution of higher quality than a k-optimum. For example, a 7-optimum in a fully connected graph of 10 agents(Figure3.1a)isonlyguaranteedtobe50%ofoptimal;howeverthis7-optimumisguaran- teed to be of higher quality than 94.5% of all possible assignments to that DCOP (Figure 3.4a), which suggests that finding a better solution may be difficult. In contrast, a 3-optimum in a ring 43 of10agents(Figure3.1b)hasthesameguaranteeof50%ofoptimalsolution,butthis3-optimum is only guaranteed to be of higher quality than 69% of all possible assignments, which suggests thatfindingabettersolutionmaybeeasier. 44 Chapter4 UpperBoundsontheNumberofk-OptimainaDCOP Traditionally, researchershavefocused onobtaining asingleDCOP solution, expressed asasin- gle assignment of actions to agents. However, in this section, we consider a multi-agent system that generatesa set of k-optimal assignments, i.e. multiple assignmentsto the sameDCOP. Gen- eratingsetsofassignmentsisusefulindomainssuchasdisasterrescue(toprovidemultiplerescue options to a human commander) [Schurr et al., 2005], patrolling (to execute multiple patrols in the same area) [Ruan et al., 2005], training simulations (to provide several options to a student) andothers[Tateetal.,1998]. Domains requiring repeated patrols in an area by a team of UAVs (unmanned air vehicles), UGVs (unmanned ground vehicles), or robots, for peacekeeping or law enforcement after a dis- aster, provide one key illustration of the utility of k-optimality. For example, given a team of patrolrobotsinchargeofexecutingmultiplejointpatrolsinanareaasin[Ruanetal.,2005],each robot may be assigned a region within the area. Each robot is controlled by a single agent, and hence,foronejointpatrol,eachagentmustchooseoneofseveralpossibleroutestopatrolwithin its region. A joint patrol is an assignment, where each agent’s action is the route it has chosen to patrol,andrewardsandcostsarisefromthecombinationofroutespatrolledbyagentsinadjacent 45 or overlapping regions. For example, if two nearby agents choose routes that largely overlap on a low-activity street, the constraint between those agents would incur a cost, while routes that overlap on a high-activity street would generate a reward. Agents in distant regions would not share a constraint. If reward alone is used as a metric to select joint patrols, then all selected joint patrols could be the same, except for the action of one agent. This set of patrols would be repetitive and predictable to adversaries. If we pick some diverse joint patrols at random, they maybeverylow-qualitypatrols. Using k-optimalitydirectlyaddressessuchcircumstances;ifno ties exist between the rewards of patrols a distance k or fewer apart, k-optimality ensures that all joint patrols differ by at least k+1 agents’ actions, as well as ensuring that this diversity would not come at the expense of obviously bad joint patrols, as each is optimal within a radius of at leastk agents’actions. Ourkeycontributioninthissectionisaddressingefficientresourceallocationforthemultiple assignmentsinak-optimalset,bydefiningtightupperboundsonthenumberof k-optimalassign- ments that can exist for a given DCOP. These bounds are necessitated by two key features of the typical domains where a k-optimal set is applicable. First, each assignment in the set consumes someresourcesthatmustbeallocatedinadvance. Suchresourceconsumptionarisesbecause: (i) ateamactuallyexecuteseachassignmentintheset,asinourpatrollingexampleabove,or(ii)the assignmentsetispresentedtoahumanuser(oranotheragent)asalistofoptionstochoosefrom, requiring time. In each case, resources are consumed based on the assignment set size. Second, while the existence of the constraints between agents is known a priori, the actual rewards and costs on the constraints depend on conditions that are not known until runtime, and so resources must be allocated before the rewards and costs are known and before the agents generate the k- optimal assignment set. In the patrolling domain, constraints are known to exist between patrol 46 robots assigned to adjacent or overlapping regions. However, their costs and rewards depend on recentfieldreportsofadversarialactivitythatarenotknownuntiltherobotsaredeployed. Atthis point the robots must already be fueled in order for them to immediately generate and execute a set of k-optimal patrols. The resource to be allocated to the robots is the amount of fuel required to execute each patrol; thus it is critical to ensure that enough fuel is given to each robot so that each assignment found can be executed, without burdening the robots with wasted fuel that will gounused. Consideranotherdomaininvolvingateamofdisasterrescueagentsthatmustgenerate asetofk-optimalassignmentsinordertopresentasetofdiverseoptionstoahumancommander, where each option represents the best assignment within a neighborhood of similar assignments. The commander will choose one assignment for the team to actually execute. Constraints exist between agents whose actions must be coordinated (i.e. members of subteams) but their costs andrewardsdependonconditionsonthegroundthatareunknownuntilthetimewhentheagents must be deployed. Here, the resource is the time the commander has to make the decision. Pre- senting too many options will cause the commander to run out of time before considering them all,andpresentingtoofewmaycausehigh-qualityoptionstobeomitted. Because each assignment consumes resources, knowing the maximal number of k-optimal assignmentsthatcouldexistforagivenDCOPwouldallowustoallocatesufficientresourcesfor agivenlevelofk. Unfortunately,wecannotpredictthisnumberbecausethecostsandrewardsfor the DCOP are not known in advance. Despite this uncertainty, reward-independent bounds can beobtainedonthesizeofak-optimalassignmentset,i.e. tosafelyallocateenoughresourcesfor agivenvalueofk foranyDCOPwithaparticulargraphstructure. Wefirstidentifyamappingto codingtheory,yieldingboundsindependentofbothrewardandgraphstructure. Wethenprovide 47 a method to use the structure of the DCOP graph (or hypergraph of arbitrary arity) to obtain significantlytighterbounds. The third key contribution in this section is to establish a connection to noncooperative set- tings by proving that our bounds for 1-optima also apply to the number of pure-strategy Nash equilibria in any graphical game on a given graph, which remains an open problem. In addition totheirusesinresourceallocation,theseboundsprovideinsightintotheproblemlandscapesthat canexistinbothcooperativeandnoncooperativesettings. 4.1 UpperBoundsonk-Optima Upper bounds on the number of possible k-optimal assignments, |A q (n,k)| are useful for two reasons: theyyieldresourcesavingsindomainswhereaparticularlevelofk-optimalityisdesired, andhelpdeterminetheappropriatelevelofk-optimalitytopreventguaranteedwasteofresources (fuel,time,etc.) insettingswithfixedresources. First, a particular level of k-optimality may be desired for an assignment set: a high k will include assignments that are more diverse, and optimal within a larger radius, but high- k algo- rithms have significantly higher coordination/communication overheads either in the number of messages passed or in the space and time required to centrally compute partial solutions to the DCOP[Modietal.,2005;MaillerandLesser,2004;Maheswaranetal.,2004a];hencelowerk is preferableundertimepressure. Lowerk mayalsobepreferableifanagentteamorahumanuser wants a more detailed set of assignments, for example, more joint patrols, more rescue options, etc. For a given level of k-optimality, bounds indicate the maximum resource requirement for any k-optimal assignment set. Thus, tighter bounds provide savings by allowing fewer resources 48 k 1 k 2 resource savings r 1 r 2 (a) (b) guaranteed resource waste β 1 β 2 β 1 β 2 ˆ k ˆ r resource level k-optimality of JA set resource level k-optimality of JA set Figure4.1: Hypotheticalexampleillustratingtheadvantagesoftighterbounds to be allocated a priori while ensuring enough will be available for all k-optimal assignments, regardlessoftherewardsandcostsontheconstraints. Figure4.1isahypotheticalexample,with k on the x-axis and the number of resources to be allocated on the y-axis. β 1 and β 2 are two different upper bounds on the number of k-optimal assignments that can exist for a given DCOP. Part (a) shows how the tighter bound β 2 indicates that a resource level of r 2 is sufficient for all ˆ k-optimal assignments, if each assignment consumes one resource, yielding a savings of r 1 −r 2 overusing β 1 . Second, if resource availability is fixed, tighter bounds help us choose an appropriate level of k-optimality. If k is too low, we may exhaust our resources on bad assignments (similar as- signments with poor relative quality). In contrast, fewer k-optimal assignments can exist as k increases, and so if k is too high, available resources that could be spent on additional assign- mentsareguaranteedtogounused. Tighterboundsprovideamoreaccuratemeasureofthiskind of guaranteed waste and thus, allow a more appropriate k to be chosen. In Figure 4.1(b), under fixed resource level ˆ r, the looser bound β 1 hides the resources guaranteed to go unused when k 1 isused. Thiswasteisrevealedbyβ 2 ,withthethicklineindicatingtheresourcesthat,ifallocated, 49 will never be used, as there cannot exist enough k-optima to use them all; instead, we now see thatusingk 2 willreducethisguaranteedwaste. Tofindthefirstupperboundsonthenumberofk-optimaforagivenDCOPgraph,wediscov- ered a correspondence to coding theory [Ling and Xing, 2004]. In error-correcting codes, a set of codewords must be chosen from the space of all possible words, where each word is a string of characters from an alphabet. All codewords are sufficiently different from one another so that transmissionerrorswillnotcauseonetobeconfusedforanother. Findingthemaximumpossible number of k-optima can be mapped to finding the maximum number of codewords in a space of q n words where the minimum distance between any two codewords is d = k+ 1. We can map DCOP assignments to words and k-optima to codewords as follows: an assignment a taken by n agents each with a domain of cardinality q is analogous to a word of length n from an alphabet ofcardinalityq. Thedistanced(a,˜ a)canthenbeinterpretedasaHammingdistancebetweentwo words. Then, if a is k-optimal, and d(a,˜ a) ≤ k, then ˜ a cannot also be k-optimal by definition. Thus,anytwok-optimamustbeseparatedbydistance ≥ k+1. Threewell-knownbounds[LingandXing,2004]oncodewordsareHamming: β H = q n / bk/2c X j=0 n j ! (q−1) j , Singleton: β S = q n−k , 50 andPlotkin: β P = j k+1 k+1−(1−q −1 )n k , whichisonlyvalidwhen(1−1/q)n < k+1.Notethatforthespecialcaseofq= 2(allconstraints arebinary),itispossibletousetherelation β H (n,k,q)= β H (n−1,k−1,q)[LingandXing,2004] toobtainatighterboundforoddkusingtheHammingbound. Now,tofindareward-independent bound on the number of 1-optima for three agents with q = 2, (e.g., the system in Example 1), weobtainmin{β H ,β S ,β P }= β H = 4,withoutknowingR 12 andR 23 explicitly. Unfortunately, problems of even d (odd k), are not of interest for error-correcting codes, and β H , the Hamming bound, is very loose or useless for DCOP when q > 2, e.g., for 1-optima (solutions reached by DSA) the bound is equal to the number of possible assignments in this case. Hence, for DCOP, we pursue an improved bound for q > 2 and odd k. β H is derived by using a sphere-packing argument stating that the total number of words q n must be greater than the number of codewords A q (n,k) multiplied by the size of a sphere of radius bk/2c centered around each codeword. A sphere S A (a ∗ ,r) with center a ∗ and radius r is the set of assignments ˜ a such that d(a ∗ ,˜ a) ≤ r, and represents words that cannot be codewords (except for its center). It can be shown that S A (a ∗ ,bk/2c) contains exactly P bk/2c j=0 n j (q − 1) j words. If k is even, the tightestpackingoccurswithspheresofradiusk/2andeachwordcanbeuniquelyassignedtothe sphere of its closest codeword. If k is odd, it is possible for a word to be equidistant from two codewordsanditisunclearhowtoassignittoasphere. TheHammingboundaddressesthisissue byusingtheboundfork−1whenkisodd,whichleadstosmallerspheresandaboundlargerthan necessary. This ignores the contribution of a word that lies on the “boundary” between several 51 spheres. Theseboundaryassignmentscanbeappropriatelypartitionedtoachieveatighterbound onthenumberofk-optimaforodd k,calledtheModifiedHammingbound. Proposition7 Foroddk, A q (n,k)≤ min{A 1 ,A 2 }where A 1 = q n − n (k+1)/2 (q−1) (k+1)/2 P bk/2c j=0 n j (q−1) j A 2 = q n P bk/2c j=0 n j (q−1) j + n (k+1)/2 (q−1) (k+1)/2 (n −1 ) Proof. Any word that has Hamming distance bk/2c or less from a codeword belongs in that codeword’ssphere,becausebelongingtomorethanonespherewouldviolatethecode’sdistance requirement. Given an odd value of k, each codeword will have n (k+1)/2 (q−1) (k+1)/2 words that areadistanceof(k+1)/2awayfromit. Itcannotclaimallthesewordsforitssphereexclusively, astheymaybeequidistantfromothercodewords. Wedoknowhoweverthateachofthesewords can be on the boundary of at most n spheres (i.e. can be equidistant from at most n codewords) because they are of length n. Furthermore, each of these words can be equidistant from at most A q (n,k) codewords, i.e. the total number of codewords in the space. Thus, each codeword can safely incorporate 1/min{n,A q (n,k)} of each of these boundary words into its sphere without any portion being claimed by more than one sphere. Aggregating over all the words on the boundary, we can increase the volume of the sphere by n (k+1)/2 (q− 1) (k+1)/2 /min{n,A q (n,k)}. 52 Usingthesphere-packingargumentwiththeportionsoftheboundarywordsaddedtoeachsphere, if A q (n,k)≤ n,wehave q n ≥ A q (n,k) " bk/2c X j=0 n j ! (q−1) j + n (k+1)/2 (q−1) (k+1)/2 A q (n,k) # ⇒ A q (n,k)≤ q n − n (k+1)/2 (q−1) (k+1)/2 P bk/2c j=0 n j (q−1) j ≡ A 1 , andif A q (n,k)≥ n,wehave q n ≥ A q (n,k) " bk/2c X j=0 n j ! (q−1) j + n (k+1)/2 (q−1) (k+1)/2 n # ⇒ A q (n,k)≤ q n P bk/2c j=0 n j (q−1) j + n (k+1)/2 (q−1) (k+1)/2 (n −1 ) ≡ A 2 . We have A q (n,k) ≤ n ⇒ A q (n,k) ≤ A 1 and A q (n,k) ≥ n ⇒ A q (n,k) ≤ A 2 . We can show that A 1 n ⇔ A 2 n, ∀ ∈ {<,>,=}. Furthermore, A 1 n,A 2 n ⇔ A 1 A 2 . Thus, when A 1 ≤ n, then A 2 ≤ n and A 1 ≤ A 2 . So, A q (n,k) ≤ A 1 = min{A 1 ,A 2 } when A 1 ≤ n. And, when A 1 > n, then A 2 > n and A 1 > A 2 . So, A q (n,k) ≤ A 2 = min{A 1 ,A 2 } when A 1 > n. Therefore, A q (n,k)≤ min{A 1 ,A 2 }. WecalltheModifiedHammingbound β MH anddefine β HSP = min{β H ,β S ,β P ,β MH },includ- ing the relation for β H for q = 2; i.e. β HSP gives the best of all the (graph-independent) bounds sofar. 53 4.2 Graph-BasedAnalysis: k-Optima Theβ HSP boundanditscomponentsdependonlyonn,kandq,regardlessofhowtheteamreward isdecomposedontoconstraints;i.e.,theboundsarethesameforallθ. Forinstance,theboundon 1-optimaforExample1(foundtobe4intheprevioussection)ignoredthefactthatagents1and 3 do not share a constraint, and yields the same result independent of the DCOP graph structure. However, exploiting this structure (as captured by θ) can significantly tighten the bounds on {|A q (n,k)|} n k=1 . In particular, in obtaining the bounds in Section 4.1, pairs of assignments were mutually exclusive as k-optima (only one of the two could be k-optimal) if they were separated by a distance ≤ k. We now show how some assignments separated by a distance ≥ k+ 1 must alsobemutuallyexclusiveask-optima. We define D G (a,˜ a) := {i ∈ G : a i , ˜ a i } and V(G) := ∪ S∈θ:G∩S,∅ S. Intuitively, D G (a,˜ a) is the setofagentswithinthesubgroupGwhohavechosendifferentactionsbetweenaand ˜ a,andV(G) isthesetofagents(includingthoseinG)whoareamemberofsomeconstraintS ∈ θincidenton a member of G (e.g., G and the agents who share a constraint with some member of G). Then, V(G) C is the setof all agents whosecontribution to the team rewardis independent of thevalues takenbyG. Proposition8 Let there be an assignment a ∗ ∈ A q (n,k) and let ˜ a ∈ A be another assignment for which d(a ∗ ,˜ a) > k. If∃G ⊂N,G ,∅ for which|G|≤ k and D V(G) (a ∗ ,˜ a)= G, then ˜ a< A q (n,k). In other words, if ˜ a contains some group G that is facing the same context as it does in a ∗ , but some agent in G is choosing a different value than it does in a ∗ , then a ∗ and ˜ a cannot both be k-optimal. 54 Proof. Given a ∗ , ˜ a, and G with the properties stated above, we have that ∀a : d(a ∗ ,a) ≤ k, Δ(a ∗ ,a) > 0. In other words, any assignment a that is a distance of k or less from a ∗ must have a lower reward than a ∗ . Let us define a particular a such that a i = ˜ a i for i ∈ V(G) and a i = a ∗ i for i < V(G); i.e., the agents in V(G) take the same values as in ˜ a and the other agents take the same values as in a ∗ . Then, by our initial conditions, D(a ∗ ,a) = D V(G) (a ∗ ,˜ a) = G, and therefore d(a ∗ ,a)≤ k whichimpliesthata ∗ hasahigherrewardthana: Δ(a ∗ ,a)= X S∈θ:S∩D(a ∗ ,a),∅ R S (a ∗ S )−R S (a S )= X S∈θ:S∩G,∅ R S (a ∗ S )−R S (a S ) = X S∈θ:S∩G,∅ R S (a ∗ S )−R S (˜ a S ) > 0. Nowletusdefineanewassignment, ˆ a,suchthat ˆ a i = a ∗ i fori∈ V(G)and ˆ a i = ˜ a i fori< V(G). Inotherwords,theagentsinV(G)takethesamevaluesasina ∗ andtheotheragentstakethesame valuesasin ˜ a. Then D(˜ a,ˆ a)= D V(G) (˜ a,a ∗ )=G andd(˜ a,ˆ a)≤ k,whichimpliesthat ˜ ahasalower rewardthan ˆ a: Δ(˜ a,ˆ a)= X S∈θ:S∩D(˜ a,ˆ a),∅ R S (˜ a S )−R S (ˆ a S )= X S∈θ:S∩G,∅ R S (˜ a S )−R S (ˆ a S ) = X S∈θ:S∩G,∅ R S (˜ a S )−R S (a ∗ S ) < 0. Thus, ˜ a < A q (n,k) becauseΔ(˜ a,ˆ a) < 0 and d(˜ a,ˆ a) ≤ k. (˜ a has a lower reward than ˆ a which is a distanceofk orlessaway,andthuscannotbek-optimalif a ∗ isk-optimal.) Proposition 8 provides conditions where if a ∗ is k-optimal then ˜a, which may be separated from a ∗ by a distance greater than k + 1 may not be k-optimal, thus tightening bounds on k- optimal assignmentsets. With Proposition 8, sinceagents are typically notfully connected toall 55 2 1 3 4 5 6 G V(G) V(G) C a ∗ : Joint actions (JAs): DCOP graph: ˜ a 1 : ˜ a 2 : ˜ a 3 : 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 (a) (b) Figure4.2: AvisualrepresentationoftheeffectofProposition 8. other agents, the relevant context a subgroup faces is not the entire set of other agents. Thus, the subgroup and its relevant context form a view (captured by V(G)) that is not the entire team. We considerthecasewhereanassignment ˜ ahas d(a ∗ ,˜ a) > k. WealsohavegroupG ofsize k within whose view V(G), G are the only deviators between a ∗ and ˜ a (although agents outside the view mustalsohavedeviated,becaused(a ∗ ,˜ a) > k). Wethenknowthat ˜ acontainsagroupG ofsizek or less that has taken a suboptimal subgroup assignment of values to variables with respect to its relevant context and thus ˜ a cannot be k-optimal, i.e. if the group G chose a ∗ G instead of ˜ a G under its relevant context V(G)\G for ˜ a, then team reward would increase. Finally, we note that this propositiondoesnotimplyanyrelationshipbetweentherewardofa ∗ andthatof ˜ a. Figure 4.2(a) shows G, V(G), and V(G) C for a sample DCOP of six agents with a domain of two actions, white and gray. Without Proposition 8, ˜ a 1 , ˜ a 2 , and ˜ a 3 could all potentially be 2-optimal. However, Proposition 8 guarantees that they are not, leading to a tighter bound on the number of 2-optima that could exist. To see the e ffect, note that if a ∗ is 2-optimal, then G = {1,2}, a subgroup of size 2, must have taken an optimal subgroup joint action (all white) given its relevant context (all white). Even though ˜ a 1 , ˜ a 2 , and ˜ a 3 are a distance greater than 2 56 from a ∗ , they cannot be 2-optimal, since in each of them, G faces the same relevant context (all white)butisnowtakingasuboptimalsubgroupjointaction(allgray). To explain the significance of Proposition 8 to bounds, we introduce the notion of an exclu- sivity relation E ⊂ N which captures the restriction that if deviating group D(a,˜ a) = E, then at mostoneofaand ˜ acanbek-optimal. An exclusivityrelationsetfork-optimality, E k ⊂P(N),isa collectionofsuchrelationsthatlimits|A q (n,k)|,thenumberofassignmentsthatcanbek-optimal in a reward-independent setting (otherwise every assignment could potentially be k-optimal). In particular, the set E k defines an exclusivity graph H k where each node corresponds uniquely to one of all q n possible assignments. Edges are defined between pairs of assignments, a and ˜ a, if D(a,˜ a) ∈ E k . The size of the maximum independent set (MIS) of H k , the largest subset of nodes such that no pair defines an edge, gives an upper bound on|A q (n,k)|. Naturally, an expandedE k impliesamoreconnectedexclusivitygraphandthusatighterboundon|A q (n,k)|. Without introducing graph-based analysis, β HSP for each k provides a bound on the MIS of H k when E k = [ E⊂N:1≤|E|≤k E. This setE k captures only the restriction that no two assignments within a distance of k can both bek-optimal. ConsiderExample1,butwithunknownrewardsonthelinks. Here,theexclusivity relation set for 1-optima without considering the DCOP graph is E 1 = {{1},{2},{3}}, meaning that no two assignments differing only by the action taken by either agent 1, 2, or 3, can both be 1-optimal. ThisleadstotheexclusivitygraphinFigure4.3(a)whoseMISimpliesaboundof4. 57 a 1 a 3 000 010 001 a 2 011 100 110 101 111 (a) (b) 000 010 001 011 100 110 101 111 Figure4.3: Exclusivitygraphsfor1-optimaforExample1withMISshowningray,(a)notusing Proposition8and(b)usingit. The significance of Proposition 8 is that it provides additional exclusivity relations for solu- tions separated by distance≥ k+1, which arise only because we considered the structure of the DCOP graph, which will allow a tighter bound to be computed. This graph-based exclusivity relationsetis e E k = [ E⊂N:1≤|E|≤k [ F∈P(V(E) C ) [E∪F] which is a superset of E k . Additional relations exist because multiple different exclusivity rela- tions( S F∈P(V(E) C ) [E∪F])appearthesametothesubgroup E becauseofitsreducedviewV(E). Now,forExample1,theexclusivityrelationsetfor1-optimawhenconsideringtheDCOPgraph is e E 1 = {{1}{2},{3},{1,3}}, which now has the additional relation {1,3}. This relation, included because of the realization that agents 1 and 3 are not connected, says that no two assignments can both be 1-optimal if they di ffer only in the actions of both agent 1 and agent 3. This leads to theexclusivitygraphinFigure4.3(b)whoseMISimpliesaboundof2. Algorithmsforobtaining boundsusing e E k willbediscussedinSection4.4. 58 4.3 ApplicationtoNashEquilibria Our graph-based bounds can be extended beyond agent teams to noncooperative settings. It is possible to employ the same exclusivity relations for 1-optimal DCOP assignments to bound the number of pure-strategy Nash equilibria in a graphical game (of the same graph structure) using any of our bounds for |A q (n,1)|. Bounds on Nash equilibria [McLennan and Park, 1999] are useful for design and analysis of mechanisms as they predict the maximum number of outcomes ofagame. WebeginwithasetofnoncooperativeagentsN ={1,...n},wherethei th agent’sutilityis U i (a i ;a {N\i} )= X S i ∈θ i U i S i (a i ;a {S i \i} ) which is a decomposition into an aggregation of component utilities generated from minimal subgroups. Note that the combination of actions taken by any subgroup of agents may generate utilityforanyagenti,thereforethesubgroupsaredenotedasS i ratherthanS,asinthecooperative case, where the utility went to the entire team. The notation a i and a {G\i} refers to the i th agent’s actionandtheactionsofthegroupGwithiremoved,respectively. Werefertoaasanassignment, with the understanding that it is composed of actions motivated by individual utilities. Let the view of the i th agent in a noncooperative setting to be V(i) = ∪ S i ∈θ i S i . The deviating group with respect to G is: D G (a,˜ a) := {i ∈ G : a i , ˜ a i }. Assuming every player has a unique optimal response to its context, then if a ∗ is a pure-strategy Nash equilibrium, and d(a ∗ ,a) = 1, i= D(a ∗ ,a),weknowthat U i (a ∗ i ;a ∗ {N\i} ) > U i (a i ;a ∗ {N\i} ) 59 and a is not a pure-strategy Nash equilibrium. However, applying the graph (or hypergraph) structureofthegame,capturedbythesets{θ i },wegetexclusivityrelationsbetweenassignments withdistance > 1asfollows. Proposition9 Ifa ∗ isapure-strategyNashequilibrium, ˜ a∈ Asuchthatd(a ∗ ,˜ a) > 1,and∃i∈N suchthat D V(i) (a ∗ ,˜ a)= i,then ˜ aisnotapure-strategyNashequilibrium. Proof. Wehave: U i (˜ a i ;˜ a {N\i} )= X S i ∈θ i U i S i (˜ a i ;˜ a {S i \i} )= X S i ∈θ i U i S i (˜ a i ;a ∗ {S i \i} ) < X S i ∈θ i U i S i (a ∗ i ;a ∗ {S i \i} )= X S i ∈θ i U i S i (a ∗ i ;˜ a {S i \i} )= U i (a ∗ i ;˜ a {N\i} ). Thefirstandlastequalitiesarebydefinition. Thesecondandthirdequalitiesarebecause D V(i) (a ∗ ,˜ a)= i. The inequality is because a ∗ is a pure-strategy Nash equilibrium. The result is that ˜a i is not an optimalresponseto ˜ a {N\i} andthuscannotbeaNashequilibrium. Proposition 9 states that a ∗ and ˜ a cannot both be Nash equilibria if ∃i, D V(i) (a ∗ ,˜ a) = i, which is identical to the condition that prevents two assignments (in a team setting) from being 1-optimal. The commonality is that in both the cooperative and noncooperative settings, agents have optimal actions for any given context, and in both settings there is a notion of relevant context,V(i)\i,whichcanbeasubsetofotheragents{N\i}. Thedifferenceisthattheviewsare generatedindifferentmanners: V(i)=∪ S∈θ:i∩S,∅ S inacooperativesetting,whileV(i)=∪ S i ∈θ i S i in a noncooperative setting. Given the views, we can generate the exclusivity relation set in the 60 same manner,E 1 = S i∈N S F∈P(V(i) C ) [i∪ F]. Given the exclusivity relation set, we can create an exclusivitygraphforanoncooperativesettinginafashionsimilartotheoneinSection4.2. Thus, the bound on the number of Nash equilibria for a noncooperative graphical game is identical to the bound on 1-optimal assignments for a cooperative DCOP, if both share the same exclusivity relationsetE 1 . 4.4 Graph-BasedBounds As seen earlier, the graph structure expands the exclusivity relation set for k-optimality in coop- erative (DCOP) settings and Nash equilibria in noncooperative (graphical-game) settings. This setdefinesexclusivitygraphH k whosemaximumindependentset(MIS)providesaboundforthe numberofk-optimalassignments(oralternatively,forthenumberofNashequilibria). Findingthe size of the MIS is NP-complete in the general case [Alon and Kahale, 1998], so we investigated heuristic techniques to obtain an upper bound on|A q (n,k)|. We observe that any fully-connected subset (clique) of H k can contain at most one k-optimum. Thus, the number of cliques in any clique partitioning of H k also provides an upper bound on |A q (n,k)|, where a partitioning yield- ing fewer cliques will provide a tighter bound. Hence, our first approach is the polynomial-time F CLIQUE clique-partitioning algorithm, shown in [Kim and Shin, 2002] to outperform several competitors. Our second heuristic technique to find a graph-based bound is Algorithm 1, the Symmetric Region Packing bound, β SRP , which uses a packing method analogous to Proposition 7, where each k-optimum claims a region of the space of all possible assignments (the nodes of H k ). Be- cause these regions are constructed to be disjoint and have identical volumes, dividing the space 61 Algorithm1AlgorithmforSymmetricRegionPacking(SRP)bound 1: e E k = S E⊂I:1≤|E|≤k S F∈P(V(E) C ) [E∪F] 2: a=[000] 3: |A k |=1 4: B(a)=∪ E∈ e E k f(a,E) 5: forallb∈ B(a)do 6: B(b)= (∪ E∈ e E k f(b,E))\(a∪ B(a)) 7: H k (b).addNodes(B(b)) 8: forallb 1 ,b 2 ∈ B(b)do 9: if D(b 1 ,b 2 )∈ e E k then 10: H k (b).addEdge(b 1 ,b 2 ) 11: M b =|cliquePartition(H k (b))| 12: |A k |=|A k |+1/(1+ M b ) 13: β SRP = (q I )/|A k | ofallassignmentsbythisvolumeyieldsabound. Figure4.4shows β SRP computedfor1-optima for Example 1. We choose an arbitrary assignment a ∈ A which we assume to be k-optimal (a= [000]inFigure4.4),aroundwhichwewillconstructaregionclaimedbya. Applying the exclusivity relations from e E k , we generate a set B(a) = ∪ E∈ e E k f(a,E) where f(a,E) yields the assignment that is excluded from being k-optimal by a and E. The first two rows of Figure 4.4 show e E 1 and the set B([0 0 0]). Applying the exclusivity relations again for each b ∈ B(a), and discarding assignments already included in a or B(a), we generate a set B(b) = ∪ E∈ e E k f(b,E) which contains all assignments that potentially exclude b from being k- optimal. InFigure4.4,weapply e E 1 tofind B(b)forallb∈ B(a)={[100],[010],[001],[101]} where the grayed out assignments are those discarded for being in{a}∪ B(a). To ensure that the regionthataclaimsisdisjointfromtheregionsclaimedbyotherk-optima, ashouldonlyclaima fractionofeachb∈ B(a). Thiscanbeachievedifashareseachbequallywithallotherk-optima that might exclude b. These additional k-optima are contained within B(b). However, not all b ∈ B(b) can actually be k-optimal as they might exclude each other. If we construct a graph H k (b) with nodes for all b ∈ B(b) and edges formed using e E k , and we find M b , the size of the MIS,thenacansafelyclaim1/(1+M b )ofb. Weagainusecliquepartitioningtosafelyestimate 62 B([000]) ={ [100], [01 0], [00 1], [10 1] {1}, {2}, {3}, {1,3} B([100]) B([001]) B([010]) B([101]) [00 0] [11 0] [10 1] [00 1] [01 1] [11 1] [10 0] [11 0] [00 0] [10 1] [01 1] [00 0] [00 1] [11 1] [10 0] [11 1] (exclusivity subgraph) H 1 (b) {1}, {2}, {3}, {1,3}} } ={ } ={ {1,3} {1} {3} 110 011 111 110 011 M b = 1/(1+M b ) = 1 1 1 1 1/2 1/2 1/2 1/2 111 Figure4.4: Computationof β SRP forExample1 M b . In Figure 4.4, for b = [0 1 0],B([0 1 0]) leads to a three-node, three-edge exclusivity graph H k ([010]). Byaddingthevaluesof1/(1+M b )forallb∈ B(a)(plusoneforitself),weobtainthat a can safely claim a region of size 3, which implies β SRP = b2 3 /3c = 2. Algorithm 1’s runtime is polynomial in the number of possible assignments, which is a comparatively small cost for a boundthatappliestoeverypossibleinstantiationofrewardstoactions. Anexhaustivesearchfor theMISofH k wouldbeexponentialinthisnumber(doublyexponentialinthenumberofagents). 4.5 ExperimentalResults We performed five evaluations in addition to the experiment described at the beginning of the section. Thefirstevaluatestheimpactofk-optimalityforhighervaluesof k. Foreachofthethree DCOPgraphsfromFigure2.2(a-c),Figure4.5(a-c)showskeypropertiesfor1-,2-and3-optima. 63 1-opt. 2-opt. 3-opt. 10 55 175 avg. reward .850 .964 .993 1-opt. 2-opt. 3-opt. 10 55 175 avg. reward .809 .961 .986 1-opt. 2-opt. 3-opt. 9 45 129 avg. reward .832 .977 .982 (a) (b) (c) | ˜ A| | ˜ A| | ˜ A| Figure4.5: 1-optimavs. assignmentsetschosenusingothermetrics Thefirstcolumnofeachtableshows| ˜ A|,thesizeoftheneighborhoodcontainingallassignments within a distance of k from a k-optimal assignment a, and hence of lower reward than a. For example,inthejointpatroldomaindescribedatthebeginningofthesection,Figure4.5(a)shows that, if agents are arranged as in the DCOP graph from Figure 2.2 (a), any 1-optimal joint pa- trol must have a higher reward than at least 10 other joint patrols. We see that as k increases, thek-optimalsetcontainsassignmentsthateachindividuallydominatealargerandlargerneigh- borhood. The second column shows, for each of the three graphs, the average reward of each k-optimal assignment set found over 20 problem instances, generated by assigning rewards to the links from a uniform random distribution. We define the reward of a k-optimal assignment set as the mean reward of all k-optimal assignments that exist for a particular problem instance; each figure in the second column is therefore a mean of means. As k was increased, leading to a larger neighborhood of dominated assignments, the average reward of the k-optimal assignment setsshowasignificantincrease(T-testsshowedtheincreaseinaveragerewardas kincreasedwas significantwithin5%.) 64 β SRP β HSP (a) (b) (c) Figure4.6: β SRP vs. β HSP forDCOPgraphsfromFigure2.2 However as k increases, the number of possible k-optimal assignments decreases, and hence the next four evaluations explore the effectiveness of the different bounds on the number of k- optima. For the three DCOP graphs shown in Figure 2.2, Figure 4.6 provides a concrete demon- stration of the gains in resource allocation due to the tighter bounds made possible with graph- basedanalysis. The xaxisinFigure4.6showsk,andtheyaxisshowsthe β HSP and β SRP bounds on the number of k-optima that can exist. To understand the implications of these results on re- sourceallocation, considerapatrollingproblemwheretheconstraintsbetweenagentsareshown inthe10-agentDCOPgraphfromFigure2.2(a),andallagentsconsumeoneunitoffuelforeach assignment taken. Suppose that k = 2 has been chosen, and so at runtime, the agents will use MGM-2 [Maheswaran et al., 2004a], repeatedly, to find and execute a set of 2-optimal assign- ments. Wemustallocateenoughfueltotheagents a priorisotheycanexecuteuptoallpossible 2-optimalassignments. Figure4.6(a)showsthatif β HSP isused,theagentswouldbeloadedwith 93 units of fuel to ensure enough for all 2-optimal assignments. However, β SRP reveals that only 18 units of fuel are sufficient, a five-fold savings. (For clarity we note that on all three graphs, bothboundsare1whenk= nand2whenn−3≤ k < n.) 65 To systematically investigate the impact of graph structure on bounds, we generated a large number of DCOP graphs of varying size and density. We started with complete binary graphs (all pairs of agents are connected) where each node (agent) had a unique ID. To gradually make each graph sparser, edges were repeatedly removed according to the following two-step process: (1) Find the lowest-ID node that has more than one incident edge. (2) If such a node exists, find the lowest-ID node that shares anedge with it, and removethis edge. Figure 4.7 shows the β HSP and β SRP bounds for k-optima for k ∈ {1,2,3,4} and n ∈ {7,8,9,10}. For each of the 16 plots shown, the y axis shows the bounds and the x-axis shows the number of links removed from the graph according to the above method. While β HSP < β SRP for very dense graphs, β SRP provides significantgainsforthevastmajorityofcases. Forexample,forthegraphwith10agents,and24 linksremoved,andafixedk= 1, β HSP impliesthatwemustequiptheagentswith512resources toensurethatallresourcesarenotexhaustedbeforeall1-optimalactionsareexecuted. However, β SRP indicatesthata15-foldreductionto34resourceswillsu ffice,yieldingasavingsof478due totheuseofgraphstructurewhencomputingbounds. A fourth experiment compared β HSP and β SRP to the bound obtained by applying F CLIQUE , β FCLIQUE toDCOPgraphsfromthepreviousexperiment. SelectedresultsareshowninFigure4.8 forgraphsof8and9agents. While β FCLIQUE ismarginallybetterfork= 1, β SRP hascleargains for k = 4. Identifying the relative effectiveness of various algorithms that exploit our exclusivity relationsetsisclearlyanareaforfuturework. Finally, Figure 4.9 compares the constant-time-computable graph-independent bounds from Section4.1,inparticular,showingtheimprovementofβ MH overmin{β H ,β S ,β P }forselectedodd values of k, given three possible actions for each agent (q = 3). The x-axis shows n, the number of agents and the y-axis shows 100 · (min{β H ,β S ,β P }− β MH )/min{β H ,β S ,β P }. For odd values 66 β HSP β SRP Figure4.7: Comparisonsof β SRP vs. β HSP 67 8 agents 9 agents k=1 8 agents 9 agents k=4 β SRP β FCLIQUE β HSP Figure4.8: Comparisonsof β SRP , β HSP , β FCLIQUE Figure4.9: Improvementof β MH onmin{β H ,β S ,β P } of k > 1, as n increased, β MH provided a tighter bound on the number of k-optima. The most improvementwasfork= 3;asnincreased, β MH gaveabound50%tighterthantheothers. 68 Chapter5 Algorithms This chapter contains a description of existing 1-optimal algorithms, new 2- and 3-optimal algo- rithms, as well as a theoretical analysis of key properties of these algorithms and experimental comparisons. We discuss the existing 1-optimal algorithms here, rather than in the discussion of relatedworkinChapter1becauseournew2-and3-optimalalgorithmsbuildontheframeworks ofthesebasicalgorithms. 5.1 1-OptimalAlgorithms We begin with two algorithms that only consider unilateral actions by agents in a given context. The first is the MGM (Maximum Gain Message) Algorithm which is a modification of DBA (Distributed Breakout Algorithm) [Yokoo and Hirayama, 1996] focused solely on gain message passing. MGM is not a novel algorithm, but simply a name chosen to describe DBA without the changesonconstraintcoststhatDBAusestobreakoutoflocalminima. WenotethatDBAitself cannot be applied in an optimization context, as it would require global knowledge of solution quality (it can be applied in a satisfaction context because any agent encountering a violated constraint would know that the current solution is not a satisfying solution). The second is DSA 69 Algorithm2DSA(myNeighbors,myValue) 1: SendValueMessage(myNeighbors,myValue) 2: currentContext=GetValueMessages(myNeighbors) 3: [gain,newValue]=BestUnilateralGain(currentContext) 4: if Random(0,1) <Thresholdthen 5: myValue=newValue (Distributed Stochastic Algorithm) [Fitzpatrick and Meertens, 2003], which is a homogeneous stationary randomized algorithm. Our analysis will focus on synchronous applications of these algorithms. Let us define a round as the duration to execute one run of a particular algorithm. This run couldinvolvemultiplebroadcastsofmessages. Everytimeamessagingphaseoccursinaround, wewillcountthatasonecycleandcycleswillbeourperformancemetricforspeed,asiscommon in DCOP literature. Let x (n) ∈ X denote the assignments at the beginning of the n-th round. We assume that every algorithm will broadcast its current value to all its neighbors at the beginning of the round taking up one cycle. Once agents are aware of their current contexts, they will go through a process as determined by the specific algorithm to decide which of them will be able to modify their value. Let M (n) ⊆ N denote the set of agents allowed to modify the values in the n-th round. For MGM, each agent broadcasts a gain message to all its neighbors that represents themaximumchangeinitslocalutilityifitisallowedtoactunderthecurrentcontext. Anagent is then allowed to act if its gain message is larger than all the gain messages it receives from all its neighbors (ties can be broken through variable ordering or another method) [Yokoo and Hirayama, 1996]. For DSA, each agent generates a random number from a uniform distribution on [0,1] and acts if that number is less than some threshold p [Fitzpatrick and Meertens, 2003]. We note that MGM has a cost of two cycles per round while DSA only has a cost of one cycle perround. PseudocodeforDSAandMGMisgiveninAlgorithms2and3respectively. 70 Algorithm3MGM(myNeighbors,myValue) 1: SendValueMessage(myNeighbors,myValue) 2: currentContext=GetValueMessages(myNeighbors) 3: [gain,newValue]=BestUnilateralGain(currentContext) 4: SendGainMessage(myNeighbors,gain) 5: neighborGains=ReceiveGainMessages(myNeighbors) 6: if gain >max(neighborGains)then 7: myValue=newValue WeareabletoprovethefollowingmonotonicitypropertyofMGM. Proposition10 WhenapplyingMGM,theglobalutilityU(x (n) )isstrictlyincreasingwithrespect totheround(n)until x (n) ∈ X E . Proof. We assume M (n) , ∅, otherwise we would be at a 1-optimum. When utilizing MGM, if i∈ M (n) andE ij = 1,then j< M (n) . Ifthei-thvariableisallowedtomodifyitsvalueinaparticular round, then its gain is higher than all its neighbors gains. Consequently, all its neighbors would havereceivedagainmessagehigherthantheirownandthus,wouldnotmodifytheirvaluesinthat round. Becausethereexistsatleastoneneighborforeveryvariable,thesetofagentswhocannot modify their values is not empty.We have x (n+1) i , x (n) i ∀i ∈ M (n) and x (n+1) i = x (n) i ∀i < M (n) . Also, u i (x (n+1) i ;x (n) −i ) > u i (x (n) i ;x (n) −i ) ∀i ∈ M (n) , otherwise the i-th player’s gain message would havebeenzero. Lookingattheglobalutility,wehave: 71 U x (n+1) = X i,j:E ij =1 U ij x (n+1) i ,x (n+1) j = X i,j:i∈M (n) , j∈M (n) ,E ij =1 U ij x (n+1) i ,x (n+1) j + X i,j:i∈M (n) , j<M (n) ,E ij =1 U ij x (n+1) i ,x (n+1) j + X i,j:i<M (n) , j∈M (n) ,E ij =1 U ij x (n+1) i ,x (n+1) j + X i,j:i<M (n) , j<M (n) ,E ij =1 U ij x (n+1) i ,x (n+1) j = X i,j:i∈M (n) , j<M (n) ,E ij =1 U ij x (n+1) i ,x (n) j + X i,j:i<M (n) , j∈M (n) ,E ij =1 U ij x (n) i ,x (n+1) j + X i,j:i<M (n) , j<M (n) ,E ij =1 U ij x (n) i ,x (n) j = X i∈M (n) u i x (n+1) i ;x (n) −i + X j∈M (n) u j x (n+1) j ;x (n) −j + X i,j:i<M (n) , j<M (n) ,E ij =1 U ij x (n) i ,x (n) j > X i∈M (n) u i x (n) i ;x (n) −i + X j∈M (n) u j x (n) j ;x (n) −j + X i,j:i<M (n) , j<M (n) ,E ij =1 U ij x (n) i ,x (n) j = X i,j:i∈M (n) , j<M (n) ,E ij =1 U ij x (n) i ,x (n) j + X i,j:i<M (n) , j∈M (n) ,E ij =1 U ij x (n) i ,x (n) j + X i,j:i<M (n) , j<M (n) ,E ij =1 U ij x (n) i ,x (n) j = U x (n) . The second equality is due to a partition of the summation indexes. The third equality utilizes the properties that there are no neighbors in M (n) and that the values for variables corresponding to indexes not in M (n) in the (n+ 1)-th round are identical to the values in the n-th round. The strictinequalityoccursbecauseagentsin M (n) mustbemakinglocalutilitygains. Theremaining 72 equalities are true by definition. Thus, MGM yields monotonically increasing global utility until equilibrium. Why is monotonicity important? In anytime domains where communication may be halted arbitrarilyandexistingstrategiesmustbeexecuted,randomizedalgorithmsriskbeingterminated athighlyundesirableassignments. Givenastartingconditionwithaminimumacceptableglobal utility, monotonic algorithms guarantee lower bounds on performance in anytime environments. Considerthefollowingexample. Example4 The Traffic Light Game. Consider two variables, both of which can take on the valuesred orgreen,withaconstraintthattakesonutilitiesasfollows: • U(red,red)= 0. • U(red,green)= U(green,red)= 1. • U(green,green)=−1000. Turning this DCOP into a game would require the agent for each variable to take the utility of the single constraint as its local utility. If (red,red) is the initial condition, each agent would choose to alter its value to green if given the opportunity to move. If both agents are allowed to alter their value in the same round, we would end up in the adverse state (green,green). When usingDSA, thereisalwaysapositive probabilityforanytime horizonthat(green,green) willbe theresultingassignment. In domains such as independent path planning of trajectories for UAVs or rovers, in envi- ronments where communication channels are unstable, bad assignments could lead to crashes whose costs preclude the use of methods without guarantees. This is illustrated in Figure 5.1 73 Figure5.1: SampleTrajectoriesofMGMandDSAforaHigh-StakesScenario which displays sample trajectories for MGM and DSA with identical starting conditions for a high-stakes scenario described in Section 5.4. The performance of both MGM and DSA with respecttovariousgraphcoloringproblemsareinvestigatedanddiscussedinSection5.4. 5.2 2-OptimalAlgorithms When applying 1-optimal algorithms, the evolution of the assignments will terminate at a 1- optimum within the set X E described earlier. One method to improve the solution quality is for agents to coordinate actions with their neighbors. This allows the evolution to follow a richer space of trajectories and alters the set of terminal assignments. In this section we introduce two 2-optimal algorithms, where agents can coordinate actions with one other agent. Let us refer to thesetofterminalstatesoftheclassof2-optimalalgorithmsas X 2E ,i.e. neitheraunilateralnora 74 bilateralmodificationofvalueswillincreasesumofallconstraintutilitiesconnectedtotheacting agent(s)if x∈ X 2E . Wenowintroducetwoalgorithmsthatallowforcoordinationwhilemaintainingtheunderly- ingdistributeddecisionmakingprocessandthesameconstraintgraph: MGM-2(MaximumGain Message-2)andSCA-2(StochasticCoordinationAlgorithm-2). Both MGM-2 and SCA-2 begin a round with agents broadcasting their current values. The first step in both algorithms is to decide which subset of agents are allowed to make offers. We resolve this by randomization, as each agent generates a random number uniformly from [0,1] and considers themselves to be an offerer if the random number is below a threshold q. If an agent is an offerer, it cannot accept offers from other agents. All agents who are not offerers are considered to be receivers. Each offerer will choose a neighbor at random (uniformly) and senditanoffermessagewhichconsistsofallcoordinatedmovesbetweentheoffererandreceiver that will yield a gain in local utility to the offerer under the current context. The offer message will contain both the suggested values for each player and the offerer’s local utility gain for each valuepair. Eachreceiverwillthencalculatetheglobalutilitygainforeachvaluepairintheoffer message by adding the offerer’s local utility gain to its own utility change under the new context and (very importantly) subtracting the difference in the link between the two so it is not counted twice. If the maximum global gain over all offered value pairs is positive, the receiver will send anacceptmessagetotheoffererwiththeappropriatevaluepairandboththeoffererandreceiver are considered to be committed. Otherwise, it sends a reject message to the offerer, and neither agentiscommitted. At this point, the algorithms diverge. For SCA-2, any agent who is not committed and can make a local utility gain with a unilateral move generates a random number uniformly from 75 [0,1] and considers themselves to be active if the number is under a threshold p. At the end of the round, all committed agents change their values to the committed offer and all active agents change their values according to their unilateral best response. Thus, SCA-2 requires three cycles (value, offer, accept/reject) per round. In MGM-2 (after the o ffers and replies are settled), each agent sends a gain message to all its neighbors. Uncommitted agents send their best local utility gain for a unilateral move. Committed agents send the global gain for their coordinated move. Uncommitted agents follow the same procedure as in MGM, where they modify their value if their gain message was larger than all the gain messages they received. Committed agents send their partners a confirm message if all the gain messages they received werelessthanthecalculatedglobalgainforthecoordinatedmoveandsendadeconfirmmessage, otherwise. A committed agent will only modify its value if it receives a go message from its partner. MGM-2 is outlined in Algorithm 4. We note that MGM-2 requires five cycles (value, offer, accept/reject, gain, confirm/deconfirm) per round, and has less concurrency than SCA-2 (since no two neighboring groups will ever move together). Given the excess cost of MGM-2, whywouldonechoosetoapplyit? WecanshowthatMGM-2ismonotonicinglobalutility. Proposition11 When applying MGM-2, the global utility U(x (n) ) is strictly increasing with re- specttotheround(n)until x (n) ∈ X 2E . Proof. Webeginbyintroducingsomenotation. Attheendofthen-thround,let C (n) ⊂N denote the set of agents who are committed, M (n) ⊂ N denote the set of uncommitted agents who are active, and S (n) ≡ {C (n) ∪ M (n) } C ⊂ N denote the uncommitted agents who are inactive. Let 76 p(i) ∈ C (n) denote the partner of a committed agent i ∈ C (n) . The global utility can then be expressedas: U x (n+1) = X i,j:E ij =1 U ij x (n+1) i ,x (n+1) j = X i,j:i∈C (n) , j∈C (n) ,E ij =1 U ij x (n+1) i ,x (n+1) j + X i,j:i∈C (n) , j∈S (n) ,E ij =1 U ij x (n+1) i ,x (n+1) j + X i,j:i∈S (n) , j∈C (n) ,E ij =1 U ij x (n+1) i ,x (n+1) j + X i,j:i∈S (n) , j∈S (n) ,E ij =1 U ij x (n+1) i ,x (n+1) j + X i,j:i∈M (n) , j∈S (n) ,E ij =1 U ij x (n+1) i ,x (n+1) j + X i,j:i∈S (n) , j∈M (n) ,E ij =1 U ij x (n+1) i ,x (n+1) j = X i∈C (n) U ip(i) x (n+1) i ,x (n+1) p(i) + X i∈C (n) X j∈N i \{p(i)} U ij x (n+1) i ,x (n+1) j + X j∈C (n) X i∈N j \{p(j)} U ij x (n+1) i ,x (n+1) j + + X j∈C (n) U jp(j) x (n+1) p(j) ,x (n+1) j − X j∈C (n) U jp(j) x (n+1) p(j) ,x (n+1) j + X i,j:i∈S (n) , j∈S (n) ,E ij =1 U ij x (n+1) i ,x (n+1) j + + X i∈M (n) u i x (n+1) i ,x (n+1) j + X j∈M (n) u j x (n+1) i ,x (n+1) j = X i∈C (n) U ip(i) x (n+1) i ,x (n+1) p(i) + X i∈C (n) X j∈N i \{p(i)} U ij x (n+1) i ,x (n) j + X j∈C (n) X i∈N j \{p(j)} U ij x (n+1) i ,x (n) j + + X j∈C (n) U jp(j) x (n+1) p(j) ,x (n+1) j − X j∈C (n) U jp(j) x (n+1) p(j) ,x (n+1) j + X i,j:i∈S (n) , j∈S (n) ,E ij =1 U ij x (n) i ,x (n) j + + X i∈M (n) u i x (n+1) i ,x (n) j + X j∈M (n) u j x (n) i ,x (n+1) j = X i∈C (n) u i (x (n+1) i ;μ −i (x (n+1) p(i) ,x (n) −ip(i) )) + X j∈C (n) u j (x (n+1) j ;μ −j (x (n+1) p(j) ,x (n) −jp(j) )) − X j∈C (n) U jp(j) x (n+1) p(j) ,x (n+1) j + X i,j:i∈S (n) , j∈S (n) ,E ij =1 U ij x (n) i ,x (n) j + X i∈M (n) u i x (n+1) i ,x (n) j + X j∈M (n) u j x (n) i ,x (n+1) j > X i∈C (n) u i (x (n) i ;μ −i (x (n) p(i) ,x (n) −ip(i) )) + X j∈C (n) u j (x (n) j ;μ −j (x (n) p(j) ,x (n) −jp(j) )) − X j∈C (n) U jp(j) x (n) p(j) ,x (n) j + X i,j:i∈S (n) , j∈S (n) ,E ij =1 U ij x (n) i ,x (n) j + X i∈M (n) u i x (n) i ,x (n) j + X j∈M (n) u j x (n) i ,x (n) j = U x (n) . 77 The first equality is by definition. The second equality partitions the indexes into update class, eliminatingcrossindexesof M (n) withanythingotherthanS (n) . Inthethirdequality,wesimplify the summations involving committed agents using expressions for partners and neighbors, we insertazerovalueterminparenthesis,andtransformthesummationsinvolvingactiveagentsinto localutilities. Inthefourthequality,wemodifytheroundindexforthoseagentswhoareinactive. Inthefifthequality,wetransformthesummationsinvolvingcommittedagentsintolocalutilities. The inequality is due to the fact that the global utility on the links of the committed partners and the local utility of the active agents must increase due to the positive gain messages. The key is that by setting j = p(i) in the second and third summations, we recover the gain message of the committed teams. Note the subtraction of the utility gain on the link between partners to avoid double counting. The final equality can be achieved by reversing the transformation to yield the globalutilityatthepreviousround. Thus,MGM-2yieldsmonotonicallyincreasingglobalutility untilequilibriumisreached. Example5 Meeting Scheduling. Consider two agents trying to schedule a meeting at either 7:00 AM or 1:00 PM with the constraint utility as follows: U(7,7) = 1,U(7,1) = U(1,7) = −100,U(1,1)= 10. Iftheagentsstartedat(7,7),any1-coordinatedalgorithmwouldnotbeable toreachtheglobaloptimum,while2-coordinatedalgorithmswould. It is not true that a 2-optimal algorithm will yield a solution with higher quality than a 1- optimal algorithm in all situations. In fact, there are DCOPs and initial conditions for which a given 1-optimal algorithm will yield a better solution than a given 2-optimal algorithm. The complexityliesinthatwecannotpredictexactlywhattrajectorytheevolutionwillfollow. Given certain initial conditions (beginning the algorithm at a 1-optimum), 2-optimal algorithms will 78 Algorithm4MGM2(myNeighbors,myConstraints,myValue) Initializationstep: (Sendoutvaluemessages) 1: forallneighbor∈myNeighborsdo 2: sendMsg(neighbor, <VALUE,myValue>) Cycle1: (Someagentsbecomeofferers. Offererssendoutoffers) 1: offerer? =FALSE;confirmed? =FALSE 2: forallneighbor∈myNeighborsdo 3: myContext.add(receiveMsg(neighbor, <VALUE,neighborValue>)) 4: if Random(0,1) <offererThresholdthen 5: offerer? =TRUE; 6: partner=pickRandom(myNeighbors) 7: sendMsg(partner, <OFFER,allCoordinatedMoves>) 8: [myIndividualMove,myIndividualGain]=computeBestMove(myConstraints,myContext) Cycle2: (Agentsrespondtoofferersandjointheirgroups) 1: forallneighbor∈myNeighborsdo 2: myOffers.add(receiveMsg(neighbor, <OFFER,allCoordinatedMoves>)) 3: if NOTempty(myOffers)then 4: [partner,ourMove,ourGain]=computeBestMove(myConstraints,myContext,myOffers) 5: if offerer? ==FALSEthen 6: committed? =TRUE 7: sendMsg(partner, <ACCEPT,TRUE,ourMove.partnersMove,ourGain>) 8: forallneighbor∈myOffers\partnerdo 9: sendMsg(neighbor, <ACCEPT,FALSE,null,null>) Cycle3: (Groupmemberssendgainmessage(forgroupmove)toneighborsoutsidethegroup) 1: receiveMsg(partner, <ACCEPT,partnerCommitted?,ourMove.myMove,ourGain>) 2: if offerer?then 3: if partnerCommitted? ==TRUEthen 4: committed? =TRUE 5: if committed?then 6: myGain=ourGain 7: else 8: myGain=myIndividualGain 9: forallneighborinmyNeighbors\partnerdo 10: sendMsg(neighbor, <GAIN,myGain>) Cycle4: (Basedonneighbors’gains,agentsconfirmordeconfirmwithpartner) 1: forallneighbor∈myNeighborsdo 2: neighborsGains.add(receiveMsg(neighbor, <GAIN,neighborsGain>)) 3: if myGain >max(neighborsGains)then 4: confirmed? =TRUE//tiesbrokenusingnodeordering 5: if committed?then 6: sendMsg(partner, <CONFIRM,confirmed?>) Cycle5: (Confirmedagentsmoveandsendoutvaluemessages) 1: receiveMsg(partner, <CONFIRM,partnerConfirmed?>) 2: if committedthen 3: if confirmed?then 4: if partnerConfirmed?then 5: myValue=ourMove.myValue 6: else 7: myValue=myIndividualMove 8: elseif confirmed?then 9: myValue=myIndividualMove 10: forallneighbor∈myNeighborsdo 11: sendMsg(neighbor, <VALUE,myValue>) 79 always perform at least as well as 1-optimal algorithms (or better) as outlined in the following corollary. For other initial conditions, the experiments shown in Section 5.4 show that for the domainsinvestigated,2-optimalalgorithmsdidoutperform1-optimalalgorithmsonaveragewith respecttosolutionquality. Corollary1 For every initial DCOP assignment x 0 ∈ X E \ X 2E , MGM-2 will yield a better solutionthaneitherMGMorDSA. Proof. Since x 0 ∈ X E , neither MGM nor DSA will move and the solution quality will be that obtainedattheassignment x 0 . However,since x 0 < X 2E ,MGM-2willcontinuetoevolvefrom x 0 until it reaches an assignment in X 2E . Because MGM-2 is monotonic in global utility, whatever solutioninreachesin X 2E willhaveahigherglobalutilitythan x 0 . Thus, MGM-2 dominates DSA and MGM for initial conditions in X E \ X 2E and is identical to DSA and MGM on X 2E (as neither algorithm will evolve from there). The unknown is the behavioronX\X E . Itisdifficulttoanalyzethisspacebecauseonecannotpinpointthetrajectories duetotheprobabilisticnatureoftheirevolution. Ifweassumethatiterationsbeginningin X\X E are taken to points in X E in a relatively uniform manner on average with all algorithms, then we might surmise that the dominance of MGM-2 should yield a better solution quality. The performance of both MGM-2 and SCA-2 with respect to a various graph coloring problems are investigatedanddiscussedinSection5.4. 5.3 3-OptimalAlgorithms Analogous algorithms for 3-optimality are detailed in this section. MGM-3 is presented in detail in Algorithm 5; SCA-3 can be implemented through small adjustments to MGM-3. The 80 Algorithm5MGM3(myNeighbors,myConstraints,myValue) 1: forallneighbor∈myNeighborsdo 2: sendMsg(neighbor, <VALUE,myValue>) Cycle1: (Someagentsbecomeofferers. Offererssendoutoffers) 1: offerer? =FALSE;committed? =FALSE;confirmed? =FALSE 2: forallneighbor∈myNeighborsdo 3: myContext.add(receiveMsg(neighbor, <VALUE,neighborValue>)) 4: if Random(0,1) <offererThresholdthen 5: offerer? =TRUE;committed? =TRUE 6: partner1=pickRandom(myNeighbors);partner2=pickRandom(myNeighbors\partner1) 7: sendMsg(partner1, <OFFER>);sendMsg(partner2, <OFFER>) Cycle2: (Agentsrespondtoofferersandjointheirgroups) 1: forallneighbor∈myNeighborsdo 2: myOffers.add(receiveMsg(neighbor, <OFFER>)) 3: if NOTofferer?then 4: if empty(myOffers)then 5: offerer? =TRUE 6: else 7: bestOffer=selectRandom(myOffers);committed? =TRUE 8: sendMsg(bestOffer.neighbor, <ACCEPT,TRUE,myConstraints,myContext>) 9: forallneighbor∈myOffers\bestOffer.neighbordo 10: sendMsg(neighbor, <ACCEPT,FALSE>); Cycle3: (Offerercomputesbestmoveforgroup,sendsittogroupmembers) 1: forallneighbor∈myNeighborsdo 2: myAccepts.add(receiveMsg(neighbor, <ACCEPT,TRUE,neighborsConstraints,neighborsContext>) 3: [ourMove,ourGain]=computeBestMove(myConstraints,myContext,myAccepts); 4: forallneighbor∈myAcceptsdo 5: sendMsg(neighbor, <MOVE,ourMove,ourGain>);myGroup.add(neighbor) Cycle4: (Groupmemberssendgainmessage(forgroupmove)toneighborsoutsidethegroup) 1: forallneighbor∈myNeighborsdo 2: myGroup.add(receiveMsg(neighbor, <MOVE,ourMove,ourGain>)) 3: forallneighborinmyNeighbors\myGroupdo 4: sendMsg(neighbor, <GAIN,ourGain>) Cycle5: (Basedonneighbors’gains,agentscommitordecommittogroupmove,notifyofferers) 1: forallneighbor∈myNeighborsdo 2: neighborsGains.add(receiveMsg(neighbor, <GAIN,neighborsGain>)) 3: if ourGain >max(neighborsGains)then 4: committed? =TRUE//tiesbrokenusingnodeordering 5: if NOTofferer?then 6: sendMsg(myGroup.offerer, <COMMIT,committed?>) Cycle6: (Offererssendfinalconfirmation(ornon-confirmation)toreceivers) 1: forallneighbor∈myGroupdo 2: committedGroup=receiveMsg(neighbor, <COMMIT,TRUE>) 3: if committedGroup=myGroupthen 4: confirmed? =TRUE 5: forallneighbor∈myGroupdo 6: sendMsg(neighbor, <CONFIRM,confirmed?>) Cycle7: (Ifwholegroupisconfirmed,allgroupmembersmove,otherwisedon’tmove) 1: confirmed? =receiveMsg(myGroup.offerer, <CONFIRM,confirmed?>) 2: if confirmed? =TRUEthen 3: myValue=ourMove.myValue 4: forallneighbor∈myNeighborsdo 5: sendMsg(neighbor, <VALUE,myValue>) 81 main complication with moving to 3-optimality is the following: With 2-optimal algorithms, the offerer could simply send all information the receiver needed to compute the optimal joint move in the offer message itself. With groups of three agents, this is no longer possible, and thus two moremessagecyclesareneeded. Just as in 1- and 2-optimal versions, we begin with all agents broadcasting value messages to their neighbors in an initialization step. Then, in the first cycle, as in MGM-2 and SCA-2, each agent generates a random number uniformly from [0,1] and considers themselves to be an offerer if the random number is below a threshold q and a receiver otherwise. Now, instead of a single neighbor, each offerer will choose two neighbors at random (uniformly) and send offer messages to both. These offer messages will not contain any suggested move; rather, they are simple invitations to join the offerer’s group. In the second cycle, upon receiving a set of offer messages from its neighbors, all receivers will accept a randomly chosen offer from this set and decline the others. The receiver will return an accept message to the chosen offerer that includes all of the constraints on the receiver as well as the context it faces (the current values of its own neighbors). Offererswilldeclinealloffersfromtheset. Inthethirdcycle,theoffererwillreceive theseaccept(ordecline)messagesandwillcalculatethebestpossiblemoveforthegroupformed by itself and the agents that accepted its offers, assuming all other agents keep the same values. It will communicate this joint move, as well as the potential gain in reward by taking this move, tothereceiversinthegroup. Inthefourthcycle,thereceiverswillreceivethemoveandthegain, and now, just as in the 1- and 2-optimal versions of MGM, all agents in the group (o fferer and receivers) will send this gain in a gain message to their neighbors outside the group. Then in the fifth cycle, as the agents receive similar gain messages from their neighbors, they consider themselvescommittediftheirgainsaregreaterthanallthosereportedbytheirneighborsoutside 82 the group. Receivers send commit messages to the offerer to report this status. In the sixth cycle, the offerer checks the commit messages from the receivers. If all agents in the group are committed, the offerer sends a confirm message to the receivers; otherwise a deconfirm message is sent. Then, in the seventh cycle, offerers who have just sent a confirm message, as well as receivers who have just received one, change their values to the values specified in the joint move,andsendoutnewvaluemessagestoalltheirneighborsasthealgorithmrepeats. We omit the formal proof of monotonicity for MGM-3, but it is straightforward to see that, justasinMGM-2,thegainmessagecycleandtheconfirmationprocesspreventtwoneighboring agents not in the same group will ever move at the same time. The main difference is that now, since we have groups of three agents, the confirmation process requires two cycles (commit and confirm) where in MGM-2 it only required one (confirm). An additional cycle is also needed at the beginning since, while the offerer still initiates the formation of the group, it is now the offerer who centralizes information from the two receivers rather than the receiver centralizing information from a single offerer, for a total of seven cycles. We implemented SCA-3 by simply replacing the gain, commit, and confirm cycles of MGM-3 (cycles 4, 5, and 6) with a stochastic step added to cycle 3 in which the offerer decides whether or not the group will move at all by choosingarandomnumber. SCA-3thusrequiresfourmessagecycles. 5.4 Experiments We performed two groups of experiments - one for “medium-sized” DCOPs of forty variables (thelargestproblemsizeconsideredin[Modietal.,2005])andoneforDCOPsof1000variables, largerthananyproblemsconsideredinpapersoncompleteDCOPalgorithms. 83 5.4.1 Medium-SizedDCOPs Weconsideredthreedifferentdomainsforourfirstgroupofexperiments. Thefirstwasastandard graph-coloring scenario, in which a cost of one is incurred if two neighboring agents choose the same color, and no cost is incurred otherwise. Real-world problems involving sensor networks, in which it may be undesirable for neighboring sensors to be observing the same location, are commonly mapped to this type of graph-coloring scenario. The second was a fully randomized DCOP,inwhicheverycombinationofvaluesonaconstraintbetweentwoneighboringagentswas assigned a random reward chosen uniformly from the set{1,...,10}. In both of these domains, weconsideredtenrandomlygeneratedgraphswithfortyvariables,threevaluespervariable,and 120 constraints. For each graph, we ran 100 runs of each algorithm, with a randomized start state. Thethirddomainwaschosentosimulateahigh-stakesscenario,inwhichmiscoordination is very costly. In this enviroment, agents are negotiating over the use of resources. If two agents decide to use the same resource, the result could be catastrophic. An example of such a scenario might be a set of unmanned aerial vehicles (UAVs) negotiating over sections of airspace, or rovers negotiating over sections of terrain. In this domain, if two neighboring agents take the same value, there is a large penalty incurred (-1000). If two neighboring agents take di fferent values, they obtain a reward chosen uniformly from {10,...,100}. Because miscoordination is costly, we introduced a safe (zero) value for all agents. An agent with this value is not using any resource. If two neighboring agents choose zero as their values, neither a reward nor a penalty is obtained. In such a high-stakes scenario, a randomized start state would be a poor choice, especially for an anytime algorithm, as it would likely contain many of the large penalties. So, rather than using randomized start states, all agents started with the zero value. However, if all 84 agents start at zero, then DSA and MGM would be useless, since no agent would ever want to movealone. So,arewardofonewasintroducedforthecasewhereoneagenthasthezerovalue, and its neighbor has a nonzero value. In the high-stakes domain, we also performed 100 runs oneachof10randomlygeneratedgraphswithfortyvariablesand120constraints,butduetothe additionofthesafevalue,theagentsintheseexperimentshadfourpossiblevalues. Foreachofthethreedomains,weran: MGM,DSAwith p∈{0.1,0.3,0.5,0.7,0.9},MGM-2 with q ∈ {0.1,0.3,0.5,0.7,0.9} and SCA-2 with all combinations of the above values of p and q (where q is the probability of being an offerer and p is the probability of an uncommited agent acting). We also ran MGM-3 with q = 0.5. Each table shows an average of 100 runs on ten randomlygeneratedexampleswithsomeselectedvaluesof pandq. Weusedcommunicationcyclesasthemetricforourexperiments,asiscommonintheDCOP literature,sinceitisassumedthatcommunicationisthespeedbottleneck. However,wenotethat, as we move from 1-optimal to 2-optimal algorithms, the computational cost each agent i must incurcanincreasebyafactorofasmuchas P j |X j |astheagentcannowconsiderthecombination of its and all its neighbors’ moves. However, in the 2-optimal algorithms we present, each agent randomly picks a single neighbor j to coordinate with, and so its computation is increased by a factorofonly|X j |. Aswemoveto3-optimalalgorithms,thecostincreasesfurther,aseachagent considers the combination of its own move and two of its neighbors’ moves; analogously, each agent randomly picks two neighbors to coordinate with. Although each run was for 256 cycles, mostofthegraphsdisplayacroppedview,toshowtheimportantphenomena. Figure 5.2 shows a comparison between MGM and DSA for several values of p. For graph coloring, MGM is dominated, first by DSA with p = 0.5, and then by DSA with p = 0.9. For therandomizedDCOP,MGMiscompletelydominatedbyDSAwith p= 0.9. MGMdoesbetter 85 Figure5.2: ComparisonoftheperformanceofMGMandDSA 86 in the high-stakes scenario as all DSA algorithms have a negative solution quality (not shown in the graph) for the first few cycles. This happens because at the beginning of a run, almost every agent will want to move. As the value of p increases, more agents act simultaneously, and thus, manypairsofneighborsarechoosingthesamevalue,causinglargepenalties. Thus,theseresults show that the nature of the constraint utility function makes a fundamental difference in which algorithm dominates. Results from the high-stakes scenario contrast with [Zhang et al., 2003] and show that DSA is not necessarily the algorithm of choice when compared with DBA across alldomains. Figure 5.3 shows a comparison between MGM and MGM-2, for several values of q. In all domains, MGM-2 eventually reaches a higher solution quality after about thirty cycles, despite the algorithms’ initial slowness. The stair-like shape of the MGM-2 curves is due to the fact that agents are changing values only once out of every five cycles, due to the cycles used in communication. Ofthethreevaluesofqshowninthegraphs,MGM-2risesfastestwhen q= 0.5, but eventually reaches its highest average solution quality when q = 0.9, for each of the three domains. We note that, in the high-stakes domain, the solution quality is positive at every cycle, duetothemonotonicpropertyofbothMGMandMGM-2. Thus,theseexperimentsclearlyverify the monotonicity of MGM and MGM-2, and also show that MGM-2 reaches a higher solution qualityasexpected. Figure 5.4 shows a comparison between DSA and SCA-2, for p = 0.9 and several values of q. DSA starts out faster, but SCA-2 eventually overtakes it. The result of the e ffect of q on SCA-2 appears inconclusive. Although SCA-2 with q = 0.9 does not achieve a solution quality above zero for the first 65 cycles, it eventually achieves a solution quality comparable to SCA withlowervaluesofq. 87 Figure5.3: ComparisonoftheperformanceofMGMandMGM-2 88 Figure5.4: ComparisonoftheperformanceofDSAandSCA-2 89 Figure 5.5 compares MGM, MGM-2, and MGM-3 for q = 0.5. In all three cases, MGM-3 increasesattheslowestrate,buteventuallyovertakesMGM-2. Inthegraphcoloringandrandom DCOPdomains,thetimewindowwhereMGM-2providesthehighestsolutionisverysmall. Figure 5.6 compares DSA, SCA-2, and SCA-3 for p = 0.9 and q = 0.5. The performance gainsforincreasingk aresimilartotheMGMcase,exceptforthegraph-coloringdomain,where the gains are smaller moving from k = 2 to k = 3, likely because both SCA-2 and SCA-3 are approachingtheoptimalsolutioninalmosteverycase. Figure5.7containsagraphandapie-chartforeachofthethreedomains,providingadeeper justification for the improved solution quality of MGM-2 and SCA-2. The graph shows a prob- ability mass function (PMF) of solution quality for three sets of assignments: the set of all as- signments in the DCOP (X), the set of 1-optima ( X E ), and the set of 2-optima ( X 2E ). Here we considered scenarios with twelve variables, 36 constraints, and three values per variable (four forthehigh-stakesscenariotoincludethezerovalue)inordertoinvestigatetractablyexplorable domains. In all three domains, the solution quality of the set of 2-optima (the set of equilibria to whichMGM-2andSCA-2mustconverge)is, onaverage, higherthanthesetof1-optima. Inthe high-stakesDCOP,99.5%ofassignmentshaveavaluelessthanzero(notshownonthegraph.) Thepiechartshowstheproportionofthenumberof2-optimatothenumberof1-optimathat are not also 2-optima. Notice that in the case of the randomized DCOP, most 1-optima are also 2-optima. Therefore, there is very little di fference between the PMFs of the two sets of k-optima on the corresponding graph. We also note that the phase transition mentioned in [Zhang et al., 2003] (where DSA’s performance degrades for p > 0.8) is not replicated in our results. In fact, oursolutionqualitygetsbetteras p > 0.8,thoughwithslowerconvergence. 90 Figure5.5: ComparisonoftheperformanceofMGM,MGM-2,andMGM-3 91 Figure5.6: ComparisonoftheperformanceofDSA,SCA-2,andSCA-3 92 1-optima 2-optima that are not 1-optima all assignments 1-optima 2-optima Figure5.7: DistributionsofSolutionQualityforX,X E ,X 2E andCardinalityofX 2E asaproportion of X E 93 Figure5.8: ResultsforMGMandMGM-3forlargeDCOPs: GraphColoring Figure5.9: ResultsforMGMandMGM-3forlargeDCOPs: RandomRewards 5.4.2 LargeDCOPs For our second group of experiments, we considered DCOPs of 1000 variables using the graph- coloringandrandomDCOPdomains. Themainpurposeoftheseexperimentswastodemonstrate thatthek-optimalalgorithmsquicklyconvergetoasolutionevenforverylargeproblemssuchas these. ArandomDCOPgraphwasgeneratedforeachdomain,forlinkdensitiesrangingfrom1to5, andresultsforMGMandMGM-3areshowninFigures5.8and5.9. Theresultsshownrepresent anaverageof100runs(fromarandominitialsetofvalues)foreachDCOP.Thesolutionquality shown in the tables is the total reward in the DCOP divided by the number of constraints in the DCOP (so that the solution quality ranges from 0 to 1 in all cases for ease of comparison). Note that a solution quality of 1.000 does not represent the optimal solution to the DCOP; rather it represents a “theoretical” optimum, where all constraint rewards are 1 (i.e. no constraints are violatedinthegraph-coloringdomain). 94 Chapter6 k-OptimalityandTeamFormation Toillustratethebenefitsofapplyingk-optimalitytoreal-worldproblems,Iinvestigatedtheprob- lem of human team formation, in which a team of people must be assembled from a large pool ofpossiblecandidates,inordertoaccomplishaspecifictask. Theseproblemscanbeformulated as centralized COPs or as DCOPs, where each candidate is represented by a distributed agent. Eitherway,oncetheproblemisformulated, severaloptionsmustbegeneratedrapidlytopresent to a human supervisor to make the final team selection. Finding k-optimal teams was a suitable choice for this domain, because it ensured that the set of choices would be diverse, and each possibleteamwouldbeoptimalwithinaregionofsimilarteamsthatcouldbeformed. Realdata fromthreeseparatedomainswasused;inallthreecases,theproblemformulationwasessentially thesame. 6.1 ProblemFormulation Each problem consisted of a set of candidates for the team and a set of roles on the team that must be filled. In each problem, there were from 40-70 candidates for a team of six to seven roles. With an agent for each candidate, this could represent a problem with as many as 8 70 = 95 1.65×10 63 possibleassignmentsofcandidatestoroles(includinganadditional“null”role,where acandidateisnotassignedtotheteam)andfindingagloballyoptimalteamisNP-hardingeneral [Modi et al., 2005]. In addition, each problem consisted of various hard and soft constraints on the assignment of candidates to positions that are further detailed in the domain sections. To express these problems as DCOPs, each candidate was treated as an agent, and the agent’s domain consisted of the set of roles that the candidate is allowed to take on plus a “null” value, representing the candidate’s exclusion from the team. Not every candidate is allowed to take on every role. Hard constraints existed between agents whose domains shared a value; such that if both agents chose the same value, the constraint was considered violated. Other constraints are detailedinthedomainsections. 6.2 SampleDomain1: TaskforceforHomelandSecurityExercise The first set of data comes from the Tabletop Exercise (TTX) planning phase of the large-scale, multi-agency,AWI-07Nanti-terroristexercisebeingconductedintheSeattle-Tacomaareabythe US Navy Center for Asymmetric Warfare. The AWI-07N exercise is planned as a week-long live simulation exercise in July 2007 during which forty participants from about thirty defense, lawenforcement,securityandoperationalagencies,organizationsandcompanieswillrespondto and recover from a series of terrorist attacks in the Seattle-Tacoma port and maritime areas by formingacentralcommandorganizationandavarietyofspecial-purposeteamsoperatingunder thatcommand. 96 6.2.1 TeamRoles The focus of TTX was on recovery operations after a series of varied terrorist attacks in the Seattle and Tacoma port and maritime areas (the Full Scale Exercise will include both imme- diate response and recovery.) Accordingly, a six-person Command Team was used, having the followingroles: • OfficerinCharge(OIC) • AssistantOfficerinCharge(AIC) • IntelligenceCoordinator(IC) • OperationsCoordinator(OC) • Inter-AgencyCoordinator(IAC) • InformationDisseminationCoordinator(IDC) TheCommandTeam 6.2.2 DataSources A Qualification Check-In Form was filled out by 40 candidates from 30 di fferent organizations (DoD, Security, Pierce County, Tacoma City, etc.). The Qualification Check-In Form had three parts,followingarequestforname,organizationandjobtitle;thesepartswere: • Experience. Participants were asked to rate their experience in each of 9 technical areas – judged important to port and marine terrorist attacks and recovery – on a scale of 1 (relatively low) to 5 (relatively high) by checking the appropriate box and use a broad definitionofeacharea. 97 • Team Processes. Participants were asked to answer 19 questions designed to assess their skillsrelatedtoworkinginateam. • Existing Connections. Participants were asked to list the people attending this Tabletop Exercise whom they had communicated with the most during the past several weeks by anymeans(inperson,phone,e-mail,instantmessaging,etc.),andprovidetheapproximate numberofcommunicationstothatpersonandfromthatperson. For convenience in modeling, we divided the 30 organizations into 8 types. Job Titles were categorized into levels from 1 (lowest) to 5 (top command level). Candidates were assigned ID numbers for use in the subsequent analyses. The tabulated results of the Experience section of the Check-In Form are shown in Figure 6.1. From left to right, the columns show: the person’s ID number, the organization the person works for, the person’s job title, the organization type (8 types), the person’s rank within the organization (1 to 5) and the person’s experience in various differentskillareas. 6.2.3 DCOPConstraints Three types of constraints were generated from this data: soft unary constraints, soft binary constraints,andhardconstraintsofanyarity. 6.2.3.1 SoftUnaryConstraints Each agent had an associated unary constraint, generating a reward based on the value (role) it took, with a zero reward given for the null value. These constraint were generated in the follow- ing ways. The answers to the Experience questions on the Check-In Form were used to generate 98 Figure6.1: TabulatedresultsofExperiencesectionofCheck-InForm 99 scoresforeverycandidateforfivemission-relatedcriteria: OrganizationandControl,Emergency Response, Economic Effects, Defense Support, and Port and Marine Operations. Similarly, the answers to the Team Processes questions were used to generate scores for every candidate for three team-related criteria: Coordination Attitudes, Attitudes to Teamwork, and Leadership At- titudes. Each team role was paired with two sets of predetermined weights; one set contained weights for each of the mission-related areas, and the other set contained weights for each of the team-related areas. Thus, the suitability of a candidate for each role with respect to the mission-related criteria was determined by a weighted sum of the candidate’s mission-related scores, using the weights particular to that role. Similarly, the suitability of a candidate for each role with respect to the team-related criteria was determined by a weighted sum of the candi- date’s team-related scores. In addition, social networking centrality measures, used to estimate leadership skills, were generated based on the Existing Connections section of the form. For theOfficer-In-Chargerole,thecentralitymeasure,mission-relatedscoresandteam-relatedscores werecombined;forallotherroles,onlythemission-relatedandteam-relatedscoreswereused. 6.2.3.2 SoftBinaryConstraints From the Existing Connections section, a second measure was generated, to represent the con- nectedness between each pair of candidates in the social network. Pairs of agents, representing candidates who were connected in some way, were assigned binary constraints, such that a re- ward proportional to this connectedness measure was generated if both agents were assigned a non-nullvalue. 100 6.2.3.3 HardConstraints Additionalpre-existingconstraintsontheteamcompositionwereexpressedasDCOPconstraints of any arity with a large negative reward if violated. Single-agent (unary) constraints were expressed simply by removing values from agents’ domains. Multi-agent constraints were ex- pressedexplicitlywithconstraintsintheDCOP.Forthisdomain,theinitialhardconstraintsused togenerateteamswere: • OfficerinChargemustbefromGroups4or5 • OfficerinChargemusthaveRank/Position≥3 • OfficerinCharge: Leadership≥2.5+InterpersonalSkills≥2 • OperationsCoordinatormusthaveEmergencyResponse≥2andDefenseSupport≥2 • Inter-AgencyCoordinatormusthaveInterpersonalSkills ≥2andTeamAttitude≥2 • TeammustcontainatleastonememberfromGroup1 • TeammustcontainatleastonememberfromGroup3 • TeammustcontainatleastonememberwithPortandMarineOperations≥3 • TeammustcontainatleastonememberwithEconomicEffects≥3 Inadditionarelaxedsetofconstraintswereusedtogenerateotherteams. • OfficerinChargemusthaveRank/Position≥3 • TeammustcontainatleastonememberfromGroups4and5 101 6.2.3.4 WeightingtheConstraints Finally, the relative importance of the four types of soft constraints (mission-related unary con- straints, team-related unary constraints, centrality unary constraints, and connectedness binary constraints) to the global objective function could be adjusted by further weighting these four measures with respect to each other. In the initial examples used, each of the four factors were weightedequally,witheachaccountingfor25%oftheoverallobjective. 6.2.4 Formationofk-OptimalTeams 2-optimalteamswereformedforseveralcases;eachcaseusedadi fferentcombinationofweights and hard constraints. For each case, 100 random runs of a 2-optimal algorithm were used to generate teams; discarding duplicate teams, the results are shown in Figure 6.2. Each run took approximately 4 minutes on a 2.16 GHz Intel Core Duo machine with 2GB of RAM running MacOSX.Insomecases(1,4,and7)all100runsproducedthesameteam,stronglysuggesting that this was the only 2-optimal team possible under the constraints given (and that it was the globally optimal team). In others, several teams were produced. Since the teams are 2-optimal, anytwoteamsinasinglecasemusthavemorethantwodifferingassignmentsofpeopletoroles, providingdiversityamongthechoices. 6.3 SampleDomain2: SecondHomelandSecurityExercise We also applied the same techniques on a larger dataset from a similar exercise in the Los An- geles area. Here, 2-optimal and 3-optimal seven-person teams were formed from a pool of 68 candidates using 100 random runs each. The 2-optimal teams formed are shown in Figure 6.3. 102 Figure6.2: 2-optimalteamsformedaccordingtovariouscriteria 103 Due to the larger dataset, more 2-optimal teams were found in the search space. Again, any two teamsmusthavemorethantwodifferingassignmentsofpeopletoroles. When3-optimalitywasused,onlythefirstteaminthefigurewasproduced,stronglysuggest- ingthatthisteamisthegloballyoptimalteam,andtheonly3-optimalteam. 104 Figure6.3: 2-optimalteams 105 Chapter7 OtherWork In addition to my work in k-optimality, I have also contributed to advances in conceptualizing and measuring privacy in DCOPs [Maheswaran et al., 2006; Greenstadt et al., 2006], as well as insolvingDCOPsmoreefficiently[Maheswaranetal.,2004b]. 7.1 PrivacyinDCOPs It is critical that agents deployed in real-world settings, such as businesses, o ffices, universi- ties and research laboratories, protect their individual users’ privacy when interacting with other entities. Indeed, privacy is recognized as a key motivating factor in the design of several multi- agent algorithms, such as in distributed constraint reasoning (including both algorithms for dis- tributed constraint optimization (DCOP) and distributed constraint satisfaction (DisCSPs)), and researchers have begun to propose metrics for analysis of privacy loss in such multiagent algo- rithms. Unfortunately, a general quantitative framework to compare these existing metrics for privacy loss or to identify dimensions along which to construct new metrics was currently lack- ing. 106 In [Maheswaran et al., 2006] we presented three key contributions to address this shortcom- ing. First,weintroducedVPS(ValuationsofPossibleStates),ageneralquantitativeframeworkto express,analyzeandcompareexistingmetricsofprivacyloss. Basedonastate-spacemodel,VPS is shown to capture various existing measures of privacy created for specific domains of DisC- SPs. TheutilityofVPSisfurtherillustratedthroughanalysisofprivacylossinDCOPalgorithms, when such algorithms are used by personal assistant agents to schedule meetings among users. In addition, VPS helps identify dimensions along which to classify and construct new privacy metrics and it also supports their quantitative comparison. Second, [Maheswaran et al., 2006] presents key inference rules that may be used in analysis of privacy loss in DCOP algorithms under different assumptions. Third, detailed experiments based on the VPS-driven analysis lead to the following key results: (i) decentralization by itself does not provide superior protection of privacy in DisCSP/DCOP algorithms when compared with centralization; instead, privacy pro- tectionalsorequiresthepresenceofuncertaintyaboutagents’knowledgeoftheconstraintgraph. (ii) one needs to carefully examine the metrics chosen to measure privacy loss; the qualitative properties of privacy loss and hence the conclusions that can be drawn about an algorithm can varywidelybasedonthemetricchosen. In [Greenstadt et al., 2006] we applied a similar analysis to more recent DCOP algorithms, including DPOP and Adopt, to see if they suffered from a similar shortcoming. We found that severalofthemostefficientDCOPalgorithms,includingbothDPOPandAdopt,providedbetter privacyprotectionthanthealgorithmsconsideredin[Maheswaranetal.,2006],suchasSynchBB. Furthermore,weexamined,forthefirsttime,theprivacyimplicationsofvariousdistributedcon- traint reasoning design decisions, e.g. constraint-graph topology, asynchrony, message-contents, 107 to provide an improved understanding of privacy-e fficiency tradeoffs. Finally, this paper aug- mented our previous work on system-wide privacy loss, by investigating inequities in individual agents’privacyloss. 7.2 SolvingDCOPsEfficiently To capably capture a rich class of complex problem domains, we introduced the Distributed Multi-Event Scheduling (DiMES) framework and showed how DCOPs could be formulated, us- ing only binary constraints, whose optimal solution is also the optimal solution to the DiMES problem [Maheswaran et al., 2004b]. To approach real-world e fficiency requirements, we ob- tained large speedups for the Adopt algorithm using two preprocessing steps: improving the communication (tree) structure of the agents and precomputing best-case bounds on solution quality. Resultsweregivenformeetingschedulingandsensornetworkdomains. 108 Chapter8 ConclusionsandFutureWork 8.1 Conclusions In addition to new algorithms for distributed constraint optimization, this thesis makes several contributions towards understanding locally optimal states that occur in systems of cooperative agentswhentheabilitytocooperateisbounded. In Chapter 2, k-optimality was introduced as an algorithm-independent classification for lo- cally optimal solutions to a DCOP. At the same time, based on this definition, any DCOP algo- rithm can be classified as a k-optimal algorithm for some value of k. We have also shown the existence of several useful properties of k-optimal DCOP solutions. In Chapter 6, a case study of the application of k-optimality to the problem of human team formation was shown, in order to instantiate many of the theoretical ideas presented in the rest of the thesis. k-optimal teams were found using actual data, including social network data, from several real military and civil- ian task scenarios. In Chapter 5, new families of k-optimal algorithms were designed for k = 2 and k = 3. Important properties of these algorithms, such as monotonicity, were proven. The 109 assignment space regions of: • dominance • density k-optimum • exclusivity Figure 8.1: A depiction of three key regions in the assignment space, given a k-optimum, that allowedforthediscoveryofvarioustheoreticalpropertiesaboutk-optima algorithms were also analyzed experimentally and shown to outperform existing 1-optimal algo- rithms. In addition, these algorithms were used to find k-optimal solutions to DCOPs larger than thosesolvedoptimallyinanypreviouswork. Based on this foundation, this thesis contains three key insights that led to the further con- tributions of theoretical results about k-optimality. We can refer to these insights as exclusivity, dominance, and density. Figure 8.1 depicts a k-optimal assignment a existing within the space of all possible assignments to a given DCOP. This thesis has shown how any k-optimum implies the existence of three particularly defined regions of the assignment space. Certain properties of these regions of the space (the number of assignments they contain, as well as the total reward 110 of those assignments) allow us to determine properties of the k-optima themselves. These three regionsareshowninthefigure,eachcorrespondingtooneofthethreeinsights: 1. Theregionofexclusivity,asfoundinChapter4containsalltheassignmentsthat,giventhe k-optimum a, cannot alsobe a k-optimum. These assignmentsoccur when anygroup G of k or fewer agents faces the same context as in a, but some agent in G chooses a different value from its value in a. Note that it is possible for some of these assignments to actually have a reward higher than that of a (due to value changes by other agents outside the view of G). Nevertheless, these assignments cannot be k-optimal because the agents in G (or somesubset)canalwaysimprovetheglobalrewardbychangingtheirvalues. 2. A subset of the region of exclusivity is the region of dominance. As found in Section 3.4, this region, given a k-optimum, contains all the assignments that not only cannot be a k- optimum,butalsomusthavealowerorequalrewardtothek-optimum. 3. A subset of the region of dominance is the region of highest density. We can define the densityofaregionofassignments ˆ Aasthemeanrewardofallassignmentsin ˆ A: P ˆ a∈ ˆ A R(ˆ a) | ˆ A| In Section 3.1 and 3.2, we are finding, for any k-optimum, the densest region that this k- optimum must dominate (have a higher reward than all assignments in the region). This regionisofcourseasubsetofthek-optimum’sregionofdominance. Withoutknowingthe actualcostsandrewardsontheconstraintsintheDCOP,wecanexpressthedensityofthis 111 particular region in terms of R(a) (the reward of the k-optimum) and R(a ∗ ) (the reward of theglobaloptimum). These three regions of assignments are the basis for four remaining contributions of this thesis. Eachcontributionisasetoftheoreticalresultsaboutk-optima. 1. Quality guarantees on k-optima. The region of highest density defined by a k-optimum wasusedtofindtheguaranteesonthesolutionqualityofak-optimum,giveninSections3.1 and3.2. Sinceak-optimum amustdominatethisregion,itmustthereforehaverewardR(a) higher than the region’s density. Since this density can be expressed in terms of R(a) and R(a ∗ ), a lower bound on the reward of the k-optimum can be expressed in terms of the rewardofthegloballyoptimalsolution,independentofthecostsandrewardsontheDCOP constraints,aswellasthedomainsizeofeachvariable. 2. Quality guarantees on k-optima in DCOPs with hard constraints. Both the region of highest density and the region of dominance are used in Section 3.3 to extend these guarantees to DCOPs that contain hard constraints. In that section, we observe that these guaranteesareonlypossiblegivencertainrestrictionsonthequantityandplacementofthe hard constraints in the DCOP. Otherwise it would be possible for a k-optimum to violate a hardconstraint,makingnoguaranteepossibleforak-optimuminsuchaDCOP.InaDCOP withe these restrictions, we showed it is not possible for a k-optimum to violate a hard constraint, because if such a k-optimum existed, it would be dominated by an assignment in its own region of dominance; a contradiction. Subsequently, this section showed how theregionofhighestdensitycouldbefoundinthepresenceofhardconstraintsbyensuring that no assignment that violated a hard constraint would be included in the region. If such 112 anassignmentwereincluded,itslargenegativerewardwouldoffsettherewardoftheother assignments,drivingthedensityoftheregionbelowzeroandmakingguaranteesonquality impossiblefork-optima. 3. Guarantees on domination ratio of k-optima. The region of dominance was used in Section 3.4 to calculate the guaranteed domination ratio (the proportion of assignments in thetotalspacethatanyk-optimummustdominate). 4. Upperboundsonthenumberof k-optimainaDCOP. Finally, the region of exclusivity was used in Chapter 4 to calculate upper bounds on the number of k-optima that can exist, given a DCOP graph. Each k-optimum claimed a region of assignments that could not also be k-optima. Then, each assignment in the region was divided among the maximal number of k-optima whose exclusivity regions it could possibly be in. This provided each k-optimum with a distinct region of assignments (or shares of assignments) that it could claim. Dividingthenumberoftotalassignmentsinthewholeassignmentspacebythesize of this region produced the upper bound on the number of k-optima that could exist given aDCOPgraph. 8.2 FutureWork Whilethisthesishasbeguntoexploretheeffectsofagents’boundedabilitytocooperateontheir performance as a team, the following areas seem especially promising for future work in this area. 1. Analogs of k-optimality for noncooperative settings: restricted coalition-proof equi- libria. A primary justification for k-optimality is the cost and di fficulty of aggregating the 113 A B C 4,4 6,0 1 0,6 5,5 0 1 0 4,4 6,0 1 0,6 5,5 0 1 0 A B B C Figure8.2: Graphicalgameexample preferences of large groups of agents; if agents are bounded in this ability, up to a certain group size, a k-optimum will arise for cooperative settings such as DCOPs. A clear area for investigation is the extension of this idea to noncooperative settings. Figure 8.2 shows a sample graphical game - a game where players’ payo ffs depend only on the actions of a subsetofotherplayers. Edgesinthegraphshowwhichplayersaffecteachothers’payoffs; sincethisrelationshipissymmetricinthisexample,thegraphisundirected. Here,thepay- offsforagentAdependonlyonitsownactionandthatofagentB(therowplayer’spayoff is listed first). Agent B’s payoffs depend on the actions of all three agents (B receives the sumofthepayoffsfromitstwomatriceswithAandC). Whathappenswhen,duetoboundsoncomputationalabilityortime,noncooperativeagents in graphical games are limited in their ability to form coalitions that contain more than k agents? Game theorists have focused on properties of coalition-proof equilibria, but this work generally assumes that all sizes of coalitions are possible. If coalitions are limited to k agents, the space of “k-coalition-proof” equilibria should become larger, and it may be easiertofindsuchequilibria. Theseequilibriawouldbejustasrobustasstandardcoalition- proof equilibria in settings where agents do not have the ability to form coalitions of more than k agents. I will begin by developing algorithms to find k-coalition-proof equilibria in 114 graphicalgamesbyusingexistingtechniquestofindallNashequilibria,andthenfocuson efficientlycheckingthemfork-coalition-proofness. 2. Efficient k-optimal algorithms for DCOP . Now that levels of solution quality can be guaranteedfork-optima,anotherclearareaforfutureworkisthedevelopmentofe fficient, distributed k-optimal algorithms for k > 3. One path along which these algorithms can bedevelopedaretheMGM/DSAframework, basedonthealgorithmsinthisthesis,where larger groups are formed in each round, at a cost of more message cycles per round. One challengeofthisapproachisthat,fork > 3thereisnotalwaysanagentinagroupthathas aconstraintwithalloftheotheragents(considerachainoffouragentsforexample). This possibilitywouldrequireadetailedsystemforpassingmessagesinordertoensurethatthe groupwouldalwaysmakecoordinatedvaluechanges. Another framework that could possibly be utilized is the OptAPO algorithm of [Mailler andLesser,2004]. Inthiscompletealgorithm,agentsdesignatedasmediatorsfindoptimal solutions to subgraphs of the DCOP. It would seem that if this responsibility of mediating agents were relaxed to require only a k-optimal solution to their subproblems, a k-optimal solutionwouldresult. 3. Improving bounds on solution quality of k-optima . The current lower bounds on solu- tion quality of k-optima depend only on the graph structure of the DCOP, and do not take any information about rewards into account. This makes them applicable over all possi- ble reward structures, and is appropriate given the assumption that no central source has knowledge of all the rewards in the system. However, initial experiments have shown that ifthesystemdesignerknowssomepartialinformationabouttherewardsontheconstraints, 115 this can be used to add constraints to the LFP which is used to find lower bounds on so- lution quality for arbitrary graphs. One area for future work is to explore the effect of this kindofpartialknowledgeonthesolutionqualityboundsfork-optima. Wefocusonpartial information because if all rewards are known and collected together, then an exact lower boundcouldbefoundbysimplyenumeratingallk-optimaandtheirrespectiverewards. 116 Bibliography S. M. Ali, S. Koenig, and M. Tambe. Preprocessing techniques for accelerating the DCOP algo- rithmADOPT. InAAMAS,2005. N. Alon and N. Kahale. Approximating the independence number via the theta-function. Math- ematicalProgramming,80:253–264,1998. B. Blum, C. Shelton, and D. Koller. A continuation method for Nash equilibria in structured games. InIJCAI,2003. E.Bowring,M.Tambe,andM.Yokoo. Multiply-constraineddistributedconstraintoptimization. InAAMAS,2006. S.BoydandL.Vandenberghe. ConvexOptimization. CambridgeU.Press,2004. R.CaruanaandM.Mullin. Estimatingthenumberoflocalminimaincomplexsearchspaces. In IJCAIWorkshoponOptimization,1999. V.ConitzerandT.Sandholm. ComplexityresultsaboutNashequilibria. InIJCAI,2003. J. Cox, E. Durfee, and T. Bartold. A distributed framework for solving the multiagent plan coordinationproblem. InAAMAS,2005. S. Fitzpatrick and L. Meertens. Distributed coordination through anarchic optimization. In V. Lesser, C. L. Ortiz, and M. Tambe, editors, Distributed Sensor Networks: A Multiagent Perspective,pages257–295.Kluwer,2003. E. C. Freuder. Synthesizing constraint expressions. Communications of the ACM, 21(11):958– 966,1978. R. Greenstadt, J. P. Pearce, and M. Tambe. Analysis of privacy loss in DCOP algorithms. In AAAI,2006. G. Gutin and A. Yeo. Domination analysis of combinatorial optimization algorithms and prob- lems. InM.GolumbicandI.Hartman,editors,GraphTheory,CombinatoricsandAlgorithms: InterdisciplinaryApplications.Kluwer,2005. M.Kearns,M.L.Littman,andS.Singh. Graphicalmodelsforgametheory. InProc.UAI,2001. H. Keiding. On the maximal number of Nash equilibria in an n x n bimatrix game. Games and EconomicBehavior,21(1-2):148–160,1995. 117 J. T. Kim and D. R. Shin. New efficient clique partitioning algorithms for register-transfer syn- thesisofdatapaths. JournaloftheKoreanPhys.Soc.,40(4):754–758,2002. S. Lin. Computer solutions of the traveling salesman problem. Bell System Technical Journal, 44:2245–2269,1965. S.LingandC.Xing. CodingTheory: AFirstCourse. CambridgeU.Press,2004. M.L.Littman,M.Kearns,andS.Singh. Anefficient,exactalgorithmforsolvingtree-structured graphicalgames. InProc.NIPS,2002. R. T. Maheswaran, J. P. Pearce, and M. Tambe. Distributed algorithms for DCOP: A graphical- game-basedapproach. In PDCS,2004a. R. T. Maheswaran, M. Tambe, E. Bowring, J. P. Pearce, and P. Varakantham. Taking DCOP to therealworld: efficientcompletesolutionsfordistributedmulti-eventscheduling. In AAMAS, 2004b. R. T. Maheswaran, J. P. Pearce, E. Bowring, P. Varakantham, and M. Tambe. Privacy loss in distributed constraint reasoning: A quantitative framework for analysis and its applications. JAAMAS,13:27–60,2006. R.MaillerandV.Lesser. Solvingdistributedconstraintoptimizationproblemsusingcooperative mediation. InAAMAS,2004. A. McLennan and I. Park. Generic 4 x 4 two person games have at most 15 Nash equilibria. GamesandEconomicBehavior,26(1):111–130,1999. P. J. Modi, W. Shen, M. Tambe, and M. Yokoo. Adopt: Asynchronous distributed constraint optimizationwithqualityguarantees. ArtificialIntelligence,161(1-2):149–180,2005. R.Nair,P.Varakantham,M.Tambe,andM.Yokoo.NetworkeddistributedPOMDPs: Asynthesis ofdistributedconstraintoptimizationandPOMDPs. InAAAI,2005. A. Petcu and B. Faltings. A scalable method for multiagent constraint optimization. In IJCAI, 2005. A.PetcuandB.Faltings. ODPOP:Analgorithmforopen/distributedconstraintoptimization. In AAAI,2006. S.Ruan,C.Meirina,F.Yu,K.R.Pattipati,andR.L.Popp. Patrollinginastochasticenvironment. In10thIntl.CommandandControlResearchSymp.,2005. T. Sandholm. Negotiation among Self-Interested Computationally Limited Agents . PhD thesis, UniversityofMassachusettsatAmherst,Dept.ofComputerScience,1996. N.Schurr,J.Marecki,P.Scerri,J.P.Lewis,andM.Tambe. TheDEFACTOsystem: Trainingtool forincidentcommanders. InIAAI,2005. A. Tate, J. Dalton, and J. Levine. Generation of multiple qualitatively different plan options. In Proc.AIPS,1998. 118 D. Vickrey and D. Koller. Multi-agent algorithms for solving graphical games. In Proc. AAAI, pages345–351,2002. N. Vlassis, R. Elhorst, and J. R. Kok. Anytime algorithms for multiagent decision making using coordinationgraphs. InProc.Intl.Conf.onSystems,ManandCybernetics,2004. D. Whitley, S. Rana, and R. B. Heckendorn. Representation issues in neighborhood search and evolutionary algorithms. In D. Quagliarella, et al., editor, Genetic Algs. and Evolution Strate- giesinEng.andComp.Sci.,pages39–57.Wiley,1998. M. Yokoo. How adding more constraints makes a problem easier for hill-climbing algorithms: AnalyzinglandscapesofCSPs. InInt’lConf.onConstraintProgramming,1997. M. Yokoo and K. Hirayama. Distributed breakout algorithm for solving distributed constraint satisfactionandoptimizationproblems. InICMAS,1996. W. Zhang, Z. Xing, G. Wang, and L. Wittenburg. An analysis and application of distributed constraintsatisfactionandoptimizationalgorithmsinsensornetworks. InAAMAS,2003. W. Zhang, G. Wang, Z. Xing, and L. Wittenberg. Distributed stochastic search and distributed breakout: properties, comparison and applications to constraint optimization problems in sen- sornetworks. ArtificialIntelligence,161(1-2):55–87,2005a. Y. Zhang, J. G. Bellingham, R. E. Davis, and Y. Chao. Optimizing autonomous underwater vehicles’surveyforreconstructionofanoceanfieldthatvariesinspaceandtime. InAmerican GeophysicalUnion,Fallmeeting,2005b. 119
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Speeding up distributed constraint optimization search algorithms
PDF
Thwarting adversaries with unpredictability: massive-scale game-theoretic algorithms for real-world security deployments
PDF
Balancing local resources and global goals in multiply-constrained distributed constraint optimization
PDF
Planning with continuous resources in agent systems
PDF
Interaction and topology in distributed multi-agent coordination
PDF
Optimal resource allocation and cross-layer control in cognitive and cooperative wireless networks
PDF
Keep the adversary guessing: agent security by policy randomization
PDF
Decoding information about human-agent negotiations from brain patterns
PDF
Efficient and effective techniques for large-scale multi-agent path finding
PDF
Algorithmic aspects of energy efficient transmission in multihop cooperative wireless networks
PDF
Dynamic routing and rate control in stochastic network optimization: from theory to practice
PDF
Enhancing collaboration on the edge: communication, scheduling and learning
PDF
Protecting networks against diffusive attacks: game-theoretic resource allocation for contagion mitigation
PDF
Addressing uncertainty in Stackelberg games for security: models and algorithms
PDF
Automated negotiation with humans
PDF
Cooperation in wireless networks with selfish users
PDF
Common ground reasoning for communicative agents
PDF
Empirical methods in control and optimization
PDF
Mobility-based topology control of robot networks
PDF
Identifying and leveraging structure in complex cooperative tasks for multi-agent reinforcement learning
Asset Metadata
Creator
Pearce, Jonathan P.
(author)
Core Title
Local optimization in cooperative agent networks
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
06/14/2007
Defense Date
05/24/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
artificial intelligence,distributed constraint reasoning,k-optimality,multi-agent systems,OAI-PMH Harvest
Language
English
Advisor
Tambe, Milind (
committee chair
), Krishnamachari, Bhaskar (
committee member
), Lesser, Victor (
committee member
), Ordonez, Fernando I. (
committee member
)
Creator Email
jppearce@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m530
Unique identifier
UC1144326
Identifier
etd-Pearce-20070614 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-502341 (legacy record id),usctheses-m530 (legacy record id)
Legacy Identifier
etd-Pearce-20070614.pdf
Dmrecord
502341
Document Type
Dissertation
Rights
Pearce, Jonathan P.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
artificial intelligence
distributed constraint reasoning
k-optimality
multi-agent systems