Close
USC Libraries
University of Southern California
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Folder
Max-3-Cut performance of graph neural networks on random graphs
(USC Thesis Other) 

Max-3-Cut performance of graph neural networks on random graphs

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Copy asset link
Request this asset
Request accessible transcript
Transcript (if available)
Content Max-3-Cut Performance of graph neural networks on random graphs by Xingyun Zhou A Thesis Presented to the FACULTY OF THE USC DORNSIFE COLLEGE OF LETTERS, ARTS AND SCIENCES UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree MASTER OF SCIENCE (Applied Mathematics) December 2020 Copyright 2020 Xingyun Zhou Table of Contents List of Figures iii List of Tables iv Abstract v Introduction 1 Chapter 1: Semidenite programming 6 Chapter 2: Graph neural networks 9 Chapter 3: Performance evaluation 12 Conclusion 12 References 13 ii List of Figures Figure 1 4 Figure 2 8 iii List of Tables Table 1 12 Table 2 12 iv Abstract The Max-Cut problem asks for the partition of a graph into two sets that maximizes the total weight of edges going between the two sets. The Goemans-Williamson [7] semidenite program approximately solves the max-cut problem, and this algorithm is conjectured to be the best possible, assuming the Unique Games Conjecture. Given much recent interest in neural networks, it is then interesting to see how well a neural network can solve NP-hard optimization problems such as max-cut. This was accomplished by Bandeira et al.[12] in 2009. This thesis discusses an extension of the work of Bandeira to solve the more general Max-3-Cut problem, which asks for the partition of a graph into three sets that maximizes the total weight of edges between the three sets. Keywords: Max-Cut, semidenite programming, graph neural networks v Introduction Letn be a positive integer. An undirected graph is a set of n verticesV =f1;:::;ng together with a set of edges Effi;jg: i;j2 V; i6= jg: Edges between vertices may have weights on them. A random graph is a graph in which properties as number of vertices, degrees and edges are generated in some random ways. This thesis will use regular random graphs in which the number of vertices and degrees are given, while connections and weights on edges are random. A d-regular graph is a graph whose all vertices have the same degree d. Given a nite weighted undirected graph with n vertices and weights on edges, the classical Max-k-Cut problem asks for a partition of the vertices intok sets that maximizes the total weight of edges going between the partition elements. The problem actually requires a particular partition for vertices of a graph with k subsets. We let n and k be positive integers. The general Max-k-Cut problem can be dened as follows. LetG = (V;E) be a undirected weighted graph, V be the set of vertices as V =f1;:::;:::;ng and E be the set of edges.jj denotes the cardinality of a set. The goal of this problem is to nd a particular partition with k disjoint subsets of vertices such that the total weights on edges between these subsets achieves the maximum possible value. One can see that it's also equivalent to minimizing the total weights of edges connecting vertices within the same subsets. Let these groups of vertices S 1 ;S 2 ;S 3 ;:::;S k be nite disjoint subsets of V , which contain dierent vertices and P k i=1 jS i j =n. For the famous Max-Cut problem(Max-2-Cut), where k = 2, let weights on edges be w ij = w ji , 1 i;j n, given it is undirected and one can dene its optimizing or loss function as max 1 2 X 1i<jn w ij (1x i x j ) (1) s.t x i 2f1;1g for all 1in; with x i = 1 if i2S 1 and x i =1 if i2S 2 . Notice that in the above loss function (1) if the vertices are not partitioned into the same set, one can have w ij (1 1 (1)) = 2w ij ; so there is a 1 2 in front of the sum to make sure the target edges won't be counted twice. We dene the inner product in Euclidean space for vectors x and y as hx;yi =h(x 1 ;:::;x n ); (y 1 ;:::;y n )i =x 1 y 1 +x 2 y 2 +::: +x n y n 1 with x i and y i are real numbers for all 1in. In general for Max-k-Cut, Frieze and Jerrum (1997) [6] give the following formulation: max k 1 k X 1i<jn w ij (1y i y j ) (2) s.t 1y i y j = ( 0 if y i =y j k k1 if y i 6=y j : In this case setting y i = 1; 2;:::;k will not give a useful integer solution, so instead we deney i be one ofk vectorsa 1 ;a 2 ;:::;a k dened as follows: leta 1 ;:::a k 2R k1 be vectors with length 1 such thata i a k =1=(k1) by lettinga 1 ;:::;a k be the vertices of a regular simplex centered at the origin. Notice that the length of an n-dimensional vectora in Euclidean space is dened asjaj = ( P n i=1 a 2 i ) 1=2 . The case of Max-Cut in (1) is a special case for (2) when k = 2. For max-3-cut, we substitute k = 3 and get the following: max 2 3 X 1i<jn w ij (1y i y j ) (3) 1y i y j = ( 0 if y i =y j 3 2 if y i 6=y j : Goemans and Williamson (1995) [7] introduce an approximation algorithm for the max- cut problem. Instead of considering it as an integer program forx i =f1;1g, they relax (1) directly to: max 1 2 X 1i<jn w ij (1u i u j ) (4) with u i 2S n1 ;8i2V and S n1 =fx2R n jjxj = 1g is the unit sphere in n dimensions. This formulation in (4) is called a semidenite program, a special kind of convex program with the optimization of a linear objective function over the intersection of the cone of positive semidenite matrices with an ane space. We denote a symmetric matrix X 2 R nn is positive semidenite(PSD) if it satises v T Xv 0;8v2R n withv is a column vector andv T is the transpose ofv, which meansv T is a row vector. The followings are equivalent for symmetric matrix X2 R nn : (i)X is positive semi-denite. 2 (ii) X has all non-negative eigenvalues. (iii)X = V T V for some V 2R mn where m n. If X2R nn is a positive semidenite matrix, we denote it as X 0. The equivalence relations (i) and (iii) give the denition of Cholesky decomposition: A symmetric positive denite matrix X2 R nn can be decomposed as X = V T V such that V2R nn is an upper triangular matrix with diagonal entries strictly positive. Semidenite programs have many applications such as traditional convex constrained optimization, control theory, and combinatorial optimization. Lieven Vandenberghe and Stephen Boyd (1996) [11] mentioned that the ellipsoid method can be applied to solve semidenite programs in polynomial time while it was not always known that how to solve them eciently. Semidenite program is a special linear program on a positive semidenite cone. The PSD cone is dened as a subset of the space ofnn matrices, which consists of all matrices X of the form X = n X i=1 i v i v T i for some non-negative real numbers i with 1in, and vectors v i 2R n . If we let Y = [Y ij ]2R nn be a symmetric real matrix and Y ij =u i u j , then we rewrite (4) as max 1 2 X 1i<jn w ij (1Y ij ) (5) Y ii = 1;8i2V Y 0: Here Y 0 means that Y is a positive semidenite matrix. By Cholesky decomposition we have Y =U T U for some upper triangular matrix U2R nn . Boettcher and Percus [3] introduce the extremal optimization (EO) algorithm to solve the graph bi-partitioning problem on regular graphs in 2001. Bandeira et al.(2019) [12] adapt their work and applied the EO method to the Max-Cut. This method is fairly specialized, unlike semidenite programming that is more general, but it also achieves good performance on Max-Cut compared to semidenite program and still have potentials needed to explore. It's easy to nd a (1/2)-approximation by choosing each vertex independently at random to be in either of the two sets.(that is you can nd a cut that is at least 50% as good as the best possible one). So, only algorithms that can do better than this trivial algorithm are interesting. Bandeira do accomplish this with their graph neural network, which has comparable performance to Goemans-Williamson. 3 Figure 1: A simple neural network A neural network works similarly to the human brain's neural network. A \neuron" in a neural network is a mathematical function that collects and classies information according to a specic architecture. The network bears a strong resemblance to statistical methods such as curve tting and regression analysis. A simple neural network is shown in Figure 1 on next page. A neural network consists of input layers, output layers and hidden layers. Each node in the network is a perceptron, a binary linear classier used to detect features. The input layers receive the input patterns. Output layers have output signals which input patterns may map. Hidden layers perform extra calculation toward output and include biases. When we consider using the neural networks to solve problems related to graphs, we can make use of graph neural networks(GNNs). Nodes in GNNs gather information from neighboring nodes and the last output layer gathers all information and outputs either a prediction or classication. Graph neural networks have some particular uses in classifying graph properties. The rst one is node classication. The network analyzes the patterns of input and determines the labels of the new nodes. The second use is link predictions. The network predicts whether there is a connection between particular nodes. The third one is graph classica- tion. Classical neural networks can be applied to image classication, while now the target classication is on graphs instead of images. Deep learning algorithms such as graph neural networks and convolutional neural net- works have some great applications in areas like computer vision, voice and face recognition and natural language processing. The promise of deep learning is that it can lead to predic- tive systems that generalize well, adapt well, continuously improve as new data arrives, and are more dynamic than predictive systems built on solid rules. All of these applications have played important roles in engineering and business while 4 the deep learning algorithms also have drawbacks. These algorithms require large amounts of training data, while some of the problem do not have enough amount of quality training data sets to create the model. Neural networks develop their behavior in extremely compli- cated ways. Lack of interpretability makes it extremely dicult to make corrections and x mistakes in deep learning algorithms. There are also some other interesting topics to discuss in terms of graph neural networks and Max-Cut. It's worth trying to give the deep learning algorithm(neural network) the eigenvalues and eigenvectors of the matrix to see if the performance improves or not. One can also try to apply the graph neural networks to stochastic block model [1]. A series of works also shows that predictors learned by deep neural networks are not robust to adver- sarial examples, with much research studying dierent aspects of robustness to adversarial examples [9]. It's also worth noticing that neural networks are not stable to noise in the Omar et al.(2005) [9]. Emmanuel Abbe and Colin Sandon (2005) [2] prove that deep learning with stochas- tic gradient descent(single batch) is eciently universal. Anything that can be eciently learned by some algorithm in polynomial time can also be eciently learned by a polyno- mial size neural network in polynomial time. However, this result does not hold for gradient descent(full or larger batch size). Stochastic gradient descent is universal even with some poly-noise while full gradient descent algorithms are not. An interesting question we will ask is: if deep learning algorithms can solve NP-hard problems better than the best known algorithms. In chapter 1 we will discuss the formulation of a semidenite program for Max-3-Cut, borrowing from work of Frieze and Jerrum (1993) [6] and describe the way we used to generate the solution. In chapter 2 we explore the use of graph neural networks applied to Max-Cut and make changes in order to t into the case of Max-3-Cut. In chapter 3 we compare the results based on random regular graphs with semidenite programming and graph neural networks. 5 Chapter 1: Semi-denite programming Let A be the adjacency matrix of a graph G and X = (x ij ) dened as x ij = ( 1 if vertex i in S j 0 otherwise. X is an nk real matrix. S 1 ;:::;S k is a partition of the vertices. An adjacency matrix of a graph refers to a matrix with rows and columns labeled with graph vertices, with 1 or 0 in position (i;j) such that A ij = 1 when vertices i and j are adjacent, and A ij = 0 when vertices i and j are not adjacent. If the graph is undirected, meaning that there is no dierence between the direction from vertices i toj andj toi, the adjacency matrix is symmetric. If connections from any vertex to itself is not allowed, the diagonal entries of the adjacency matrix are all 0's. A diagonal of a matrix is dened as the set of all entries in position (i;i);i = 1;:::;n. We let diag(A) denote the diagonal of a matrix A. Then one can obtain the trace formulation of Max-k-Cut problem as follows max 1 2 tr(X T LX) s.t Xu k =u n ; with u k and u n be all-ones column vectors with dimension k 1 and n 1. A trace of a matrix is dened as the sum of all diagonal entries. Notice that L = diag(Au n )A and 1 2 tr(X T LX) gives the total weight of edges between nodes in subsets. One can thus try to nd a relaxation for this linear program. By Dam andSotirov (2015) [10], the following SDP relaxation can be established: max 1 2 tr(LY ) s.t diag(Y ) =u n kY 1 n 0 Y 0 withY =XX T and 1 n denote annn matrix with all ones lled in its entries. Notice that kY 1 n 0 is a positive semi-denite programming constraint and Y 0 means Y ij 0 for all 1i;jn. 6 For k = 2, the Max-Cut problem, let Z = 2Y 1 n , then Z = 2Y 1 n )Y = 1 2 (Z + 1 n ) ) 1 2 tr(LY ) = 1 2 tr(L 1 2 (Z + 1 n )) = 1 2 tr( 1 2 LZ + 1 2 L1 n ) = 1 4 tr(LZ): with L 1 n = 0. Thus for Max-Cut, one can have the SDP relaxation as max 1 4 tr(LY ) s.t diag(Y ) =u n Y 0: For Max-k-Cut, the well-known semidenite program by Frieze and Jerrum (1997) [6] is as follows: max k 1 2k tr(LY ) s.t diag(Y ) =u n Y 1 k 1 1 n Y 0: and Y 1 k1 1 n means that Y ij 1 k1 for all 1i;jn. In order to implement a Max-3-Cut semidenite program and thus take the last choice of relaxation, we set k = 3 to get max 1 3 tr(LY ) s.t diag(Y ) =u n Y 1 2 1 n Y 0: . We use CVX solver package [8] to solve the above SDP, obtaining a positive semidenite matrixY2R nn , since it's positive semidenite, one can perform a Cholesky decomposition 7 in order to get the corresponding upper triangular matrix U 2 R nn . Then randomly generate two independent n 1 Gaussian vectors g 1 and g 2 with identity covariance matrix in order to project the matrix U into a 2-dimension plane. Take inner product of U and g 1 , g 2 such that P 1 = U g 1 P 2 = Ug 2 and letP 1 be a vector containing rst coordinates of the projections,P 2 be the one containing second coordinates of corresponding the projections. The gure below shows the partition and projections for 50 nodes. As shown in the gure, the vertices has been partitioned into three areas indicated by dotted lines, as expected giving an intuitive and direct outcome of the cut. Figure 2: Projection map for Max-3-Cut, n = 50 8 Chapter 2: Graph neural networks Yao, Bandeira and Villar (2019) [12] give an alternative way of solving Max-Cut. They altered the Line Graph Neural Networks by Chen et al.(2017) [4], which was designed for community detection on graphs. To remain consistent with experiment results of the Ban- deira paper all graphs generated in this thesis are obtained by performing regular graph random generation. A simple blueprint of Bandeira et al.(2019) [12] paper about graph neural network pro- gramming consists of several parts. The data set, in this scenario, the graph was rst generated by a graph generator, which was established as a separate le in their work. In this case, the graph will be generated as a regular graph, uniformly at random among all n vertex d-regular graphs. The adjacency matrix and the corresponding Laplacian matrix were then generated and provided to the main le. GNN model will be established based on input arguments including number of nodes, edge density, etc. An important input is the k. By default it's two, indicating that the program deals with the max-cut problem. For Max-3-Cut, the number of class will be changed to three. Once it is changed, tensors related to the graph will be generated in nk 3 size. A tensor is similar to a matrix, instead that a matrix is in 2-dimension, which means rows and columns, and a tensor has m-dimension with m be a positive integer. A matrix is a 2-dimensional tensor, and a 3-dimension tensor is an array with three discrete dimensions. The main program then ran the stochastic gradient descent on every single training set and took the average. Parameters were trained during the process. In the case of Max-Cut, whenk = 2, the loss function of neural networks in the model is dened as maxL() = max 1 T X tT l((G t ; )) with the training setsfG t ;y t g tT , and l((G t ; )) = 1 4 x T L(G) x: (6) The training sets are used to train the model with a total number of T sets. Every training set has a regular graph G t with 1 t T . is a set of all parameters in the neural networks which is needed to optimize. (G t ; ) is denoted as the neural network model trained by a graphG t with parameters . TheT training sets were given one by one to the model and parameters were learned during every training process and passed to the next run. L(G) is the Laplacian matrix of this graph which is dened asL(G) = diag(G)A(G) where diag(G) is a diagonal matrix whose (i;i) entry is equal to the degree of ith vertex in the graph G and A(G) is the adjacency matrix. The above matrices will be generated and computed for every single training set. Here the degree of a vertex refers to the total number 9 of vertices that the vertex connects to and x = (x 1 ;x 2 ;:::;x n )2f1; 1g n is a vector. This form is similar to the function in SDP part corresponding to the situation where k = 2. The equation (6) is not dierentiable, so by making use of a probability matrix (x)2 [0; 1] n2 with ij = P(x i = j), a relaxation method can be performed in order to do a stochastic gradient decent method between training sets. The probability matrix is of size n k 2. The last dimension indicates there are possible outcomes for every node, either 1 or1. For example,(3; 1) = 0:3 means that the possibility the 3rd node being partitioned into S 1 is 0.3. The relaxation method makes use of another probability distribution by randomly taking an option such as x = 1 and let p i = P(x i = 1). Substitute x with 2p 12 [1; 1] n and then the loss function becomes l((G t ; )) = 1 4 (2p 1) T L(G) (2p 1): Then the stochastic method shall be applied to the model since now the loss function is dierentiable with respect to parameters . The detailed implementation is in the Bandeira et al.(2019) [12] paper. Whenk = 3, the Max-3-Cut, some changes should be made compared to the case k = 2. The probability matrix becomes (x) 2 [0; 1] n3 since now every node will have three possible labels and the sum of these three nonnegative labels is 1. Instead of using the previous relaxation method, one can make use of the loss function in SDP part which is max 1 2 tr( T L(G)): Dembo et al. (2017) [5] proved that in Max-Cut problem for random regular graphs, as n!1, the cut size goes to n( d 4 +P r d 4 +o d ( p d)) +o(n) with P 0:7632 for k = 2, an universal constant. For any candidate partition x = (x 1 ;x 2 ;:::) and corresponding size of Max-Cut can be evaluated by computing its corresponding P value P = z=nd=4 p d=4 : The larger theP , the better the model performs. When P is close toP , the cut trained by either semidenite programming or graph neural network is expected to be among the best possible cut of the graph. 10 For Max-3-Cut, the corresponding formula for P should be changed to P = z=nd=3 p d=3 (7) and we will use it to calculate P values models trained by semidenite programming and graph neural networks. 11 Chapter 3: Performance evaluation The following performance result are performed on a semidentite program and graph neural networks for Max-3-Cut. We use the equation of P value (7) for Max-3-Cut to compute the followings. For xed number of nodesn = 100, an decrease in degrees will give a increase inP values. Table 1 shows the performance of dierent methods with various degrees of xed node size n = 100. n d Graph neural network Semidenite programming 100 10 0.5834 0.7117 5 0.6082 0.7432 3 0.6414 0.7841 Table 1: Computed P values for dierent methods with xed number of nodes n = 100 and various degrees d When the number of degree is xed as d = 5, we have the following table by changing the number of nodes. d n Graph neural network Semidenite programming 5 50 0.5955 0.7070 100 0.6084 0.7639 150 0.6357 0.8070 Table 2: Computed P values for dierent methods with xed degrees d = 5 and various numbers of nodes n Conclusion The performance result shows that the semidenite program performs better than graph neural network in optimizing the Max-3-Cut problem. The general runtime for semidenite programming is less than one second, and graph neural networks usually takes similar time to run a single train set. While graph neural networks use many training sets in the model, the total amount of time to get a good approximation with graph neural networks are much more than the time required for semidenite programs. Enabling the GPUs in the computer will be a nice attempt in order to reduce the runtime for graph neural networks. 12 References [1] Emmanuel Abbe. Community detection and stochastic block models: recent develop- ments. arXiv e-prints, arXiv:1703.10146, 2017. [2] Emmanuel Abbe and Colin Sandon. Poly-time universality and limitations of deep learning. arXiv e-prints, arXiv:2001.02992, 2001. [3] S. Boettcher and A. G. Percus. Extremal optimization for graph partitioning. Physical Review E, 64, 026114, 2001. [4] Z. Chen, X. Li, and J. Bruna. Supervised community detection with line graph neural networks. arXiv e-prints, arXiv:1705.08415, 2017. [5] A. Dembo, A. Montanari, and S. Sen. Extremal cuts of sparse random graphs. The Annals of Probability, 45(2): 1190{1217, 2017. [6] A. Frieze and M. Jerrum. Improved approximation algorithms for max k-cut and max bisection. Algorithmica, 18(1):67-81, 1997. [7] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maxi- mum cut and satisability problems using semidenite programing. Journal of the ACM (JACM), 42(6):1115{1145, 1995. [8] Michael C. Grant and Stephen P. Boyd. Software for disciplined convex programming. [9] Omar Montasser, Surbhi Goel, Ilias Diakonikolas, and Nathan Srebro. Eciently learn- ing adversarially robust halfspaces with noise. arXiv e-prints, arXiv:2005.07652, 2005. [10] E.R. van Dam and R. Sotirov. New bounds for the max-k-cut and chromatic number of a graph. arXiv e-prints, arXiv:1503.06595v2, 2015. [11] Lieven Vandenberghe and Stephen Boyd. Semidenite programming. SIAM Review, Vol. 38, No. 1 (Mar., 1996), pp. 49-95, 1996. [12] Weichi Yao, Afonso S. Bandeira, and Soledad Villar. Experimental performance of graph neural networks on random instances of max-cut. arXiv e-prints, arXiv:1908.05767, 2019. 13 
Asset Metadata
Creator Zhou, Xingyun (author) 
Core Title Max-3-Cut performance of graph neural networks on random graphs 
Contributor Electronically uploaded by the author (provenance) 
School College of Letters, Arts and Sciences 
Degree Master of Science 
Degree Program Applied Mathematics 
Publication Date 11/22/2020 
Defense Date 11/06/2020 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag graph neural networks,Max-Cut,OAI-PMH Harvest,semidefinite program 
Language English
Advisor Goldstein, Larry (committee chair), Fulman, Jason (committee member), Heilman, Steven (committee member) 
Creator Email xingyun@usc.edu,zhouxingyun@outlook.com 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-c89-396036 
Unique identifier UC11668695 
Identifier etd-ZhouXingyu-9143.pdf (filename),usctheses-c89-396036 (legacy record id) 
Legacy Identifier etd-ZhouXingyu-9143-0.pdf 
Dmrecord 396036 
Document Type Thesis 
Rights Zhou, Xingyun 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law.  Electronic access is being provided by the USC Libraries in agreement with the a... 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Abstract (if available)
Abstract The Max-Cut problem asks for the partition of a graph into two sets that maximizes the total weight of edges going between the two sets. The Goemans-Williamson semidefinite program approximately solves the max-cut problem, and this algorithm is conjectured to be the best possible, assuming the Unique Games Conjecture. Given much recent interest in neural networks, it is then interesting to see how well a neural network can solve NP-hard optimization problems such as max-cut. This was accomplished by Bandeira et al. in 2009. This thesis discusses an extension of the work of Bandeira to solve the more general Max-3-Cut problem, which asks for the partition of a graph into three sets that maximizes the total weight of edges between the three sets. 
Tags
graph neural networks
Max-Cut
semidefinite program
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button