Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
EFFICIENT ALGORITHMS FOR CONNECTION-ASSIGNMENT IN INTERCONNECTION NETWORKS by Suresh Babu Chalasani A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Engineering) August 1991 Copyright 1991 Suresh Babu Chalasani UMI Number: DP22813 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. Dissertation Publishing UMI DP22813 Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106 -1346 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90089-4015 This dissertation, w ritten by under the direction of Dissertation Committee, and approved by all its members, has been presented to and accepted b y The Graduate School, in partial fulfillm ent of re quirements for the degree of ?h.D. CpS S u resh Babu C h a la sa n i D O C TO R OF PH ILOSOPH Y Dean of G raduate Studies DISSERTATION COMMITTEE Chairperson D ed ication To my parents A ck n ow led gem en ts I shall remain indebted to Professor C.S. Raghavendra for his invaluable guidance j and support throughout my stay at USC. It is very rare that one comes across a person as nice as Professor Raghavendra, and I know I shall never be able to thank him enough for the encouragement he has given me. I shall always consider it a privilege to have known a researcher as gifted as Professor Anujan Varma at the University of California at Santa Cruz. But for his teaching and guidance, this dissertation would have remained a distant dream. I owe special thanks to Professor Prasanna Kumar and Professor Charles Lan sky who were kind enough to be on my dissertation committee. I learned a great deal from the discussions with Professor Prasanna Kumar. I am also indebted to Professors Alexander Sawchuk and John Silvester for providing me with beneficial inputs during the initial phases of my research. Mr. Gandhi Puvvada has taught me immensely by way of his example. His continued striving for perfection never ceases to amaze me. I was lucky to be a member of Professor Raghavendra’s research group at USC. I received more knowledge than I could give from Meera Balakrishnan, Rajendra Boppana, Ge-Ming Chiu, Hwa-Chun Lin, Sing-Ban Tien and Pie-Ji Yang. I re ceived the benefits of continuous technical interaction with Rajendra Boppana who was extremely helpful in improving the quality of my work. I am also thankful to Sundaresan Chandrashekhar, Amitava Majumdar, Bangalore M anjunath, Sharad Mehrotra, Dhabaleswar Panda, Krishnan Ramamurthy, Rajagopalan Srinivasan and Ramana Yemeni with whom my technical discussions proved useful. I enjoyed the company of a large number of selfless friends to whom I shall always remain thankful. Lucille Stivers has been a very special friend whose benev olence no words of mine can fully describe. Ken Zabel, who had been courageous enough to be my roommate for what appeared to be an infinite amount of time, always strove to reduce my sorrow. Sridevi Koritala, though she entered my life very recently, always sent me her stream of affection in the depth of secrecy. I would like to acknowledge the financial support received from NSF Grant No. MIP-8452003, DARPA/ARO Contract No. DAAG29-84-K-0066 and ONR Contract No. N00014-86-K-0602. I am also thankful to the IBM Corporation for awarding me an IBM Graduate Fellowship for the academic year 1990-91. iv C o n ten ts D ed ication ii A cknow ledgem ents iii List O f Figures vii A bstract ix 1 Introduction 1 1.1 Different Interconnection Strategies . . . . . . . . . . . . . . . . . . 2 1.2 Connection-Assignment Problem ..................... 6 1.3 Overview of the D issertation....................... 9 2 C onnection-A ssignm ent in Faulty O ne-Sided C rosspoint N etw orks 11 2.1 In tro d u ctio n .................... 11 2.2 Upper Bound on Crosspoint F a u l t s ........................... 14 2.3 Nonblocking O p eratio n ....................... 18 2.4 Rearrangeable O p eratio n ............................................................................. 28 2.4.1 Formulation of the general problem as a matching problem . 29 2.4.2 Rectangular Configurations ..................................................... . 31 2.4.3 k-Rearrangeable Trapezoidal C onfiguration.............................. 39 2.5 Extension to Multiple-Bus S y s te m s .......................... 44 3 A lgorithm s for T im e-Slot A ssignm ent in H ierarchical Sw itching S ystem s 48 3.1 Introduction ....................... 48 3.2 Problem Form ulation.......................... 49 3.3 A Recursive TSA A lgorithm ................................................. 54 3.4 Parallel TSA A lgorithm s........................................................ 65 3.4.1 A PRAM A lg o rith m ..................................................................... 66 3.4.2 A General Algorithm for P < L j2 Processors ........................ 68 3.4.3 A Hypercube A lg o rith m .............................................................. 69 3.5 TSA algorithms for SS/TDMA System s.................................................. 73 v 4 T im e-S lot A ssignm ent A lgorithm s for S S /T D M A System s w ith V ariable-B andw idth B eam s 77 4.1 Introduction .............................................................................. 77 4.2 Problem F orm ulation................................................................................ 80 4.3 A Recursive TSA A lg o rith m ........................................................... 82 4.4 An Efficient Sequential TSA A lgorithm ....................................... 89 5 A n Increm ental A lgorithm for T im e-Slot A ssignm ent in T D M Sw itching System s 95 5.1 In tro d u ctio n ............................ 95 5.2 Equivalence of the TSA P ro b le m s................................................. 96 5.3 Incremental TSA Algorithm . . . . . . . . . . . . . . . . . . . . . . 107 6 C oncluding Rem arks 124 A p p en d ix 127 B ibliography 129 vi L ist O f F igu res 1.1 A two-sided crosspoint switch.......................................... ........................... 4 1.2 A one-sided crosspoint matrix with 32 ports. . . . . . . . . . . . . . 5 1.3 A simple TDM switching system............................................................... 8 2.1 A crosspoint m atrix with fault set {(0, 1), (0,2), (0,3), (1,3)}................ 15 2.2 Illustration for the proof of Theorem 1...................................................... 17 2.3 A nonblocking configuration......................................................................... 20 2.4 A maximal connection set and its allocation graph................................. 23 2.5 (a): A maximal connection set. (b): its allocation graph, (c): allo cation graph after the pairwise exchange of connections {0, 2} and {1,3}. Only the active and the faulty crosspoints are shown.............. 26 2.6 (a) A faulty crossbar, (b) The graph G c,f for the connection set {{0,5}, {1,7}, {2,3}, {4,11}, {6, 8}, {9,10}}. .............. 30 2.7 A rectangular configuration. .................... 32 2.8 A general rectangular configuration with multiple faulty regions. . . 33 2.9 Illustration of the rearrangements in Case 4(b) of Theorem 5. . . . 37 2.10 A rectangular configuration with 25 percent faulty crosspoints. . . . 38 2.11 A trapezoidal configuration for A " = 12............................................... 40 2.12 Example for illustration of algorithm Rearrange. Only the active and the faulty crosspoints are shown.................................... 41 2.13 The triangular configuration for N = 12............................................ 43 3.1 A general TDM hierarchical switching system. . ............................... 50 3.2 General traffic network G t ,l x corresponding to an HSS (Refer to Table 3.1 for the lower-bounds and capacities on the arcs.)................ 55 3.3 A traffic m atrix T and the corresponding traffic network Gt,l x for Li = 5.................................................................... 57 3.4 A circulation in the network of Figure 3.3............................................... 63 3.5 Traffic network G 'TLl corresponding to an SS/TDMA system. . . . 76 4.1 An SS/TDMA system with variable-bandwidth beams.......................... 78 4.2 General traffic network G t ,l x corresponding to an SS/TDMA system with variable-bandwidth beams.................................... 82 vii 4.3 A traffic m atrix T and the corresponding traffic network Gt,Li for L i = 5 . . . .................................................................................................. . 85 4.4 A circulation in the network of Figure 4.3.................. 86 5.1 Algorithm row-fill...............................................................................................100 5.2 (a) A traffic matrix; (b) the corresponding HSS..........................................101 5.3 Results of the row-fill (a) and column-fill (b) algorithms on the traffic m atrix of Figure 5.3.......................................................................................... 102 5.4 Algorithm column-fill. . ............................................................................. 105 5.5 Clos network model for the TSA problem.................................................... 108 5.6 Clos network model for a traffic matrix with M ' = 6 and L = 5. . . 110 5.7 Algorithm deallocate................................. 113 5.8 Clos network model after removing connections corresponding to the A m atrix of equation (5.18). ...............................................114 5.9 Graph model G (X y Y, E) corresponding to switching matrices 54 and S§ of equation (5.19)...................................... 117 5.10 Algorithm allocate.................. 119 viii A b stract This dissertation addresses the problem of assigning connections in interconnection networks. Connection-assignment problem in two different classes of systems is investigated. In a system of the first class, processors are interconnected using a one-sided crosspoint switching network. In this system, we study the problem of assigning connections when faulty crosspoints exist in the network. We provide a complete characterization of the faulty crosspoint networks that can operate in the nonblocking mode. We show that the one-sided crosspoint networks can tolerate as many as 50 percent faulty crosspoints in the rearrangeable mode of operation. Realizing a connection set on a crosspoint network in the presence of an arbitrary fault-set is modeled as a graph matching problem. Two special distributions of faults, namely rectangular and trapezoidal, which allow easy determination of the rearrangeability are also investigated. All results derived in the context of one-sided crosspoint networks are extended to the case of multiple-bus systems. The second class of systems that we consider consists of satellite-switched time-division multiple access (SS/TDMA) systems. Hierarchical switching systems (HSS’s) and SS/TDMA systems with variable bandwidth beams are two examples of these systems. We provide efficient algorithms for connection assignment using optimal number of time-slots in these switching systems. We design sequential, parallel and incremental-sequential algorithms for time-slot assignment (TSA) in these switching systems. Our sequential algorithms are asymptotically faster than the existing sequential algorithms, while parallel algorithms for the TSA problem are proposed for the first time in this dissertation. Our sequential and parallel algorithms are based on modeling the TSA problem as the problem of finding a series of maximum-flows in networks. The incremental-sequential algorithm that we design is shown to achieve considerable speed-up over the other algorithms, when traffic demands in consecutive frames overlap to a significant extent. IX C h ap ter 1 In tro d u ctio n Due to recent advances in VLSI technology and the parallelism inherent in many scientific and engineering computations, parallel processing has been proposed for building high performance computers. For any parallel or distributed system, inter processor communication is a major design issue because a specific communication system might support one application well but might prove inefficient for others. As the communication system is the only means using which processors can cooperate in executing a task, achieving high utilization and reliability of the communication system is a crucial prerequisite for achieving high performance. General-purpose parallel computer systems can be divided into multiproces sors and multicomputers. In multiprocessor systems, all the processors address a common main memory space and processors communicate by sharing the main memory. In multicomputer systems, each processor has its own memory space and interaction between processors relies on message passing between the com municating processors. Intel’s iPSC is a commercially available multicomputer system. Multiprocessors are further divided into tightly coupled and loosely cou pled multiprocessors. In a tightly coupled multiprocessor, the main memory, which is partitioned into several modules, is placed at a central location so that the access time from any processor to any memory module is the same. Examples of tightly coupled multiprocessor systems include C.mmp of Carnegie Mellon University and Denelcor’s HEP system [24]. Loosely coupled systems are characterized by the fact that a main memory module is attached to each processor and accesses to a remote main memory module consume more time compared to accesses to the module that 1 j is associated with the processor. Examples of loosely coupled systems include the • Cm* of CMU, and the RP3 of IBM. i 1.1 D ifferent In terco n n ectio n S tra teg ies The communication system or the interconnection network (IN) of a multiproces sor system must i) be cost-effective and reliable, ii) be amenable to easy control, and iii) provide reasonable performance. A very simple, low-cost interconnection scheme is employing a common bus to connect all processors and memory mod ules in the system. Such a shared bus allows only one transfer between processors and memory modules in a single time unit. Hence a shared bus, when used to interconnect a large number of processors and memory modules, offers poor per formance. Moreover, failure of the shared bus can bring down the multiprocessor system. Crosspoint switching networks (or simply crossbars), when used to connect N processors with N memory modules, require 0 ( N 2) crosspoints but are capable of supporting all possible distinct connections between processors and memories simultaneously. Crosspoint switching networks are used extensively for communi cation switching and for computer interconnection. In static interconnection schemes, each processor is connected to a subset of other processors using dedicated links. Shared bus, hypercube [44], and pyramid [49] interconnection schemes fall in this category. Dynamic networks are character ized by the fact that their internal states can be modified to establish several dif ferent connections between processors and memory modules. Dynamic, multistage interconnection networks (MINs), which consist of cascaded stages of switching elements, were proposed in literature with the intention of achieving a reasonable balance between the crossbar and shared bus schemes in terms of cost and perfor mance. Most of the MINs proposed in literature employ n stages of 2 x 2 switches with N f 2 switches per stage, when N equals 2n. Examples of such MINs include Omega [29] and indirect binary n-cube [40] networks. Though the cost of any typical MIN increases as 0 ( N log2 N ), as opposed to the 0 ( N 2) growth-rate for crossbars, MINs either do not provide the complete connection capability provided 2 by the crossbars or are not amenable to easy control. Hence, in the rest of this the sis we concentrate on parallel and distributed systems that use crossbar networks for interconnection. C lassification of crosspoint sw itching networks. Crosspoint switching net works can be one-sided or two-sided. In its general form, a two-sided crosspoint switch with N input terminals and M output terminals consists of N • M cross- points organized as a m atrix with N rows and M columns. A connection between input i and output j is established by activating a switching gate, or crosspoint, placed at the intersection of row i and column j. Small crosspoint networks (for example, of size 16 x 16 or 32 x 32) can be constructed on a single chip. Larger matrices are assembled by partitioning the matrix into smaller rectangular blocks and assigning each block to a chip. For the purpose of illustration, Figure 1.1 shows a two-sided crosspoint m atrix with 16 inputs and outputs constructed from sixteen 4 x 4 switching chips. An im portant property of a two-sided crosspoint switch is the existence of a unique path between every pair of input and output terminals. Although this renders them easy to control, the unique-path property makes them vulnerable to faults. The failure of any crosspoint in a two-sided m atrix disconnects one input- output pair. An alternate design of crosspoint networks, namely the one-sided crosspoint networks, affords improved fault-tolerance by providing multiple paths for setting up a connection between two terminals [18, 52, 54]. In a one-sided crosspoint switch, each port has a pair of input/output lines lead ing to the switch. Connections between these lines are set up by means of internal lines (buses) running perpendicular to them. These two sets of lines, namely the input/output lines and the internal buses, form the horizontal and vertical lines of a switching m atrix, with switches placed at the points of intersection. The archi tecture and implementation details of one-sided crosspoint networks are given in [18]. A nonblocking one-sided switch with N input/output ports can be obtained by means of a m atrix with N horizontal lines and N /2 vertical lines (buses). For the purpose of implementation, the switching m atrix is partitioned into chips, each chip consisting of n horizontal lines and n/2 vertical buses. Figure 1.2 shows a one sided switch with N = 32 input/output ports and 16 internal buses. The switch 3 I N P U T P 0 T S OUTPUT PORTS Figure 1.1: A two-sided crosspoint switch. m atrix is constructed out of sixteen crosspoint chips, each of size 8x4. Note that the crosspoint in a one-sided matrix is more complex than that in a two-sided ma trix. Each horizontal line in Figure 1.2 represents a full-duplex channel and each vertical bus a pair of bidirectional lines. A connection between a source terminal and a destination terminal is established by finding an unused column bus in the m atrix and then turning on the two crosspoints where the source and destination rows intersect with the selected bus. Since any unused column bus can be selected for making a connection between two ports in the m atrix, this design provides better fault-tolerance as compared to a two-sided matrix. C lassificatio n of th e sy stem s co n sid ered in th is d isse rta tio n . The first class of systems that we consider consists of multiprocessor and multicom puter systems. 4 0 INTERNAL BUSES 15 31 Figure 1.2: A one-sided crosspoint m atrix with 32 ports. In a system of the first class, processors are interconnected using a one-sided cross- point switching network. The second class of systems that we consider are the tim e division multiple access (TDMA) systems that are widely used in satellite j switched (SS) and terrestrial systems. Time-division multiplex (TDM) switching systems are widely used in terrestrial and satellite communication because of the need to concentrate traffic from many low-bandwidth sources into high-speed chan nels. These systems typically consist of a number of input channels connected to a number of output channels by means of a two-sided crossbar switch. The traffic in each channel is in the form of frames, and each frame is divided into time-slots. A time-slot represents the basic unit of traffic that is carried by the channels in one unit of time. 5 1.2 C o n n ectio n -A ssig n m en t P ro b lem A connection-request is a request to connect two ports in the system. Two examples of connect ion-requests are as follows. • Request to establish a connection between an input port (usually a processor) and an output port (usually a memory module) via a two-sided crosspoint network. • Request to establish a connection between two ports (usually processors) via a one-sided crosspoint network. Duration of each connection-request is assumed to be finite and is assumed to be independent of the durations of other connection-requests. The connection- assignment problem consists in realizing connection-requests through the network. Connection-assignment problem can be solved either in a centralized fashion or in a distributed fashion. In the centralized scheme, a central controller decides how to realize the connection-requests through the network. Distributed schemes do not employ a centralized controller. In a distributed control scheme, ports and/or the network-elements cooperate in realizing the connection-requests. Centralized connection-assignment algorithms consume more time than the distributed algo rithm s, but distributed connection-assignment algorithms are difficult to develop. In this dissertation, we develop centralized algorithms for connection assignment in two different types of systems. C onnection-assignm ent in m ultiprocessor system s. A system in the first class interconnects processors using a crosspoint switching network. Only one path exists between an input processor and an output processor when a two-sided crossbar is used for interconnecting multiple processors. Hence, assigning any new connection can be achieved in a distributed manner by using suitable logic circuits at each individual crosspoint. A connection-request between any two processors (or ports), when a one-sided crossbar is used for interconnecting m ultiple processors, can be realized using any free bus in the network; thus a centralized controller can allocate a bus for every new connection-request by maintaining a list of free buses in the network. We realize that connection-assignment in non-faulty one sided crosspoint networks is fairly straightforward. Our main interest in Chapter 2 6 is assigning connection requests between ports in a one-sided crosspoint network when faulty crosspoints exist in the network. Connection-assignment in a two-sided crosspoint network when faulty crosspoints exist in the network is well understood (see, for example, [45, 13]) and will not be addressed in this dissertation. C onnection-assignm ent in S S /T D M A system s. A common type of TDM switching systems employed in satellite networks is the SS/TDM A (satellite switched time-division multiple access) system [22]. In an SS/TDM A system, a switch on-board the satellite connects a set of uplink communication channels from earth stations to a set of downlink communication channels. The traffic in each channel is in the form of TDM frames. The switch can be reconfigured during each time-slot to allow traffic from any incoming communication channel to be routed to any outgoing channel. We refer to such a TDM switching system as a simple TDM switching system. Figure 1.3 shows a simple TDM switching system with four input channels and the same number of output channels. During each time-slot, the central switch in the system can receive one unit of traffic from each incoming channel and transm it one unit of traffic to each outgoing channel. For example, the frame shown in Figure 1.3 consists of six slots and the number within each slot in the figure indicates the destination of the traffic in that slot. W ith the assignment of the traffic units to time-slots shown in Figure 1.3, the switch receives exactly four units of traffic from the input channels during each tim e slot, and each unit is addressed to a distinct outgoing channel; the switch can therefore transm it all the traffic to the output channels without conflict. A conflict occurs if the traffic-units are assigned to the slots such that two traffic-units addressed to the same outgoing channel arrive at the switch at the same time. The time-slot assignment (TSA) problem in a TDM switching system is to find a conflict-free assignment of traffic-units to slots such that the frame-length is minimized. Such an assignment that minimizes the frame-length over all possible conflict-free assignments is referred to as an optimal TSA. In this dissertation, we consider only optimal time-slot assignments. Thus, given the traffic requirements of the system, that is the number of traffic units to be transm itted from each incoming channel to each outgoing channel, the function of a TSA algorithm is to compute an optimal TSA for the system. 7 3 2 2 3 1 1 2 1 1 1 3 3 1 3 4 4 4 4 4 4 3 2 2 2 INPUT CHANNELS CENTRAL SWITCH OUTPUT CHANNELS Figure 1.3: A simple TDM switching system. TSA problems have been widely studied for various configurations of the switch ing system [16, 31, 8, 11, 22, 20]. In all the systems studied, the problem can be modeled m athematically as that of decomposing a m atrix representing the overall communication-requirements into constituent matrices. Each of the constituent matrices produced should satisfy a certain set of constraints and the constraints vary according to the structure of the system. Each constituent m atrix repre sents a portion of the total traffic, which requires a certain number of time-slots to transm it. The sum of the slots needed by the constituent matrices gives the frame-length. An optimal TSA algorithm always finds an assignment that min imizes the frame-length over all feasible decompositions satisfying the specified constraints. Optimal TSA algorithms are known for many TDM switching config urations [16, 31, 8, 11, 12, 22, 9, 20]. TSA algorithms that are currently available in literature are computationally intensive. For example, the best-known sequential algorithm, due to Inukai, for finding TSA in a simple TDM system has a worst-case time-complexity of 0 ( N 4'5) for N users [22]. The contribution of this dissertation towards finding an optimal TSA in TDM switching systems is three-fold: 8 1. First, we provide parallel algorithms for finding an optimal TSA in a general class of SS/TDMA switching systems known as the hierarchical switching systems (HSS’s). 2. We provide sequential and parallel algorithms for time slot assignment in another class of switching systems known as the SS/TDM A systems with variable bandwidth beams. 3. Finally, we provide algorithms for incremental computation of TSA in hier archical switching systems. 1.3 O verview o f th e D isserta tio n This dissertation addresses the issue of connection-assignment in one-sided cross- point switching networks and SS/TDMA switching systems. We present several algorithms for connection-assignment in these systems. In Chapter 2, we address the problem of assigning connections in faulty one sided crossbars. We establish upper bounds on the size of a fault set to sustain a given level of connectivity. Two modes of operation — namely nonblocking and rearrangeable — are considered. We show that, in an N x N /2 one-sided crossbar, at most N /2 — 1 faulty crosspoints can be tolerated in the case of nonblocking operation. We also show that any distribution of these faults with no more than one fault per column of the m atrix is nonblocking and present an algorithm for allocating connections for any such distribution. We then show a formulation of the general problem of determining the realizability of a connection set on a crosspoint network in the presence of an arbitrary fault-set as a graph-matching problem. We also define two special distributions of faults, namely rectangular and trapezoidal, which allow easy determination of the rearrangeability of the network. The rectangular configurations allow as many as 25 percent of the crosspoints to be faulty and limit the number of rearrangements to two in the worst case. We also introduce algorithms for rearrangement of connections in these networks in the presence of faults. Finally, we extend the results to multiple-bus systems where 1 < M < N /2 buses are used to interconnect N processors. 9 In chapters 3, 4 and 5 we address the time-slot assignment problem in different SS/TDM A systems. In Chapter 3, we present parallel and sequential algorithms for computation of time-slot assignments (TSA) in a general class of switching systems called hierarchical switching systems (HSS’s). These algorithms are based on mod eling the the time-slot assignment problem as a network-flow problem. Previous algorithms for finding an optimal time-slot assignment in these switching systems are inherently sequential and no parallel algorithms are known for this problem. We also present a sequential algorithm for computing time-slot assignment in HSS’s. In Chapter 4, we present parallel and sequential time-slot assignment algorithms for SS/TDM A systems with variable bandwidth beams. Our algorithms are again based on modeling the time-slot assignment problem as a network-flow problem. Our sequential algorithms improve on the existing sequential algorithms, while our parallel algorithms first parallel algorithms devised for this problem. In Chapter 5, we present an incremental algorithm for scheduling traffic in hierarchical switching systems. Instead of recomputing the time-slot assignment for each frame of traffic, our algorithm computes a TSA for a new frame by modifying the known TSA of the previous frame. The incremental TSA algorithm uses a two-step process. The first step transforms the TSA problem in the HSS into an equivalent TSA problem in a simple TDM switching system. The second step uses an incremental algorithm to find a TSA for the latter. The second step exploits the correspondence between the TSA problem and the rearrangement problem in Clos three-stage network. When the traffic demands in consecutive frames overlap to a significant extent, the incremental algorithm provides considerable speedup over the previous algorithms. Chapter 6 concludes this dissertation. 10 C h a p ter 2 C o n n ect io n -A ssig n m en t in F au lty O n e-S id ed C ro ssp o in t N etw ork s 2.1 In tro d u ctio n We observed that, because of the unique-path property, the loss of any single crosspoint in a two-sided m atrix introduces blocking. However, in the case of a one sided crosspoint m atrix with N ports and N /2 buses, any of the N /2 buses can be used to connect a given pair of terminals. This brings forth an interesting question: Can this inherent redundancy in the architecture be utilized to prevent blocking when some crosspoints are faulty? In this chapter we try to answer this question by studying the effect of multiple crosspoint faults on the connection capability of the network. We derive necessary and sufficient conditions on the number and distribution of crosspoint faults to sustain a certain level of connectivity. Further, we develop algorithms for assigning connections when faulty crosspoints exist in the network. We will refer to the m atrix with N ports, N /2 buses, and a crosspoint at each of the N 2/ 2 intersection points as a complete or full configuration. The residual configuration after removal of one or more faulty crosspoints will be referred to as a partial configuration. The number of crosspoints that can be removed from a complete configuration depends on the degree of connectivity desired of the residual configuration. Following the classification by Benes [7], we can define the following three modes of operation. 11 1. Strictly nonblocking: The matrix is strictly nonblocking if, for every possible pairing of a set of idle terminals, the pairs can be connected in any sequence through any unused bus existing at that time. It is clear th at any partial configuration violates this criterion, that is no faults can be tolerated in this mode. 2. Widely nonblocking: The network is widely nonblocking if a bus allocation algorithm exists to connect the idle pairs in any sequence. The algorithm is always able to allocate a bus for every new connection independent of the sequence in which ports are connected and disconnected. The only penalty introduced in this case as compared to strictly nonblocking operation is the overhead of the bus allocation algorithm. Some partial configurations satisfy this criterion, for example any partial configuration obtained by removing exactly one crosspoint from a complete configuration. 3. Rearrangeable: The network is rearrangeable if any pair of idle term inals can be connected, by reallocating some of the existing connections if necessary. This implies the existence of an algorithm to produce an assignment of buses for any choice of pairing of the N terminals among themselves. The rearrangeability criterion is less stringent than the other two. Since the strictly nonblocking case allows no loss of crosspoints, it deserves no further attention. Therefore, we will use the term nonblocking synonymously with widely nonblock ing. The problem of finding the maximum number of faulty crosspoints to sustain a given level of connectivity is equivalent to the problem of finding the minimum number of crosspoints to achieve the same level of connectivity. The latter problem has been studied by Mitchell and Wild [34], and by Varma and Chalasani [53] in the context of one-sided crosspoint networks. A lower bound of N 2/ 4 N — 1 crosspoints for rearrangeable operation have been derived in both [34] and [53]. This gives an upper bound of N 2 /4 — N + 1 faulty crosspoints for rearrangeable operation. Rearrangeable configurations satisfying this bound have been shown to exist [34]. However, no characterization of such minimal configurations is known. Our interest in this chapter is in finding the effect of a given set of faults on the 12 capacity of a one-sided crosspoint network, with respect to both nonblocking and rearrangeable modes of operation. We consider the general problem where the fault t set is arbitrary, as well as some special classes of fault distributions. A one-sided crosspoint network can be viewed as a special case of multiple-bus interconnection networks in which M buses are used to interconnect N processors. Lang, Valero, and Fiol studied the problem of minimizing the number of crosspoints in such multiple-bus systems when used for processor-memory interconnection [27]. Their analysis is based on a system-model where multiple buses are used to connect a set of processors to a set of memory units in a bipartite structure, requiring no connectivity between processors and between memory units. Our analysis, in contrast, assumes a system-model in which any network-port can be connected to any other port. This would be the case, for example, in a multiprocessor system if m ultiple buses were used to interconnect many processor cards, each with its local memory. The effect of the loss of crosspoints is more difficult to characterize in the case of such networks owing to the requirement of full connectivity among the ports. Construction of crosspoint networks with minimum number of crosspoints has also been investigated by Masson and Nakamura in the context of two-sided con centrators [33, 35]. A two-sided concentrator connects a number of input terminals to a smaller number of output terminals in a bipartite structure. Because of the topological difference between two-sided and one-sided crosspoint networks, these results do not extend readily to the latter. This chapter is organized as follows: A simple upper-bound on the number of crosspoint faults sustainable in either the nonblocking or the rearrangeable mode of operation is derived in Section 2.2. Nonblocking configurations are the topic of Section 2.3; we state and prove two conditions on a fault set that are both necessary and sufficient to achieve a partial nonblocking configuration. These allow a maximum of N /2 — 1 faulty crosspoints in the m atrix. A graph model is used to show the sufficiency of the conditions. Rearrangeable operation is covered in Section 2.4. We first consider the general problem of determining the realizability of a connection set in a one-sided crosspoint m atrix in the presence of an arbitrary fault set. We show a formulation of the problem as a graph-matching problem. 13 We then consider two special classes of fault-distributions, namely rectangular and trapezoidal, that allow easy determination of the rearrangeability. In Section 2.5, we generalize the results to multiple-bus networks employing 1 < M < N / 2 buses j to interconnect N processors. 2.2 U p p er B o u n d on C rossp oin t F au lts In this section we derive an upper bound of N 2 /4 — N + 1 for the num ber of faulty crosspoints in a one-sided crosspoint network with respect to the rearrangeable mode of operation. This will establish that any fault-set of cardinality exceeding 7V 2/4 — N + 1 results in a loss of rearrangeability. The same result was obtained in [34] while attem pting to characterize rearrangeable one-sided configurations with the minimum number of crosspoints. It is easy to observe that any upper bound on the number of faults for rearrangeable operation also serves as an upper bound for nonblocking operation. Similarly, any lower bound on the number of faults for nonblocking operation also serves as a lower bound for rearrangeable operation. Hence, the upper bound of N 2/4 — N + 1 faults is also an upper bound for the number of faults tolerable in the nonblocking mode of operation. A tighter upper- bound will be shown for the nonblocking case in Section 2.3. Before proceeding further, we pause to introduce the definitions and notations used in this chapter. The terms crosspoint m atrix and crossbar are used inter changeably to denote a one-sided crosspoint m atrix, unless stated otherwise. All crossbars considered consist of an even number of ports. In the illustrations of one-sided crossbar networks the horizontal lines represent ports and vertical lines represent buses. A faulty crosspoint at the intersection of a port and a bus is indicated by a circle and a fault-free crosspoint is implied otherwise. All the cross- point matrices considered have N /2 buses for N ports. The ports are numbered 0,..., N — 1 and buses 0,..., N /2 — 1. A crosspoint at the intersection of port i and bus j is often referred to by its address as Several types of faults can occur in the crosspoint m atrix. A stuck-at fault on one of the port lines makes the corresponding port unusable. Similarly, a stuck- at fault on a bus line renders the bus unusable for any connection. Such faults, 14 b u se s 1 2 3 O ' 1 ' 2 ' 3' 4' 5' 6 ' 7' ^ ( V ; v, / V t ) \ V ) Figure 2.1: A crosspoint m atrix with fault set {(0, 1), (0, 2), (0,3), (1,3)}. which result in the loss of an entire port or an entire bus, can be sustained only by providing extra hardware or by allowing some degradation in the form of blocking in the switch. Not all faults in the network, however, result in the loss of an entire port or bus. This is particularly true of multichip implementations where the effect of a fault in one of the chips can be confined to only the segment of the bus-line or port-line within the faulty chip. In addition, our interest in this chapter is to bring out the inherent redundancy present in the crosspoint m atrix that allows the m atrix to operate in the presence of certain types of faults without degrading its connection capability. Therefore, we model a physical fault in the network by a fault set which is a set of crosspoints affected by the fault. An open crosspoint, for example, corresponds to fault set with a single fault, and the failure of an entire bus is modeled by a fault set containing all the crosspoints on the failed bus. A faulty crosspoint (i , j ) is assumed to have failed in such a way th at it does not affect either the port line i or the bus line j . Note that this fault model is capable of describing a wide range of physical faults. W ith respect to this model, fault- tolerance is achieved if any given set of connections can be realized by the network such that none of the crosspoints in the fault set are used. 15 ] We define a fault set as a set of crosspoints, each represented by its address. A critical fault-set is a fault set that destroys either the nonblocking or the rear- rangeable nature of the crosspoint m atrix, depending on the criterion used. For example, Figure 2.1 shows a crosspoint m atrix with eight ports and four buses with the fault set {(0,1), (0,2), (0,3), (1,3)}. We will later show th at this fault set is critical with respect to the criterion of being nonblocking but is non-critical with respect to rearrangeability. Note that a critical fault-set with respect to the crite rion of rearrangeability is critical with respect to nonblocking as well. Similarly, a non-critical fault set with respect to the nonblocking criterion is also non-critical with respect to rearrangeability. A connection is represented as an unordered 2-subset {a, 6} of the set of ports {0,1,..., T V — 1}. The ordered connection < a, b > is an ordered pair of ports with a < b. A connection set is a set of connections such that no port appears more than once in all the connections taken together. Similarly, an ordered connection set is a set of ordered connections such that no port occurs more than once in the set. For example, {{0,1}, {10,3}, {6,4}, {5,11}} is a connection set for T V = 16 and {< 0,1 > , < 3,10 > , < 4,6 > , < 5,11 >} is the corresponding ordered connection set. A maximal connection set is a connection set of cardinality TV/2, that is a connection set in which all the ports are present. The number of distinct maximal connection sets is given by i= N /2 J I ( T V - 2 i + l ) = TV!/((TV/2)! 2 ^ 2) . *=1 The upper bound on the cardinality of a non-critical fault-set in a one-sided cross- point network is based on the two observations stated in the following theorem. T heorem 1 In any rearrangeable one-sided crosspoint matrix with T V ports and TV/2 buses, 1. Each o f the TV/2 buses has at most (TV/2 — 1) faulty crosspoints. 2. Every pair o f buses, taken together, has at most (T V — 3) faulty crosspoints. P roof. To prove the first part, consider a bus b that has exactly TV/2 faulty crosspoints. Then it is possible to partition the T V ports of the system into two 16 buses p o rts h Figure 2.2: Illustration for the proof of Theorem 1. equal subsets, one containing the N /2 ports that have fault-free crosspoints with the bus b and the second containing ports having no connectivity with b. Bus b cannot be used to connect any port in the first subset to any port in the second. This proves the first part of the theorem. To prove the second part, consider two buses b\ and b % having exactly (N /2 -f 1) fault-free crosspoints each. Now the set of N ports of the system can be partitioned into four subsets, as shown in Figure 2.2, depending on the existence of fault-free crosspoints on buses 61 and 62. In Figure 2.2, the subset S q contains the ports having no connectivity with either of the buses b\ and 62; Si includes all the ports th at have connectivity with bi but no connectivity with b % ] and so on. If |5;| denotes the cardinality of Si, we have |*$i| * F |-S's| = |£21 + |* 53 1 = N /2 + 1 (2-1) IS0I + I&I + IS2I + IS3I = N (2 -2 ) 17 Let |So| = k. From equations (2.1) and (2.2), l-Sil = l^ l = N /2 — k — 1] \S3\ = k + 2. We can construct a connection set C in which the N / 2 — k — 1 ports in Si are paired with the N /2 — k — 1 ports in S2, and the k ports in .So are paired with k ports from S3 . C has N /2 — 1 connections, none of which can be allocated to buses 61 or 62. Thus, the connection set C cannot be realized on the network. This concludes the proof of Theorem 1. ■ Theorem 1 is useful in obtaining an upper bound on the number of faulty crosspoints for rearrangeable operation in the following way: at most one bus can have (N /2 — 1) of the crosspoints faulty; each of the remaining buses can have no more than (N /2 — 2) faulty crosspoints. This yields an upper bound of N /2 (N /2 — 2) + 1 = N 2/ 4 — N + 1 faults for any rearrangeable N x N /2 crosspoint m atrix. Partial rearrangeable configurations satisfying the upper bound of N 2/4 — N + l faulty crosspoints have been shown to exist [34]. However, testing for rearrangeabil ity in the presence of arbitrary faults is time-consuming. We discuss rearrangeable operation of one-sided crosspoint networks in detail in Section 2.4. W hen faults are few in number, it may be possible to operate the network in nonblocking mode. We therefore consider the nonblocking mode of operation first. In the next section, we characterize nonblocking configurations of one-sided crosspoint networks. 2.3 N o n b lo ck in g O p eration In this section, we characterize critical and non-critical faults in an N x N /2 network with respect to the nonblocking mode of operation. We first show that a fault set is critical if (i) it contains more than N /2 — 1 crosspoints, or (ii) two of the crosspoints in the set belong to the same bus. We also show the sufficiency of these conditions by proving that any fault set of cardinality at most N /2 — 1 is non-critical if no two crosspoints in the set reside on the same bus. 18 To prove that a partial configuration is nonblocking, we need to show the exis tence of a bus-allocation algorithm that satisfies any arbitrary sequence of connect and disconnect requests. The algorithm responds to a connect request by allocat ing a bus (which is currently not in use) such that none of the faulty crosspoints lie at the intersection of the selected bus with the pair of ports to be connected. A disconnect request is processed by deallocating the bus used by the connection. If a sequence of such connect and disconnect requests can be constructed such that the bus allocation algorithm fails to satisfy a connection request, the network is blocking with respect to the bus allocation algorithm used. To show that the con figuration is blocking, we need to show that the sequence defeats any bus-allocation algorithm. The situation can be likened to a two-person game [48], where player A makes moves by removing existing connections and supplying new ones; player B is the bus-allocation algorithm. The network is nonblocking if player B can keep playing indefinitely by allocating a bus to every new connection supplied by A. However, if a winning strategy exists for A, that is a sequence of moves that causes B not being able to allocate a bus, the network is blocking. In general, it is difficult to construct such connection/disconnection sequences to defeat the bus-allocation algorithm. In many cases, simpler techniques can be used to prove that a configuration is blocking. For example, consider any max imal connection set C and let C = C\ U C2 be any non-trivial partition of C into two subsets, with \C\\ = k and \C2\ = N /2 — k. Let B\ = {61, 62? • • •» be the set of buses used by the allocation algorithm for connections in C\ and B 2 = {6fc+i, bk+2 , • • -, & ;v/2} be the buses allocated to C2 ■ The k buses in B\ and the 2 k ports in C % form a one-sided crosspoint m atrix. It is easy to see th at this m atrix must be nonblocking if the original configuration on N ports is nonblocking with respect to the bus-allocation algorithm used. Similarly, the buses in B 2 and the ports in C2 must form a nonblocking configuration with respect to the allo cation algorithm used. Further, if k = 2, the 4 x 2 configuration can sustain at most one faulty crosspoint to be nonblocking. Therefore, we can conclude that a configuration with N > 4 ports is blocking with respect to an allocation algorithm if a choice of two connections exist such that the four ports and the two buses assigned to them form a 4 x 2 m atrix with more than one faulty crosspoint in it. 19 p o r ts 0 - 1 - 2~ 3 - 4 “ 5 “ 6 “ 7 “ buses 1 2 3 Figure 2.3: A nonblocking configuration. We use this idea to establish an upper bound on the cardinality of a non-critical fault-set for nonblocking operation. T h eorem 2 Any nonblocking one-sided crosspoint matrix on N ports must satisfy the following: 1. Each bus contains at most one faulty crosspoint. 2. There is at least one bus with no faulty crosspoints. P roof. To prove the first part, assume that a bus b\ has two faulty crosspoints, say at (pi,&i) and (p2,&i)- Consider a bus-allocation for a maximal connection set C th at includes the connection {pi,p?}- Let b2 be the bus assigned by the allocation algorithm to the connection {piiP2} and let {p3,p4} be the connection assigned to the bus b\. Now ports Pi,P2 ,Pz,P4 and buses b\,b2 form a 4 x 2 m atrix with at least two faulty crosspoints. Since this is a blocking configuration, the larger configuration is also blocking. This proves the first part. The proof of the second part is by induction. The statem ent is trivially satisfied for N = 4, since no nonblocking configuration exists with two faulty crosspoints. Given that it is true for N — 2k we now show that it is true for N — 2k + 2 ports. Assume that the configuration on 2k -f 2 ports has exactly k + 1 faulty 20 crosspoints, one on each bus. We can find two ports, say p\ and P2 , with no faulty crosspoints and connect them. Let b\ be the bus assigned to this connection. We can now remove ports p\ and p2 and bus 61, leaving a smaller configuration with I 2k ports and k buses. Since each bus in this latter configuration has exactly one faulty crosspoint, it is blocking by induction hypothesis. Therefore, the initial configuration on 2k + 2 ports is blocking. This concludes the proof of Theorem 2.■ Since at least one bus needs all the N crosspoints to be fault-free and the remaining N /2 — 1 buses allow at most one fault each, Theorem 2 immediately establishes an upper bound of N /2 — 1 for the size of a non-critical fault-set. It is easy to see that non-critical fault-sets satisfying this bound exist. For example, if all the N /2 — 1 faulty crosspoints lie on a single port, the resulting configuration is nonblocking. Such a configuration for N = 8 is given in Figure 2.3, where all the faulty crosspoints are on port 0. This forces the allocation algorithm to use bus 0 for any connection involving port 0. However, when this connection and bus 0 are removed, the remaining configuration is a complete configuration on N — 2 ports. Thus, no blocking is introduced if bus 0 is always used to connect port 0. Although it is almost trivial to establish the nonblocking nature of the partial configuration in Figure 2.3, it is not so simple to establish the sufficiency of the two conditions in Theorem 2; that is to show that any fault set of cardinality at most N /2 — 1 is non-critical if no two crosspoints in the set reside on the same bus. In the rest of this section we prove this sufficiency for arbitrary distributions of faults by the use of a graph model to represent the allocation of connections in the m atrix. To show the nonblocking property of a partial configuration, we need to consider arbitrary sequences of connect and disconnect operations. Since we are interested in a worst-case situation, we assume that all the buses are in use at all times. This is achieved by introducing phantom connections for idle ports by choosing pairs of them at random. A bus is allocated for each phantom connection, though there is no physical connection between the ports involved. The connect and disconnect operations occurring in the switch can then be visualized as follows: W hen two ports are disconnected, the connection is still retained as a phantom connection on the same bus. Two possibilities can arise in the case of a connect operation (i) The 21 ports to be connected, say a and b, are currently paired as a phantom connection. The connect operation in this case results in no change of the connection set. (ii) Port a and b are not currently paired, which means there are two phantom connections of the form {a,c} and {b,d}. The connect operation in this case can be regarded as a pairwise exchange of two connections {a, e}, {6, d} resulting in new connections {a, 6}, {c, d}. In summary, a sequence of connect/disconnect operations can be modeled as a sequence of pairwise exchanges on the connection set. To prove that a configuration is nonblocking, it is sufficient to show the following two conditions. 1. There is some maximal connection set C which can be realized by the given configuration. 2. Any finite sequence of pairwise exchanges on C can be satisfied, that is the bus-allocation algorithm is able to allocate the connections after every pairwise-exchange operation. Since any connection set can be reached from any other connection set by a finite sequence of pairwise exchanges, the choice of the initial connection set is inconse quential. We now consider the partial configuration corresponding to a fault set F of cardinality at most N /2 — 1 such that no two faults are located on the same bus. Let C be a maximal connection set. We define an allocation A of a connection set C as an assignment of the buses to the connections in C such that no faulty crosspoints are used. Formally, we define A as a one-to-one function from C to the set of buses { 0 ,1 ,..., N f 2 — 1} such that if {pi,p'{} is the connection allocated to bus i, then both the crosspoints (pi,i) and (p[, i) are fault-free for every i in the range 0 < i < N /2 — 1. We use the following graph model to represent the allocation of the buses in the network to a connection set C. D efin itio n 1 The allocation graph G c,a corresponding to a maximal connection set C and an allocation A on the network is defined as follows: • G c ,a has N / 2 vertices, one corresponding to each bus. 22 b u s e s 0 1 2 3 4 5 p o r t s 0 1 2 3 4 5 6 7 8 9 10 11 N f \ f / k ....7 k ..... N f f / k \ ) J \ \ f f N \ f / k ... ^ ) / \ \ f \ f J k ( s / ( f k ) \ f ) / * K \ / \ k " " f 1 ) / N f / k 5 o 4 Figure 2.4: A maximal connection set and its allocation graph. • Let {pj,p'j} be the connection allocated to bus j by A. A directed edge exists from vertex i to j if and only if at least one of the crosspoints (pj,i), (p'j, i) is faulty. The following properties of G c,a become apparent: 1. The number of edges is equal to the size of the fault set F . T hat is, G c,a has at most N / 2 — 1 edges. 2. The out-degree of each node is at most one and the in-degree is at most the size of the fault set F. 3. If there are no cycles in the graph, then each component of G c,a is a weakly- connected directed acyclic graph (DAG). For example, Figure 2.4 shows the allocation of a connection set on a 12 x 6 m atrix and the corresponding allocation graph. The graph was constructed as follows: The ports connected by bus 0 are 9 and 10. Port 10 has no faulty crosspoints and port 9 has one faulty crosspoint at bus 3. An edge is therefore introduced from vertex 3 to 0; and so on. 23 The utility of the graph G c,a is in avoiding “unsafe” states of the m atrix, that is states that can be transformed into a blocking state by a finite sequence of pairwise exchanges. We now show that the existence of a cycle in G c ,a implies such an unsafe state. D efin ition 2 An allocation A of a connection set C is unsafe if the corresponding graph G c ,a has a directed cycle. L em m a 1 An unsafe allocation can be transformed to a blocking state by a finite sequence of connect and disconnect operations. P roof. Let b\ — > 62 — ► • • • — + — * W be a directed cycle in the allocation graph and let {pi,p'i} be the connection allocated to bus bi, 0 < i < k — 1. Then the k buses • • ■ , and the 2 k ports pi ,p\,P2 ,P2 i • • • iPk,Pk f°rm a 2k x k configuration with exactly one faulty crosspoint on each bus. Hence, by Theorem 2, this smaller configuration is blocking and can be transformed to a blocking state by a sequence of pairwise exchanges among the k connections realized by this smaller configuration. This proves the lemma. ■ For example, the allocation shown in Figure 2.4 is unsafe owing to the cycle containing buses 0, 1,2, 3. It is easy to observe that the buses 0, 1, 2, 3 and ports 1, 3, 5, 7, 8, 9, 10, 11 form an 8 x 4 configuration with exactly one faulty crosspoint on each bus. We can therefore transform this state into a blocking state by a finite sequence of pairwise exchanges, independent of the bus-allocation algorithm in use. The absence of cycles in G c,a for a particular connection set C is only a nec essary condition for nonblocking operation, but not a sufficient one. T hat is, even though the bus-allocation algorithm produces a safe allocation for a particular con nection set (7, a pairwise exchange of two connections in C may force the algorithm to resort to an unsafe allocation. We now show that such pairwise exchanges can always be accommodated without introducing cycles into the allocation graph in the case of the partial configurations under consideration. T h eorem 3 The partial configuration corresponding to a fault set F is nonblocking if (i) F contains at most one crosspoint from every bus and (ii) |F | < N /2 — 1. P roof. The theorem is proved in two parts. 24 1. There is an initial maximal connection set C and an allocation A such that G c ,a has no cycles. 2. Given an allocation graph G c,a without cycles, any pairwise exchange of two connections in C can be accommodated without introducing cycles. To prove the first part we show how to construct a connection set that can be allocated without cycles. We first re-number the ports and buses such that all crosspoints on bus 0 are fault-free and all crosspoints (i , j ) , j < i < N — 1, on bus j are fault-free, for 1 < j < N /2 — 1. This is always possible if conditions (i) and (ii) of Theorem 3 are satisfied. We can now construct the initial connection set C as C = {{i,i + N / 2 }| 0 < * < N / 2 - 1 } . No cycles are created in the allocation graph if the connection {i, i -f N /2} is allocated to bus i for 0 < i < N /2 — 1. This proves the first part. To prove the second part, consider a connection graph G c,a containing no cycles. Let i and j be any two vertices in G c,a • Let {pi,p[} be the connection allocated to bus i and {pj,p'j} be the connection allocated to bus j. A pairwise exchange of the two connections produces the new connections pi,pj and p\,pj. We need to show that the new connections can be assigned to buses i and j without introducing cycles in the allocation graph. This allocation is achieved as follows: Case 1: There is no path from i to j or j to i in the original graph. In this case one of the buses can be chosen arbitrarily for the connection {pi,pj} and the other for {p',p'}. We need to show that no cycles are introduced in the allocation graph. Note th at the pairwise exchange operation affects only the incident edges of the two vertices i and j , that is an edge incident to vertex i in the original graph might be incident to vertex j in the new graph, and/or vice-versa. Any cycle introduced by the exchange operation therefore would have to contain both the vertices i and j since no path existed between them in the original graph. This is possible only if either a cycle containing vertex i or one containing vertex j existed in the original graph, a contradiction. Case 2: One of the vertices i ,j is accessible from the other. W ithout loss of 25 b u s e s 1 2 3 4 5 p o r t s 0 1 2 3 4 5 6 7 8 9 10 11 N f f } k ....) \ f \ N f .....J V ) / f / k. ( f \ V f ) ) k — 2 f c N } ( k f k ? k ) / k \ Nf ... ^ 7 ) \ k f N .....V f k.... / k o o ■ * o - l o - o - 3 o 4 0 5 ( b ) ( c ) ( a ) Figure 2.5: (a): A maximal connection set. (b): its allocation graph, (c): allocation graph after the pairwise exchange of connections {0,2} and {1,3}. Only the active and the faulty crosspoints are shown. generality we can assume a path from i to j. Since m is the immediate predecessor of j, one of the crosspoints (P j,m ), (P j,m ) is faulty. We assign bus j to the new connection {pi,Pj} if (py, m) is the faulty crosspoint and to {Pi,Pj} if (p'j,m) is the faulty crosspoint. Bus i is then allocated to the remaining connection. Note that this assignment maintains the edge from m to j which is part of the above path. Any new cycle introduced by the above allocation would have to contain either vertex i or vertex j. A cycle containing vertex j is possible in the new graph only if a path existed from j to i in the original graph. The latter implies the existence of a cycle in the original graph, a contradiction. A cycle containing vertex i can occur only if the edge from vertex m to j in the original graph moved to vertex i as a result of the new allocation. This is prevented by the above assignment of buses. This concludes the proof of Theorem 3. ■ For example, Figure 2.5(a) shows the allocation of a connection set {{0,2}, {1,3}, {4,6}, {5,7}, {8,11},{9,10}} on a 12 x 6 m atrix with a fault-set F — 26 {(3,0), (5,1), (8,2), (9,3), (7,5)}. The corresponding allocation graph is shown in Figure 2.5(b). Consider a pairwise exchange of the connections {0,2} and {1,3}, allocated to buses 5 and 1, respectively. The new connections are {0,1} and {2,3}. Since there is no path between vertices 1 and 5 in Figure 2.5(b), this corresponds to case (i). Hence, the new connections can be allocated arbitrarily. If we allocate the connection {0,1} to bus 1 and {2,3} to bus 5, the resulting allocation graph is as shown in Figure 2.5(c) and is acyclic. It is easy to verify that the alternate choice also introduces no cycles. Now consider, instead, a pairwise exchange of the connections {4,6} and {8,11}, allocated to buses 0 and 3, respectively. The new connections axe {4,8} and {6,11}. This corresponds to case (ii) because of the existence of a path from vertex 0 to 3 in Figure 2.5(b). If we allocate the connection {4,8} to bus 3 and {6,11} to bus 0, the allocation graph remains unchanged. However, the alternate choice introduces a cycle containing vertices 0, 1 and 2. The proof of Theorem 3 also provides us with a bus-allocation algorithm for a partial nonblocking configuration. Any new request for connection is processed according to the two cases described in the proof of Theorem 3. The bus-allocation algorithm maintains the allocation graph corresponding to the current state of the network and uses it to make the selection. As described earlier, we assume th at all the ports remain connected at all times, using phantom connections for idle ports. This simplifies the description of the algorithm. Any new connection request can i then be modeled as a pairwise exchange of ports in two connections. Let us assume th at the pairwise exchange involves the connections {pi,p'i} currently allocated to bus i, and { p 3 , p - } currently allocated to bus j . The new connections are {pi,Pj} and {p'i,Pj}- The bus allocation algorithm can be described as follows. Algorithm Allocate-Nonblocking 1. Check if there is a path from vertex i to vertex j in G c ,a - If so, find the vertex m immediately before j along the path. Go to step 4. 2. Check if there is a path from vertex j to vertex i in G c,a • If so, find the vertex m immediately before i along the path. Go to step 4. 3. Go to step 5 (no path exists between i and j.). 27 4. If the crosspoint (p4 , m) or (p j , m ) is faulty, assign bus j to the connection {PiiPj} and bus i to {p'i,p'j}. Go to step 6. 5. Assign bus i to the connection {pi,Pj} and bus j to {p'i,Pj}- 6. Update the allocation graph G c ,a and exit. Step 1 of the algorithm tests for the existence of a path from vertex i to vertex j in G c ,a • Note that a non-critical fault set limits the number of outgoing edges from a vertex to one. Therefore, the existence of the path can be checked by starting from i and following the outgoing edge, if any, of any vertex reached during the search. The search is term inated when the vertex j is reached, or when a vertex of out-degree zero is reached. This takes at most 0 ( N ) time. The second step checks for the existence of a path from j to i in an identical manner. Therefore, the worst-case running time of the algorithm is O(N). 2.4 R earran geab le O p eration In the previous section we showed that the number of crosspoint faults is limited to N /2 — 1 if nonblocking operation is desired. If more faults are present, possibly because of manufacturing defects in the crosspoint chips, it may be still possible to operate the m atrix in the rearrangeable mode of operation. In this section we study the effect of crosspoint faults with respect to the rearrangeable mode of operation. In rearrangeable crosspoint networks, one can relocate existing connections to accommodate new ones. A network is said to be rearrangeable if a connection request between two ports which are currently idle can be satisfied with a finite number of rearrangements. A rearrangement in our case refers to the movement of an active connection from its allocated bus to a bus that is currently idle. The crosspoints on the new bus can be activated before the connection is deallocated from the current bus, thus avoiding an interruption of service to the connected ports. Note that this is different from exchanging the buses used by two active connections, where a break in service cannot be avoided without the use of a spare bus. If every new connection between a pair of idle ports in the crosspoint network 28 can be satisfied using rearrangement of at most k of the existing connections, we term the network a k-rearrangeable crosspoint switching network. In this section, we first consider the general problem of determining if a given set of connections is realizable on a one-sided crosspoint network under an arbitrary fault set. We show a formulation of the problem, due to Mitchell and W ild [34], as a matching problem in a bipartite graph. We then show much simpler means to determine the rearrangeability of the network when the distribution of faults have special properties. We study two specific distributions, namely rectangular and trapezoidal, of faults and characterize the conditions for rearrangeability for these cases. The rectangular configuration allows faults to be confined within non overlapping rectangular regions, while the trapezoidal configuration confines them to a single trapezoidal region. 2 .4 .1 F orm u lation o f th e gen era l p ro b lem as a m a tc h in g p ro b lem In this section we consider the general problem of determining if a given connection set can be realized by a one-sided crosspoint network under an arbitrary fault set F. Mitchell and Wild [34] characterized this problem as a m atching problem in a bipartite graph. The discussion in this section is based on their work. Let C = {ci = {p2i,P2i+i}|0 < i < N /2} be a connection set to be realized on a faulty crosspoint m atrix in the presence of a fault set F. We are interested i in determining whether the connection set C can be realized by the network with fault set F. The following graph model is used to characterize the problem. D efin ition 3 Let G c,f = (X, Y, E ) be the bipartite graph corresponding to con nection set C to be realized in the presence of fault set F . Here, 1 . X = C, 2. Y = B = { 0 ,1 ,..., N /2 - 1}, and 3. E = {{ci,b)\(p2i,b) F and (p2i+i,b) ^ F, 0 < i < N / 2 ,b € B }. T hat is, vertices in X correspond to the connections in C, vertices in Y correspond to the buses in the network, and an edge exists from a connection c4 = {p2;,P 2i+i} 29 b u se s 0 1 2 3 4 5 p o r t s ( H ) -------- ------- E H H H ^ 10 11 ( a ) , 5 } , 7 } {2 , 3 } 8} / Figure 2.6: (a) A faulty crossbar, (b) The graph G c ,f for the connection set {{0,5}, {1,7}, {2,3}, {4,11}, {6,8}, {9,10}}. to a bus b if and only if both the ports involved in the connection — p2i and p2i+i — have non-faulty crosspoints with bus b. The graph model is illustrated in Figure 2.6. Figure 2.6(a) shows a 12 x 6 cross- point network with faulty crosspoints represented by circles. Figure 2.6(b) shows the bipartite graph corresponding to the connection set C = {{0,5}, {1,7}, {2,3}, {4,11}, {6, 8}, {9,10}}. In this graph, there is an edge between the vertex rep resenting {6,8} and the vertex representing bus 3 because both crosspoints (6,3) and (8,3) are fault-free. Therefore, the connection {6,8} can be assigned to bus 5 in the faulty network. This is the only bus that can be allocated to the connection {6,8} because every other bus has a faulty crosspoint either with port 6 or with port 8. The graph model described above displays all the possible ways the buses in the network can be assigned to the individual connections in a given connection set. Therefore, if a maximum matching of cardinality \C\ can be found, then the connection set C can be realized on the faulty crossbar. Conversely, if no matching 30 of cardinality \C\ exists in the bipartite graph G c,f , then the connection set C can not be realized on the faulty crossbar. Further, if M = {(c,-,6j)|0 < i < N /2} is a maximum matching in G c ,f > then bus can be used to realize the connection Ci, for all c, € C . For example, consider the crosspoint network of Figure 2.6 . The bipartite graph of Figure 2.6(b) has a maximum matching of cardinality 6, as shown by the dotted lines. The corresponding connection set is therefore realizable in the presence of the fault set shown. By the use of an algorithm due to Hopcroft and Karp [21], a maximum matching for the above graph model, if it exists, can be found in 0 ( N 2'5) time. Therefore, re alizability of a given connection set on the network can be determined in polynomial tim e. However, with an arbitrary fault set, determination of the rearrangeability can be time-consuming. Deciding the rearrangeability using the above graph model requires testing every maximal connection set. However, in practice, it is highly unlikely to have a large number of faulty crosspoints distributed arbitrarily in the network. In practice, the faults are more likely to be confined to one or more small regions in the crosspoint m atrix. This is particularly true of manufacturing defects. W ith this motivation, we study some special distributions of faults in the subse quent sections. W ith these fault-distributions, realizing arbitrary connection-sets and establishing a new connection between a pair of idle ports require considerably less amount of time. 2 .4 .2 R ec ta n g u la r C on figu ration s In this section, we study a special class of fault-distributions where the faulty crosspoints are confined to disjoint rectangular regions. This is a realistic fault distribution for representing manufacturing defects in VLSI and WSI. It is also useful in modeling operational faults when the crosspoint m atrix is implemented using m ultiple chips. The failure of an entire row or column in a chip, or the failure of an entire chip, can be analyzed using this distribution. We first consider the 31 buses 0 1 2 3 4 5 6 Po p o r ts 0 1 2 3 4 5 6 7 8 9 10 11 12 13 * 1 7 • V 1 1 J ^J t, - > ) * 1 N | 1 1 \ J ) J 1 1 1 / \ J t , v I . ) v ,J ) I - J Bn Figure 2.7: A rectangular configuration. case of a single rectangular faulty region and later generalize to multiple rectangular regions which are bus- and port-disjoint. The simplest case of a rectangular configuration is where all the faulty cross- points lie within a rectangular region spanned by a set of x ports and a set of y buses. Such an arrangement is obtained by logically renumbering the ports and buses so th at all the faulty crosspoints lie within a rectangular region. For example, Figure 2.7 shows a 12 x 6 crossbar with a fault set F. Here the faults are confined to a rectangular region formed by x = 4 ports and y — 3 buses. In the general form of the rectangular configuration, the faults are confined to m rectangular regions, each rectangular region being bus- and port-disjoint with every other region. Figure 2.8 illustrates the general case. Region i is formed by a set of buses Bi C B and a set of ports Pi C P. We use Xi to denote the cardinality 32 Pa P o rts Figure 2.8: A general rectangular configuration with m ultiple faulty regions. of Pi and yi for th at of B z. We can formally define a rectangular configuration as follows: D e fin itio n 4 A rectangular configuration corresponding to fault set F is defined by a set of m ordered pairs {(Po, Bo), (Pi, B i) ,. . . , 5 m_i)} where Pi C P is a subset of the ports and Bj C B is a subset of the buses. The Pi ’ s and B j ’ s should satisfy the following conditions. 1. Pi H Pj = Bi D Bj = 0, for all 0 < i , j < m . ** B q B x Pi B-m — l Pm- 1 Buses 33 2. For every (p, 6) € F, there exists some i, 0 < i < m, such that p € Pi and 6 € Bi. The first condition specifies that the regions be port- and bus-disjoint. The second specifies th at every faulty crosspoint must belong to one of the regions. The partial configuration defined by Definition 4 is not unique for a given fault set F. T hat is, there may be many ways to enclose the faulty crosspoints with rectangular regions. For example, we may enclose a given fault set within a single large rectangular region or in many smaller disjoint rectangular regions. Because the purpose of using a rectangular configuration to describe the faults is to establish rearrangeability of the network by testing some sufficient conditions, we should use the configuration that is most likely to satisfy the test for rearrangeability. Therefore, we choose the configuration to satisfy the following two properties. 1. Every rectangle is minimal, that is, there is no smaller rectangle enclosing the faulty crosspoints within 2. There is no proper way to enclose the faulty crosspoints within a particular rectangle with two smaller non-overlapping rectangles. We define such a configuration as a minimal rectangular configuration. Because of our assumption that all crosspoints within the rectangular regions are unusable, it is easy to see that a configuration that minimizes the number of fault-free crosspoints i within its rectangular regions is more likely to satisfy a rearrangeability test. The above definition of minimality therefore minimizes the fault-free crosspoints within the rectangular regions by (i) minimizing the size of the individual rectangles, and (ii) decomposing a single rectangle into two smaller non-overlapping rectangles whenever possible. A more formal definition of minimal rectangular configurations, as well as an algorithm to construct such a configuration corresponding to a given fault set, are given in Appendix A. i The rectangular configuration is capable of modeling many m ultiple faults ex actly. For example, a set of N /2 faults located on a diagonal of the crosspoint m atrix can be modeled exactly as a rectangular configuration with N /2 disjoint regions, each of size l x l . Rearrangeability of the network in the presence of 34 such a fault set can be established by the use of the general results for rectangular configurations presented below. Given a rectangular configuration, we now formulate sufficient conditions for rearrangeability of the crosspoint m atrix. We first consider the case of a single rectangle P0 x B 0 enclosing the faulty crosspoints. The following theorem states sufficient conditions for rearrangeability of such a configuration. T h e o re m 4 A rectangular configuration {(Po, Po)} is rearrangeable if\Po\ + \B0\ < N /2. Further, at most one rearrangement is sufficient for any new connection if this condition is satisfied. P ro o f. Let x denote the cardinality |P0| of Po and y that of B 0. Assume th at a new connection between ports p and q is to be established on the faulty crossbar and that there is at least one bus that is available. If both p and q belong to the set P — P0, there are no faulty crosspoints on either p or q , and the connection {p, q} can be realized without any rearrangements. Similarly, if there is an unallocated bus in the set P —Po, it can be used to make the connection. Now assume, without loss of generality, that p € P0 and all the available buses are in Bq. Since p € Po, the num ber of active ports in Po is at most x — 1. These ports can together use at most x — 1 buses. There are N /2 — y buses in the set B — B q, of which at most x — 1 are allocated to ports in Po; since N /2 — y > x — 1 by hypothesis, there is a bus r in the set B — Bo that is currently connecting two ports w, z € P — Pq. We can move the connection {w ,z} to any available bus in set Bo and use bus r to connect the ports p and q. This concludes the proof of the theorem. ■ The condition in Theorem 4 is a sufficient condition for rearrangeability, but is also necessary if all the crosspoints within the rectangle are faulty. The necessity in this case can be shown by considering a connection set of x connections obtained by pairing ports in Po with ports in P — Po. These connections require x buses in the set B — Bo. Therefore, all of them cannot be allocated if x > N /2 — y. We now consider the general rectangular configuration with m ultiple rectan gular faulty regions. As in Definition 4, let {(Po, Po), (Pi, Pi), • • •, (Pm -i, Pm-i)} denote a rectangular configuration with m faulty regions. The following theorem states and proves our main result for this fault model. 35 T h e o re m 5 A general rectangular configuration {(Po, Po), (Pi, Pi), • • •, (Pm_ i, P m_i)}, is rearrangeable if (i) |Pi| + |P i| < N f 2, for every 0 < i < m; and (ii) m in(|Pj|, |P j|) + |P*| + \Bj\ < N f 2, for every 0 < i ,j < m. Further, a maximum of two rearrangements are sufficient to realize any new con nection if these conditions are satisfied. P ro o f. Let p and q be the ports to be connected. We consider the following cases. 1. p, q £ P0 U Pi U ... Pm_i. 2. p,q € Pi, for some 0 < i < m. 3. p € Pi for some 0 < i < m and q (f Po U P\ U ... Pm-i • 4. p G Pi and q € Pj, for some 0 < i ,j < m with i ^ j. The first case is trivial because any available bus can be used to make the connec tion. In the second and third cases, any bus in the set P — P t - can be used to make the connection {p,q}', by Theorem 4, at most one rearrangement is sufficient. Only Case 4 remains. In this case we assume, without loss of generality, that ]P4 -| < |P-1. If there is a bus available in the set P — (P 8 - U Bj), it can be allocated to the connection {p, q} and no rearrangements are needed. If no such bus is available, th at is if all the available buses are in the set P 8 U Bj, we consider the following two subcases. Subcase (a): There is an available bus in P*. Observe that there can be at most |jPi| — 1 active ports in the set Pi, disregarding port p. Since N / 2 — (|P»| + |P y|) > |Pj| — 1, there is a bus in B — (Bi U Bj), say r, currently connecting two ports u ,v ^ Pj. We can move the connection {u, v) to any available bus in P 4 and use bus r for the connection {p, q}. Thus, only one rearrangement is required to realize the connection {p, q}. Subcase (b): The only available buses are in Bj. If |P j| = |Pj|, the proof is sim ilar to th at given for Subcase (a). Now assume |Py| > |P4|. If there is a bus in B — (Pi U Bj), say r, currently being used for connecting two ports u ,v (f Pj, 36 Buses P o r ts f r e e bus X- X ------------- H Bi X Figure 2.9: Illustration of the rearrangements in Case 4(b) of Theorem 5. 37 N /S B u ses N /8 N /4 N /8 N /4 N / 8 N /4 Figure 2.10: A rectangular configuration with 25 percent faulty crosspoints. we can move the connection {w,u} to any available bus in Bj and utilize bus r for the connection {p, q}. This involves one rearrangement. Otherwise, every bus in B — (Bi U Bj) is being used to connect a port in Pj with another port. Since \Pj \ — 1 < N /2 — \Bj\, there is a bus s 6 Bi connecting two ports u ,v ^ Pi U Pj. This is illustrated in Figure 2.9. Move the connection {u,t>} to any available bus in B j , thus freeing bus s. Now consider any bus r G B — (Bi U Bj) such that r connects a port w € Pj with another port z £ Pi. At least one such bus r exists because \Pj\ > |Pt|. Now move the connection {to, z} from bus r to bus s and realize the connection {p, q) on bus r. Thus, a maximum of two rearrangements are needed in this case. This concludes the proof of Theorem 5. ■ Like in the case of Theorem 4, the conditions in Theorem 5 are also necessary conditions for rearrangeability if all the crosspoints within the rectangular regions are faulty. Note that the only case where we needed two rearrangem ents was in Case 4(b) with \Pj\ > |P;|. Therefore, if the second condition of Theorem 5 has max(\Pi\,\Pj\) instead of min(\Pi\, |Pj|), then at most one rearrangem ent is sufficient. This gives the following corollary. C o ro lla ry 1 A rectangular configuration {(Po, Bo), (Pi, P i ) , . . . , (Pm_i, P m_i)} is rearrangeable with at most one rearrangement per connection-request if 38 (i) \P{\ + |5 i| < N / 2, for every 0 < i < m; and (ii) m ax(|P8|, \Pj\) + |Bi[ + \Bj\ < N / 2 , for every 0 < i , j < m. These conditions are stricter than those of Theorem 5, but may be satisfiable by many partial configurations. When the rectangular regions have the same number of ports, the conditions become identical. The proofs of theorems 4 and 5 not only establish the rearrangeability of a faulty network, but also give a procedure for rearranging connections to satisfy a new connection request. This procedure can be easily formulated as an algorithm and implemented in the switch controller. The conditions in Theorem 5 allow a large number of faults to be tolerated without losing rearrangeability. In addition, the number of rearrangem ents required is also modest, at most two per new connection and much less on the average. For example, Figure 2.10 shows a one-sided crosspoint m atrix with four disjoint rectangular faulty regions, each of size N /4 x N / 8 . By Corollary 1, this network is rearrangeable with a maximum of one rearrangement. The total num ber of crosspoints in the faulty regions is N 2 / 8 , or 25 percent of the total crosspoints. Thus, the rearrangeability of the network can be established in a large num ber of cases using the rectangular model described. 2 .4 .3 k -R ea rra n g ea b le T rap ezo id a l C o n fig u ra tio n One particular partial configuration for one-sided crosspoint networks th at has the property of rearrangeability is the trapezoidal configuration illustrated in Fig ure 2.11. A fc-rearrangeable trapezoidal configuration, with 0 < k < N /2 —2, allows a maximum of (k + l)N /2 — (k + 1 )(k + 2)/2 faulty crosspoints, approxim ately 25 percent of the total number of crosspoints for large N with k = N /2 — 2. Although this configuration requires more worst-case rearrangements than the rectangular configuration, it has the advantage of a simple bus allocation algorithm for setting up connections. A one-sided crosspoint m atrix with some faulty crosspoints is defined as a k- rearrangeable trapezoidal configuration if the ports and buses can be re-numbered to obtain the following arrangement: For a port i, 0 < i < k, and bus j, 0 < j < 39 b u ses 0 1 2 3 4 5 p o r ts 0 1 2 3 4 5 6 7 8 9 10 11 Figure 2.11: A trapezoidal configuration for N = 12. AT/2 — 1, a crosspoint (i , j ) is faulty only if i < j. All crosspoints (i,j) with i > j or i > k are fault-free. In other words, 1. Each of the ports k + l ,k + 2,.. . , N — 1 is connected to every bus. 2. The faulty crosspoints of port i, 0 < i < k, lie on buses i + 1 through N /2 — 1. Thus the maximum number of faulty crosspoints tolerated by this configuration equals (N /2 - 1) + (N /2 - 2) + ... + (N /2 - (k + 1)) = (k + l)N /2 - (k + 1)(* + 2)/2. This configuration places all the faulty crosspoints logically in a trapezoidal region on the upper right corner of the switch. The 2-rearrangeable trapezoidal configu ration for T V = 12 is shown in Figure 2.11. When no connections exist already in the m atrix, an ordered connection set C can be routed on this m atrix by a simple bus-allocation algorithm as follows. Algorithm Allocate-Trapeze 1. w h ile \C\ > 0 do (for every pair to be connected) 40 buses 0 1 2 3 4 5 p o r ts 0 — 1 2 3 4 5 6 7 8 9 10 11 Figure 2.12: Example for illustration of algorithm Rearrange. Only the active and the faulty crosspoints are shown. 2. find < p,p > such that p = m ax{a| < a,a > € C}. 3. find max j, p > j > 0, such that bus j is unallocated. 4. allocate bus j to the connection < p,p >. 5. C * — C — {< p,p >}. (remove < p,p > from C .) 6. end Given any connection set, the bus-allocation algorithm processes each connec tion, choosing the highest-numbered ports first, and assigns them to buses sequen tially, starting from the last bus. This algorithm works because it allocates a bus i to a connection p,p such that i < m in(p,p ). This guarantees that none of the faulty crosspoints will be needed to make a connection. It also forces port 0 to use bus 0 always. For example, consider the maximal connection set {{0,1}, {2,10}, {3,7}, {4,5}, {6, 9}, {8,11}} to be realized on the m atrix in Fig ure 2.11. The algorithm allocates bus 5 for the connection {8,11}, bus 4 for {6,9}, 41 r ' ; bus 3 for {4,5}, bus 2 for {3,7}, bus 1 for {2,10}, and bus 0 for {0,1}. None of the faulty crosspoints are needed to complete these connections. W hen the crosspoint m atrix contains some active connections it is not always possible to find an unallocated bus to connect a pair of idle ports without rearrang- j ing some of the existing connections. Now we present an algorithm for allocating a bus to a new connection when some connections are already present. It should b e noted th at no rearrangements are needed if the new connection involves port J i 0, because the algorithm never allocates bus 0 to a connection not involving port ! ; i ,0. Therefore we will consider only connections not involving port 0. To allocate a 1 jbus for an ordered connection < p,p > , we scan the buses sequentially, starting j from bus 1, and find the first unallocated bus j. If j < p or if p > k, then both brosspoints (p,j) and (p , j ) are fault-free; in this case bus j is allocated to the con nection < p , p > and no rearrangements are needed. However, if j > p and p < k, a bus cannot be allocated without rearranging some of the connections. Since all 1 j the buses in the set { A r + l,fc + 2 ,. . . , N /2 — 1} are similar in term s of the location j of faulty crosspoints, in this case we can assume, without loss of generality, that 1 j = k + 1. The algorithm constrains the rearrangements to the set of connections ! assigned to buses p , p + 1 , . . . , j — 1. It should be noted that only bus p in this | range is useful for connecting ports p and p because of the faulty crosspoints on | the remaining buses. Therefore, the final aim of the algorithm is to free up bus p by \ moving the connection currently allocated to bus p to a different bus b in the range j p - f 1 < b < j. In the general case this requires a sequence of rearrangements, each j rearrangem ent involving the movement of a connection from a lower-numbered bus , 'to a higher-numbered bus. ! i The rearrangements proceed as follows: Let < q>, cb > represent the ordered connection realized by bus b, for every p < b < j. Starting from bus p, we scan the buses in sequence till we find a bus I with c/ > j. It is easy to show th at such a bus i / exists within the range p < b < j . Consider the set of ports {x\p < x < j } . There j are exactly (j — p) ports in this set. Since port p is inactive, the num ber of active j ports in this set is at most j — p —1. These active ports together can occupy at most | j — p — 1 of the j — p buses in the range p < b < j. Thus, at least one bus I must | I be allocated to a connection {q,c,} such that c'i > ci > j . This connection {c/, c,} i ! i I I ! __________ ______________________________________________________ 42—i buses 12 3 4 p o r ts 0 1 2 3 4 5 6 7 8 9 10 11 c ^ ( \ t ^ r J \ ( ) \ \ r ) \ \ f ) W) ' ) L ( J \ \ ( ) K \ t ) \ ) V ( ) V . c s ) \ (\ 7 7 Figure 2.13: The triangular configuration for N — 12. lean be moved to bus j since both the crosspoints (ci ,j ) and (c[,j) are present. Bus I is now free and can be used for a new connection. If p — I, the connection < p,p > can be allocated to bus I and the procedure is complete. Otherwise, we can repeat the procedure for rearrangement in the smaller range p < b < I, and find a connection to transfer to bus I. Thus, in a maximum of j — p steps, we are guaranteed to free bus p. The maximum number of connections moved is thus \j — p. Since the maximum possible value of j is k + 1 and the minimum possible lvalue of p is 1, k rearrangements axe needed in the worst case. The algorithm can be stated formally as follows. Algorithm Rearrange (Allocates a bus for an ordered connection < p,p >.) (< Cf>, c 'b > denotes the connection carried by bus 6.) 1 1. Find min j such that bus j is unallocated. 2. If j < p or p > k, allocate bus j for < p,p > and exit. I 3. Find minimum /, p < I < j , such that c/ > j. j 4. Move < ci,ct > to bus j. I 5. j * — L Go to 2. ! I An example is shown in Figure 2.12 to illustrate this algorithm with respect to i the 2-rearrangeable trapezoidal configuration for N = 12. For convenience, only The active and the faulty crosspoints are shown. In Figure 2.12, bus 0 is used for the connection {0,5}, bus 1 for {2,3}, bus 2 for {4,10}, bus 3 for {6,9}, bus 4 ,for {7,11} and bus 5 is free. A new connection {1,8} cannot be allocated to bus ,5 because of the faulty crosspoint at (1,5). This connection can be realized only > through either bus 0 or bus 1. When Algorithm Rearrange is applied, it first moves j ithe connection {4,10} from bus 2 to bus 5 and {2,3} from bus 1 to bus 2, thereby : I . i freeing bus 1 for the new connection. The number of rearrangements is two. . It is interesting to compare the trapezoidal configuration with the nonblocking ; I crossbar in Figure 2.3. The fc-rearrangeable trapezoidal configuration has faulty crosspoints on the first k ports and needs k rearrangements in the worst case. If we restrict the faults to port 0, the resulting configuration becomes nonblocking. From this observation, we note that there is a tradeoff between the size of the fault-set and the maximum number of rearrangements needed to accommodate a j new connection. The maximum number of rearrangements in a trapezoidal config- j * uration, which equals the param eter k, can be N /2 — 2. W hen k equals N /2 — 2 . 'all the faulty crosspoints are placed logically in a triangular region on the upper right corner of the switch. Hence, the resulting configuration can be called a trian- 1 gular configuration. The maximum number of faulty crosspoints in the triangular | configuration is N /4(N /2 — 1), which, for large N , approaches 25 percent of the 'to tal num ber of crosspoints. A triangular configuration for N = 12 is shown in I ^ f Figure 2.13. This configuration has faulty crosspoints on the first five ports, and ! the maximum number of rearrangements needed is four as compared to two for the ’trapezoidal configuration shown in Figure 2.11. 1 2.5 E x te n sio n to M u ltip le-B u s S y ste m s As observed in Section 2.1, a one-sided crossbar can be viewed as a multiple-bus j : interconnection network with N ports and M buses. The results in sections 2.2-2.4 ! i I ____________________ _ _ _ _ _ _ _ _ _ _ _ _ 44 j therefore apply equally well to multiple-bus systems with exactly TV/2 buses. In ~ J 1 this section, we generalize the results to multiple-bus systems with 1 < M < N /2 , abuses and analyze their fault-tolerance capability. j ! We call a multiple-bus system with N ports and M buses as an (TV, M ) con figuration. A crosspoint {i,j) in this case models the connectivity between port i and bus j. A faulty crosspoint corresponds to a faulty interface between the cor responding port and bus. A complete (TV, M) configuration is one in which every jbus is connected to every port. A partial configuration results if some port cannot use one or more buses. The maximum number of pairwise connections that can be supported by an (TV, M) multiple-bus configuration is M. Thus, any (TV, M) configuration is a blocking configuration if M < N/2. Therefore, for the following analysis of (TV, M ) configurations, we limit the size of every connection-set to M . W ith this constraint 1 jon the size of a connection set, we can define two modes of operation for an (TV, M ) m ultiple-bus configuration. In the nonblocking mode, a bus allocation algorithm , exists to connect idle pairs of ports in any sequence so long as the total num ber 'of active connections does not exceed M . We first show th at no faults can be tolerated in this mode if M < N/2. !T h eorem 6 Any partial (TV, M ) multiple-bus configuration with M < N /2 is a I blocking configuration even if the size of every connection set is at most M . P roof. Let (pi, b) be any faulty crosspoint in the multiple-bus configuration. If M < N /2, we can construct a connection set C of size M such th at port pi is 'not included in C. Assume that the bus-allocation algorithm is able to allocate jthe buses to all the connections in C. Let {p2,P3} be the connection assigned to ’ bus b by the allocation algorithm. Now assume th at the connection {p2, P 3 } is removed and processor px is to be paired with processor p2 in a new connection. The only available bus, which is bus b, can not be used for the new connection {pi,p2} because of the faulty crosspoint (pi, b). ■ The nonblocking mode, therefore, allows no crosspoint faults to be tolerated. , However, if movement of active connections between the buses can be allowed, many faults can be tolerated in the rearrangeable mode. We consider this mode of operation next. 45 i We define an (N , M ) multiple-bus configuration as rearrangeable if a bus can be j i found to connect any idle pair of ports, so long as the number of active connections j [at any tim e does not exceed M . This may require rearrangement of some of the 1 I ! existing connections. Alternately, a rearrangeable (N,M) multiple-bus configura tio n can be seen as one in which every connection set of size M can be realized. ■ This mode of operation is particularly suited to multiprocessor systems where re quests for connections from processors are presented simultaneously to a central jarbiter at the beginning of a cycle, which assigns the buses taking the faults into | consideration. The rearrangeable mode of operation perm its all the M buses to ibe assigned when the number of requests is equal to or more than M. The only overhead introduced by the faults is that of the bus-allocation algorithm. ! An upper bound on the number of faults for a rearrangeable (N , M ) configura- ! jtion can be found by the same method used in Section 2.2. Each of the M buses ] in the configuration can sustain at most (M — 1) faults and every pair of buses can have a total of (2M — 3) faults. This res( ults in an upper bound of M 2 — 2M + 1 faults for a rearrangeable (N, M ) multiple-bus configuration. I Rearrangeable multiple-bus configurations satisfying the above upper bound can be shown to exist, but no characterization of such configurations is known. As , presented in Section 2.4.1, the general problem of realizing an arbitrary connection set in an (N, M ) configuration can be formulated as th at of finding a maximum m atching in a bipartite graph. Therefore, with arbitrary faults, a m atching algo- • rithm can be used to find an assignment of the buses. Simpler methods can be used if the fault-distribution can be modeled as a rectangular or trapezoidal configura- : tion. The results on the rectangular configurations of one-sided crosspoint networks : can be extended in a simple manner to multiple-bus configurations. The following ( theorems are extensions of theorems 4 and 5 to multiple-bus configurations. I I I T h e o re m 7 Let the faults in an (N , M ) multiple-bus configuration be confined j ! within a single rectangular region spanned by the set of ports Pq and the set of j ] buses Bo- The multiple-bus configuration is rearrangeable if \Po\ + |-£?o| 5; M . At i ; most one rearrangement is sufficient for any new connection if this condition is satisfied. j I > ! 46 ! T h e o re m 8 Let the faults in an (N,M) multiple-bus configuration be confined i within m distinct non-overlapping rectangular regions Pq x Bo, Pi x B\ ,..., Pm- 1 x i?m_i. The multiple-bus configuration is rearrangeable if \ (i) |Pj| + < M, for every 0 < i < m; and I (it) m in(|P;|, |-P,|) + \Bi I + \Bj\ < M, for every 0 < i ,j < m. A maximum of two rearrangements are sufficient to realize any new connection if these conditions are satisfied. Further, the maximum number of rearrangements | reduces to one if the second condition above is replaced with i (ii) max(|P*|, |P j|) + \Bi\ + \Bj\ < M, for every 0 < i ,j < m. | The proofs for these theorems are so similar to the proofs of theorems 4 and 5 th at , they are om itted. 1 In this chapter, we studied connection assignment in faulty one-sided crosspoint switching networks. One-sided crossbars studied in this chapter are suitable for : interconnecting a multiprocessor or a m ulticomputer system. Many terrestrial and j j satellite-switched systems proposed in literature use two-sided crosspoint switching | networks. Scheduling the traffic in systems that use two-sided crosspoint switching networks in optimal number of time-slots is the subject of chapters 3-5. 47 : C h a p ter 3 J J A lg o r ith m s for T im e-S lo t A ssig n m en t in H iera rch ica l S w itch in g S y ste m s j I j # i 3.1 In tro d u ctio n I ; Time-slot assignment problems have been widely studied for various configurations of the switching system [16, 31, 8, 12, 22, 9, 20]. In all the systems studied, 'th e problem can be modeled m athematically as that of decomposing a m atrix i i representing the overall communication requirements into constituent matrices. Each of the constituent matrices produced should satisfy a certain set of constraints f ; and the constraints vary according to the structure of the system. Each constituent : m atrix represents a portion of the total traffic, which requires a certain num ber of ! time-slots to transm it. The sum of the slots needed by the constituent m atrices ■ gives the frame-length. An optimal TSA algorithm always finds an assignment ^ I th at minimizes the frame-length over all feasible decompositions satisfying the j I specified constraints. Optimal TSA algorithms are known for many TDM switching 1 I : configurations [31, 8,12, 22, 9, 20]. All these algorithms are iterative and inherently < sequential in nature. No parallel algorithm has been known for the TSA problem. ! In this chapter, we present a parallel algorithm for constructing optim al time- , | slot assignments for a general class of TDM switching systems. This class was ! , proposed by Eng and Acampora [16], who called it a hierarchical switching system ! | 1 ! (HSS). A HSS configuration has a three-tiered switching structure as shown in Figure 3.1, where a set of M incoming channels are interconnected to M outgoing channels by means of a nonblocking switch of size N < M. The HSS structure is j 'general enough to cover a wide range of TDM switching configurations. We first ! I present a parallel algorithm on the PRAM model of com putation for constructing an optim al TSA. If L is the length of an optimal TSA, this algorithm uses L/2 { i processors and runs in 0 ( M 3 log L) time. We then generalize it to P < L/2 pro cessors, with running time 0 (M 3 log P + M 2 ■ m m (N ,\/M ) • m in(L/P, M 2)). We also describe an efficient implementation of the algorithm on a hypercube multi- j processor with P processors with the same time-complexity. A massively-parallel version of the algorithm runs in 0 { M 2 log M log L) tim e on M L /2 processors. The rest of this chapter is organized as follows. We introduce the necessary definitions and notations in Section 3.2 and present the m athem atical formulation of the TSA problem. In Section 3.3, we describe a sequential TSA algorithm based j on recursively decomposing the given traffic m atrix into two traffic matrices. This j recursive algorithm runs in 0 (L M 3) sequential-time. Although the sequential ver- < sion of this algorithm is slower than the best-known sequential algorithm, it aids us in designing a fast parallel algorithm. In Section 3.4, we present the PRAM -based j parallel algorithms and describe how the algorithm can be effectively implemented | ion a hypercube multiprocessor system. In Section 3.5, we show how the parallel 'TSA algorithms can be applied to the class of SS/TDM A systems. | I 3.2 P r o b le m F orm u lation i A time-division multiplexed (TDM) hierarchical switching system (HSS) has a | | three-tiered switching structure as shown in Figure 3.1. The first stage of the ! switching system consists of / multiplexers, where multiplexer i concentrates Pi .input users into K{ output lines. The outputs of the multiplexers are connected to an N x N nonblocking switch. The third and final stage of the hierarchical switching system consists of g demultiplexers, where demultiplexer j connects K'- output lines 1 from the central switch with Pj output users. Each input (output) is connected j to exactly one multiplexer (demultiplexer). Thus, if the num ber of input (output) , users of the HSS equals M , it is easy to observe that ^ f= i P* = Pj = M. j ! Similarly, J2i=i Ki — Yfj=\ Kj = N. A detailed description of hierarchical switching j systems is found in [16]. Next, we present a m athem atical formulation of the | 49 ! I DEMUX MUX NxN CENTRAL SW ITCH DEMUX MUX DEMUX MUX Figure 3.1: A general TDM hierarchical switching system. ! problem of finding an optimal time-slot assignment in an HSS. This formulation is I based on m aterial presented in [16] and [8]. Each input user of the HSS demands different amounts of tim e to transm it ; information to the output users. Let tjj (1 < i,j < M ) denote the num ber of [ time-slots required to transm it the traffic from input user i to output user j. Note | | th at titj are non-negative integers. Thus the M x M m atrix T = [L,j], known as the ! i . 1 traffic matrix, denotes the connection demands from input users to output users I ! . . . . . i ,of the HSS. Rows of this traffic m atrix are divided into / groups / i , ..., / / such j th at group I t contains rows that correspond to inputs of m ultiplexer i. Columns of 1 the traffic m atrix are divided into g groups O i,..., 0 g such th at group Oj contains , I columns that correspond to outputs of demultiplexer,;. Let J?, (Oj) denote the sum ; (of the entries in row * (column j) of the traffic m atrix. Similarly, let £ /,- (Vj) denote ; the sum of all entries in group (Oj) and let S denote the sum of all entries in traffic m atrix T. The time-slot assignment (TSA) problem consists in scheduling i ' the connection demands, presented in traffic m atrix T, using the minimum num ber . 50 of time-slots. Furthermore, any such schedule should meet the following constraints in every time-slot. 1. Each input (output) user can communicate with at most one output (input) user. 2. M ultiplexer i (1 < i < f) can connect at most K{ of its P8 input users to the central switch. 3. Demultiplexer j (1 < j < g) can connect at most K'- of its P j output users to the central switch. In m athem atical terms, the time-slot assignment problem for a given traffic m atrix T is finding positive integers L \ ,..., Lm and zero-one matrices S i,..., Sm subject to the following constraints. 1. Each Si, known as a switching matrix, has at most one 1 in every row (col umn) and at most Ki (Kj) l ’s in group /,■ (Oj); 2. T = L\ • Si + L? • S2 + • • • + Lm • Sm; and 3. L = L\ + Z-2 + • • • + Lm, known as the length of the time-slot assignment, is minimum over all decompositions of T that meet the two previous constraints. It was shown by Eng and Acampora [16], and subsequently by Liew [31], that a TSA of length L exists for a traffic m atrix T if and only if the following five conditions are satisfied. M M S = < NL-, (3.1) *= 1 j=1 M Hi = ^ * > j < L, for 1 < i < M; (3.2) j=l M Cj — a — 1 < L, for 1 < j < M; (3.3) 1 — X M Up = ^2 < K PL, for 1 < p < f ; (3.4) ieip j=1 M v q = E E k i < K q L , for 1 < q < g. (3.5) i = l }€Og 51 We use the notations ‘ |_xj ’ to denote the largest integer less than or equal to x and ‘[x ]’ to denote the smallest integer greater than or equal to x. From equa tions (3.1)-(3.5), it is easy to observe that the length of an optim al TSA for any traffic m atrix T is given by L = max{ [ ’-S'/iV]; Jfc (t = 1,..., M); Cj (j = 1,..., M); lVr/K p] ( p = l , . . . , / ) i r V A J l ( , = 1, (3.6) All the earlier TSA algorithms assume that the inequalities (3.1), (3.4), and (3.5) are tight; that is, the traffic m atrix T satisfies equations (3.1), (3.4), and (3.5) when the inequalities in them are replaced with equalities. A general traffic m atrix satisfying equations (3.1)-(3.5) is readily converted to this form by adding dummy entries to it in a pre-processing step of the TSA algorithm [16, 8]. One advantage of the TSA algorithms developed in this chapter is that none of these requires addition of dummy traffic to the original traffic m atrix. Thus, for a general traffic m atrix T satisfying equations (3.1)-(3.5), we develop algorithms for finding an optim al TSA of length L as given by (3.6). We say that a traffic m atrix T is of length L if the m atrix T has an optim al TSA of length L. Three sequential time-slot assignment algorithms for an HSS were presented in [31, 8, 12]. These algorithms are based on constructing a network-flow model corresponding to the traffic m atrix and finding a maximum flow in the graph model. The algorithm presented in [8] for generating an optimal tim e slot assignment of length L for a given traffic m atrix T can be described informally as follows: Bonuccelli's Algorithm 1. Generate the traffic network associated with the traffic m atrix T. 2. Find a maximum flow in the traffic network and generate the corresponding switching m atrix Si. 3. Find the maximum integer Li > 0 such that the traffic m atrix T — L{ • Si is guaranteed to have a TSA of length L — Li. 4. T < — T — Li - Si. 52 5. If T has non-zero entries, go to step 1; otherwise stop. Bonuccelli’s algorithm requires 0(m in(T , M 2)) iterations and 0 ( M 3) tim e per iter ation [8]. Hence, the running time of the algorithm is 0(m in(T , M 2) • M 3). Liew’s algorithm [31] can be viewed as a special case of Bonucelli’s algorithm and runs in 0 ( L M 3) time. The third algorithm [12], due to Chalasani and Varma, has the best time-complexity of the three. This algorithm requires 0{vdm{L, M 2)) itera tions and 0(min(./V, y/M )-M 2) time per iteration. Hence, it runs in 0(m in(T , M 2) • min(iV, y/M) • M 2) time, which is at least an 0 (\/M ) improvement over the other two. Although the above algorithms run in polynomial time, they require a pro hibitively large amount of tim e when M , the number of input users of the HSS, exceeds a few hundred. Since the number of input users of an HSS system can be large, there is a need to obtain faster TSA algorithms for these systems. As discussed in [12], obtaining a sequential TSA algorithm that runs in less tim e than 0(m in(L , M 2) • min(7V, y/M) • M 2) can be quite difficult. Thus, the most promis ing approach for a faster solution to the TSA problem lies in developing efficient parallel algorithms. The TSA algorithms presented in [31, 8, 12] are inherently sequential. T hat is, iteration i in these algorithms uses a traffic m atrix generated in iteration i — 1. Thus, no two iterations in these algorithms can be run in parallel. In the next section, we present a divide-and-conquer approach to find an optim al TSA for any given traffic m atrix. In this approach, the problem of finding an optim al TSA for a traffic m atrix T is divided into two subproblems of finding optim al TSAs for traffic matrices T\ and T2 such that T — T\ + Tj. Such a decomposition enables us to devise a recursive time-slot assignment algorithm which can be effectively implemented on different models of parallel computation. Another advantage of our approach is that it eliminates the pre-processing step of adding dummy traffic to the traffic m atrix, which is required in all the previous algorithms. 53 3 .3 A R ec u r siv e T S A A lg o rith m In this section, we present a sequential TSA algorithm based on recursively decom posing the traffic m atrix. The idea is to divide any given traffic m atrix T of length L into two traffic matrices T\ and T2 such that: 1. T = Ti + T2, and 2. L = Li + Z /2, where L\ and L 2 denote the lengths of traffic matrices T\ and T2, respectively. The existence of such a decomposition for any traffic m atrix was proved in [16], but no algorithm was provided to obtain such a decomposition. Using the sequential TSA algorithms [8, 12], one can obtain such a decomposition in a straightforward manner. However, this approach requires 0(m in(Zq, M 2) • min(Ar, V M ) ■ M 2) time. In this section, we use a network-flow model to obtain a decomposition in 0 ( M 3) sequential time. The first step of the decomposition algorithm consists in generating a graph model associated with the traffic m atrix T. We term this network a traffic network. Given a traffic m atrix T of length L, and a number L\ in the range 0 < L\ < L, the traffic network Gt,^ corresponding to T and L\ is obtained as follows: The traffic network is a directed graph th at consists of a total of 2M + / -f g + 2 nodes. Figure 3.2 illustrates the general structure of a traffic network. The nodes are arranged on six levels, with arcs directed from level % to level i + 1. Level 1 consists of the source node s and level 6 corresponds to the sink node t. The nodes in levels 2 and 3 and the arcs between them model the multiplexers in the HSS. Similarly, the nodes in levels 4 and 5 and the arcs between them model the demultiplexers. The nodes in levels 3 and 4 with the associated arcs model the input and output users with their traffic requirements. The nodes and arcs are constructed level by level as follows: L evels 2 a n d 5: The / nodes at level 2 represent the multiplexers in the system. Node Ip corresponds to multiplexer p, for 1 < p < f. Each node Ip has an incoming arc from the source node s. Similarly, the node Oj in level 5 represents demultiplexer j, for 1 < j < g. Each node Oj has an outgoing arc to the sink node t. 54 Level 1 2 3 4 5 6 Figure 3.2: General traffic network G t ,l 1 corresponding to an HSS (Refer to Ta ble 3.1 for the lower-bounds and capacities on the arcs.). L evels 3 a n d 4: These nodes represent the rows and columns of the traffic m atrix. Nodes ri, r 2, ..., Tm in level 3 represent the M rows and nodes c1 ? c2, ..., cm in level 4 represent the M columns. An arc exists from r 8 - to Cj if and only if the entry t{j in T is non-zero. Each node r,- has an incoming arc from node Ip in level 2 if and only if row i belongs to the group 7P. Similarly, each node Cj has an outgoing arc to node Oq in level 5 if and only if column j belongs to the group Og. Finally, a backward arc (t , s) from the sink to the source is introduced to close the network. There are two quantities associated with each arc of the traffic network G t,lx ' ■ The first quantity, known as capacity, indicates the maximum flow th at can pass through the arc and the second, known as lower bound, indicates the minimum am ount of flow that must pass through the arc. Table 3.1 lists the lower bounds and capacities assigned to the arcs in G t ,Li - Each arc (s, Ip) has capacity K PL\\ similarly, each arc (Oq,t) has capacity K'qL\. The lower bound on arc (s, Iv) is Up — K P(L — L i) if Up exceeds KP(L — Li), and 0 otherwise; similarly the lower bound on arc (0 q,t ) is Vq — K'q{L — L\) if Vq exceeds K'q{L — L\), and 0 otherwise. 55 arc lower bound capacity (s, Ip) Up - K P(L - Lx), if Up > K P(L - Lx); 0, otherwise. K PL X (Oq,t) Vq - K'q(L - Lx), if V, > K '(L - Lx); 0, otherwise. K 'L X (Lp, Tj) R{ — L + Zfj) if Rj L — fij 0, otherwise. Lx (Cj, Oq) Gj — L + Lx, if Gj > L — Lx; 0, otherwise. Lx {ril C j) lAjLx/LJ TLj L x/ l i ( ■ t,S) 5 - N{L - Lx), if S > N (L - Lx); 0, otherwise. N L X Table 3.1: Lower bounds and capacities of the arcs in the network G t ,l x- Every arc of the form (r,-,Cj) has capacity [L jL x/L ] and lower bound [LjLx/LJ. Capacity of every arc of the form (Ip, r,;) or (cj, 0 q) is Lx. The lower bound on arc (Ip, r,) is Ri — L + Lx if Ri exceeds L — Li, and 0 otherwise; similarly, the lower bound on arc (Cj, Og) is Cj — L + L x if Cj exceeds L — Lx, and 0 otherwise. The arc (i, 5) has capacity N L X] its lower bound is S — N (L — Lx) if S' exceeds N{L — Lx), and 0 otherwise. Figure 3.3 shows an example traffic-matrix T and the corresponding traffic network for L x = 5. The HSS considered has M — 8 users connected to a central switch of size N — 4 via two identical 4 x 2 multiplexers on the input side and two identical 2 x 4 demultiplexers on the output side. By equation (3.6), the length of any optim al TSA for this traffic m atrix is 10. A circulation in the traffic network is an assignment of numbers(flows) to the arcs such that (i) the number assigned to each arc is not smaller than its lower- bound and not greater than its capacity; and (ii) the flow is conserved at each node, th at is, for each node, the sum of the flows assigned to incoming arcs is equal to the sum of the flows on outgoing arcs. The value of the circulation is the flow | assigned to the arc (t,s ) by the circulation. The lower bounds and capacities on the arcs of the traffic network G t ,l x have been chosen with the aim of finding a circulation that corresponds to a traffic m atrix Tx of length Lx. The traffic m atrix Tx has the same structure as the original traffic 56 5 6 1 2 2 2 00 2 2 6 Figure 3.3: A traffic m atrix T and the corresponding traffic network Gt,L\ for L i = 5 . 57 m atrix T. Hence, in Ti, the sum of all entries should be at most equal to N L X. This is achieved by assigning a capacity of N L X to the arc (t, s). At the same tim e, the total traffic in the residual m atrix T2 = T — Tx should not exceed N (L — L x) for T2 to be a valid traffic m atrix. The lower bound of S — N{L — L\) on the axe (t,s) achieves this objective. In addition, the sum of all entries in group Ip (0 q) of Ti should at most be equal to K pLx (K'qLx) for Tx to be valid. This is achieved by forcing each arc (s, Ip) to have a capacity of K PLX and each arc (0 q,t ) to have a capacity of K qL x. The sum of all entries in group Ip of traffic m atrix Ti should be at least Up — K P(L — L x) so that the sum of all entries in group Ip of the residual m atrix T2 equals at most K P(L — L x). A lower bound of Up — K P{L — Lx) on arc (s,/p) realizes this objective. The lower bound of Vq — K q(L — L x) for arc ( 0 q,t) serves the same purpose for the column-group Oq. Finally, the sum of all entries in row i (column j) of T2 should not exceed L — L\ for T2 to be valid. Placing a lower bound of Ri — L + Li on arc (Ip,ri) achieves this objective for the row-sum; and a lower-bound of Cj — L + L\ on arc (cj, 0 q) serves the same purpose for the column-sum. Any circulation in the traffic network Gt,Li corresponds to a traffic m atrix Tx of length L\. The i ,j th entry of Ti is simply the flow assigned to the arc (r,-,Cj) by the circulation. Therefore, the following algorithm can be used to decompose a traffic m atrix T of length L into two traffic matrices Tx and T2 of lengths L x and L 2, respectively, such that T = Tx + T2 and L = Lx + L2. 1 Decom position Algorithm(T, Lx) 1. Generate the traffic network Gt,Li corresponding to T and L x. 2. Find a circulation in the traffic network and generate the corresponding traffic m atrix Ti. 3. T2 <- T - Tx. A traffic network, similar in structure to that shown in Figure 3.2, was em ployed by Bonuccelli to extract a switching m atrix from any traffic m atrix [8]. Our approach, in contrast, uses the traffic network to decompose the traffic m atrix into two valid traffic matrices whose individual lengths can be chosen arbitrarily, 58 subject to the constraint that the sum of the lengths is equal to the length of the original m atrix. Next, we prove the existence of a circulation of value N L \ in the traffic network G t.Li - L e m m a 2 Let T be a traffic matrix of length L and let L\ be any integer such that 0 < Li < L. Further, let Gt,Li be the traffic network corresponding to the matrix T and L \ . A circulation that satisfies the lower-bound and capacity constraints exists in G t,Li- P ro o f. We use the theory of circulations [28], to establish the existence of a circulation in G t , l A cutset (cc, ft) in a network is a partition of the nodes into two subsets a and ft. Let hj and dij denote the lower bound and the capacity of the arc (z,j), respectively. It was proved by Hoffman [28] that in a network with lower bounds and capac ities, a circulation exists if and only if £ k i < £ (3.7) i€0 ,j €a i€ct,j€ 0 for all cutsets (a, ft). In other words, let us consider any partition of the set of nodes in the network into two subsets a and ft. A circulation exists in the network if and only if the sum of the lower bounds on all arcs th at emerge from a node in ft and term inate at a node in a does not exceed the sum of the capacities on all arcs th at originate from a node in a and term inate at a node in ft. We next show th at, for every partition of nodes into sets a and ft in the network G t , l 1 , condition (3.7) is satisfied. To prove this, we consider the special case when L i = L. In G t , l , each arc (s,Ip) has capacity K PL and lower bound Up. Similarly, each arc (Oq, i) has capacity K[L and lower bound Vq. Every arc of the form (r;, cj) has capacity and lower bound thJ. Capacity of every arc of the form (Ip, r,) is L and lower bound R{. Similarly, every arc (cj,Oq) has capacity L and lower bound Cj. Finally, the arc (t, s) has a capacity of N L and a lower bound of S. 59 Now assume that a flow of value equal to the lower bound is placed on each axe of Gt,l ■ It is easy to observe that such a flow satisfies the principle of flow- conservation at each node in Gt,l, as well as the capacity and lower-bound con straints on each arc. Therefore, such an assignment constitutes a valid circulation for Gt,l , of value NL. Hence, by Hoffman’s result, the network Gt,l satisfies condition (3.7) for all cutsets. Given that G t,l satisfies condition (3.7) for all cutsets, we now prove th at Gt,l^ satisfies the same condition for any arbitrary L\ in the range 0 < L\ < L. To show this, consider any (a, /3) cutset in Gt,Li • We need to consider four separate cases depending on the choice of the cutset: (i) both s and t belong to the partition a , (ii) both s and t belong to , (iii) s £ a, t € /?, and (iv) s £ /?, t € ct. Case (i): s,t £ a. Because Gt,l satisfies condition (3.7), we have E — E di,j- i€0,j€a i£a,j£0 T hat is, £ R i+ £ <«+ £ C,-+£ V, < lp € 0,ri£ a ri£0,Cj£a cj£0,O q£ a Og £ 0 £ a ; £ + £ L + £ tij + £ L. (3.8) lp£0 Ip&a,ri£ 0 Ti£a,cj£0 cj£ a ,O q£ 0 By multiplying both sides of equation (3.8) with Xi/T, we obtain the following equation. I £ RiUjL^ £ £,£,/£+ £ C J £,/£+ £ %£,/£ < lp € 0,ri£ a ri £0,cj £ a cj£0,O q£ a Oq£0 ' £ K pL1 + E ^ 1 + E E L 1 - (3-9) lp£.0 lp£oi,ri£0 Ti£a,Cj£0 Cj£a,Oq£0 We next require the fact that i?8 — L + L\ is at most equal to RtLxf L. This can be shown using the following sequence of equations. Ri < L Ri(L — Li) < L(L — Li) 60 i?i(l — Li/L) < L — L\ Ri — L -1 - L\ ^ RiL^jL Therefore, RiLi/L > max(0, Ri — L + L\). (3.10) In a similar way, we can show that VqLx!L > max(Q,Vq - K'q(L - Lx)). (3.11) Using (3.10) and (3.11), and the relations [UjL\/L\ < tijL\jL and \tijLx/L] > UjLx/L, we can rewrite equation (3.9) as shown below. Y . (Ri — L + L\) + ^2 [U jLi / L\ + ^2 (Cj — L + L\) + /p€/3,r,-ea r,€/3,c^6< * cjep,Oqea C j >L— L\ E < E ^ + E £ * + 0,€(8 Ip€P IP €a,ne0 V q>K'q(L-L1) E r « u £ i / i i + E (3-12) ri& a,Cj€/3 Cj£a,Oq€(3 Now observe that the quantity on the left-hand side of equation (3.12) corresponds to the sum of the lower bounds on arcs that emerge from a node in /? and term inate at a node in a in network G t ,Li - Similarly the right hand side of equation (3.12) corresponds to the sum of the capacities on arcs from a node in a to a node in (i in Gt,Li • Hence, the network G t,li satisfies Y — Y j d% ,j- i€P,j€a i€a,,je.fi In a similar fashion, we can prove that the network Gt,l 1 satisfies condition (3.7) for the remaining cases corresponding to (ii) both s and t belong to /?, (iii) s E ct, t E /?, and (iv) s E (3, t E a . Hence, a circulation satisfying the capacity and lower-bound constraints exists in Gt,Li • This concludes the proof of Lemma 2. ■ Once a circulation is found in the traffic network G t,lt , the corresponding m atrix Tx is constructed by setting its ijth entry equal to the flow-value assigned 61 to the arc (rj,Cj) by the circulation. Next we prove th at the traffic m atrix T\ constructed by this m ethod has a length of L\ and the traffic m atrix T — Ti has a length of L — L\. L e m m a 3 Let T be a traffic matrix of length L and let G t , l x be the traffic network for some L\ in the range 0 < L\ < L. If T\ is the traffic matrix constructed from any circulation in Gt,lx, then Ti is of length L\ and T — Tx is of length L — L\. P ro o f. Any circulation in G t , l x must satisfy the upper bound on each of its arcs. Thus, in Ti, the sum of all entries is at most N L \ ; the sum of all entries in row- group 7p(column-group Og) is at most K VL\ (KqLx); and the sum of all entries in row i (column j) is at most L\. Hence, T\ is a valid traffic m atrix of length L\. To show that T2 = T — T\ is a valid traffic m atrix of length L — Lx, consider the lower bounds on each arc of G t , l x ■ The lower bound of Up — K P(L — Lx) on arc (s, Ip) forces the sum of all entries in row-group Ip in Ti to be at most K P(L — Lx). Similarly, the sum of all entries in column-group Oq can not exceed K'q{L — Lx). The sum of all entries in any row i (column j) of Ti can not exceed L — Lx because of the lower bound of Ri — L + L\ on arc (Ip, r8 ) (Cj — L + L\ on arc (cj, Oq)). Finally, the total traffic in T2 can not exceed N (L — L\) because of the lower bound of S — N (L — L\) on arc (t,s). Therefore, by equations (3.1)-(3.5), T2 is a valid traffic m atrix of length L — Lx. This concludes the proof of the lemma. ■ As an example, Figure 3.4 shows a circulation of value 17 in the network of Figure 3.3. The corresponding traffic m atrix Tx is given by 2 3 1 1 1 4 1 1 3 and has a length of 5. It is easy to show that T — Tx is also a valid traffic m atrix of length 5. 62 Figure 3.4: A circulation in the network of Figure 3.3. The following theorem proves the correctness of the decomposition algorithm and estim ates its time-complexity. T h e o re m 9 The decomposition algorithm correctly divides any traffic matrix T of length L into two traffic matrices T\ and T2 of lengths L\ and L2, respectively, such that T = Ti + T2 and L = Lt + L2. Further, it runs in 0 ( M 3) time. P ro o f. Correctness of the decomposition algorithm follows from lemmas 2 and 3. To estim ate the time-complexity of the algorithm, observe th at the num ber of nodes in the traffic m atrix equals O(M) and the number of edges equals 0 ( M 2). Thus, construction of the traffic m atrix, which constitutes Step 1 of the algorithm, can be achieved in 0 ( M 2) time. Finding a circulation through the traffic network can be achieved in 0 (M 3) time using any of the several max-flow algorithms pre sented in literature [32, 50] (for a brief survey, see [4]). Once a circulation is found, generating the traffic m atrix Ti takes 0 (M 2) time. Therefore, Step 2 of the decom position algorithm requires 0 ( M 3) time. Step 3 of the algorithm, which computes the difference between two M x M matrices, requires 0 ( M 2) time. Hence, the decomposition algorithm runs in 0 ( M 3) time. This concludes the proof of the theorem. ® 63 Observe th at the same decomposition technique can be employed iteratively to decompose any given traffic m atrix T into c different traffic matrices Ti, T2, ..., Tc of lengths L\, L2, ..., Lc, respectively, such that T = T\ + T% + • • • + Tc and L = L\ + L 2 + • • • + Lc. Such a decomposition can be achieved in 0 (cM 3) steps. Now we are in a position to formulate the recursive sequential TSA algorithm. Recursive TSA Algorithm (T , L ) (Decomposes a traffic m atrix T of length L into L switching m atrices.). 1. Let L be the length of T. If L = 1, stop (T is a switching m atrix). 2. Execute Decomposition Algorithm(T, [L /2]). Let 7\ and T2 be the traffic matrices generated as a result of the decomposition. 3. Recursive TSA Algorithm (Ti, [L/2]). 4. Recursive TSA Algorithm (T2, [L/2J). Observe that every invocation of the decomposition algorithm divides the origi nal traffic m atrix T into two traffic matrices of approximately equal lengths. Hence, every traffic m atrix of length greater than one gets subdivided into several traffic m atrices, each of length one, owing to the recursive nature of the above algorithm. Thus, for any traffic m atrix T, the recursive TSA algorithm generates an optim al TSA in a finite number of steps. To compute the running-time of the algorithm, observe that the first invocation of the algorithm with a traffic m atrix T of length L generates two subproblems: (i) finding an optim al TSA of length [L/2] for T\, and (ii) finding an optimal TSA of length \Lf 2] for T2. The amount of tim e taken to obtain such a decomposition equals 0 ( M 3). Hence, the total running tim e of the algorithm on any traffic m atrix of length L can be expressed using the recurrence equation T( 1) = 0 ; T(L) = r(ri/2U + T(|L/2J)+0(M 3). The solution to this recurrence is given by 0 (L M 3) [2]. Thus the recursive TSA algorithm runs in time 0 (L M 3) in a sequential implementation. This is the same 64 as the running tim e of Liew’s Algorithm. The recursive TSA algorithm , however, is inefficient when compared with the best-known sequential TSA algorithm [12], which runs in 0 ( M 2 ■ min(A^, y/M) ■ min(L, M 2)) time. The main advantage of the recursive TSA algorithm is that it is naturally suited to parallel implementation. Such parallel algorithms are the topic of the next section. 3.4 P a ra llel T S A A lg o rith m s In this section, we use the decomposition algorithm presented in Section 3.3 to develop parallel algorithms for finding an optimal TSA in a hierarchical switch ing system. These algorithms are based on recursive decomposition of the traffic m atrix into two matrices, each of length approximately half of the original m a trix. We first describe a parallel algorithm with 0 ( M 3 log L) tim e on a PRAM model of com putation using exactly [L/2J processors. The running tim e of this algorithm can be improved to 0 { M 2 log M log L) by increasing the num ber of pro cessors to M\_L/2J. We then combine this algorithm with the sequential algo rithm in [12], resulting in a general P-processor algorithm with time-complexity 0 ( M 3 log P + M 2 -min(Af, \/M)-min(L/P, M 2)) for P in the range 1 < P < [Lj2 j. Finally, we describe an implementation of the algorithm on a P-processor hyper cube multiprocessor. We use the PRAM model of computation to describe our parallel algorithm. This model consists of P autonomous processors, all having access to a common memory. Each processor may also have a local memory, which is not accessible by any other processor. At each step, each processor performs one instruction from its instruction stream. A memory-access is assumed to take unit time. Three variants J of the PRAM model have been used by researchers to describe parallel algorithms (see, for example, [25]): In the concurrent-read, concurrent-write (CRCW ) model, any num ber of processors may simultaneously read from or write to the same location in memory. In the concurrent-read, exclusive-write (CREW ) model, any num ber of processors may simultaneously read the same memory location, but simultaneous writes are not allowed. Finally, in the exclusive-read, exclusive-write (EREW ) model, which is the weakest of the three, exclusive access is required 65 for both read and write. The following description of our algorithms assume the EREW PRAM model. 3 .4 .1 A ,P R A M A lg o r ith m Our parallel algorithms adopt a divide-and-conquer approach to decompose the traffic m atrix into individual switching matrices. Assume th at the traffic m atrix T of length L is stored as m atrix T0 in the memory. In the first step of the parallel algorithm , the traffic m atrix T0 is divided into two traffic matrices by invoking the decomposition algorithm with param eters To and \L / 2"|. The two resulting traffic m atrices, of lengths \L / 2] and [Lf 2j , are stored as T0 and T ^ / 2] > respectively. The next step of the algorithm further divides T0 and T\l/ 2 \, resulting in new matrices To, T|‘|x/2'|/2 " |j T|x/2] and T[x/2i+[rL/2]/2V These two decomposition operations can be executed in parallel with no memory-conflicts. This process is continued until each resulting traffic m atrix is of length one. An informal description of the parallel TSA algorithm is given below. Parallel TSA Algorithm(T, L) 1. fo r all i € {0,1,..., L — 1} do T, < —0. 2. T o < — T; L q < — L . 3. fo r j = 0 to [log L] — 1 do 3.1. fo r all i satisfying T; > 1 do 3.1.1. Perform D ecom position^, [T ;/2 ]). Let T/ and T" be the re sulting matrices, of lengths [T;/2] and |_T;/2J respectively. 3.1.2. r i+rLi/2l < — T/'; Z r,-+[ • £ ,t -/2i LTj/2J. 3.1.3. Ti <- T[\ U <- \Li/2]. 3.2. e n d for all. 4. e n d for. The for all statem ents are executed for all i in parallel. Note th at steps 3.1.1 - 3.1.3 need not be performed in lock-step for each i if synchronization is performed 66 at the end of the for-all loop to guarantee that all active processors complete one iteration of the loop before proceeding to the next. From the correctness proof of the decomposition algorithm (Theorem 9), it fol lows th at the parallel algorithm produces matrices To, T\,..., T l-\, each of length one. Thus at the end of execution of the algorithm, matrices T0, T? , ..., T l- i rep resent an optim al TSA of length L for the traffic m atrix T. To estim ate the tim e complexity of the parallel algorithm, observe that steps 1 and 2 require a constant amount of time. Step 3.1.1 performs the decomposition algorithm and requires 0 ( M 3) tim e for each decomposition. Each of the steps 3.1.2 and 3.1.3 copies an M x M m atrix into another and hence requires M 2 steps. Since steps 3.1.1-3.1.3 are performed in parallel for all i, each invocation of the for loop beginning on line 1 takes only 0 ( M 3) time. Thus, the parallel TSA algorithm executes in 0 ( M 3 log L) tim e. To estim ate the number of processors required, we notice th at when L equals 2, only one processor is required. The number of processors for larger L is given by the recurrence P ( l) = P ( \L /i \ ) + P ( [ L / 2 S ) , with -P(l) = 0 and P(2) = 1. A solution for this recurrence relation is given by P{L) < \_Lj2J. Thus, a maximum of [L/2J processors are required for this parallel algorithm. The time-complexity of this parallel algorithm can be improved further if we have more processors available. This additional improvement is achieved by par allelizing step 3.1.1 of the decomposition algorithm. Step 3.1.1 finds a m aximum flow through the traffic network and requires 0 ( M 3) sequential time. Parallel al gorithms for finding a maximum flow in an M -node network in 0 ( M 2 log M ) tim e using M processors have been reported [46, 19]. Therefore, step 3.1.1 of the de composition, and hence the decomposition itself, can be completed in 0 { M 2 log M) parallel-time using M processors. This results in a massively-parallel version of the algorithm with M L /2 processors and 0 ( M 2 log M log L) time. For the special case of L — 0 ( M k) for some k > 2, and N = 0 ( M ), which are reasonable assumptions in practice, the parallel algorithm runs in 0 ( M 3 log M) tim e and the massively-parallel version run in 0 ( M 2 log2 M ) time; the best-known sequential algorithm, in comparison, takes 0 ( M 4 S) tim e [12]. 67 3 .4 .2 A G en era l A lg o r ith m for P < L / 2 P r o c e sso r s The above algorithm can now be generalized to P processors, for any 1 < P < \L f 2J. This general algorithm has two stages: In the first stage, the traffic m atrix T of length L is decomposed into P traffic matrices, each of length approximately L /P . This decomposition can be performed in parallel on the P processors. In the second stage of the algorithm, each of the P processors works independently on a traffic m atrix of length LJP to produce a TSA of length L /P by means of a sequential TSA algorithm. Because a sequential implementation of the recursive decomposition algorithm is inefficient as compared to the sequential algorithm in [12], the latter is the best choice for the second stage. The general algorithm is described below, where we assume the number of processors P is a power of 2. This assumption simplifies the description of the algorithm. The algorithm is easily modified to handle the case of arbitrary P. Parallel TSA Algorithm on PRAM model with P — 2 P processors, P < \_L/2J. 1. T0 < — T; Lq < — L. 2. for j = 0 to p — 1 do 2.1. for all i € {0,1,..., 2J — 1} do 2.1.1. Perform Decomposition(Tj, [T ;/2]). Let T/ and T" be the re sulting matrices, of lengths \L{/2] and [Li/2J respectively. 2.1.2. Tl+ 2 3 < T"; Li+2J i [T i/2j, 2.1.3. Ti 2?; Li <- \U f2]. 2.2. end for all. 3. end for. 4. for all j e {0 ,1,..., 2P — 1} do 4.1. if Li > 1 decompose T8 into Li switching matrices using the sequential TSA algorithm in [12]. 5. end for all. 68 The first stage of the above algorithm consists of the loop beginning on line 2 and executes in 0 ( M 3 log P) time to produce traffic matrices To, T \,... ,Tp-\. The length Li of traffic m atrix T ,- is at most \L/P~\. The second stage of the algorithm comprises the loop beginning on line 4 where processor i decomposes T * into Li switching matrices. The sequential TSA algorithm in [12] achieves this in 0 ( M 2 • min(7V, y/M) • m in(L/P, M 2)) time. Therefore, the overall running tim e of the algorithm is 0 ( M 3 log P + M 2 • min(iV, y/M) • m in(A /P, M 2)). 3 .4 .3 A H y p e r c u b e A lg o r ith m Next, we describe an implementation of the parallel TSA algorithm on a hypercube multiprocessor system. We consider a hypercube system with P — 2P processors, for some 0 < p < [log L ] . In the hypercube system, two processors i and j are adjacent (that is, connected by a link) if and only if the binary representations of i and j differ in exactly one bit-position. Thus each processor in a hypercube with 2P processors is connected to exactly p other processors. The parallel TSA algorithm th at runs on the hypercube multiprocessor system is detailed below. The algorithm exploits the fact that a 2p-processor hypercube contains two disjoint 2P_1-processor hypercubes. The algorithm starts with the initial traffic m atrix To = T , of length X, in processor 0. In the first iteration, processor 0 divides To into T q and T q , of lengths [T/2J and [A/2] respectively. Processor 0 then sends m atrix T q to processor 2P“1. In the next iteration, processors 0 and 2P~X work independently to decompose the m atrices T/ and T q , respectively. These two processors divide the hypercube into two smaller subcubes, the first consisting of processors 0,1,..., 2P_1 — 1, and the second consisting of 2P_1, 2P_1 + 1,..., 2 P — 1, and perform recursive decomposition of the m atrix within each subcube. Each iteration of the loop divides the subcubes into smaller subcubes until each of the 2P processors receives a traffic m atrix. At this point, each processor can use a sequential algorithm to further decompose the traffic m atrix it received, if its length is more than 1. Note th at the algorithm is totally asynchronous and all communication is to adjacent nodes in the hypercube. Hypercube Parallel TSA Algorithm(T, A) 1. Processor 0 performs T0 < — T and Aq < — A. 69 2. for j = p — 1 dow nto 0 do 2.1. for all processors i satisfying i mod 2J+1 = 0 do 2.1.1. Perform Decomposition(T{, \Li/2\). Let T' and T" be the re sulting matrices, of lengths \L{/2] and [L{ / 2J respectively. 2.1.2 . Processors i and i + 2J communicate to perform Ti+2 > * — T"\ L i+ v * — [Li/2J. 2.1.3. Processor i performs Ti < — T/; Li < — \Li/2]. 2.2. end for all. 3. end for. 4. for all processors i £ , 2 P - 1} do 4.1. Decompose Ti into L{ switching matrices using the sequential TSA al gorithm in [12]. 5. end for all. As in the general parallel TSA algorithm, the hypercube algorithm runs in , 0 ( M 3 log P + M 2 • min(7V, y/M) • m in(L/P, M 2)) time. In the case when L — P — 2P, the above algorithm term inates with each of the P processors containing exactly one switching m atrix in the decomposition. Often it is required th at similar switching matrices in any TSA are assigned contiguous slots. This reduces the number of times the switching system has to be reconfigured during the transmission of the frame. Our algorithms do not necessar ily produce a TSA in which all similar switching matrices are contiguously located. i This problem can be alleviated by using a bitonic-sort and a recursive-doubling operation after running the TSA algorithm. In the following part of this section, we briefly describe how these operations can be achieved. We consider only the case of the hypercube multiprocessor with L = P = 2P processors. Batcher’s bitonic sort algorithm correctly sorts any given set of L data values in O(log2 L) tim e using L /2 comparison-exchange elements [5]. In order to group all similar switching matrices together, bitonic sort is to be performed with switching m atrices as the data values. However, since each switching m atrix has exactly N 70 non-zero elements (with value 1) we use an array of N elements for comparison instead of an M x M m atrix. Let Vi denote the array corresponding to the switching m atrix Ti and Vij denote the j th element of Vi, with 1 < j < N. Vij is an ordered pair (X ij,yij ) that stores the address of the j th one in the switching m atrix T*. W ithin each Vi, the entries Vij are ordered by their row addresses. For the purpose of sorting, the following order is imposed on the K ’s: Vt < Vk if one of the following conditions is satisfied: 1. There is an r in the range 1 < r < N such th at XifT < Xk,r, and Xij = Xkj for all 1 < j < r. 2. Xij = Xk,j for all 1 < j < JV; further, there is an r in the range 1 < r < N such th at yitr < yk,n and yf)j = yk,j, for all 1 < j < r. 3. xitj = x kj and y{j = yhj for all 1 < j < N. A bitonic-sort is performed to sort the arrays Vo through according to the ordering above. The function of the comparison-exchange elements in the bitonic- sort is performed by the processing elements. In this way, the L switching matrices To, T\,. .., TL-1 can be sorted in 0(iVTog2 L ) time. Once the bitonic-sort is performed and the matrices ordered, only one recur sive doubling operation is required to compute the num ber of times each switch ing m atrix appears in the optimal TSA. The recursive doubling operation can be explained as follows: Assume that the switching matrices are ordered such that Vo < V i <■■• < Vl- i . The recursive doubling operation computes the values a0, c j,..., cll- i as per the following rule: 1. If Vi = Vi-1, cti equals -1; 2. else di is the maximum r such that VT = Vi. T hat is, for each non-negative di, the number of traffic m atrices identical to Tz is given by di — * -f 1. This gives a decomposition of the traffic m atrix T into distinct switching matrices Ti as T = + t « j > 0 71 A parallel algorithm to compute the < Z j’s on a hypercube multiprocessor with P = 2P processors is described below. For simplicity, we assume L = P. Processor i stores switching m atrix Ts - in sorted order. A D D R E S S i and D A T A i are local variables in processor i. Hypercube Recursive-Doubling Algorithm 1. Initially a{ = *, A D D R E S S i = i, and D A T Ai = K, for all 0 < i < P - 1. 2. for j = 0 to p — 1 do 2.1. for all processors i satisfying i mod 2J+1 = 2 3 do 2.1.1. if V i = DATAi^i sef ai * ----- 1- 2.2. end for all. 2.3. for all processors i satisfying i mod 23+l = 0 do 2.3.1. D A T A i <- D A T A i+2i ■ 2.3.2. if V i = K+2> set A D D R E S S i <- A D D R E S S i+2J. 2.4. end for all. 3. end for. 4. D A T A i «- 0 < * < P - 1. 5. for j = p — 1 dow nto 0 do 5.1. for all processors i satisfying i mod 23+l = 0 do 5.1.1. D A T A i ++ D A T A i+2}. 5.1.2. A D D R E S S i < -> A D D R E S S i+2j. (swap values in processors i and i + 2 3 .) 5.2. end for all. 5.3. for all processors i satisfying i mod 2J+1 = 23 do 5.3.1. if ai ^ — 1 and D A T A i = Vi set ai * — A D D R E S S i. 5.4. end for all. 7 2J 6. end for. The algorithm consists of two for loops, the first beginning on line 2 and the second on line 5. The first loop of the algorithm sets ai = — 1 for all processors i with Vi = Vi-j. This is achieved in p iterations by starting with sets of processors of size 2 and doubling the size of the set in every iteration. During iteration j, there are 2 p~3~l sets, and processors i with i mod 28 + 1 = 0 are the leaders of the respective sets. At the end of the j th iteration of the loop, every leader processor i has A D D R E SSi set to the maximum index r in the same set such th at Vr = V, and D ATAi set to the maximum Vr in the set. Also, processor i has a, set to — 1 if and only if a second processor r < i exists in the same set with VT = V{. Therefore, at the end of the loop, every processor % has a * - set to — 1 unless no processor r < i exists such th at VT = Vi. Note th at communication takes place only between group leaders during every iteration, which are always apart by a power of 2. The second loop of the algorithm, beginning on line 5, sets a,- for each processor whose index i is the minimum among all the processors holding the same switching m atrix. During the first iteration of the loop, the processors are divided into two sets, the first containing processors 0 ,1 ,..., P / 2 — 1, and the second containing P/2, P/2 -j- 1 ,...,P — 1. ADDRpj 2 in processor P/2 contains the largest index r such that Vr = Vp/ 2 ■ The sequence of swap operations in steps 5.1.1 and 5.1.2 moves this data and address to the processor s with the smallest index s, if any, j among the processors 0,1,..., P/2 — 1 with data = Vp/2i as is then set to r. j Each iteration of the loop successively divides the sets in half and propagates the ! smallest d ata item in the second half to the first. At the end of the loop, all a ,’s are correctly set. Each swap operation takes 0 ( N ) time. The recursive doubling algorithm there fore can be completed in 0 ( N log L ) time. 3.5 T S A a lg o rith m s for S S /T D M A S y ste m s In this section, we apply the results from the previous sections to obtain effi cient, parallel TSA algorithms for a simpler class of TDM switching systems. This class of switching systems, known as satellite-switched time-division multiple-access 73 (SS/TDM A) systems, can be viewed as a special case of hierarchical switching sys tems. A sequential algorithm for time-slot assignment in SS/TDM A systems is described in [22] and has a time-complexity of 0 ( M 45), where M is the num ber of users. Here we describe how the methods developed in the earlier sections can be applied to SS/TDM A systems to yield efficient parallel-algorithms for TSA. A simple SS/TDM A system consists of a number of input and output users con nected directly to a nonblocking central switch. Figure 1.3 illustrates an SS/TDM A system with 4 input-users and 3 output-users. For simplicity, we assume th at the num ber of input users, M , of an SS/TDM A system equals the num ber of out put users, which in turn equals the bandwidth of the central switch. A simple SS/TDM A system differs from an HSS in that there are no m ultiplexers (demul tiplexers) at the input (output) side of the switch. An SS/TDM A system can be viewed as a degenerate case of the hierarchical switching system with M m ulti plexers and M demultiplexers where 1. P{ = Ki = 1, for 1 < i < M; and | 2. P / = K[ — 1, for 1 < i < M . In other words, each multiplexer (demultiplexer) has exactly one input and one output and hence, is redundant. Thus, the problem of finding an optim al TSA for ! an SS/TDM A system becomes a special case of the TSA problem for an HSS. In an SS/TDM A system whose traffic requirements are given by the M x M m atrix T , the TSA problem consists in finding positive integers L\, L 2, ..., Lm, and zero-one m atrices Si, S 2 , ..., Sm subject to the following constraints: 1. Each Si, known as a switching matrix, has at most one 1 in every row (col umn); 2. T — Li ■ Si + L 2 • S 2 + -------( - Lm • S m’ , and 3. L = Li + L-i + • • • + Lm, known as the length of the time-slot assignment, is minimum over all decompositions of T that meet the two previous constraints. 74 Given any M x M traffic m atrix T for an SS/TDM A switching system, a TSA of length L exists for T if and only if the following conditions are satisfied. M M •S = £ £ * « < Vi; 2 = 1 j=l M Ri = ^2 tij < L , for 1 < i < M; i=i M Cj = < L , for 1 < j < M. 2 — 1 Hence, for any M x M traffic m atrix T, the length of an optim al TSA is given by L = max{ \S /M ~ \; Ri (i = 1,..., AT); Cj (j = 1,..., M )}. Because an SS/TDM A switching system can be treated as a special case of an HSS, TSA algorithms for the former follow directly from those developed for the latter. Nevertheless, a much simpler network model can be obtained by elim inating the nodes and edges corresponding to the multiplexers and demultiplexers in the model of an HSS. Figure 3.5 illustrates such a simplified traffic network for an SS/TDM A system. This traffic network, G j Li, consists of 2M + 2 nodes, arranged in four levels, with arcs directed from level i to level i + 1. Level 1 consists of the source node s and level 4 consists of the sink node t. Nodes in level 2 (level 3) model the input (output) users with their traffic requirements. These nodes represent the rows and columns of the traffic m atrix, respectively. Nodes ri, r 2, ..., tm in level 2 represent the M rows and nodes Ci, c2, ..., cm in level 3 represent the M columns. An arc exists from rt - to Cj if and only if the entry L j in T is non-zero. There is an arc from s to every node r; in level 1; correspondingly, there is an arc from every node Cj in level 3 to the sink-node t. A backward arc (t, s) completes the network model. Each arc in G'T Ll is assigned a capacity and lower bound as in the case of the HSS model. Arcs (s, rt) and (cj, £)) are assigned a capacity of L\. The lower bound on arc (s, r t) is Ri — L + L\ if R{ > L — L\ , and zero otherwise; similarly, the lower bound on arc (cj,t) is Cj — L + L\ if Cj > L — L\ , and zero otherwise. Every arc of the form (ri,Cj) has capacity \t{^L\/L\ and lower bound [tijL i/L \. Arc (t,s) 75 L e v e l 1 2 3 4 Figure 3.5: Traffic network G'TLl corresponding to an SS/TDM A system. has a capacity of M L \ ; its lower bound is S — M (L — L{) if S' > M (L — L\), and zero otherwise. By the methods used in Section 3.3, we can show that the traffic network G'TLl has a valid circulation which corresponds to a traffic m atrix of length L\. Hence, the parallel algorithms of Section 3.4 can be directly applied to the traf fic network G'T Ll of an SS/TDM A system. The asym ptotic time-complexities of these algorithms remain unchanged in spite of the simpler model. Thus, the T/2-processor algorithm has a time-complexity 0 ( M 3 log L) and the general P- processor algorithm has a time-complexity of 0 ( M 3 log P + M 2 ,5 min(L/P, M 2)). W hen L = 0 ( M k) for some k > 2, the X/2-processor parallel algorithm runs in 0 ( M 3 log M) time; the massively-parallel version run in 0 ( M 2 log2 M ) time. Therefore, these algorithms compare favorably with the sequential algorithm of [22], which has a time-complexity of 0 ( M 4 5). The techniques developed in this chapter are general enough to be applied to a variety of other switching systems as well. In the next chapter, we apply these techniques to a different class of SS/TDM A switching systems called the SS/TDM A systems with variable bandwidth beams. 76 C h a p ter 4 T im e -S lo t A ssig n m en t A lg o rith m s for S S /T D M A S y ste m s w ith V a ria b le-B a n d w id th B e a m s 4.1 In tr o d u c tio n In this chapter, we present efficient sequential and parallel algorithms for con structing optim al time-slot assignments for a general class of TDM switching systems employed in satellite communication. This class of systems, known as satellite-switched time-division multiple-access (SS/TDM A) systems with variable- bandwidth beams, was first analyzed by Gopal, et al. [20]. An SS/TDM A system with variable-bandwidth beams, illustrated in Figure 4.1, has M uplink beams and N downlink beams. Each of these beams is composed of several communication- channels of equal bandwidth. For simplicity, the bandw idth of each channel is assumed as unity. The number of channels in uplink i is denoted by a* and the num ber of channels in downlink j by 0j. Each uplink beam i is demultiplexed into its ai channels, each of unit bandwidth, by an on-board demultiplexer. The demultiplexed channels are then switched to the respective output channels by an on-board switch. Finally, a multiplexer combines the channels destined to downlink j and transm its as a single downlink beam j. Let Sa denote 52* a*, the sum of the bandwidths of the uplink beams to the satellite. Similarly, let Sp denote Yhj flji the sum of the bandwidths of the downlink beams. Observe th at Sa > M and Sp > N. The switch can connect each of its 77 U plink b e a m s 1------- 2 --------- M - Dem ulti- p le x e r a i O LM S w i t c h £ Pn M u ltip le x e r Downlink beams 1 2 N Figure 4.1: An SS/TDM A system with variable-bandwidth beams. Sa input channels with any of its Sp output channels. A limit exists, however, on the m axim um traffic that can be handled by the switch in one time-slot. This lim it K , known as the maximum traffic handling capacity of the satellite, satisfies the condition 1 < K < rain(Sa , Sp). As discussed in [20], this switching structure is general enough to cover a wide range of TDM switching configurations. For example, when all a ,’s and /7,’s are unity, we obtain a simple SS/TDM A switching structure described by Inukai [22]. Likewise, if a, = 1 for all i, and ftj — ft for all j , the SS/TDM A system with variable-bandwidth beams reduces to a system considered by Bongiovanni, et al. [10]. The reader is referred to [20] for a discussion on other possible system configurations that are modeled by an SS/TDM A system with variable-bandwidth beams. Owing to the complex set of constraints to be satisfied by any optim al TSA for SS/TDM A system with variable-bandwidth beams, the time-slot assignment algorithms presented in [22, 9, 10, 8, 11, 12] for other TDM switching systems can not be directly applied to solve the TSA problem for these systems. The only known algorithm in the literature for finding an optim al TSA for these switching systems is due to Gopal, et al. [20] and requires 0(S^Spim n(L,(Sa -f- *S /?)2))) sequential-time. The contribution of this chapter, towards finding an optim al TSA for SS/TDM A systems with variable-bandwidth beams, is two-fold: 78 1. We present a sequential algorithm for the TSA problem th at runs in 0 ((M + TV )3 min(T, MNct)) time, where a = m a x { a i,. . . , , /?i,. . . , {3jy, Sa — K, Sp — K }. This algorithm is asymptotically faster than the algorithm of [20] which requires 0 ( S 2Sp min(T, (Sa + •S '/?)2))) time. Moreover, we provide an upper-bound of min(L, M N ( 1 + a)) on the num ber of switching matrices generated by the algorithm; this upper-bound compares favorably with the upper-bound of min(T, (Sa + Sp) 2 — SaSp — K (S a -f S'/?)) for Gopal, et al.’s algorithm. 2. Additionally, we present efficient parallel-algorithms for the TSA problem. Our first algorithm, which runs on the PRAM model of parallel com puta tion, uses L f 2 processors and requires 0 ((M + TV)3 log L) tim e, where L is the length of any optimal TSA. We then generalize this algorithm for P < L/2 processors and show an efficient implementation on the hyper cube multiprocessor in 0 ((M + TV)3 log P + (M + N ) 3 m in(T /P, M N a )) time. A massively-parallel version of the algorithm runs in 0 ((M + TV)2 \og(M + TV) log L) tim e on (M + N )L/2 processors. Our TSA algorithms are based on formulating the TSA problem as a network- flow problem, namely that of finding a circulation in a graph model representing the switching system. This approach was used in C hapter 3 in the context of hierarchical switching systems. Unfortunately, the TSA problem for SS/TDM A systems with variable-bandwidth beams can not be form ulated as a special case of the TSA problem for hierarchical switching systems. Therefore, the algorithms developed in Chapter 3 can not be directly applied to the problem of finding an optim al TSA for SS/TDM A systems with variable-bandwidth beams. The rest of this chapter is organized as follows. We introduce the necessary definitions and notations in Section 4.2 and present the m athem atical formulation of the TSA problem in an SS/TDM A system with variable-bandwidth beams. In Section 4.3, we describe a sequential TSA algorithm based on recursively decom posing the given traffic m atrix into two traffic matrices. This recursive algorithm runs in 0 (L (M + TV)3) sequential-time where L is the length of any optim al TSA for the traffic m atrix under consideration. Although this algorithm is asym ptot ically slower than the sequential algorithm due to Gopal, et al. [20], its design 79 provides a theoretical foundation on which our later algorithms are based. Sec tion 4.4 presents a sequential algorithm for the TSA problem that runs in 0 ((M + N )3 min(L, M N a )) time, where a = m a x { a i,. . . , aj^, /?i,. . . ,j3jy, Sa ~ K, Sp — K }. Finally, we indicate how we can design parallel algorithms similar to those presented in Section 3.4 for finding an optim al TSA in SS/TDM A systems with variable band w idth beams. 4 .2 P r o b le m F orm u lation An SS/TDM A system with variable-bandwidth beams has M uplink beams and N downlink beams, where uplink i has bandwidth a, and downlink j has band w idth (3j. We now present a m athem atical formulation of the problem of finding an optim al time-slot assignment in such systems. This formulation is based on m aterial presented in [20]. Each uplink beam i has a certain number of traffic units to be transm itted to each downlink beam j. Let tij (1 < i < M, 1 < j < N) denote the num ber of time-slots required to transm it the traffic from uplink i to downlink j. Note th at tij are non-negative integers. Thus the M x N m atrix T = [tij], known as the traffic matrix, denotes the connection demands from the uplink beams to the downlink beams of the system. Let Ri (Cj) denote the sum of the entries in row i (column j) of the traffic m atrix and let S denote the sum of all entries in traffic m atrix T. The TSA problem consists in scheduling the connection demands, presented in traffic m atrix T, using the minimum number of time slots. Furtherm ore, any such schedule should meet the following constraints in every tim e slot. 1. Uplink i (downlink j) can transm it (receive) a maximum of ai (f3j) units of traffic, and 2. The total amount of traffic from uplinks to downlinks does not exceed K , the maximum traffic handling capacity of the satellite. In m athem atical terms, the time slot assignment problem for a given traffic m atrix T is finding positive integers L \ ,... ,L m and matrices subject to the following constraints. 80 1. Each Sp, known as a (a, fl) switching matrix (o/?SM, for brevity), is an M x N m atrix with non-negative integer entries such th at the sum of all entries in row i (column j ) does not exceed a* 2. The sum of all entries in each Sp does not exceed K , the maxim um traffic handling capacity of the satellite; 3. T — L\.S\ + • • • + Lm.Sm; and 4. L = T i+ - • ■+Lm, known as the length of the tim e slot assignment, is m inimum over all decompositions of T that meet the three previous constraints. It was shown by Gopal, et al. [20] that a TSA of length L exists for a traffic m atrix T if and only if the following three conditions are satisfied. s = eJL i t v < k l - (4.i) Ri = E j l i tij < ctiL, for 1 < * < M\ (4.2) Cj = Efci Uj < /3jL, for 1 < j < N. (4.3) We use the notation ‘ [xj ’ to denote the largest integer less than or equal to x and ‘M ’ to denote the smallest integer greater than or equal to x. From equa tions (4.1)-(4.3), it is easy to observe that the length of an optim al TSA for any traffic m atrix T is given by L = m ax{[\Syir|; {Ri/ai] (i = 1 ,.. .,M ); \Cj/pf\ (j = 1 ,. . . , N)}. (4.4) In the rest of this chapter, we study the problem of finding an optim al TSA of length L given by equation (4.4), for any given traffic m atrix T. We say th at a traffic m atrix T is of length L if T has an optimal TSA of length L. In the rest of this chapter, finding an optimal TSA refers to finding an optim al TSA in an SS/TDM A system with variable-bandwidth beams, unless otherwise mentioned. In the next section, we present a divide-and-conquer approach to find an op tim al TSA for any given traffic m atrix. In this approach, the problem of finding an optim al TSA for a traffic m atrix T is divided into two subproblems of finding 81 Level 1 2 3 4 Figure 4.2: General traffic network G t ,l x corresponding to an SS/TD M A system with variable-bandwidth beams. optim al TSAs for traffic matrices T\ and Ti such that T = T\ + Ti. Such a de composition technique forms the theoretical foundation based on which we present efficient sequential (Section 4.4) and parallel algorithms for the TSA problem. 4 .3 A R ecu rsiv e T S A A lg o rith m In this section, we present a TSA algorithm based on recursively decomposing the traffic m atrix. The idea is to divide any given traffic m atrix T of length L into two traffic m atrices 7\ and Ti such that: 1. T = Ti + Tj, and 2. L — Li + L2, where L\ and Li denote the lengths of traffic m atrices Ti and Ti, respectively. We use a network-flow model to obtain such a decomposition, for any 0 < L\ < L, in 0 ((M + N )3) sequential time. The first step of the decomposition algorithm consists in generating a graph model associated with the traffic m atrix T. We term this network a traffic network. Given a traffic m atrix T of length L, and a number L\ in the range 0 < Li < L, the 82 traffic network corresponding to T and L\ is obtained as follows: The traffic network is a directed graph that consists of a total of M + N + 2 nodes. Figure 4.2 illustrates the general structure of a traffic network. The nodes are arranged on four levels, with arcs directed from level i to level i + 1. Level 1 consists of the source node 5 and level 4 corresponds to the sink node t. The nodes in levels 2 and 3 with the associated arcs model the uplink and downlink beams, respectively, with their traffic requirements. More specifically, the nodes rj, r 2, ..., tm in level 2 represent the M rows of the traffic matrix; similarly, the nodes ci, C 2, -.., cjv in level 3 represent the N columns. An arc exists from node r t - to node Cj if and only if the entry £ * j in T is non-zero. Each node r,- has an incoming arc from the source node s. Similarly, each node cj has an outgoing arc to the sink node t. Finally, a backward arc (£, s') from the sink to the source is introduced to close the network. There are two quantities associated with each arc of the traffic network G t,l1‘ - the first quantity, known as capacity, indicates the maximum flow th at can pass through the arc and the second, known as lower bound, indicates the minimum am ount of flow that m ust pass through the arc. Table 4.1 lists the lower bounds and capacities assigned to the arcs in Gt,Li• Each arc (s,rj) has capacity a,L i; similarly, each arc (cj, t) has capacity /3jL4. The lower bound on arc (s, r 4 ) is Ri — ai(L — L \ ) if Ri exceeds cti(L — L\), and 0 otherwise; similarly, the lower bound on arc (cj, t) is Cj — 0j(L — L\) if Cj exceeds f3j(L — L j), and 0 otherwise. Every arc of the form (r*, Cj) has capacity \tijLi/L] and lower bound [tijL i/L \. The j arc (£, s) has capacity KL\. The lower bound on arc (£,.s) equals S — K (L — L\) if S, the sum of all entries in the traffic m atrix T, exceeds K{L — L \ ) and 0 otherwise. Figure 4.3 shows an example traffic-matrix T and the corresponding traffic I network. The TDM switching system of this example has six uplinks and five downlinks. Different param eters of this switching system are as follows: cti — 3, a 2 = 5, a 3 = 4, a 4 = 5, a 5 = 6, a 6 = 5, = 7, f3 2 = 4, /?3 = 8, fl4 = 6, j3 5 = 5, and K — 20. Using these param eters, it is easy to observe that the length of any optim al TSA for the given traffic m atrix T equals 10. The traffic network shown in the figure is for L\ = 5. 83 axe lower bound capacity ( s ,r s -) Ri — ai(L - L i), if Ri > ai(L — L t)\ 0, otherwise. d i L \ (ri, cj) [tijLil L\ IUj L i / L C j - M L - h ) , if Cj > f3j(L - Lr); 0, otherwise. PjLi (M ) S - K(L - Lx), if S > K (L - Tx); 0, otherwise. K L X Table 4.1: Lower bounds and capacities of the arcs in the network Gt,Li • A circulation in the traffic network is an assignment of numbers (flows) to the arcs such th at (i) the number assigned to each arc is not smaller than its lower- bound and not greater than its capacity; and (ii) the flow is conserved at each node, th at is, for each node, the sum of the flows assigned to incoming arcs is equal to the sum of the flows on outgoing arcs. The value of the circulation is the flow assigned to the arc (<, s) by the circulation. The lower bounds and capacities on the arcs of the traffic network Gt,Li have been chosen with the aim of finding a circulation th at corresponds to a traffic m atrix T\ of length L \ . The traffic m atrix Ti has the same structure as the original traffic m atrix T. Hence, in Ti, the sum of all entries should be at most equal to K L\. This is achieved by assigning a capacity of K L \ to the arc (t , 5). At the same time, the total traffic in the residual m atrix T2 = T — T\ should not exceed K (L — L \ ) so th at Ti can be scheduled in at most K (L — L\) time-slots. The lower bound of S — K (L — L\) on the arc (t,s ) achieves this objective. The sum of all entries in row i of T\ should not exceed to ensure th at the length of T\ does not exceed L\. This is achieved by assigning a capacity of ctiL\ to the arc (s, r,). For the same reason, a capacity of (3jL\ is placed on arc (cj, t). Finally, the sum of all entries in row i of Ti should not exceed a-i(L — L x) to ensure that the length of Ti does not exceed L — L\. Placing a lower bound of Ri — ai(L — L\) on arc (s, r,-) achieves this objective for the row-sum; a lower bound of Cj — ftj{L — L\) on arc (cj,t) serves a similar purpose for the column-sum. 84 16 13 4 15 19 11 10 23 25 31 6 20 7 Figure 4.3: A traffic m atrix T and the corresponding traffic network G t ,l x for L x = 5. Any circulation in the traffic network Gt,Li corresponds to a traffic m atrix T\ of length L\. The entry at the z'th row and j th column of Ti is simply the flow assigned to the arc (r{,Cj) by the circulation. Therefore, the following algorithm can be used to decompose a traffic m atrix T of length L into two traffic m atrices Ti and T-i of lengths L\ and L 2, respectively, such th at T = T\ + T2 and L = L\ + T2. Decom position Algorithm(T, L \ ) 1. Generate the traffic network G t ,l x corresponding to T and L\. 2. Find a circulation in the traffic network and generate the corresponding traffic m atrix T\. 3. T2 <- T - T \ . 85 Figure 4.4: A circulation in the network of Figure 4.3. The following lemma states the existence of a circulation in the traffic network Gt,L\ • The proof of this lemma is so similar to that of Lemma 2 th at it is om itted. L e m m a 4 Let T be a traffic matrix of length L and let L\ be any integer such that 0 < L\ < L. Further, let G t ,l x be the traffic network corresponding to the matrix T and L i . A circulation that satisfies the lower-bound and capacity constraints exists in G t ,l x ■ Once a circulation is found in the traffic network Gt,lX i the corresponding m atrix Ti is constructed by setting its ijth entry equal to the flow-value assigned j to the arc (rt -, c?) by the circulation. Next we prove th at the traffic m atrix Ti constructed by this m ethod has a length of L\ and the traffic m atrix T — Ti has a length of L — L \. L e m m a 5 Let T be a traffic matrix of length L and let G t ,l x be the traffic network for some L\ in the range 0 < L\ < L. If T\ is the traffic matrix constructed from any circulation in G t ,l x > then T\ is of length L\ and T — T\ is of length L — L\. P ro o f. Any circulation in G t ,l x must satisfy the upper bound on each of its arcs. i Thus, in T\, the sum of all entries is at most KL\\ the sum of all entries in row 86 of i (column j) of T\ is at most <*iLi (fljLi). Hence, T\ is a valid traffic m atrix of length L\. To show th at Ti = T — T\ is a valid traffic m atrix of length L — L\, consider the lower bounds on each arc of Gt,li- Since the flow through arc (t, s) in Gt,lx should be at least equal to S — K (L — T i), the sum of all entries in T — T\ can not exceed K (L — L \ ). The lower bound of Ri — cxi(L — L\) on arc (s , ri) forces the sum of all entries in row i of Ti to be at most a,-(L — L\). Similarly, the sum of all entries in column j can not exceed flj(L — L\). Therefore, by equations (4.1)-(4.3), T2 is a valid traffic m atrix of length L — L\. This concludes the proof of the lemma. ■ As an example, Figure 4.4 shows a circulation of value 100 in the network of Figure 4.3. The corresponding traffic m atrix Ti is given by / 8 0 6 0 0 \ 2 0 8 0 0 10 0 0 5 5 0 11 13 0 0 15 0 0 3 0 \ 0 0 10 0 4 / and has a length of 5. The following theorem proves the correctness of the decomposition algorithm and estim ates its time-complexity. (T h e o re m 10 The decomposition algorithm correctly divides any traffic matrix T of length L into two traffic matrices T\ and T2 of lengths L\ and Li, respectively, such that T = T\ + T i and L = L\ + Li. Further, it runs in 0 ((M + N )3) time. P ro o f. Correctness of the decomposition algorithm follows from lemmas 4 and 5. To estim ate the time-complexity of the algorithm, observe that the num ber of nodes in the traffic m atrix equals M + N + 2 and the num ber of edges equals O(M N). Thus, construction of the traffic m atrix, which constitutes Step 1 of the algorithm , can be achieved in 0 ( M N ) time. Finding a circulation through the traffic network can be achieved in 0 ((M + N )3) tim e using one of the several max-flow algorithms in the literature [32, 50] (for a brief survey, see [4]). Once a circulation is found, generating the traffic m atrix T\ takes O (M N) time. Therefore, 87 Step 2 of the decomposition algorithm requires 0 ((M + N )3) time. Step 3 of the algorithm , which computes the difference between two M x N m atrices, requires O (M N ) time. Hence, the decomposition algorithm runs in 0 (( M + iV)3) time. This concludes the proof of the theorem. ■ Now we are in a position to formulate the recursive sequential algorithm that finds an optim al TSA for any traffic m atrix T of length L. Recursive TSA Algorithm (T „ L) 1. If L = 1, stop (Comment: T is a switching m atrix). 2. Execute Decomposition Algorithm(T, \L f 2]). Let Ti and Ti be the traffic matrices generated as a result of the decomposition, of lengths \L( 2] and [Lf 2 J respectively. 3. Recursive TSA Algorithm (Ti, |’X/2"|). 4. Recursive TSA Algorithm (T2, [Lj2J). Observe that every invocation of the decomposition algorithm divides the origi nal traffic m atrix T into two traffic matrices of approximately equal lengths. Hence, every traffic m atrix of length greater than one gets subdivided into several traffic m atrices, each of length one, owing to the recursive nature of the above algorithm. Thus the recursive TSA algorithm generates an optimal TSA in a finite num ber of steps for any given traffic m atrix. To compute the running-tim e of the algorithm, observe th at the first invocation of the algorithm with a traffic m atrix T of length L generates two subproblems: (i) finding an optimal TSA of length \L/2\ for Ti, and (ii) finding an optim al TSA of length \_Lf2\ for T% . The am ount of tim e taken to obtain such a decomposition equals 0 ((M + N )3). Hence, the total running tim e I of the algorithm on any traffic m atrix of length L is given by 0 (L (M + A^)3). Note th at this recursive TSA algorithm runs in time that is proportional to the length of the optim al TSA of the traffic m atrix. In other words, the running-tim e of the algorithm depends on the entries in the traffic m atrix. Furthermore, the num ber of (a, (3) switching matrices generated by the recursive TSA algorithm equals T, the length of the traffic m atrix. Hence, in general, this algorithm is inferior to the 88 algorithm presented by Gopal, et al. [20]. The latter algorithm has the following features: 1. The worst-case running-time of the algorithm is independent of the entries in the input traffic-matrix. 2. The num ber of (a, 0) switching matrices generated by the algorithm is upper bounded by (Sp + S a ) 2 - SpSa - K(Sp + Sa), where Sa = YliZj4 ai, Sp = J2J jZi 0j, and K is the traffic handling capacity of the satellite. This is again independent of the entries in the traffic m atrix. The recursive TSA algorithm dose not possess these attributes and is therefore unattractive for a sequential implementation. The decomposition approach, how ever, is ideally suited to a parallel implementation. For sequential im plem entation, we take a slightly different approach which results in both the running tim e and the num ber of switching matrices generated, in the worst case, to be independent of the entries in the traffic m atrix. This sequential algorithm is the topic of the next section. 4 .4 A n E fficien t S eq u en tia l T S A A lg o r ith m Before we present a sequential algorithm th at is faster than the algorithm in [20], let us consider a special case of the decomposition algorithm, presented in the previous section, with param eter L x — 1. In this special case, the decomposition algorithm divides the given traffic m atrix T of length L into two traffic m atrices T' and T", of lengths 1 and L — 1, respectively. Note that T' is an (a, 0) switching m atrix and hence requires no further decomposition. We can then decompose T" further into two traffic matrices, one an «/?SM and the other a traffic m atrix of length L — 2. By repeating this process, we obtain the iterative sequential algorithm presented below. First Iterative TSA Algorithm (T ) (Comment: Generates a0 SM’s Si, S 2 , • • •, S l , where L is the length of T.) 89 1. * < -1 . 2. If T is of unit length, then set T ,- * — T and stop. (Comment: T is an (a, (3) switching m atrix.) 3. Perform Decomposition Algorithm(T, 1). Let T ‘ be the traffic m atrix of unit length that is generated by the decomposition algorithm. 4. Si < — T i * — i + 1. (Comment: T' is an ct/?SM.) 5. T T — T 1 ; goto Step 2. It is easy to observe that each iteration of the above algorithm extracts exactly one c*/?SM and hence the algorithm in all requires L iterations, where L is the length of the traffic m atrix T. Thus the algorithm runs in tim e 0 (L (M + AQ3), which is same as the time-complexity of the recursive TSA algorithm presented in the previous section. The first iterative TSA algorithm is inefficient owing to the fact th a t the num ber of iterations required by this algorithm is equal to the length of the traffic m atrix. Hence, one way to improve this algorithm is by reducing the num ber of iterations. T hat is, efficiency can be achieved by extracting as many a ^ S M ’s as possible in each iteration. Observe th at Step 3 of the first iterative algorithm generates T \ an a/3SM. Now we present a technique to extract as many switching matrices th at are identical to T' as possible in every iteration of the algorithm. In other I words, our goal is to find the largest possible multiplicative constant, say r, for the switching m atrix T' and consider the m atrix T — rT' instead of T — T' for the next iteration; this has the effect of performing the work equivalent to r iterations in a single iteration. To find the multiplicative constant r, we simply consider the ratios L tifj for all non-zero and select the minimum among them . This modified algorithm is given below. Second Iterative TSA Algorithm (T) (Comment: Generates a (3 SM’s S i ,. . . , Sp, and p positive constants L \ , . . . , Lp such th a t T — J2i=t LiS{.) 1. * < - 1. 90 2. If all entries of T are zeroes, then stop. | i I 3. Perform Decomposition Algorithm(T, 1). Let T' be the traffic m atrix of unit 1 t length that is generated by the decomposition algorithm. 4. r *- mmilUj/t'ijl | t\j > 0}. 5. Si < — T'; Li 4 — r; i < — i + 1. t t 6. Set T < — T — rT'; goto Step 2. j Although the second iterative TSA algorithm appears to be correct, it has a prob lem . If we choose r as shown in Step 4 of the algorithm, the m atrix T — rT' may jhave a length greater than L — r. This is because our choice of r does not enforce the constraints of equations (4.1)-(4.3) on the residual m atrix T — r T Therefore, l jthe algorithm does not always yield an optim al TSA. This problem, however, does ,not arise if the original traffic m atrix T satisfies the equations (4.1)-(4.3) with the inequalities in them changed to equalities. Therefore, the problem can be remedied ; by adding dummy entries to T such that the following constraints are satisfied. j I f I i I s = tij = KL-, (4.5) R % = HjLi U,j = Oi-L, for 1 < i < M ; (4.6) Cj = T,iii Us = PjL, for 1 < j < N . (4.7) i i I | A system atic procedure to add dummy traffic to any given traffic m atrix T is j p described in [20]. Given any M x N traffic m atrix T, this procedure generates an (M + 1) x (N + 1) traffic m atrix T from T by appending an extra row and an j extra column to T. For the new row, the value of the param eter on is chosen as &M+1 = Sp — K; and for the new column, / 3n + i = Sa — K. Dummy traffic is then ! added to T so that it satisfies the following conditions: " i I 1. Ri = OiL, for 1 < i < M + 1; and i i I j 2. Cj = PjL, for 1 < j < N + 1. Hence, the sum of all entries in the (M + 1) x (N + 1) traffic m atrix T is given by i j (Sa + S /3 — K)L, and the sum of all entries in the M x N subm atrix spanned by ; the first M rows and the first N columns of T equals KL. Note th at the length of , A I T is the same as that of T. The procedure to add dummy traffic requires 0 ( M N ) i time. Hereafter, we assume th at the preprocessing step of adding dum m y traffic has been performed on the traffic m atrix and the input to the second iterative TSA i algorithm is the modified (M + 1) x (N + 1) m atrix. f The correctness of the second iterative TSA algorithm follows from the correct ness of the decomposition algorithm and the observation th at the m atrix T — rT 1 (Step 6) is of length L — r, where L is the length of m atrix T. The next im portant l question th at we would like to answer is: Does the num ber of iterations required j iby the second iterative TSA algorithm grow with the length of the traffic m atrix? j :In other words, can we show an upper bound on the num ber of iterations that is \ independent of the entries in the traffic m atrix? The following theorem answers j j this question in the affirmative. | T h e o re m 11 The second iterative sequential algorithm requires at most m in(L, :M N ( 1 + a )) iterations, where a = m ax{o1, . . . , ftji/, /?i,. . . Sa — K ,Sp — K}, I and runs in 0 ((M + N ) 3 min(L, M N a)) time. i i iP ro o f. As before, let Sa = a i and Sp = Y%Zi fa- As mentioned before, ; we assume that the second iterative TSA algorithm receives traffic m atrices T, of j dimensions (M + 1) X (N + 1), such that they satisfy the following conditions: j 1. a M+i = Sp - K and 0 N + 1 = Sa - K. j i I 2. Ri = aiL, for 1 < i < M + 1 and Cj = 0jL, for 1 < j < N + 1. I Let a equal m a x { o i,. . . , «m , /?i, . . . , ^ n , Sa — K, Sp — K}. We first prove the upper ! bound on num ber of iterations. Let t\ -, for 1 < i < M and 1 < j < N denote the elements of T at the beginning of any iteration (Step 2 of the algorithm) and let j tf ■ denote the elements of T at the end of the same iteration (at the end of Step 6 J I ,of the algorithm). Let • denote the elements of m atrix T' in the same iteration. ! I | 'A t the end of the iteration, at least one of the following two statem ents hold: ] ’C o n d itio n 1: There exist i,j such that t}j > a and tfj < a. I i C o n d itio n 2: There exist i,j such that tfj < t}j < a. J |To prove this claim, let us first assume that there exist i,j such th at t}j < a and > t'i, > 0. In this case, if, < t) , • < a and hence, condition 2 above is satisfied. Next, %yj 7 * i J — 7 7 let us assume the complement of the previous case. T hat is, for all i,j such th at t' ■ • > 0 assume th at t\ - > a. In this case, at least one pair of values for i,j exists such th at > a, t'{ - > 0, and r = \t\jft\ jJ. Therefore, I fh = lh ~ rtij = * 1 j ~ 1 Ajlt'uKi < < a, and condition 1 is met. | The above conditions provide an upper-bound on the num ber of iterations in i | the following manner. They imply that, in every iteration of the algorithm , at least j : one of the following events takes place on the elements of the traffic m atrix T . 1 i_ 'E v e n t 1: A positive element in T , whose value is at most a at the beginning of the iteration, becomes smaller by at least 1. i I E v e n t 2 : An element with value greater than a at the beginning of the iteration j gets reduced to a value th at is less than or equal to a. t i Now, let us consider element tp > q in the initial traffic m atrix T . Before this element i becomes zero, it is subject to Event 1 at most a times and to Event 2 at most once. i 'Further, in a particular iteration, there is at least one element in T th at is affected 'by one of the events. Thus, the total number of iterations is upper-bounded by i I the product of (1 + a) with the number of elements in T. Further, the num ber I I ! | of iterations is trivially upper-bounded by L, the length of the traffic m atrix T. In each iteration 0 ((M + N )3) time is spent; thus the total running tim e of the jsecond iterative TSA algorithm equals 0 ((M + N ) 3 m in(L,M Na)). ■ j ! It should be noted that the above upper-bound on the num ber of iterations ! i . . . . J , is very crude, for it assumes th at in every iteration only one of the elements of ' | the m atrix is scheduled. In practice, the number of iterations, as well as the total running-tim e of the algorithm, can be expected to be much lower. In comparison, the algorithm in [20] requires O(S^Sf) min(T, (Sa + Sp)2))). If we assume th at j M — N, ai = fa = 7 for all i, K = O (M j), and L > MN'y, our algorithm requires j , 0 ( M 5 7 ) tim e whereas the algorithm in [20] requires 0 (M 57 5) time. 1 The decomposition algorithm presented in Section 4.3 can be used to develop parallel algorithms for finding an optim al TSA in SS/TDM A switching systems with variable-bandwidth beams. These algorithms are based on recursive decom position of the traffic m atrix into two matrices, each of length approxim ately half of the original m atrix, and are very similar to the parallel algorithms presented in Section 3.4 for the hierarchical switching systems. Hence, a parallel algorithm , similar to the one presented in Section 3.4.1, runs in 0 ((M + N ) 3 logL) tim e on .the EREW PRAM model of computation using exactly [L/2\ processors. The ! running tim e of this algorithm can be improved to 0 ((M + N ) 2 log(M -f N ) log L ) jby increasing the number of processors to (M + N)[L/2j processors. We can gen- jeralize this algorithm, using the techniques presented in Section 3.4.2, to P < L/2 processors, resulting in an algorithm with time-complexity 0 ((M + N ) 3 log P + (M + N )z m in(L/P, M Na)). The generalized algorithm can be implemented on a i P-processor hypercube system, using the techniques presented in Section 3.4.3. I The TSA algorithms proposed so far do not take into account the dependency ‘in the traffic demands between two consecutive frames. Instead, these algorithms j j re-com pute the entire TSA for each frame. In many cases, however, the traffic | demands change very little from frame to frame, and much effort is wasted in re- ! computing the entire TSA for each frame. In such a situation, it is far more efficient | to obtain an optim al TSA for the traffic demands in the current frame by suitably l changing the known, optimal TSA for the previous frame. Any algorithm that computes the time-slot assignment by making incremental changes to a previous I assignment to accommodate the changes in traffic is called an incremental TSA 'algorithm . In the next chapter, we develop incremental TSA algorithms for TDM I switching systems. 94 C h a p ter 5 jAn In crem en ta l A lg o rith m for T im e -S lo t A ssig n m e n t in T D M S w itch in g S y ste m s i i i j ! 1 5.1 In tr o d u c tio n ^ i ! I . . I (In this chapter, we present an incremental algorithm for constructing optim al time- | I ^ I islot assignments for the hierarchical switching systems. Our increm ental TSA al- j \ gorithm for the class of hierarchical switching systems is based on two fundam ental ideas: j 1. The TSA problem in a hierarchical switching system can be transform ed into I an equivalent TSA problem in a simple TDM switching system. Although , j this transform ation, by itself, does not result in an improved TSA algorithm . for the former, it results in a much simpler formulation of the problem and 1 enables application of our incremental approach. j 2. A correspondence exists between the problem of computing an increm ental ] ; TSA in a simple TDM switching system and that of performing rearrange- j m ents of connections in a three-stage Clos rearrangeable network. This cor- j respondence enables us to formulate a simple incremental TSA algorithm ’ by considering changes in traffic requirements between frames in the TDM t j system as equivalent to connections to be rearranged in the Clos network. t I . . . . . | iThus, our incremental TSA algorithm for hierarchical switching systems consists of j 1 l I two steps: (i) transforming the problem to a TSA problem in a simple TDM system, j [and (ii) applying the incremental TSA algorithm to the resulting simple TDM ! j system. Obviously, the second step by itself provides an incremental TSA algorithm jfor the class of SS/TDM A systems. The rest of this chapter is organized as follows. In Section 5.2, we describe how the problem of finding an optimal TSA in an HSS can be transform ed into that of finding an optim al TSA in a simple TDM switching system. This transform ation is achieved by adding dummy rows and columns to the traffic m atrix corresponding to the hierarchical switching system. In Section 5.3, we present the incremental algorithm for finding an optimal tim e slot assignment in a simple TDM switching system. The algorithm exploits the correspondence between the increm ental TSA | problem and the rearrangement problem in a Clos three-stage network. If c is the ! num ber of changes in the traffic demands between two consecutive frames, and N I is the size of the central switch, the algorithm runs in 0 (cN) time. | t 5.2 E q u iv a len ce o f th e T S A P r o b le m s i In this section, we describe a systematic procedure to transform a traffic m atrix corresponding to an HSS into one corresponding to an equivalent simple TDM i | switching system. This enables the TSA problem in an HSS to be solved by a TSA ; algorithm for simple TDM switching systems. ! , I Let T be a traffic m atrix of size M x M to be scheduled on an HSS shown 1 in Figure 3.1, and let denote the (?, j) th entry of T. The size of the central , switch in the HSS is N x N. The traffic m atrix is assumed to satisfy the following i ■ constraints. j M M I ' Total-traffic constraint: S = E E *<j = NL; (5.1) I < L , for 1 < i < M ; (5.2) < L, for 1 < j < M; (5.3) M ultiplexer constraint: Ur = E E ' « = K rL, for 1 < r < /;(5.4) ! i € l r j = 1 t'=l j=1 M Row constraint: Ri j=l M Column constraint: Cj = 2 = 1 M I 96 | M I ; Demultiplexer constraint: Va = ^ ^ tij = K'SL , for 1 < s < g{5.5) i j *’ = i j e o s i ; i The prim ary distinction between the TSA problem in a hierarchical switch ing system and th at in a simple TDM switching system is the presence of the scheduling constraints due to the multiplexer and demultiplexer stages, as given by equations (5.4) and (5.5). Because of these constraints, it is not possible, in general, to add dummy entries to the traffic m atrix to take up the slack in equa tio n s (5.2) and (5.3). This difficulty, however, can be overcome by adding dummy rows and columns to the traffic m atrix. To explain the addition of the new rows and columns in T, let us consider the first row-group ii, corresponding to multiplexer 1 in the HSS. This multiplexer has Pi inputs and K\ outputs. The sum of all entries in this row-group is K\L. Our objective is to make each row-sum equal to L. To achieve this, we append ' (Pi — K\) columns to T and add dummy traffic entries to the subm atrix formed by the the rows in /i and the new columns. The dummy traffic entries should be assigned such that the following conditions are satisfied: : 1. Sum of entries in each of the Pi rows of row-group Ii is L. I 2. Sum of the added entries in each new column is L. I I We will later present an algorithm to assign the dummy traffic to satisfy these \ i conditions. This construction is equivalent to adding new outputs to m ultiplexer 1 j i in the system to equal the number of inputs, and increasing the size of the switch 1 by (Pi — Ki) to accommodate these new outputs. The output side of the switch is ; correspondingly enlarged to have (Pi —K \ ) additional outputs, which are connected i to a dummy demultiplexer of size (Px — K x) x (Pi — K\). • We can repeat the above procedure with each multiplexer-group in T , adding ; (Pj — if,) new columns for group The total number of columns added is J2i=i(Pi~ I Ki) — M — N. This results in a (2M — N ) x M m atrix in which each row-sum is L. A similar procedure can now be performed on the column-groups of T , by adding . (M — N ) new rows and assigning dummy traffic such that every column sums to L. This summarizes our approach. We now present the details of the transform ation. ; Our final goal is to obtain a (2M — N ) x (2M — N ) traffic m atrix T from the given traffic m atrix T such that the sum of elements in every row and every , column of T equals L. This is achieved by adding (M — N) additional rows and the same num ber of columns to the traffic m atrix T. We add dummy traffic entries to these (M — N ) additional rows and columns such th at the slack in equations (5.2) and (5.3) is absorbed. To construct the new traffic m atrix T , We start by initializing its subm atrix corresponding to the first M rows and columns to the given traffic m atrix T ; and the remaining entries to zero. T hat is, J tij, for 1 < i j < M; Ti,j " \ 0 , for M + 1 < i,j < 2 M - N; { } where r,tj denotes the ( i,j)th entry of T. Let us num ber the rows and columns of T from 1 through (2M — N ). Let Ri denote the sum of all entries in row i of the m atrix T , and Cj the sum of all I entries in column j . Let R i and Cj denote the corresponding values for the original m atrix T. Further, let S denote the sum of all entries in T (the total traffic), and S the corresponding sum in T . Initially, S, Ri and Cj are given by for 1 < i < M; for 1 < j < M\ Ri = Cj = 0, for M + 1 < i,j < 2 M - N . I I 1 Of the (M — N) new columns in T , the first (Pi — K\) are used to absorb the slack in row-group I\ (the rows corresponding to multiplexer 1). These columns are said to be generated by row-group R. Thus, columns (M + 1 + YfjZi ' 1 (Pj ~ Kj)) through M + Y%=\(Pj ~ Kj) 3X 6 generated by the row-group for every ! < « < / . We now present an algorithm to add dummy traffic to the subm atrix spanned by the rows in row-group /, and the new columns generated by A total of (Pi~Ki)L I dummy traffic is added to this subm atrix in such a way th at each row-sum in row- !group I{ becomes L, and each column-sum in the subm atrix becomes L. The S' = S; Ri = Ri, c 3 = c h C3 = 0, 98 " 1 j working of this algorithm can be explained with respect to the row-group 1% . A ! i total of (Pi — K\)L dummy traffic needs to be assigned to the subm atrix spanned by rows 1 through Pi and columns M + 1 through M + Pi — K\. We start at the upper left-hand corner of this subm atrix, and assign the m axim um possible traffic such th at neither the row-sum nor the column-sum corresponding to this entry exceeds L. Clearly, the maximum value that can be assigned is L — R\. This makes the row-sum R\ for row 1 equal to L and the column-sum C m +i for column M -f 1 equal to L — P i. We now proceed to the second row and assign ! • / A • I a value of min(L — R<i,L — C m +i) to the entry at the intersection of row 2 and column (M + 1) of T. Thus, we proceed through the subm atrix by moving to the 1 next row when the current row-sum reaches L and moving to the next column when ■the current column-sum reaches L. On completion of this procedure, each row in the group /i attains a row-sum of L. Further, the total dummy traffic assigned j to each of the columns in the subm atrix is also L , and the total dum m y traffic (assigned to the subm atrix is (Pi — K\)L. The above procedure can be applied independently to each of the row-groups in ,T . Algorithm row-fill in Figure 5.1 presents the general algorithm to add dummy traffic to the columns generated by a row-group. In this algorithm , x and y are .the pointers to the current row and current column, respectively, of m atrix T. J These are initially set to point to the entry at the upper left-hand com er of the i I subm atrix under consideration. Variable a indicates the am ount of dum my traffic i rem aining to be assigned. Step 5 initializes the amount of dummy traffic available to (Pi~ Ki)L. Step 6 computes the maximum amount of traffic th at can be inserted at the current position without causing the corresponding row-sum or column-sum i to exceed L; and assigns it. Step 7 updates the row-sum, column-sum and available j •dummy traffic to reflect the above assignment. Steps 8 and 9 advance the row- j i and column-pointers, respectively. The row-pointer is moved when no more traffic 1 can be added to the current row, and the column-pointer is moved when no more Traffic can be assigned to the current column. The algorithm term inates when all ithe (Pi — Ki)L dummy traffic has been added to the subm atrix. I Before we discuss the correctness of the row-fill algorithm, we illustrate its I operation with an example. Figure 5.2(a) shows a traffic m atrix T to be scheduled ! 99 Algorithm Row-Fill(i) A (Adds dummy traffic to the columns in X generated by row-group /;. R x and Cy denote the sum of entries in row x and column y, respectively, of r.) (First find the boundaries of the subm atrix for adding traffic.) 1. if i = 1 th e n first-row * — 1 and first-column < — M + I 2. else first-row < — 1 + Pj an<3 first-column * — M + 1 + (Pj — Kj). 3. x * — first-row; y * — first-column (x and y mark the current row and column, respectively.) 4. total-traffic < — (Pi — Ki)L (total traffic to be added.) 5. a < — total-traffic (traffic remaining to be assigned to the subm atrix.) (Find the maximum traffic for the current position in the sub m atrix and assign it.) 6* Tx> y < — min(L — R x, L — Cy). (U pdate row- and column-sums and the remaining traffic to be assigned.) A A A A I 7 . R x * R'X H ” Tx,yy d- ^ ^ ^ (Advance current row- and column-pointers.) A 8. if R x = L th e n x < — x + 1 9. if Cy = L th e n y < — y + 1 (Check for term ination) 10. if a > 0 g o to Step 6; otherwise stop. Figure 5.1: Algorithm row-fill. 100 L MUX 1< MUX 2< DEMUX 1 DEMUX 2 (a) I MUX DEMUX CENTRAL SWITCH MUX DEMUX 8 (b) Figure 5.2: (a) A traffic matrix; (b) the corresponding HSS. on the hierarchical switching system shown in Figure 5.2(b). The system has two I i j identical 4 x 2 multiplexers and two identical 2 x 4 demultiplexers. The length of this traffic m atrix is L = 15. Each row-group in T consists of four contiguous Tows and each column-group consists of four contiguous columns. The result of 'applying the row-fill algorithm is shown in Figure 5.3(a), where the shaded areas jmark the submatrices corresponding to the two row-groups. Note th at each row in 'th e m atrix of Figure 5.3(a) sums to 15 and each of the new columns sums to 15. I To prove the correctness of the algorithm, we must prove th at it term inates with (i) each row-sum in row-group i equal to L, and (ii) the sum of added traffic in each of the (Pj — Ki) columns in the subm atrix also equal to L. First observe th at the algorithm visits each row and each column of the subm atrix at least once DEMUX 1 DEMUX 2 M U X I S 6 2 7 9 0 2 1 % 3 2 1 i i l 4 2 1 9 4 3 X 5 5 : S- 1 6 P.W A V .J (a) 6 2 7 9 6 2 1 9 3 2 1 12 4 2 9 4 3 6 2 5 5 5 1 6 8 ! W & 4 « - ¥ V 4 j (b) I i Figure 5.3: Results of the row-fill (a) and column-fill (b) algorithms on the traffic m atrix of Figure 5.3. i during step 6. Step 6 makes sure that the traffic added does not cause any row-sum ; i jor any column-sum to exceed L. Further, the row-pointer is moved to the next row I (step 8) only when the current row-sum reaches L. Therefore, the only way the algorithm can fail is th at it reaches the last row of the subm atrix, say row k, and i finds th at the remaining traffic in a cannot be assigned to this row. This, in turn, can arise only in the event of one of the following: 1. The slack (L — Rk) is less than a. 2. The column-pointer y is pointing to the last column in the subm atrix, and the sum of the traffic already assigned to this column exceeds (L — a). Case 1 cannot be true because we started with a = (Pi — Ki)L, the total amount of j slack in the (Pt — Ki) rows of the submatrix. Because the num ber of columns in the |subm atrix is (Pi — Ki), case 2 can occur only if we assigned a traffic of more than I i j L to one of the previous columns. This is impossible by the way j is computed ' t i in step 6. This proves the correctness of the row-fill algorithm. : i ; To estim ate the running tim e of the row-fill algorithm, we need to find the j num ber of times the sequence of steps 6-10 is executed. It is easy to observe th at, j i . 1 'during each pass through this sequence, at least one of the pointers x and y is (incremented. The size of the subm atrix is Pi x (Pi — Ki). Hence, the sequence is executed at most (Pt + Pi — Ki) — 1 = (2Pi — Ki) — 1 times. The worst-case time-complexity for processing all the row-groups is therefore given by / I ^ 2 ( 2 P i - K i - l ) = 0 ( 2 M - N ) = 0(M ). (5.7) j t— i j | A similar algorithm can now be used to absorb the slack in the columns of T , by : | adding dummy traffic to the rows (M + 1) through (2M — N). The first (P[ — K() j i rows among them are used to remove the slack in the column-group 0 \ (the columns ; j corresponding to demultiplexer 1). These rows are said to be generated by column- j |group 1. Thus, rows (M + 1 + E ^ t ^ P j ~ K'j) through M + E]=i(Pj - A j) are I i generated by column-group 0 4 , for every 1 < i < g. Similar to the row-fill algorithm, we can now formulate an algorithm to add 1 dummy traffic to the subm atrix spanned by the columns in column-group O, and ' Jthe new rows generated by 0 ,. A total of (P( — K[)L dummy traffic is added " j to this subm atrix such in such a way th at each column-sum in column-group 0* I I becomes L, and each row-sum in the subm atrix becomes L. This procedure is , applied independently to each of the column-groups in T. Algorithm column-fill in Figure 5.4 presents the general algorithm to add dummy traffic to the rows generated by a column-group. | The correctness of the column-fill algorithm can be proved in a m anner similar to th at of the row-fill algorithm. Furthermore, its time-complexity for processing all the column-groups is 0(2M — N ), same as that of the row-fill algorithm. Now ! |consider the example traffic-matrix of Figure 5.3(a). The result of applying the !column-fill algorithm on its two column-groups is shown in Figure 5.3(b). The i .shaded areas correspond to the submatrices affected by the column-fill operations. ^ Note th at each column in the m atrix of Figure 5.3(b) now sums to 15, as does each row. This m atrix, therefore, satisfies the constraints of a traffic m atrix in a simple TDM switching system where the size of the central switch is 12 x 12. Now we can state the complete algorithm to transform a traffic m atrix T corre sponding to an HSS to a traffic m atrix T corresponding to a simple TDM switching system. Algorithm Transform 1. Add dummy traffic to T using Bonuccelli’s Filling Algorithm [8] such that the multiplexer- and demultiplexer-constraints (equations (5.4) and (5.5)) are satisfied without slack. 2. Initialize T as per equation (5.6). 3. for z = 1 to / do row-fill(i) 4. for i = 1 to g do column-fill(i) The filling operation in Step 1 takes 0 ( M 2) time [8]. Step 2 of the transform j algorithm takes 0((2M — N)2) time, while steps 3 and 4 each takes 0 (2 M — N) ■ !time. Therefore, the overall complexity of the algorithm is 0 ( M 2). 104 i 1 I t Algorithm Column-Fill(z) (Adds dummy traffic to the rows in T generated by column-group 0,-. R x and Cy denote the sum of entries in row x and column y, respectively, of T ) (First find the boundaries of the subm atrix for adding traffic.) 1. if i = 1 th e n first-column < — 1 and first-row < — M + 1 2 . else first-column * — Fj and first-row < — M -|-l-|-^ j“ j- 1(P j— K'j). 3. x < — first-row; y < — first-column (x and y m ark the current row and column, respectively.) 4. total-traffic < — (i^ — A^-)^ (total traffic to be added.) 5. a < — total-traffic (traffic remaining to be assigned to the subm atrix.) (Find the maximum traffic for the current position in the subm atrix and assign it.) 6 . tx> y * — m in(£ — R x, L — Cy). (U pdate row- and column-sums and the remaining traffic to be assigned.) A A A A 7. R *x ^ R x “ b 3 Cf y 4 C'y -} - ^ ^ y (Advance current column- and row-pointers.) 8 . if Cy — L th e n y * — y + 1 9. if R x = L th e n x < — x + 1 (Check for term ination) 10. if a > 0 g o to Step 6; otherwise stop. 1 Figure 5.4: Algorithm column-fill. [ The motivation behind converting a traffic m atrix T corresponding to a given ! HSS to a traffic m atrix T corresponding to a simple TDM system is th at a TSA for ; the former can be obtained in a trivial m anner from a TSA for the latter. Thus, let T = Si • Li + § 2 • L 2 -\------- \- Sm - L m (5.8) be an optim al TSA for T i n a simple TDM switching system of size (2 M — N ) x j (2M — N ). We now show th at an optim al TSA for the original traffic-m atrix T ; ton the HSS is obtained simply by considering each switching m atrix Sk above and , using its first M rows and columns. To show that the M x M subm atrix formed by the first M rows and columns of Sk in equation (5.8) is a valid switching m atrix for T , we first observe th at the [transformed m atrix T has the following properties: ! 5b* I I = L, for 1 < i , j < 2 M — N ; (5.9) Ti,j = for 1 < i , j < M ; (5.10) Ti,j = o, for M + 1 < i , j < 2M — N] (5.11) (2M - N ) (2M - N ) i= l j = 1 = (2M - - N )L; (5.12) i= 2 M — N j = M r* .i i= M + l j = 1 = (M - Ar)L; (5.13) i= M j= 2 M — N E E r* .i ! = 1 j = M + 1 = (M - N )L . (5.14) i Now let us consider any switching m atrix Sk in the decomposition of equation (5.8). ; Sk has exactly one 1 in each row and in each column, and the total traffic in it is , j (2M — N ). Therefore, we need to only show that when the last (M — N ) rows and (columns are removed from Sk, the total traffic in row-group is exactly K{ and 'th e total traffic in column-group Oj is exactly Kj, \ Let us consider the Pt rows corresponding to row-group /,• in Sk- The sum j of the traffic in these rows is Pt -. Now consider the (P8 — K i) columns generated ' by this row-group. The total traffic in these columns is (P8 — Ki). Note th at all ; -this traffic resides within the subm atrix spanned by the P, rows in row-group p , i < 1 i 106 ! land the (Pj — Ki) columns generated by the group. Therefore, when the last j A ' (M — N ) columns are removed from Sk, the total traffic in row-group 7j reduces to ! P i- ( P i- K i) = Ki. Similarly, we can show that the total traffic in column-group Oj i is K'- when the last (M — N) rows are removed from Sk. In summary, we have shown in this section th at, to find an optim al TSA for an M x M traffic m atrix corresponding to an HSS with an N x N central switch, it is sufficient to find an optim al TSA for a (2M — N ) x (2M — N ) traffic m atrix corresponding to a simple TDM system with a (2M — N) x (2M — N ) central switch. This allows any algorithm for the TSA problem in a simple TDM switching system to be applied to solve the TSA problem in an HSS. In the next section, we present an incremental algorithm to find an optimal TSA for a (2M — N ) x (2M — N) traffic m atrix corresponding to a simple TDM switching system. The results in this section allow the algorithm to be applied to the general class of TDM hierarchical switching systems. 5.3 In crem en ta l T S A A lg o rith m jln this section, we consider a simple TDM switching system with a central switch | of size (2M — N) x (2M — N) and develop an incremental algorithm for scheduling a given traffic m atrix T. Because of the result in the previous section, this algorithm is applicable to the general class of TDM hierarchical switching systems. For ■ convenience, we use the symbol M' in this section to denote (2M — N ). ! The incremental algorithm is based on the correspondence between the problem ^of computing an incremental TSA and the rearrangement problem in a Clos 3-stage 'network. To illustrate the correspondence between the two problems, consider the Clos network of Figure 5.5, where the middle-stage consists of L switches, each of size M ' x M ' . The outer stages consist of switches of size (L x L) and there are exactly M ' of them in each. The total num ber of input term inals (or output term inals) is therefore M'L. This is a rearrangeable network, th at is, any perm utation on the M 'L inputs can be realized on it [6]. Now let us consider a traffic m atrix T of size M ' x M ' in which each row-sum and each column-sum is L. Assume that the entry Tjj in T represents the number 107 M’ 1 Figure 5.5: Clos network model for the TSA problem, of term inals on input-switch i to be connected to output-switch j in the Clos i | network. Now we can use a perm utation-routing algorithm (for example, see [30]) I to find a setting of the middle-stage switches to realize this set of connections. The j ' routing algorithm computes the setting of each middle-stage switch for the given ’ < T . It is easy to observe that the setting of each middle-stage switch computed by ! : the routing algorithm specifies a switching m atrix for the traffic m atrix T on the ; TDM system. Therefore, any routing algorithm for the Clos network can be used j jto compute an optim al TSA for the corresponding simple TDM switching system. I 108 I We illustrate the above correspondence by an example. Consider a simple TDM system with six input and six output users, that is M ' — 6. For this system, let us consider the traffic m atrix T ( 1 0 0 0 4 ° \ 0 2 1 1 1 0 0 0 1 1 0 3 0 2 1 0 0 2 4 1 0 0 0 0 \ 0 0 2 3 0 0 / (5.15) which has a length of five. The corresponding 3-stage Clos network is illustrated in Figure 5.6. There are six 5 x 5 switches in each of the outer stages and five 6 x 6 switches in the middle stage. Now assume that the entry in T represents the num ber of term inals on input-switch i to be connected to the output-sw itch j , for each i and j . A perm utation-routing algorithm can be used to decompose T into five perm utation matrices. One such decomposition is T 1 0 0 0 0 0 N / 0 0 0 0 1 0 ^ 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 + 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 \ 0 0 1 0 0 0 0 0 0 1 o N f 0 0 0 0 1 0 ^ 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 + 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 ° J \ 0 0 0 1 0 + + i I L ___ 109 stage 2 stage 3 Figure 5.6: Clos network model for a traffic m atrix with M 1 — 6 and L = 5. 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 Each of the constituent matrices above specifies the setting of a middle-stage switch. The switch-settings corresponding to the above decomposition are marked in Figure 5.6. Observe that each of the matrices in equation (5.16) is a valid switching m atrix for the simple TDM system under consideration. Therefore, equa tion (5.16) provides an optim al TSA for the system. Now consider the incremental problem where a TSA needs to be computed for a traffic m atrix T ' by modifying a known TSA for traffic m atrix T . Assume 110 th at T and T ' are of the same length L. In view of the correspondence to the Clos network routing problem, this incremental TSA com putation can be seen to be identical to removing new connections from a known setting of the Clos network and inserting new ones. This is the well-known rearrangem ent problem in a Clos network, first studied by Pauli [39]. This correspondence between the incremental-TSA problem and the Clos-network rearrangem ent problem is utilized in our algorithm for incremental computation of the TSA. Let A be the difference m atrix given by A = T - T . Negative entries in A represent traffic to be deallocated; these correspond to con nections to be removed from the Clos network model. Similarly, positive entries correspond to new traffic to be allocated, th at is, new connections to be made in the Clos network model. Let c be the sum of positive entries in A (which is also equal, in magnitude, to the sum of negative entries). To find a TSA for the new traffic m atrix, we need to remove the c connections corresponding to the negative entries in A and introduce the same number of new connections. We now introduce an algorithm to perform this task. For convenience, this algorithm is given in two phases — the first phase performs the deallocation and the second performs the allocation. Before we describe the algorithm, we introduce some notations. We use 8 itj to denote the entry in the ith row and j th column of the difference m atrix A. The given TSA for T is assumed to be of the form " T = S\ + 1 S 2 4 - . • • + Si,., where Si are the individual switching matrices. Let denote the entry in the m th row and n th column of switching m atrix Si (1 < m ,n < M '). The first phase of the incremental algorithm is given as Algorithm deallocate in Figure 5.7. This algorithm first searches for a negative entry in the difference m atrix A. If such an entry 8 m % n exists, a switching m atrix must exist with a 1 in its m th row and nth column. Let Sj be one such m atrix. The algorithm then sets 111 this entry in Sj to zero to perform the deallocation, and increments < 5 m > n to update the A m atrix. This step is repeated until no negative entries rem ain in A. All the traffic to be removed have now been deallocated. Algorithm deallocate also maintains two data structures, arrays RZi and CZj, to keep track of the switching matrices affected by the deallocation. These switching m atrices correspond to middle-stage switches in the Clos network with vacant ter minals. These switches form the candidate switches for allocating new connections during the second phase of the incremental-TSA computation. On term ination of the deallocate algorithm, RZi (for each i in the range 1 < i < M') represents the set of switching matrices with all zero-entries in row i. Similarly, CZj represents the switching matrices with all zero-entries in column j. Obviously, u R Z i= u C Zj. l< i< M ' l <j< M ' The running tim e of the deallocate algorithm is determ ined by th at of the loop beginning on step 3, which is executed c times. By keeping a list of negative entries of A , step 3.1 can be executed in constant time. Similarly, by m aintaining linked- lists of switching matrices with a ‘1’ in position i , j for each i and j , step 3.2 can be performed in constant time. Step 3.3 requires deletion of element j from the list of switching matrices that have a one in position m, n. By implementing the lists as doubly-linked lists, this can be achieved in constant time. Step 3.4 requires only constant time. Therefore, Algorithm deallocate requires only 0 (c ) time. For example, let us assume that the traffic m atrix T presented in equation (5.15) changed and the current traffic-demands are represented in a new traffic m atrix given by T = / 3 0 0 0 2 0> 1 0 1 0 3 0 0 0 1 1 0 3 0 1 1 1 0 2 1 4 0 0 0 0 u 0 2 3 0 0^ (5.17) 112 Algorithm deallocate (Given a TSA X = S\ + S 2 + . • • + S l, and a difference m atrix A, updates the switching matrices Si by removing traffic corresponding to negative entries of A.) (c is the num ber of negative entries in A.) 1. fo r i = 1 to M ‘ do R Zi < — 0 {RZi m aintains the set of switching matrices with all zero-entries in row si.) 2. fo r j = 1 to M* do C Z j «- 0. {C Zj m aintains the set of switching matrices with all zero entries in column J-) 3. fo r i = 1 to c do (c is the number of negative entries in A.) 3.1. Find m and n such that 6 min < 0 (Find a negative entry in A.) 3.2. Find j such that [Sj]mtn = 1 (Find a switching m atrix with a 1 in the corresponding position.) 3.3. [< S j]m ,n « — 0; 6 m,n * — Sm,n + 1 (Perform deallocation.) 3.4. R Z m < — R Z m U {j}; C Z n * — C Z n U { j} (U pdate R Zi and C Z j to include switching-matrix S j.) 4. e n d fo r Figure 5.7: Algorithm deallocate. Thus, the difference m atrix, A — T ' — T , is given by / 2 0 0 0 - 2 ° \ 1 - 2 0 - 1 2 0 0 0 0 0 0 0 0 - 1 0 1 0 0 - 3 3 0 0 0 0 \ 0 0 0 0 0 0 / 113 stage I stage 2 stage 3 Figure 5.8: Clos network model after removing connections corresponding to the A m atrix of equation (5.18). Algorithm deallocate removes all traffic corresponding to the negative entries in A. 1 I This is equivalent to disconnecting all connections in the Clos network th at corre spond to the negative entries in the difference m atrix. For example, corresponding to the negative entry — 2 at position (1,5) of the A m atrix in equation (5.18), two connections from input-switch 1 to output-switch 5 must be disconnected. The middle-stage switches in Figure 5.6 implementing such connections are 2, 3, 4 and 5. The entry at position (1,5) of A can be made zero by removing two of these connections, say from middle-stage switches 2 and 3. This corresponds to modifying the switching matrices S 2 and S3 by changing the entry in position (1,5) from 1 to 0. Repeating this process for each negative entry of A results in the re moval of a total of nine connections. Figure 5.8 shows the connections left in the Clos network model of Figure 5.6 at the completion of the deallocate phase. The corresponding values of the switching matrices S 1 - S 5 at this point are 114 5 X = 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 5 2 = 5 3 = fo 0 0 0 0 o \ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 / 54 = V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 Ss = 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 (5.19) The contents of the arrays R Zi and C Z 3 at the end of execution of the deallocate algorithm are R Z \ = {2,3}; R Z 2 = {2,3,5}; = 0 ; R Z 4 = {4}; R Z 5 = {2,3,4}; = 0 ; C Z X = {2,3,4}; <7Z2 = {2,3,4}; <7Z3 = 0; <7£4 = {5}; <7Z5 = {2,3}; C Z & = 0. The second phase of the incremental TSA algorithm, the allocate phase, consists in allocating the traffic corresponding to the positive entries in A. Given the position (i , j ) of a positive entry in the A m atrix, and the switching matrices left behind by the deallocate phase, the allocate phase m ust modify the switching matrices to satisfy the following: 115 1. The sum of entries in position ( i,j) of the switching matrices increases by 1. 2. The sum of entries in every other position remains unchanged. This corresponds to introducing a new connection in the Clos-network model be tween the input-switch i and the output-switch j. This operation is trivial if a switching-matrix Sp exists such that it has all zeroes in both row i and column j . The allocate operation then simply consists in setting the (*',j)th entry of Sp to 1. If such a switching m atrix does not exist, however, some of the existing connec tions m ust be modified. Let Sp be a switching-matrix with all zeroes in its row i. Similarly, let Sq be a switching-matrix with all zeroes in column j , with p ^ q. Such a p and q are guaranteed to exist after the deallocate phase. In the Clos network model, p and q are middle-stage switches such th at p has a free input-term inal and q has a free output terminal. Allocating the new connection to Sp is equivalent to the well-known rearrangement problem in the Clos network model, for which an algorithm was proposed by Pauli [39]. Our allocation algorithm uses the same idea used in Pauli’s algorithm, but we use a simpler formulation based on bipartite graphs. The rearrangem ent problem is elegantly modeled by the following graph- theoretic formulation. From switching matrices Sp and Sq, we construct a bipartite m ultigraph G = (X , Y , E ), where X and Y are the two sets of vertices and E is the set of edges. The vertex-set X represents the rows in the switching matrices and the vertex-set Y represents the columns. An edge exists from vertex x £ X to vertex y € Y if and only if the entry at position (x, y ) in either Sp or S q is a 1. The edges corresponding to Sp and Sq are distinguished in this model by coloring them distinctly, say red for Sp and green for S q. Observe th at the degree of each vertex in this graph model is at most two and no vertex has two incident edges with identical color. Conversely, given such a bicolored m ultigraph with the property th at every vertex has at most one edge of each color, the traffic represented by the m ultigraph can be allocated completely among two switching m atrices. Such an allocation is achieved simply by assigning entries corresponding to the red edges to one switching m atrix and those corresponding to the green edges to the other. It is easy to see that such an assignment does not allow more than a single 1 in each row and column of the switching matrices. 116 X Y 1 2 3 4 • § K ^ 5 6 Figure 5.9: Graph model G (X , Y ,E ) corresponding to switching m atrices S 4 and S 5 of equation (5.19). To illustrate the above graph model, Figure 5.9 shows the graph corresponding to switching matrices 64 and S 5 of equation 5.19. Edges corresponding to S 4 are shown by solid lines and those corresponding to S 5 are shown by broken lines. Allocating the new entry at position (i , j ) is equivalent to adding a new edge in the m ultigraph G between vertex i € X and j € Y . Since Sp has all zeroes in row i, and S q has all zeroes in column j , the vertices i € X and j € Y each has exactly one incident edge. In addition, the edge incident to vertex i € X is red in color and th at incident to j € Y is green. Therefore, when the new edge is introduced between vertices i € X and j € T , it cannot be assigned one of the colors without forcing one of the two vertices to have two incident edges of the same color. This problem is solved by modifying the coloring of some of the edges. To achieve this objective, we find a path (sequence of non-repeating vertices) P = VXV2 . . . V k in the m ultigraph with the following properties. 1. The start vertex iq is the vertex j € Y . 2. The ending vertex iq has degree one. 3. (uj,v,-+i) € E , for all 1 < i < k. 117 Such a path P is guaranteed to exist because there is at least one vertex other than j € Y with degree one and every vertex has a degree of at most two. Vertices Vi with i = 1 ,3 ,... along the path P belong to the set Y and vertices V { with i = 2 ,4 ,... belong to X . In addition, (i>i, v < i) is an edge with red color. Similarly, (ut -,Vi+i) is an edge with red color if i is odd and green otherwise. The new edge can be introduced by simply flipping the color of each edge along the path P . This recoloring ensures that the only edge th at was incident on vertex j 6 Y is now colored green. If we now introduce the new edge between vertex i E X and vertex j € Y and color it red, then every vertex in the m ultigraph would have at most one incident edge of each color. The modified switching matrices can be found by assigning entries corresponding to red edges in the modified graph to Sp and those corresponding to green edges to Sq. To illustrate the rearrangement process with an example, consider the m ulti graph in Figure 5.9. Assume that we need to allocate one unit of traffic at posi tion (4,4) of the A m atrix. This corresponds to introducing a new edge between vertex 4 on the left-hand side and vertex 4 on the right-hand side of the m ulti graph. Note th at the edges already incident to these vertices are of opposite color; so the new edge can not be added without recoloring the m ultigraph. It is easy to see that a path of alternating edge-colors exists in the m ultigraph starting from the vertex 4 E Y and ending at the vertex 2 € X . The edges along this path are labeled 0 , (2), (3) in Figure 5.9. Flipping the color of each of these three edges leaves vertices 4 € X and 4 G Y each with a single incident edge of green color. The new edge can therefore be introduced between them and colored red. The modified switching matrices after the introduction of the new edge are given by ( 0 0 0 0 1 0 ^ * 0 0 0 0 1 0 > 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 ; S 6 = 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 ° 0 1 0 0 ° J 1 ° 0 0 1 0 0 J 118 Algorithm allocate^', j) 1. Find Sp such that p € RZp, if column j of Sp has all zeroes, go to 13. 2. Find S q such that q € C Zj\ if row i of Sg has all zeroes goto 13. 3. n « — j; L I S T < — 0 (L I S T keeps track of the edges to be recolored). 4. Find row m such that [5p]mjn = 1; if no such m exists g oto 9. 5. L I S T + — L I S T U {(m ,n)} (add edge (m ,n) to the list). 6. Find column n such that [S'g]m,n = 1; if no such n exists g o to 10. 7. L I S T * — L I S T U {(m ,n)} (add edge (m ,n ) to the list).. 8. go to 4. (U pdate R Z and C Z arrays) 9. C Z n < — C Z n U {q} - {p}; goto 11. ! 10. R Z m * — R Z m U {p} — {<?}. (Now flip the color of each edge in L IS T ) 11. w h ile (L I S T ^ 0) do 11.1 Find an element {(m ,n)} in L I S T 11.2 [Sp}mtn < - » ■ [Sq\mtn (exchange entries between Sp and Sq) 11.3 L I S T < — L I S T — {(m ,n)} (remove edge from L IS T ). 12. end w h ile . 13. « — 1 (allocate new traffic entry). 14. R Z i * — RZi — {p}5 C Z j * — C Z j — {p} Figure 5.10: Algorithm allocate. 119 We can now present the complete algorithm for the allocate phase of the in crem ental TSA computation. A formal description of this algorithm is given as 'A lgorithm allocate in Figure 5.10. Given the position ( i,j) of a positive entry in A , the algorithm first finds a switching-matrix Sp such that all its entries in row i are zero; and a switching-matrix S q such that all entries in column j are zero. If p equals q, or if Sp has all zeroes in column j , then the new entry can be allocated to Sp without modifying its other entries. Similarly, if Sg has all zeroes in row i, | the new connection can be allocated to S q and no other changes need to be made. |If none of these conditions are m et, however, we need to perform reallocation of traffic entries between Sp and S q as explained in the preceding paragraphs. | The algorithm implicitly uses the graph model G (X , Y, E ) described earlier to i perform the reallocation. The loop beginning in step 4 of the algorithm follows i i the path P of alternating edge-colors until it term inates in a vertex of degree one. ; W hile traversing the path, an unordered list L I S T is used to m aintain a list of the edges traversed. Once the entire path is known, a separate loop, beginning in step 11, is used to flip the colors of the edges in L IS T . This flipping of colors is j achieved by exchanging the corresponding entries between Sp and S q. I j The reallocation of traffic also requires the data structures H Z and C Z to be updated. Steps 9 and 10 perform this function. Similarly, after the new entry is allocated to Sp, RZi and CZj are updated by removing Sp from each of them. To illustrate the allocate algorithm, consider the allocation of the positive en trie s of the A m atrix of equation (5.18) to the switching m atrices of equation (5.19). \ The traffic units at positions (1,1) can be allocated without any reallocation of ex isting traffic by inserting them in S 2 and S3. The entry at position (2,1), however, cannot now be allocated to any of the switching matrices without some realloca- ! tion. Because row 2 of S 2 and column 1 of 54 are all zero, reallocation can be [performed between S 2 and S 4 . Execution of algorithm allocate modifies 5 2 and S 4 as ' 0 0 0 0 1 0 ' 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 ; s 4 = 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 ° 0 1 0 0 1 ° 0 0 1 0 Proceeding similarly, all the positive entries in A can be allocated. The switching ! matrices at the completion of the allocation of all positive entries in A are given ;by Si = S 3 = (1 0 0 0 0 0 N 0 0 0 0 1 0 N 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 ; S 2 = 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 v ° 0 0 1 0 ° J 0 0 1 0 0 ° / (l 0 0 0 0 0 N / 0 0 0 0 1 0 N 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 ; S 4 = 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 ° 0 0 1 0 0 J \ 0 0 0 1 0 ° J S 5 = 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 ilt is easy to verify that these switching matrices represent a valid TSA for the traffic m atrix T ' of equation (5.17). 121 , To estim ate the time-complexity of Algorithm allocate, we note th at the switch- | i ing m atrix Sp can be found in step 1 in 0(1) time. Further, the test for all zeroes j jin column j of Sp can also be performed in 0(1) time. This is achieved by repre- ! senting each switching m atrix as an array. Element n of the array corresponding to the switching m atrix Sp equals m if [Sp}mtn is a 1 and 0 otherwise. Using a similar ■ , data structure, step 2 can also be performed in constant time. The loop consisting I of steps 4-8 correspond to finding a path in the bipartite m ultigraph with at most .2M edges. Hence, these steps together require 0 ( M ) time. Steps 9 and 10 can be implemented in 0 (M ) time using the doubly-linked list data structure for R Zi and CZj. Steps 11.1-11.3 can be implemented in constant time, and hence step 11 ! ^requires O (M ) time. Finally, Step 14 requires 0 ( M ) time. Thus, the overall tim e j 'complexity of Algorithm allocate is O (M ). Note that steps 4 and 6 of this algo- • Irithm require maintaining a list of switching matrices with a one in position (m, n). ; i I ;Step 11.2 must modify this list, and the modification can be performed in constant ; i ^im e using doubly-linked lists. Now we present the complete incremental-TSA algorithm using the two algo rithm s deallocate and allocate. Algorithm Incremental-TSA (Given a TSA for traffic m atrix X, computes a TSA for traffic m atrix T 7.) (Assumption: T and T ’ are of the same length, and every row and column in each m atrix sums to L.) 1. Compute the difference m atrix A — T ' — T 2. c a l l deallocate, (Disconnect connections corresponding to negative entries in A.) 3. fo r i = 1 to c do 3.1 Find m and n such that Smin > 0 (Find a positive entry in A.) 3.2 c a l l allocate(m, n ) 3.3 & m ,n * & m ,n 1 4. e n d for 1 Step 1 of Algorithm Incremental-TSA requires 0 ( M 2) time. Step 2, a call to Algorithm deallocate, requires 0 (c ) time. Steps 3.1 and 3.3 can be performed in constant time. Step 3.2, which calls Algorithm allocate once, requires 0 ( M ) j time. Hence, step 3 requires 0 (c M ) time. Therefore, the overall complexity of the j increm ental TSA algorithm is 0 ( M 2 -f cM ). t I I 123 i C h a p ter 6 j C o n clu d in g R em ark s i In this dissertation, we addressed the problem of assigning connections in various j interconnection networks. Algorithms for connection-assignment in faulty one- i sided crosspoint switching networks, time-division multiplexed hierarchical switch- t jing systems and SS/TDM A switching systems with variable bandwidth beams were 'developed. In Chapter 2, we studied the fault-tolerance capability of one-sided crosspoint networks. Two modes of operation were considered, nonblocking and ■ rearrangeable. We showed th at at most N /2 — 1 faulty crosspoints can be tolerated by an N x N /2 one-sided crosspoint network in the case of nonblocking operation. jWe also showed th at any distribution of these faults with no more than one fault ;per bus is nonblocking and presented an algorithm for allocating connections for f 'any such distribution. If rearrangeable operation is allowed, much larger fault-sets can be tolerated. We first showed a characterization of the general problem of realizing a connection set on a crosspoint network in the presence of arbitrary faults as a graph-m atching ; problem. We then defined two special distributions of faults which allow easy determ ination of rearrangeability. When the given fault set is modeled by one of these configurations, we formulated simple conditions to test the rearrangeability. In m any cases, the rearrangeability can be established by reducing the given fault- jset to one of these classes and checking for a simple set of sufficient conditions. [Further, these fault distributions also lend themselves to simple rearrangem ent [algorithms. The maximum size of a non-critical fault set is almost 25 percent of 'the total num ber of crosspoints for both classes of distributions. j As shown in Section 2.5, the analysis for one-sided crossbar networks can easily ! be extended to general multiple-bus interconnection networks. Nonblocking oper- : ation is not attainable in the presence of faults if the number of buses M is less !than N /2 even when the number of simultaneous connections requested is lim ited [ !to M . However, many faults can be tolerated in the rearrangeable mode by proper I assignment of the buses. The results in Chapter 2 are useful for tolerating manufacturing defects as well jas operational faults. They provide an exact measure of the connection capability i of a faulty network, that is whether the network is nonblocking or rearrangeable; i and in the latter case, the number of rearrangements needed. The actual criterion to be used for the criticality of a fault set depends on the amount of degradation th at can be allowed in the system. Our results are general enough to cover a wide range of such criteria. J In Chapter 3, we described parallel algorithms for finding an optim al TSA in |a class of TDM switching systems. Although the algorithms apply to the class of .hierarchical switching systems, they can also be adapted to other TDM switching configurations described in [22, 9, 20]. The algorithms are based on modelling the TSA problem as a network-flow problem. The general algorithm using P < L /2 processors has running time 0 ( M 3 log P -\-M 2 • min(iV, y /M )• m in(T/ P, M 2)), where j IM is the num ber of users, N is the size of the central switch, and L is the length j of the TSA. An efficient implementation of the algorithm on a hypercube multi- j processor was shown. The algorithm can also be implemented on multiprocessors ■ with mesh and tree topologies with the same asym ptotic time-complexity. Our parallel TSA-algorithms provide only logarithmic speedup with respect to the num ber of processors. This is because of the recursive-decomposition approach, on which all the algorithms are based. In addition, a sequential im plem entation j jof the decomposition is not as efficient as one of the known sequential TSA algo- | rithm s [12]. For these reasons, the parallel algorithms fail to achieve 100-percent 1 efficiency. Nevertheless, the approach used in [12] is not amenable to a parallel ; im plementation. Therefore, alternate approaches with improved efficiency are a i | promising research area. 125 | In C hapter 4, we described sequential and parallel algorithms for finding an 'optim al TSA for the class of SS/TDMA switching systems with variable-bandwidth 1 i j J beams. Our algorithms are based on the technique of finding circulations in a j network model associated with the switching system. Our iterative sequential i {algorithm has, in general, a better time-complexity than the algorithm presented in j ![20]. Our parallel algorithms, which are similar to those presented in C hapter 3 for j ■ I The case of HSS’s, use the divide-and-conquer approach to decompose the problem | ■ of finding an optimal TSA for a given traffic m atrix into two subproblems of finding ! -optimal TSA’s in two smaller traffic matrices. j | In Chapter 5, we presented incremental TSA algorithms for the class of hierar chical switching systems. These incremental algorithms were shown to be efficient I jwhen the number of changes in the traffic demands of two consecutive frames is j | small. Computing the incremental TSA for an HSS was achieved using the follow- J ing two steps. I 1. Converting the problem of finding an optimal TSA for an HSS to th at of j finding an optim al TSA for an SS/TDMA system. j i ; 2. Computing the incremental TSA for an SS/TDM A system. ! j Step 2, by itself, provides an efficient TSA algorithm for SS/TDM A systems. The , incremental TSA algorithm utilizes the analogy between an SS/TDM A system and I , a three-stage Clos network. If the number of changes between the traffic demands j | of two consecutive frames is c, our incremental TSA algorithm requires 0 ( M 2 + cM ) j 1 » . . . . . i 'tim e, where M is the number of inputs of the hierarchical switching system. Increm ental algorithms are more efficient for computing the TSA if the traffic demands in consecutive frames overlap to a significant extent. Although our algo rithm is primarily intended for incremental computation of the TSA, it can also be jused for com putation of a complete TSA. This may, in fact, be more efficient than j jsome of the earlier known algorithms. For an arbitrary traffic-matrix, complete re- | i com putation of the entire TSA using the incremental algorithm takes 0 { M 2 + L M ) 1 i ! time. The best known sequential algorithm due to Chalasani and Varma, has a ' tim e complexity of 0(m in(T , M 2) • min(Ar, y/M ) • M 2). Therefore, the incremental | algorithm performs better if the number of changes c is small compared to the J | quantity m in(T, M 2) • m in(N , y/M ) • M . \ I I 126 A p p en d ix A The rectangular configuration defined by Definition 4 is not unique for a given fault set P , because the faulty crosspoints may be enclosed by rectangular regions I in more than one way. Therefore, there is a need to define the best rectangular * configuration corresponding to the given fault set such that the configuration has ithe maximum probability of satisfying the conditions in Theorem 5. In this section jwe define such a minimal rectangular configuration and give an informal description of an algorithm to find such a configuration for a given fault-set. For convenience, we define the following with respect to a rectangle (P,-, Bi). i ! Pi x Bi = {(p, 6)|p € P and b € P }; | Fi = F D (Pi x Bi). 'T hat is, Fi denotes the set of faulty crosspoints within the ith rectangle. Now we can formulate the conditions for m inimality of a rectangular config uration corresponding to a fault set P. Firstly, every rectangle (Pi,P4 - ) of the configuration should be minimal, th at is there should be no smaller rectangle en closing the set of faulty crosspoints enclosed by (P i,B i). Secondly, there should be J no proper way to enclose the faulty crosspoints within a particular rectangle with | two smaller non-overlapping rectangles. These conditions are formally stated in the following definition. i i D e fin itio n 5 A partial configuration {(Po, Po), (Pi, P i), • • •, (Pm_i, P m-i)} corre- \ I sponding to a fault set F is minimal if and only if the following conditions are j satisfied: ! I I (i) Fi % Pi x B \, for every P.- C Bi, and \ Fi % Pi x B if fo r every P- CP». | (ii) For every proper partition Pi = P- U P" and Bi = B\ U B- with P- D P" = j b; n B" = 0, F i g (Pi x B'i) U (P" x B "), for all 0 < i < m . Condition (i) refers to the minimality of each rectangular region, while condition 1 (ii) refers to its non-partitionability. j We can now devise a simple algorithm to construct the minimal rectangular configuration corresponding to a given fault set F . We begin by choosing any 1 I faulty crosspoint (p, b) and inserting port p into P0 and bus b into B 0. Now we j expand the rectangle (Po,Po) as follows: if x E Po, add to Bo all y such that {x,y) € F. if y € Po, add to Po all x such that (x,y) £ F. I :The procedure is repeated till the rectangle is closed with respect to the above two I operations, th at is no new buses or ports can be added. This defines the rectangular region (P0,Po). We can now remove all the crosspoints in P0 x B 0 from F and (repeat the same procedure to construct the next rectangle (Pi, P i), and so on till every crosspoint in F is included in one of the rectangles. It is easy to see th at the | rectangular configuration constructed by this algorithm is minimal in the sense of | Definition 5. I I I I I I i ; 128 I I R efe re n c e L ist j[1] G. B. Adams III, H. J. Siegel, “The Extra-Stage Cube: A Fault-Tolerant ] Interconnection Network for Supersystems,” IE E E Transactions on Comput- i i ers, Vol. C-31, No. 5, May 1982, pp. 443-454. < | i [2] A. E. Aho, J. E. Hopcroft, and J. D. Ullman, The design and analysis o f \ computer algorithms, Addison-Wesley, 1974. |[3] A. V. Aho, J. Hopcroft, J. D. Ullman, “D ata Structures and Algorithms,” i Addis on-Wesley, 1980. i [4] R. K. Ahuja and J. B. Orlin, “A fast and simple algorithm for the m aximum flow problem,” Operations Research, vol. 37, pp. 748-759, 1989. i [5] K. E. Batcher, “Sorting networks and their applications,” Proc. A F IP S ' ' Spring Joint Conf., 1968, pp. 307-314. ; 1 i ;[6] V.E. Benes , “On rearrangeable three-stage connecting networks,” The Bell | System Technical Journal, vol. 41, September 1962, pp. 1481-1492. I I [7] V. E. Benes , Mathematical Theory of Connecting Networks and Telephone Traffic, Academic Press, New York, 1965. j [8] M. A. Bonuccelli, “A fast tim e slot assignment algorithm for TDM hierarchi- 1 1 cal switching systems,” IEEE Trans. Commun., vol. COM-37, pp. 870-874, August 1989. 1 I j [9] G. Bongiovanni, D. Coppersmith, and C. K. Wong, “An optim um tim e | slot assignment algorithm for an SS/TDM A system with variable num ber > i of transponders,” IEEE Trans. Commun., vol. COM-29, pp. 721-726, May 1 1981. 1 i j [10] G. Bongiovanni, D. T. Tang, and C. K. Wong, “A general m ultibeam satellite j switching algorithm,” IE E E Trans. Commun., vol. COM-29, pp. 1025-1036, j | July 1981. j 129 <[11] S. Chalasani and A. Varma, “Fast parallel time-slot assignment algorithms for TDM switching systems,” Proc. 19th I n ti Conf. Parallel Processing, St. Charles, Illinois, Aug. 1990, Vol. Ill, pp. 154-161. [ 1 2 ] S. Chalasani and A. Varma, “An improved time-slot assignment algorithm for TDM hierarchical switching systems,” Proc. Fourth I n ti Conf. Data Com- i munication Systems and their Performance, Barcelona, Spain, June 1 9 9 0 , j pp. 1 1 6 - 1 3 2 . I [13] S. Chalasani, C. S. Raghavendra and A. Varma, “Fault-Tolerant Routing I in MIN-based Supercomputers,” Proc. Supercomputing ’ 90, New York, Nov. i 1990, pp. 244-253. ' i < j [14] S. Chalasani and A. Varma, “Efficient time-slot assignment algorithms for ■ 1 SS/TDM A systems with variable bandwidth beam s,” Proceedings o f INFO- I COM ’ 91, Bal Harbor, Florida, April 1991, pp. 658-667. (An enhanced ver- i sion of this paper is subm itted to IEEE Transactions on Communications.) [15] S. Chalasani and A. Varma, “Parallel time-slot assignment algorithms for TDM switching systems,” To appear in IE E E Trans. Commun. I J I [16] K. Y. Eng and A. S. Acampora, “Fundamental conditions governing TDM ; j switching assignments in terrestrial and satellite networks,” IE E E Trans. I Commun., vol. COM-35, pp. 755-761, July 1987. [17] T-Y. Feng, “A Survey of Interconnection Networks,” Computer, Vol. 14, No. 12, December 1981, pp. 12-27. I ,[18] C. J. Georgiou, “Fault-Tolerant Crosspoint Switching Networks,” Proceed- I ings o f the 14th International Symposium on Fault-Tolerant Computing, July 1984, pp. 240-245. i : [19] A. V. Goldberg and R. E. Tarjan, “A new approach to the maxim um flow I problem,” J. Assoc. Comput. Mach., vol. 35, pp. 921-940, 1988. j [20] I. S. Gopal, G. Bongiovanni, M. A. Bonuccelli, D. T. Tang, and C. K. Wong, I I “An optim al switching algorithm for multibeam satellite systems with vari- | i able bandwidth beams,” IEEE Trans. Commun., vol. COM-30, pp. 2475- 1 2481, November 1982. ' [21] J. E. Hopcroft and R. M. Karp, “An n5/2 algorithm for maximum matchings | in bipartite graphs,” SIA M Journal of Computing, Vol. 2, 1973, pp.225-231. [22] T. Inukai, “An efficient SS/TDMA tim e slot assignment algorithm ,” IE E E ! ' Trans. Commun., vol. COM-27, pp. 1449-1455, October 1979. j 130 A. Jajszczyk, “One-Sided Switching Networks Composed of Digital Switching • M atrices,” IEEE Transactions on Communications, Vol. COM-35, No. 12, [ December 1987, pp. 1383-1384. I J.S. Kowalik, “Parallel MIMD Computation: the HEP Supercom puter and its Applications,” Eds.: J.S. Kowalik, MIT Press, 1985. C. P. Kruskal, L. Rudolph, and M. Snir, “The power of parallel prefix,” IE E E Trans. Comput., vol. C-34, pp. 965-968, October 1985. T. Lang et al., “Bandwidth of Crossbar and M ultibus Connections for Mut- liprocessors,” IEEE Transactions on Computers, Vol. C-31, No. 12, December 1982, pp. 1227-1234. T. Lang et al., “Reduction of Connections for Multibus Organization,” IE E E Transactions on Computers, Vol. C-32, No. 8, August 1983, pp. 707-716. E. L. Lawler, Combinatorial Optimization: Networks and Matroids, Saunders College Publishing, 1976. D. H. Lawrie, “Access and Alignment of Data in an Array Processor,” IE E E Transactions on Computers, Vol. C-24, No. 12, December 1975, pp. 1145- 1155. G. F. Lev, N. Pippenger, L. G. Valiant, “A Fast Parallel Algorithm for Rout ing in Perm utation Networks,” IEEE Trans. Comput., vol. C-30, pp. 93-100, February 1981. S. C. Liew, “Comments on ‘Fundamental conditions governing TDM switch ing assignments in terrestrial and satellite networks’,” IE E E Trans. Com mun., vol. COM-37, pp. 187-189, February 1989. V. M. M alhotra, M. P. Kumar, and S. N. Maheswari, “An 0 ( |V |3) algo rithm for finding maximum flows in networks,” Information Processing Let ters, vol. 7, pp. 277-278, 1978. G. M. Masson, “Binomial Switching Networks for Concentration and Dis tribution,” IE E E Transactions in Communications, Vol. COM-25, No. 9, September 1977, pp. 873-883. [34] C. Mitchell and P. Wild, “One-Stage One-Sided Rearrangeable Switching Networks,” IE E E Transactions on Communications, Vol. COM-37, No. 1, January 1989. [35] S. Nakamura, G. M. Masson, “Lower Bounds on Crosspoints in Concentra tors,” IE E E Transactions on Computers, Vol. C-31, No. 12, December 1982, pp. 1173-1179. 131 C. H. Papadim itriou, K. Steiglitz, “Combinatorial Optimization: Algorithms , and Complexity,” Prentice-Hall, 1982. D. S. Parker, “Notes on Shuffle/Exchange Type Switching Networks,” IE E E Transactions on Computers, Vol. C-29, No. 3, March 1980, pp. 213-222. D. S. Parker, C. S. Raghavendra, “The Gamma Network,” IE E E Transac tions on Computers, Vol. C-33, No. 4, April 1984, pp. 367-373. M. C. Pauli, “Reswitching of connection networks,” The Bell System Tech nical Journal, vol. 41, May 1962, pp. 833-855. M. C. Pease, “The Indirect Binary n-Cube Microprocessor Array,” IE E E Transactions on Computers, Vol. C-26, No. 6, May 1977, pp. 458-473. G. F. Pfister et al., “The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture,” Proceedings of the International Conference on Parallel Processing, August 1985, pp. 764-771. C. S. Raghavendra, A. Varma, “IN D R A : A Class of Interconnection Net works with Redundant Paths,” Proceedings of the Real-Time System s Sym posium, Austin, Texas, December 1984, pp. 153-164. C. Rose and M. G. Hluchyj, “The performance of random and optim al scheduling in an time multiplex switch,” IEEE Trans. Commun., vol. COM- 35, pp. 813-817, August 1987. C. L. Seitz, “The Cosmic Cube,” Communications o f the ACM , Vol. 28, No. 1, January 1985, pp. 22-33. J. P. Shen, J. P. Hayes, “Fault-Tolerance of Dynamic-Full-Access Interconnec tion Networks,” IE E E Transactions on Computers, Vol. C-33, No. 3, March 1984, pp. 241-248. Y. Shiloach and U. Vishkin, “An 0 ( n 2 log n) parallel max-flow algorithm ,” Journal o f Algorithms, vol. 3, pp. 128-146, 1982. H. S. Stone, “Parallel Processing with the Perfect Shuffle,” IE E E Transac tions on Computers, Vol. C-20, No. 2, February 1971, pp. 153-161. J. Szep, F. Forgo, “Introduction to the Theory of Games,” D. Reidel Pub lishing Company, 1985. S. L. Tanimoto, “A Pyramidal Approach to Parallel Processing,” Proceedings o f the 10th Annual International Symposium on Computer Architecture, June 1983, pp. 372-378. R. E. Tarjan, “A simple version of Karzanov’s blocking flow algorithm ,” 1 Operations Research Letters, vol. 2, pp. 265-268, 1984. A. Varma, C. S. Raghavendra, “Fault-Tolerant Routing in M ultistage In- , terconnection Networks,” IEEE Transactions on Computers, vol. 38, no. 3, March 1989, pp. 385-393. A. Varma, J. Ghosh, C. J. Georgiou, “Rearrangeable Operation of Large Crosspoint Switching Networks,” IEEE Transactions on Communications, vol. 38, no. 9, September 1990, pp. 1616-1624. A. Varma, S. Chalasani, “Reduction of Crosspoints in One-Sided Crosspoint Switching Networks,” To appear in Proceedings o f INFOCOM ’ 89, Ottawa, Canada, April 1989, pp. 943-952. j A. Varma, S. Chalasani, “Fault-Tolerance Analysis of One-Sided Crosspoint Switching Networks,” To appear in IEEE Transactions on Computers. A. Varma, S. Chalasani, “An Incremental Time-Slot Assignment Algorithm for TDM Hierarchical Switching Systems,” To appear in Proceedings o f the I n t’ l. C onf on Communications, June 1991. (An enhanced version of this paper is subm itted to IEEE Journal on Selected Areas in Communications.) A. Waksman, “A Perm utation Network,” Journal o f the ACM, Vol. 15, No. j 1, January 1968, pp. 159-163. I L. D. W ittie, “Communication Structures for Large Networks of Microcom- \ puters,” IE E E Transactions on Computers, Vol. C-30, No. 4, April 1981, pp. j 264-273. !
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11257238
Unique identifier
UC11257238
Legacy Identifier
DP22813