Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
FAULT TOLERANCE ANALYSIS OF HYPERCUBES by Sing-Ban Robert Tien A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Engineering) August 1991 Copyright 1991 Sing-Ban Robert Tien UMI Number: DP22837 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted, Also, if material had to be removed, a note will indicate the deletion. Dissertation Publishing UMI DP22837 Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106- 1346 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90089-4015 This dissertation, written by S in g - JBan _RobT e x t .T ien ........................... under the direction of h is Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillm ent of re quirem ents for the degree of D O C TO R OF PHILOSOPH Y Dean of Graduate Studies , J u ly 3 0 , 1991 D a te ................... DISSERTATION COMMITTEE Chairperson y v \ Ph.D. CfS T563 3WZGj [,2J A ck n ow led gem en ts I like to thank my dissertation committee members, Prof. C.S. Raghavendra, Z. Zhang and R. Guralnick for their guidance. I am indebted to my advisor, Prof. Raghavendra who kindly supported me and always helped me throughout my graduate study. I am grateful to Prof. Guralnick who spent much of his valuable time to help me solve my problems. From him, I learned true mathematics and the spirit of a professional mathematician. Based on this learning, I could proceed with my research. I also like to thank Prof. Zhang for many fruitful discussions. I owe special thanks to my wife. Her true love, care, and encouragements have always been the most precious to me. W ithout her, my Ph.D. life could not have been so meaningful and fruitful to me. I remember one of our friends once said “ When you get your Ph.D., half of it belongs to her.” I think th a t’s very true. I am obliged to my parents-in-law for their taking care of my son Jonathan so that ‘ I could concentrate on my research. I also wish to thank two brothers in Lord, brother Peter Chang at Church in Rosemead and brother Chao at Church in Irvine and their sisters. They constantly prayed for me, especially when I was in the dark j in doing research and almost wanted to give up. I believe it is because of their sincere prayers and His mercy, I was turned back to light. During these years of study, many friends worked together with me. They were constant sources of encouragements, helps and fun. Their names are Meera Balakrishnan, Rajendra Boppana, Suresh B. Chalasani, Ge-Ming Chiu, Kuen-Jong Lee, Jung-Cheun Lien, Hwa-Chun Lin, Wei-Ming Lin, and Pei-Ji Yang. I thank you all. ii C o n ten ts A cknow ledgem ents ii List O f Tables v List O f Figures vi A bstract viii Introduction 1 1.1 Fault T o leran ce ............................................................. 2 1.2 Overview of the D issertation..................................................................... 2 H ypercub e Fault Tolerance 5 2.1 Fault Prevention, Removal, and Tolerance .............................. 5 2.2 Hyper cube C o m p u te rs................................... 6 2.3 Faulty H y p e rc u b e s..................................................................................... 8 2.4 Hypercube with Extra Nodes and L inks................................................. 11 2.5 Hypercubes without Extra Nodes and L in k s ........................................ 18 2.5.1 Reconfiguration Using Unused Nodes as Spares .................. 18 2.5.2 Subcube Fault T olerance................ 21 2.5.3 Simulation of Algorithms on Faulty Hypercubes . . . . . . . 22 2.6 Related Issues in Faulty Hypercubes ............................. 23 2.6.1 Basic Properties of Faulty H y p e rc u b e s.................. ..... 23 2.6.2 Important Communication F u n ctio n s........................................ 24 2.7 Benefits of Fault T olerance.................................................................... . 25 R econfiguration of Faulty H ypercubes by A utom orphism s 26 3.1 Introduction.................................................................................................. 26 3.2 Reconfiguration of Hypercubes under Single Failure ....................... 28 3.2.1 Reconfiguration of a Single Node F a ilu re .................................. 28 3.2.2 Reconfiguration of a Single Link F a i l u r e .................................. 35 3.3 Reconfiguration of Hypercube under Multiple Failures .................... 39 3.3.1 Isomorphism-Completeness of the General P ro b le m .............. 41 iii 3.3.2 Less Than Five F a u l t s .................................................................. 44 3.3.3 Multiple Adjacent F a u lts ................................................................ 47 3.3.4 Random Faults (> 5 ) ...................................................................... 50 3.3.5 Simulation R e s u lts ......................................................................... 55 3.4 D iscussion..................................................................................................... 62 4 A nalysis of Faulty H ypercubes 63 4.1 Intro d u ctio n .................................................................................................. 63 4.2 Preliminaries ............................................................................................... 64 4.3 The Shortest Path P r o b le m ..................................................................... 65 4.3.1 Number of Faults Less Than n ..................................................... 65 4.3.2 Number of Faults Between n and 2n — 3 .................................. 70 4.4 The Diameter P ro b le m .............................................................................. 78 4.4.1 3 or More Non-Faulty Neighbors.................................................. 81 4.5 D iscussion..................................................................................................... 85 5 Sim ulation of N orm al A lgorithm s on Faulty H ypercubes 86 5.1 Introduction.................................................................................................. 86 5.2 Free Dim ensions............................................................................................ 88 5.3 Reassigning Faulty Nodes and Their Links . .................................... 89 5.4 Simulation of Normal A lg o rith m s........................................................... 90 5.5 Fast Global O perations.............................................................................. 94 5.5.1 Number of Faults Less Than n ......................... 95 5.5.2 Number of Faults Between n and 2n — 3 .................................. 96 5.6 Implementation of the Simulation S c h e m e .............................................100 5.6.1 Finding Free D im ensions................................................................. 101 5.6.2 Finding Uncovered Free Dim ensions..................... 102 5.6.3 Reassigning Faulty Nodes and Their L in k s ................................... 103 5.7 D iscussion................................................ 104 I 6 C onclusions and Future R esearch 106 I iv L ist O f T ables 2.1 Summary of major works in hypercube fault tolerance using extra nodes and/or links......................................................................................... 18 3.1 Case 1: A task graph is embedded with 32 spares left....................... 57 3.2 Case 2: A task graph is embedded with 47 spares left....................... 57 3.3 Case 3: A task graph is embedded with 50 spares located in a small subcube. . ................................ 57 3.4 A task graph is embedded in a 7-dimensional cube with 50 nodes left as spares. .......................................... 61 t v I L ist O f F igu res 2.1 A 4-dimensional hypercube computer constructed from two 3-di mensional hyper cube computers...................... 7 2.2 An FFT network can be embedded into hypercubes for parallel ex ecution of FFT. ................... . 9 2.3 An overview of the techniques for providing fault tolerance on hy percube computers. . ................... 12 2.4 One simple way to attach a spare node is to connect it to every node in the hyper cube............................................................................................ 13 2.5 Spare processors are placed at Hamming codeword positions. . . . . 14 2.6 (a) A circulant graph with 12 nodes on a circle, (b) A 3-cube with nodes arranged on a circle, (c) Redundant links shown by dotted lines are introduced to convert a 3-cube into a circulant graph, (d) A 3-cube and a spare node are placed on a circle and redundant links shown in dotted lines are introduced to make it circulant. (e) A different embedding of the 3-cube in the circulant graph which bypasses the faulty node 2................................... 16 2.7 Unused nodes as spares to perform reconfiguration............................... 20 3.1 A 3-dim cube where each node swaps with the other node on a dimension link........................................ 28 3.2 A loop with six nodes is reconfigured to tolerate a node failure. . . 32 3.3 A more complex embedding in a 4-dim cube. Bit-complementing operations are used to perform reconfiguration....................................... 33 3.4 (a)A tree is embedded in a 4-dim cube, (b)After using complement ing operations to perform reconfiguration............................ 34 3.5 (a)Routing in a good 2-dim subcube.(b)Routing in a 2-dim subcube containing the faulty link. ..................................................................... 38 3.6 A 2 by 4 mesh embedded in a 3-dim cube is reconfigured to tolerate single link failure............................................................................................ 39 3.7 Conceptual illustration of the reconfiguration problem.............................41 3.8 The construction of G\ from G \................................................................. 43 3.9 Canonical form of a 5-node subset, where the shaded arers are blocks o fl's . ............................................................................................... 51 __________________________________ vi__i 3.10 The probability of successful reconfiguration for three different task graphs embedded in a 10-dimensional cube............................... 59 3.11 The probability of successful reconfiguration when a task graph is embedded such that 50 spares are available in a 7-dimensional hy percube. ................................ 60 4.1 A worst case fault pattern which makes the shortest path between S and D of length 6 is shown above. The length of the shortest path is increased by 4............................................................................................. 71 4.2 A worst case scenario to show that the diameter is n -f 2 when |F | = 2n — 3. ....................................... 79 4.3 There are n — 2 paths between X and Y in Q ^- 2 - Two more paths are constructed. ...................... 83 5.1 Three cases of fault patterns.............................................................. 90 5.2 An example where at least 3n — 1 steps are needed..................................101 A b stract An im portant issue in large multiprocessor systems is fault tolerance. In this dissertation, fault-tolerance aspects of the hypercube computer, a popular m ulti processor system, are investigated. Generally, there are two techniques of providing fault tolerance for a hypercube computer. One technique uses extra spare nodes and/or links to replace the faulty components; the other uses unutilized nodes of a hypercube to replace faulty nodes or operates it with degraded performance. This dissertation concentrates on the second approach. The main idea behind using unused nodes is to transform an embedded task to a new embedding which contains only nonfaulty nodes. Effectively, faulty nodes are replaced by the unused nodes; i.e. the unused nodes act as spares. Condi tions under which a set of unused nodes can replace a set of faulty nodes are derived. Simulations are conducted to evaluate the success rate of reconfiguration for multiple random faults. Two key properties of faulty hypercubes, namely, shortest paths and diameter, are also studied in detail. These two measures set fundamental limits on the performance of many applications when tasks run on faulty hypercubes. Efficient algorithms for finding near shortest paths are designed. These algorithms can be used in solving message routing problem in faulty hypercubes. When tasks do require all nodes in a hypercube computer, a simulation scheme is developed to execute tasks on faulty hypercubes. Specifically, a class of algorithms, known as normal algorithms, is simulated on faulty hypercubes. For some frequently used global operations, namely, global minimum, maximum, logical AND, and OR, a direct and more efficient algorithm which runs on faulty hypercubes is also designed. viii C h ap ter 1 In tro d u ctio n Computing systems play an important role in our daily life through a wide variety of applications, such as banking systems, airline reservation systems, telephone switching systems, weather forecasting, medical diagnosis, to name a few. Since the speed of single processor has almost reached the limit of physical devices, uniprocessors can hardly meet the increasing demands of computing power for such applications. Multiprocessing is a promising approach to provide higher pro cessing power. By incorporating more processors together, there is high potential in increasing the processing power of computers by many folds. Extensive research has been conducted on multiprocessor systems. Im portant areas include the following: interconnection network to support efficient communi cation [75], efficient algorithms for solving various types of problems [60], parallel programming languages and compilers [12, 71]. Construction of various experimen tal multiprocessor prototype systems has also been undertaken; to name a few, the IBM RP3 Computer [58], the NYU Ultracomputer [29], Connection machines [34], Hypercube computers [68] and the Warp computer [3]. All these efforts are di rected towards developing multiprocessor systems with higher computing power to j meet the demands of applications. Among these multiprocessor systems, hypercube computer has become one of the most popular multiprocessor system, due to its rich connectivity, regularity and other nice properties. Many hypercube systems are also commercially available through companies such as Intel, Floating Point Systems, Ncube, Ametek, and Thinking Machine Corp. [34, 38, 55]. 1 1.1 F ault T olerance An im portant issue in these advanced and complex multiprocessor systems is fault tolerance. Even though the failure probability of a single processor is usually low, the failure probability of an entire multiprocessor system can be quite high since a large number of processors are involved. Due to the nature of the tasks executed in a multiprocessor system, coordination among processors is needed. Thus the failure of a single processor can stop the entire execution of tasks. In noncritical applications, the discontinuation of the ability to execute tasks causes inconvenience, for example in banking systems and telephone switching systems [2, 39]; otherwise wastes valuable computation time, for example in su percomputers like Connection machines and Cray machines [2, 39]. In critical applications, such as nuclear reactor control computers, and aircraft flight control computers, such interruption in the ability to perform tasks may mean loss of lives and/or economic loss [2, 39]. There is a need to provide fault-tolerance capabilities in these large parallel computers for economic as well as safety reasons, so that the tasks can continue to execute even in the presence of failures. In single-processor systems, spares need to be introduced to provide hardware redundancy since there is only one component of each type. When a component failure occurs, the spare can replace the failed component. However, in a multipro cessor system, the situation is different. These parallel computers are inherently redundant in the sense that they possess a large number of identical processors. This inherent redundancy can be exploited to improve reliability and achieve fault tolerance without incurring extra cost. Exploiting such inherent redundancy in hypercubes to achieve reliability and fault tolerance is the central theme of this dissertation. 1.2 O verview o f th e D isserta tio n In Chapter 2, general concepts of fault tolerance are introduced. Then the struc ture of hypercube computers is discussed. Basic properties of hypercubes such as shortest paths, diameter, connectivity, etc. are also reviewed. A survey on state of the art of hypercube fault tolerance then follows. Fault tolerance schemes on 2 hypercube can be classified into two general categories: 1) those th at use extra nodes and/or links, and 2) those that use the inherent redundancy of hypercubes. The work of this dissertation falls into the second category. The purpose of this survey is to show where the contributions of this dissertation fits in comparison with other works and to show the progress of work in other areas. Finally the benefits of fault tolerance are summarized. Chapter 3 presents the reconfiguration techniques which use unused nodes as spare nodes. Our technique, which is based on automorphisms, works for any gen eral task graph embedded in a hypercube. The key concept of our approach is to use the automorphisms of hypercubes to transform the task graph to another graph which is fault-free and isomorphic to it. Single fault is guaranteed to be reconfigurable. Reconfiguration in the event of multiple faults depends on the lo cations of unused nodes. The general problem of reconfiguration in the presence of an arbitrary number of faults is shown to be as hard as the graph isomorphism problem. However, conditions for successful reconfiguration are derived. Simula tions show that with high probability small number of faults in practical machines (dimensions < 10) can be tolerated. Since faults change the structure of hypercubes, the basic properties of hy percubes also change. In Chapter 4, two important properties are investigated for faulty hypercubes. Bounds for shortest paths and diameter are derived when the number of faulty components is less than 2n — 2, where n is the number of dimensions of the hypercube. Worst case examples are constructed to show that these bounds are tight. Efficient algorithms for finding near shortest paths are also presented. The length of the path found by this algorithm is at most the Hamming distance between the two end nodes of this path plus a small constant. When there are no unused nodes, the techniques proposed in Chapter 3 can not be applied. In Chapter 5, a simulation scheme is presented for a useful class of algorithms, namely, the normal algorithms. In a normal algorithm running on a hypercube, every processor communicates along the same dimension in each step. ' The faulty hypercube is used to simulate the operations of these normal algorithms developed for fault-free hypercubes. This simulation scheme only incurs a constant slowdown factor in the entire execution time when the number of faults is less than _________________________ 3 1 n, the number of dimensions of the hypercube. In particular, in non-worst case situations, which are more likely in practice, the communication time is at most tripled. The computation time is always doubled. For certain special primitive global operations, a more direct method is developed resulting in lower tim e com plexity. Global maximum, minimum, logical AND, logical OR, and broadcasting can be completed in optimal number of steps. Finally, in Chapter 6, some conclusions and directions for future research are given. 4 C h a p ter 2 H y p ercu b e Fault T olerance Since hardware deteriorates, it is inevitable that in a computing system some components will fail in its life time. Examples of component failures can be failed transistors or shorted metal lines in integrated circuits. When such failed compo nents, called faults, are used in performing system functions, erroneous data may be generated. Such erroneous data which manifest the faults are called errors. Errors can then result in system malfunction, which is called failure [2, 39]. W ith fault-tolerance capability built in a system, faults can be detected, excluded from the active parts of the system; thus the system can continue to operate without incurring failures. However, a system with fault-tolerance capability does not im ply high reliability. If components of low reliability are used, failures can occur too often which can exhaust the fault-tolerance capabilities. Therefore, careful design and testing of a system are also crucial for building reliable computing systems. Generally, there are 3 strategies in combating faults: fault prevention, removal, and tolerance as described in the following section. 2.1 F ault P rev en tio n , R em oval, an d T oleran ce Fault P reven tion is targeted at avoiding faults at the design and construction of a system through rigorous processes. This involves careful selection of reli able technology and components, cautious management of the project, and formal methods for verifying logic design and software programs. Through such rigorous L 5 design and construction process, the number of faulty components in a system can be minimized. This is an important part in constructing a reliable system. F a u lt rem oval Despite the rigorous design process, design faults can still be present in the system, due to the enormous complexity of hardware and software. Besides, totally fault-free components are hard to guarantee and too expensive to procure. Fault removal is aimed at removing the faults introduced at the design and construction phase. The approach is usually through testing by exercising the system with comprehensive test inputs or by operating the system in extreme conditions (burn-in test). F a u lt T o leran ce Once the system passes the first two phases, the system can be put into operation. However, failures can still arise because of the stress of normal operation or environmental effects. Such failures result in unexpected interrupts of system functions which can cause severe problems. Fault tolerance is intended to handle such faults so that the system can continue to function even in their presence. Issues of the first two strategies in designing reliable and high performance computers are not addressed in this dissertation. This dissertation concentrates on the issues in the third strategy, i.e., fault tolerance as applied to a class of multiprocessors called hypercube computers. 2.2 H y p ercu b e C om p u ters An n-dimensional hypercube computer is constructed recursively from lower di mensional hypercubes. For example, a 4-dimensional hypercube computer as shown in Fig. 2.1, where a processor is at each node and a communication chan nel at a link, is constructed by connecting the corresponding nodes of two 3- dimensional hypercube computers. Similarly, a 5-dimensional hypercube computer is constructed by connecting the corresponding nodes of two 4-dimensional hy percube computers, and an n-dimensional hypercube computer from two (n — 1)- dimensional hypercube computers. An n-dimensional hypercube computer consists 6 | Figure 2.1: A 4-dimensional hypercube computer constructed from two 3-di mensional hypercube computers. of 2n nodes and n • 2n_1 links. The nodes are labelled with n-bit-binary numbers such th at the labels of any two adjacent nodes differ in exactly one bit. Basic properties of hypercubes are well known [32, 67]. Hypercube is a regular graph where each node has the same degree n. The shortest path between any two nodes is of length equal to their Hamming distance, where Hamming distance is defined as the number of bits differing in their binary labels. Such a shortest path can be found by simply correcting successive bit positions where the two end nodes differ. Between any two nodes there are n node-disjoint paths of length at most their Hamming distance plus 2. The connectivity, defined as the number of nodes (or links) that need to be deleted so that the network becomes disconnected, is thus n. In fact, the only way to disconnect a hypercube, by deleting n nodes (or links) is by deleting all the neighboring nodes of a particular node (or all its incident links). Hamiltonian paths exist in hypercubes; for example, a Hamiltonian path can be constructed using the concept of gray codes [65]. Another property of hypercube is that it is a bipartite graph and hence no cycle of odd length exists. Many efficient algorithms for solving various kinds of problems such as sort ing [52], routing [10, 53, 73], broadcasting [41, 54], image processing [48], and m atrix operations [51] on hypercubes have been designed. General tasks are exe cuted in hypercubes as described below. A task is decomposed into subtasks and 7__ then represented as a task graph where nodes represent the subtasks and a link is introduced between two subtasks if they communicate. This task graph is then embedded into hypercube for parallel execution. For example, a 4-input FFT (Fast Fourier Transform) network which is the task graph of FFT, shown in Fig. 2.2(a), can be embedded into a 4-dimensional hypercube as shown in Fig. 2.2(b). In general, it has been shown that FFT network is a subgraph of hypercubes with sufficient number of nodes [31]. Other commonly used task graphs such as meshes and trees have also been shown to be subgraphs of hypercubes [67, 74]. Techniques for embedding general task graphs have also been developed in [16, 30]. As mentioned in Chapter 1, the overall failure rate of a hypercube can be high. It is im portant as well as interesting to investigate how to run tasks and algorithms on faulty hypercubes so that the system can continue to perform useful work in the presence of faults. 2.3 F au lty H yp er cu b es The first step in handling faults is to detect their presence. For hypercubes, one straightforward approach to detect faults is by having two copies of identical tasks running at the same time in two different subcubes. Here, a subcube is a lower dimensional cube in a hypercube. For example a 3-dimensional cube is a subcube (3-subcube) in a 4-dimensional hypercube. The results from these two copies are then compared. If there is discrepancy, a fault has occurred. However, in a multiprocessor system like hypercube, performance is always an im portant concern. Running two identical copies of the same task degrades the performance by a factor [ of 2. i To improve performance, conventional detection techniques can be used on pro cessor boards. Typically, a board is partitioned into many confinement areas. For example, a memory module can be a confinement area and detection mechanism can be built at the boundary so that any data fetch from memory is always checked before data leaves the module. This has two advantages, firstly, it detects if the memory module is faulty and secondly, it prevents errors from propagating to other 8 (1, 1) (0, 1) (0, 2) (0, 3) (0,4) (a) Task graph of a 4-point FFT (0,4) (2,4) (1,4) (0, 2) (0,3) (0, 1) (2, 1) (b) A 4-point FFT network embedded in a 4-cube Figure 2.2: An FFT network can be embedded into hypercubes for parallel execu tion of FFT. 9 j parts of the system. As the confinement area becomes smaller, the errors are con fined to a smaller area and also the fault location can be known more accurately. However, more detection hardware is required and therefore the cost will increase. Commonly used detection methods are error-detecting codes or by comparison of outputs of two functional units. Sometimes, a mixture of error-detecting codes and duplication are used. Off-line methods such as self-diagnosis, which have been developed in the theory of testing, can also be used. Test inputs are fed into the system and by analyzing the outputs, the system is determined to be fault- free or not. Such approach has been implemented for hypercube computers [4, 21]. Each processor tests its neighbors and collectively all good processors decide which processors are faulty. The confinement area can also be defined in software. For example, the subtask executed in a processor before sending its data to other subtasks can perform a reasonableness check. If the data does not pass the reasonableness test, a fault has occurred. In this way, errors can be restricted to the subtask module. Rea sonableness test depends on the application running on hypercubes. For example, a square root procedure can square the result and compare with the original value for reasonableness test. However, the detection mechanism at this level can not locate the faults very accurately. A thorough fault diagnosis is needed to locate faults. By providing detection mechanisms at multiple levels, both in hardware and software, more faults can be detected, since different detection mechanisms tend to detect different sets of faults. Thus higher coverage can be achieved, where coverage is defined as the percentage of total faults that can be detected. Since undetected faults can create unpredictable failures and hence cause severe problems, high coverage of a detection mechanism employed is very crucial. In addition to the self-checking circuits at the hardware level and the self-diagnosis at the system level mentioned before, it is also possible to perform detection at the application level. Algorithm based detection and correction methods [5, 6, 35, 64] are examples of such detection mechanisms. In this approach, based on the application algorithm, the detection mechanism uses some invariants of the algorithm to detect faults. For example, in signal processing, Parseval’s theorem [57] relates the energy in I 10 tim e domain to the energy in the frequency domain. For digital sequences this relation is N - l N - 1 N £ x2 (j) = £ r2 0) j=0 j=0 where the AT(/) are inputs and F ( j) are outputs. By comparing the sum of squares of the outputs and the sum of squares of the inputs times N , an FFT algorithm can be checked to see if an error has occurred. Once faults are detected and located, the next step is to remove their effects by reconfiguration and recovery. If there is redundancy available at the board level, for example a spare ALU, the spare module can be used to replace the faulty module. If there is no redundancy on the board or it has been exhausted, system level redundancy needs to be exploited to tolerate faults. An overview of system level techniques for providing fault tolerance on hyper cube computers is shown in Fig. 2.3. The techniques can be broadly classified into two categories: 1) those with extra nodes and/or links; and 2) those without extra nodes or links. In the first category, the spares allocated in hypercube are based on subcube, Hamming code, or circulant graph concept. The second category exploits inherent redundancy by reconfiguration using unused nodes, subcube fault toler ance, or simulation of algorithms on faulty hypercubes. Two related issues are routing/broadcasting and basic structural properties of faulty hypercubes. The topics shown in Fig. 2.3 in enclosed rectangles are studied in this dissertation. 2.4 H y p ercu b e w ith E xtra N o d e s an d L inks i Many researchers have studied fault-tolerance aspects of some regular multipro cessor networks which include binary trees, systolic arrays, and mesh connected computers [1, 25, 23, 44, 61, 56]. The general approach usually employed is the introduction of extra nodes and switches to achieve fault-tolerance [22, 23, 56] This 1 technique, when applied to hypercube, changes its topology in order to include ex- i tra nodes and/or switches, and often require large number of extra nodes and/or links. Some major efforts in this direction are discussed below. 11 with extra nodes/links Subcube Hamming code Circulant graph Hypercube Fault-Tolerance I specific task graphs I - linear chains, rings, Reconfiguration ( meshes general task graphs Subcube Simulation 1 — without extra nodes/links Basic properties of faulty hypercubes Routing/Broadcasting Figure 2.3: An overview of the techniques for providing fault tolerance on hyper cube computers. 12 Figure 2.4: One simple way to attach a spare node is to connect it to every node in the hypercube. One simple way to attach a spare node is to connect the spare to every node of the hypercube as shown in Fig 2.4 for a 3-dimensional hypercube. This spare node can replace any failed node in the hypercube by activating the links connecting to the neighbors of the faulty node. However, the number of links incident on the spare is very high. To reduce the number of links on the spare nodes, Rennels [66] i proposed a scheme to attach spare nodes to each subcube. The attached spares in this scheme are connected to the nodes in the subcube and the neighbors of all nodes in the subcube. Thus each spare is connected to less number of nodes and has lower degree. The smaller the size of the subcube is, the lower the degree of each spare node. However, reducing the size of the subcube requires more number of spares. This configuration can tolerate fault scenarios in which there are less faulty nodes in a subcube than the number of spares attached to th at subcube. 13__ 6 7 Figure 2.5: Spare processors are placed at Hamming codeword positions. In [7], Banerjee proposed a strategy to attach spare nodes to certain processor nodes. The method of selection of nodes to attach spare nodes is based on Ham ming codes. In a Hamming code, every node is either at a codeword location or at a location neighboring to only one codeword. A spare processor is placed at every codeword location, i.e., at every codeword position of the hypercube there are 2 processors. In this fashion, the task of any faulty processor can be reassigned to the nearest spare processor. For example, in a 3-dimensional cube, the spare processors are placed at opposite locations 0 and 7 as shown in Fig. 2.5. Suppose node 2 is faulty, its job can be assigned to the spare processor at location 0. How ever, it now needs to travel 2 links to communicate with nodes 3 and 6. If two nodes 1 and 2 are faulty, then one of them needs to be assigned to 7 and even longer communication delays are incurred. The number of spares needed is ^ j- for n = 2k — 1, where k is an integer. When n ^ 2k — 1, more spares(> 2n/(n + 1)) are needed. __________________________ I4_ In [8], Banerjee et al. proposed another way to add spares to hypercube. The scheme partitions the entire cube into many 3-subcubes. In each 3-subcube, 2 spare nodes are added in 2 opposite faces of the 3-cube. The job of a faulty node is again assigned to the nearest spare node and communication degradation will be incurred in the presence of faults. In total, 2n_2 spares are needed. In the techniques discussed above, spares are distributed evenly in the system so that a faulty node can always be replaced by a nearby spare. Therefore, the reconfiguration is fast since only local movements are needed. However, when there are 2 or more adjacent faults, these techniques have difficulty in reconfiguring the hypercube without incurring long communication paths. D utt and Hayes [22] employed graph theoretic techniques to overcome the dif ficulties of previous methods in reconfiguring adjacent faults. Their technique can be used to design ^-fault-tolerant systems (k-FT), i.e., the system can tolerate any k faults. The underlying concept of their techniques is the use of a class of graphs, called circulant graphs. A graph is a circulant graph if its nodes can be arranged on a circle in a symmetric form so that clockwise or counter clockwise rotations and reflections with respect to a line through the center of the circle bring the graph back to itself. For example, a graph with 12 nodes is shown in Fig 2.6(a). It is easy to see that this graph is invariant under rotations and reflections. D utt and Hayes present a technique to convert noncirculant graphs into circu lant graphs by adding redundant links. For example, a 3-dimensional cube with nodes arranged on a circle is shown in Fig. 2.6(b). In this embedding of the 3-cube on a circle, a node is connected to either its clockwise or counter clockwise neigh bor on the circle and to the second nodes in both clockwise and counter clockwise direction on the circle. Since a circulant graph is invariant under rotations in both directions, links need to be added to connect all clockwise and counter clockwise neighbors, as shown in Fig 2.6(c) by dotted lines. It is easy to see that the graph is also invariant under reflections. To include redundant spare nodes, similarly a circulant supergraph containing a 3-cube can be constructed. An embedding of the 8 nodes of a 3-cube and a spare node on a circle is shown in Fig. 2.6(d) where the 3-cube is connected by solid lines. Since in this embedding the solid lines connect two nodes which have 0, 1, and 2 (d) (e) Figure 2.6: (a) A circulant graph with 12 nodes on a circle, (b) A 3-cube with nodes arranged on a circle, (c) Redundant links shown by dotted lines are introduced to convert a 3-cube into a circulant graph, (d) A 3-cube and a spare node are placed on a circle and redundant links shown in dotted lines are introduced to make it circulant. (e) A different embedding of the 3-cube in the circulant graph which bypasses the faulty node 2 . 16 nodes in between, dotted lines are added so that any two nodes with 0 , 1 , and 2 nodes in between are also connected. The graph is then invariant under rotations and reflections. By rotating the 3-cube around the circle, 9 different 3-cubes can be obtained. Each of these 9 different 3-cubes leaves a distinct node as spare. By choosing an appropriate 3-cube, any faulty node can be left as spare. Thus the graph in Fig. 2.6(d) is 1-FT. For example, suppose node 2 is faulty in Fig 2.6(d), using the 3-cube shown in Fig. 2.6(e) the faulty node is excluded from the 3-cube. Denote the graph in Fig. 2.6(c) by G and that in (d) by . The same procedure of converting G to G^ can be applied to add one more spare node to G^l\ Denote the graph so obtained by G&K Then G ^ can tolerate any two faults as follows. When the first fault occurs, rotate G^ in G ^ to bypass that fault. When the second fault arises, rotate the 3-cube in G^ to bypass the second fault. By iteratively constructing G ^ from G^k~l\ a &-FT circulant supergraph can be obtained. This technique elegantly used the concept of circulant graph and can be applied to any arbitrary multiprocessor graph. However, the minimum number of links need to be added in converting graphs to circulant graphs is not given in [ 2 2 ] and yet to be investigated. The number of links introduced by this procedure can be quite high. For example, just to convert an n-cube into a circulant graph, (n — 2)2n ~ 2 links are introduced. To include spare nodes, even more number of links are required. The authors proposed using switches to alleviate this overhead. However, a large number of switches are used, and they can fail and become the critical components. Lee and Hayes [46] proposed a different approach by mapping tasks into a smaller dimensional hypercube, since tasks running on a hypercube usually have the number of dimensions as an input variable, so that a task can easily be adapted to run on a smaller hypercube. Since only 2 faulty nodes can destroy all (n — 1)- dimensional subcubes Qn- i , they proposed to add spare links, but no spare nodes, to the hypercube so that more faults are needed to destroy all Qn-i ■ The number of extra links needed is 0 ( 2 n_1). The reconfiguration tim e is fast. However, once it degrades into a Qn-i subcube, almost half of the Qn processors are left unsed, resulting in low processor utilization. _ _ _ _ _ _ _ _ _ _ 17 no. spares no. links faults reconf. time degrad. Rennels [ 6 6 ] 2 n —m 2 n(n —m + 1 ) isol. const. No Banerjeel [7] 2 n/(n + 1 ) 0 isol. const. comm. Banerjee2 [ 8 ] 2 n —2 2 7 1 + 2 " ~ 3 isol. const. comm. D utt [22] k 1 e 1 A k-FT not avail. No Lee [46] 0 3 • 2n - 1 4-FT const. 1 / 2 Table 2 .1 : Summary of major works in hypercube fault tolerance using extra nodes and/or links. All aforementioned techniques are summarized in Table 2.1. The common feature of these schemes is tolerating a small number of arbitrary faults at a high cost of adding large number of spare links/nodes. As mentioned previously, this makes multiprocessor system even more unaffordable. An alternative approach is to exploit the inherent redundancy to provide fault tolerance without extra nodes and links. 2.5 H y p ercu b es w ith o u t E x tra N o d e s and L inks The general strategy to provide fault tolerance for hypercube computers when there are no extra nodes is to either use unutilized nodes to replace faulty nodes or to operate with degraded performance. 2.5.1 R eco n fig u ra tio n U sin g U n u sed N o d e s as S pares Many tasks may not require the entire hypercube for their execution. Fault tol erance for such tasks can be achieved by employing the unused nodes to replace faults. For other tasks, we can map them to hypercubes in such a way th at some nodes are deliberately left idle. Chen et al. [19, 20] are among the first to propose this strategy. They proposed an algorithm for reconfiguring loops in faulty hypercubes. In a 3-cube, for example, the initial embedding of a 6 -node loop is shown in Fig. 2.7(a). Suppose node 7 18 becomes faulty, its task is reassigned to node 2 — the opposite node of the 2 -cube containing its predecessor 6 and its successor 3 — as shown in Fig. 2.7(b). However, in their initial embedding, not every node has a nearby spare. W hen a fault occurs, the reconfiguration process may involve a sequence of reassignments. For example, suppose that node 1 is faulty in Fig. 2.7(a). Its task will be assigned to node 7 which forces the task of node 7 to be assigned to node 2. Thus it involves two reconfiguration steps. The embedding after reconfiguration is shown in Fig. 2.7(c). In general, this sequence of reassignments can be long if the spare locations are not close-by. To improve the reconfiguration time, in [59] an incomplete gray code sequence is used to embed a loop and assign 1/4 of the hypercube nodes as unused nodes. In this scheme, the opposite node of every used node in the 2-subcube containing its predecessor and successor is unused. Therefore, the reconfiguration tim e is constant for any single fault. However, two adjacent faults can not be tolerated by this scheme. By allowing even more number of unused nodes, Lee presents a scheme which can tolerate any arbitrary 2 faults with constant reconfiguration tim e [45]. Using only local movements to perform reconfiguration in the presence of mul tiple faults becomes difficult as seen from the discussion above. Yang et al. [76] propose a technique which involves global movements to overcome the difficulties. Automorphisms of hypercubes are used to transform an embedded task graph. (Definition of automorphisms can be found in Chapter 3.) The selection of appro priate automorphisms is based on a key concept called free dimensions proposed in this dissertation. This technique can reconfigure around any (n — 2) faults for chains and loops and can also be applied to meshes and tori. Most of these techniques are only applicable to specific task graphs. Since hy percube computers are desired to operate as general purpose multiprocessor sys tems, there is a need to provide fault tolerance techniques for general task graphs. In Chapter 3 (also in [70]), the reconfiguration problem of general task graphs is investigated. The tasks of nodes and spares are transformed by automorphisms to new locations so that faulty nodes become unused. The number of faults th at can be tolerated in this scheme is closely related to the number of unused nodes and 7 \ A A A A Figure 2.7: Unused nodes as spares to perform reconfiguration. their locations. Conditions on spare locations are derived. W ith careful arrange ment of spares, small number of faults can be tolerated for arbitrary task graphs. The reconfiguration process requires global movement which can be accomplished efficiently by the techniques described in Chapter 5. 2 .5 .2 S u b cu b e Fault T oleran ce As mentioned before, many algorithms or tasks that run on an n-dimensional hy percube can be easily adapted to run on a lower dimensional hypercube. Hence, one way to run tasks on faulty hypercubes is to find the maximum fault-free sub cubes. Lower bounds for the minimum number of faults for rendering all (n — k)- dimension subcubes to be faulty were derived by Becker and Simon in [9]. Graham et al. did a survey on this topic and found that many researchers had worked on this subject in different contexts [14, 47, 42]. They used computational approach to get exact least number of faults for the worst case scenarios for small dimensions up to 10. Sridhar and Raghavendra [49] gave an algorithm to find the maximum fault-free subcube with complexity 0 (n m 2) where m is the number of fault-free nodes in the hypercube. A parallel version of the algorithm with 0(n) complexity is also presented. In general, they show that when the fault-free nodes are expressed in terms of subcubes, instead of a list of nodes, this problem is Co-NP-hard. The disadvantage of subcube-fault-tolerance approach is that the slowdown factor can be quite large even with a small number of faults. For example, in the worst case, even with 2 faults which are at distance n in an n-cube, the maximum fault-free subcube is only an (n—2)-subcube. This implies a slowdown factor of 4 in computation time. Besides, almost 3/4 of the fault-free processors are not utilized. In general, less than k log k log n faults are needed to render the dimension of the maximum fault-free subcube (n — k), which can cause a slowdown factor of 2k [9]. Instead of finding maximum subcubes, Hastad et al. [33] showed that an (n — 1)- dimensional hypercube can be embedded in an n-dimensional hypercube in the presence of large number of faults (a constant fraction of the hypercube nodes) with high probability. Executing jobs on this embedded (n — l)-cube only incurs a constant slowdown factor. This result shows that hypercube itself is very robust. 2 1 However, the approach is probabilistic and the constant may be too large for practical considerations. 2 .5 .3 S im u la tio n o f A lg o rith m s on F au lty H y p e r c u b e s In [11], a deterministic scheme for simulation of a class of algorithms, called weak algorithms, on faulty hypercubes is proposed. Weak algorithms are those where in each tim e unit every processor receives and sends only one message. The authors showed that for any constant c, with nc faults, n-cube can be partitioned into m- subcubes, where m is a constant, such that every m-subcube has more than half of the nodes fault-free and connected. A good node in the connected component is assigned to simulate the tasks of all nodes in that subcube. Communication within the subcube is now in the same node. Communication between adjacent m-subcubes can be accomplished in constant hops. Since each m-subcube has more than half of the nodes fault-free and connected, there must be a path between the two nodes simulating the tasks of two adjacent m-subcubes and the path is of constant length. This result strengthens the robustness of hypercube previously dem onstrated by the probabilistic approach of Hastad et al., with a deterministic approach. However, the constant is still not practical for real machines. The authors also considered the case of a small number of faults, less than n, for normal algorithms, where in each step every processor communicates along the same dimension. The n-cube is partitioned into 2-subcubes each containing at most one faulty node. The job of the faulty node is assigned to its opposite node across the diagonal and the fault-free nodes perform their own jobs. Therefore computation time is doubled. The communication time now is 1 hop if it is within the 2-subcube and 3 hops if it is with the neighbors of the faulty nodes in the adjacent 2-subcubes. The entire communication tim e is increased by a factor of 4. In Chapter 5, a scheme is presented for simulation of normal algorithms on faulty hypercube. Instead of a distance-2 node, the job of a faulty node is assigned to its neighbor across a dimension, called free dimension, which always exists if the number of faults is less than n. This scheme also doubles the computation time. W hen the fault pattern is not the worst case scenario, it only takes two hops to communicate with the neighbors of a faulty node. It is also shown that when i __________________________ 2 2 I the number of faults is less than \n j 2 ] -f 1 the worst case scenario can not arise in which case the communication time is only 3 times that of the fault-free case. In the worst case, to communicate with a neighbor of a faulty node, it can take 4 hops and hence entire communication time is 5 times that of the fault-free case. Since worst-case fault scenarios are not likely to arise in practice, the approach proposed in this dissertation tends to give better average performance. 2.6 R ela te d Issu es in F au lty H y p ercu b es W ith the previously described techniques, a faulty hypercube can continue to func tion using the residual fault-free network. However, basic properties of the residual network are no longer the same as those of fault-free hypercubes. Since these prop erties characterize the fundamental limits of the faulty hypercubes and thus set lower bound for performance of faulty hypercubes, it is im portant to study these basic properties. 2.6.1 B a sic P r o p e r tie s o f F au lty H y p ercu b es Two im portant properties of faulty hypercubes that are of interest are diameter and shortest paths. When the number of faults is less than n, the faulty hypercube will still be a connected component. It is known that between any two nodes there are n node-disjoint paths of length at most their Hamming distance plus 2. Hence, less than n faults can not block all these paths. Further, it turns out that between two diametrically opposite nodes all these n paths are of length n. Thus the diam eter can be shown to be n + 1 under this fault scenario. W hen there are n or more faults, the hypercube may be disconnected. However, ! connectivity is a worst case parameter. Esfahanian [24] proposed the concept of “forbidden faulty sets.” That is, certain sets of faults are assumed not to arise. For example, a hypercube is disconnected by n faulty nodes only when they are all neighbors of a particular node. In practice, this is unlikely to happen and this kind of sets of faults are considered to be forbidden faulty sets. If all the neighbors of a node do not become faulty at the same time, the generalized connectivity of hypercube is (2n — 2) as shown in [24] and the diameter is bounded by n + 6 . 23 In Chapter 4 (also in [69]), it is shown that when the number of faults is less than 2 n — 2 , the length of shortest paths between any two non-isolated nodes is at most equal to their Hamming distance plus 4 and, if the same forbidden faulty sets are assumed, the diameter is at most (n -f 2). These bounds are also shown to be tight. These bounds provide the lower bounds for tim e complexity of various communication patterns. For example, diameter sets the lower bound for broadcasting. Another result on faulty hypercubes is in [50], where the authors showed that the length of the shortest path between two nodes is at most equal to their Hamming distance plus a constant when the number of faults is less than 4n - 11. 2 .6 .2 Im p o rta n t C o m m u n ica tio n F u n ction s Communication plays an important role in multiprocessing, since to cooperatively execute a task processors need to exchange information. Im portant communica tion functions include routing which sends a message from one node to another, broadcasting where one node sends message to all the rest of nodes and other global operations such as global minimum or maximum. W hen each node only knows the faults in its immediate neighbors, a randomized ; algorithm can be used for routing [28]. A message is routed forward to a randomly selected neighbor to move closer to the destination. When all such nodes are faulty, an arbitrary fault-free neighbor is selected at random. Simulation results show that this scheme can successfully route messages with high probability. Another approach based on depth first search is proposed in [17] which also requires only local knowledge of faults. In this scheme, the message is sent along a shortest path as if there were no faults. If a faulty node is hit, the message takes a detour or even backtrack. A probabilistic analysis shows that with reasonably high probability messages are routed on optimal paths. Broadcasting is a very useful communication function in updating network status. For example, updating faults information requires broadcasting. In [62], when number of faults is less than n, a broadcast algorithm running in 2 n steps is given. Since there are n node-disjoint paths between the broadcasting source 24 and any other node, the algorithm sends messages along all n node-disjoint paths. Therefore, the message will successfully go through at least one path. Global operations such as finding maximum or minimum value among all the nodes are also frequently used and provided in the kernel of real hypercube ma chines [38]. In Chapter 5, an algorithm running in optimal number of steps, 2n, is given for such global operations. Again, it is assumed that the number of faults is less than n. The same algorithm is extended to the case when the number of faults is between n and 2n — 2 and runs in 4n steps. 2 .7 B e n e fits o f Fault T olerance Since faults are detected and errors are confined, a hypercube computer with fault- tolerance capabilities can exclude the faults from the active part of the system and continue to provide service to the users (possibly with lower performance). Thus the system has lesser chance for total failure and hence the reliability is improved. While the system operates in degraded mode, repair of the faulty parts can be undertaken. Once the faults are fixed, the system can go back to full performance. In this way, the complete shutdown time of the system is reduced. This implies higher availability of hypercube computer and lesser wastage of valuable compu tation time. A system with fault-tolerance capabilities will also have less chance in producing erroneous data to users. Thus it is safer to use these data and higher safety in service is provided. C h a p ter 3 R eco n fig u ra tio n o f F aulty H y p er cu b es b y A u to m o rp h ism s 3.1 In tro d u ctio n To execute a task on a hypercube machine, the task is represented by a graph and then mapped to the hypercube such that vertices of the task graph are m apped to processors and edges to links of the hypercube. This is essentially finding a copy of a given graph in a larger base graph, and in general is a difficult problem, termed graph embedding. If an exact embedding is not possible then dilation, i.e., edges map to paths, can be used, or expansion can be applied, where the nodes to task graph vertices ratio is greater than 1 . Embedding specific kinds of graphs in a hypercube has been investigated by many researchers, for example see [31, 36, 37, 40, 74]. Techniques for embedding general graphs in hypercube structures can be found in [18, 30]. Embedding task graphs in hypercubes is not our focus. We assume that task graphs have already been embedded in hypercubes and propose reconfiguration schemes to achieve fault tolerance. In general, a given task graph can be embedded in a hypercube in several ways such that all the resulting embeddings are isomorphic to one another. One such embedding is selected for execution. Even though a task graph may be m apped to a hypercube such that all nodes are used, many task graphs do not require all nodes of the hypercube. These unused nodes (links) can be treated as spares. Upon failure, the reconfiguration involves remapping of the task graph to another instance of the embedding consisting only healthy nodes. We develop graph-theoretic techniques using automorphisms [13, 77] to solve this problem. For any general task graph embedded in an n-dimensional hypercube which does not use all nodes, any single node or link failure can be tolerated with worst case reconfiguration time of 0(n) by flipping at most n bits. Interestingly, these bit-flipping operations correspond to a bit-complement perm utation of node labels to achieve this remapping. It can be shown that Bit-Permute-Complement (BPC [53]) perm utations is the automorphism group of hypercubes. We use this class of perm utations to reconfigure embedded task graphs under m ultiple failures. We consider the following four fault scenarios for reconfiguration: 1 . single node or link failure; 2. small number (< 5) of random node failures; 3. multiple node failures which are adjacent (form a connected component); 4. large number (> 5) of random node failures. We show that the general problem of reconfiguring multiple failures is equivalent to I the graph isomorphism problem. However, we show that necessary and sufficient conditions of successful reconfigurations by automorphisms can be derived for a) less than five faults; b) multiple adjacent faults which form a connected component; and c) general fault patterns. An efficient algorithm is developed to determine if reconfiguration is possible, when a certain condition is satisfied. Simulation results for large number of multiple random failures indicate that heuristic algorithms which perform reconfiguration in a relatively short time with high probability, can i be developed. J Our graph-theoretic technique for reconfiguration is quite general and can be applied to other regular processor structures as well, for example loop networks, and fc-regular multiprocessor networks. I Figure 3.1: A 3-dim cube where each node swaps with the other node on a dimen sion link. 3.2 R eco n fig u ra tio n o f H y p ercu b es u n d er S in gle Failure In this section, we explain our fault tolerance technique with only a single node or link failure. In the next section our technique is generalized to handle multiple failures. I 3.2.1 R eco n fig u ra tio n o f a S in gle N o d e F ailure In 3-dimensional space, it is clear that a cube is identical to itself if every node swaps with the other node on some fixed dimensional link, as shown in Fig. 3.1. ! In graph-theoretic terms it means that these operations generate graphs that are ' isomorphic to the cube. Such operations are called automorphisms of a cube. Since | the cube remains unchanged, anything embedded in it will also remain unchanged, j i To tolerate a faulty node, the idea is to find a series of operations which transform j the task graphs to a different embedding such that the faulty nodes can be avoided. ; This concept can be generalized to a hypercube of any size. In an ra-dimensional hypercube, let nodes be labeled with distinct n-bit binary numbers in such a way that adjacent nodes differ only at one bit position. Let this set of labels be denoted by L. 28 Define a set of transformations from L to L, Ti, i — 1 ,..., n, Tj(ax • • • 8 • 8 ®n) ™ ( ^ 1 8 8 8 • 8 8 &n) where ax... a,-... a n is a label and a 8 is complement of a,. Transformation Ti maps each node to another node which is obtained by complementing the ith bit of the original node label. T; is one-to-one and onto and Ti o T; = T,-2 = I, identity transformation. A link of a Ghyp can be represented as a pair of node labels which differ by one bit, say the j th bit position, as (bi ... bj ... bn, b i... bj ... bn) or in short (&i... *j ... bn). A link under transformation Ti is defined as Ti(b\ ... bi... ... bn) — (b\ ... bi... ♦j ... bn) when i < j and similarly defined for i > j. Ti transforms a link by transforming its two end nodes. W hen i = j , Ti simply exchanges the two labels and thus the links map to themselves. A graph Ghyp under transformation T, will have every node and every link transformed by T8 . D e fin itio n 3.1 Two graphs G = (V ,E ) and G1 = (V ',E ') are isomorphic if there exists a one-to-one mapping 0 from V onto V 1 such that (w,u) € E if and only if (6 (u ),e (v ))e E '. L e m m a 3.1 Any embedded graph Ghyp in a hypercube is isomorphic to its image after transformation T{, i.e., GfiyP ~ Ti ( Ghyp ) where “ ps” denotes isomorphism. P ro o f: (sketch) It is obvious that the Hamming distance (or just distance) between two nodes is preserved by Ti. Therefore, adjacent nodes Vi and in Ghyp will be adjacent in Ti{Ghyp) and the link between Ti{v\) and T8(u2) is exactly the image of the link 29__ between iq and v2, i.e., Ti{v\,v2) = (Ti(vi),Ti(v2)), by definition. If Vi and v2 are not adjacent, then Ti(vi) and Ti(v2) cannot be adjacent since Te - preserves distance. □ L em m a 3.2 Tix o Ti2 o • • • o Tik(Ghyp) « Ghyp. T heorem 3.1 With an arbitrary graph Ghyp embedded, any single node failure of an n-dimensional hypercube can always be tolerated, as long as there is an unutilized node. P ro o f: Let node x with label (aiai ... an) of the hypercube be faulty and let an unoccupied spare node s exist with label ( 6 1 6 1 ... bn). Suppose ( 6 1 6 1 . . . bn) is different from (a\a\ . . . an) at bit positions i\i2 .. .im,m > 1. By complementing these bits, s can be transformed to x and s to x. That is, the roles of x and s are exchanged, x becomes a faulty spare node and s assumes x ’s job. The series of transformations is o Tt -2 o • • • o T;m. By lemma 3.2, o Tj2 o ... o Tim(Ghyp) ~ Ghyp. Since a faulty node cannot disconnect the hypercube, transferring of tasks to new nodes can always be routed through. Hence, the theorem is proved. □ (Examples will be given later.) During reconfiguration, the whole system is not processing user’s jobs; there fore it is desirable to minimize the reconfiguration time. In a SIMD hypercube with a central controller, we have the following result for the reconfiguration time, assuming the central controller knows locations of all spares, and the location of the faulty node. The central controller is assumed to be highly reliable. T h eorem 3.2 Reconfiguration of an embedded graph Ghyp with a single node fail- I ure for an n-dimensional SIMD hypercube can be completed in 2d or 2d -j- 2 steps, where d is the distance between the faulty node and the nearest unoccupied spare node. Further, this time is bounded by 2n. Proof: Upon detection of a faulty processor, the central controller will find the nearest spare and determine a sequence of transformations based on the labels of the nearest spare node and the faulty node. Let the label of the faulty node and 30 the label of the nearest spare node differ by d bits, at bit positions i\, * 2,..., id, (since they have distance d) then the sequence of transformations on Ghyp for reconfiguration is Tii oT i2o . ..o Tid They complement the bits where the labels of the nearest spare and the faulty node differ. The effect of this sequence of transformations on a job allocated to a node with label (ai... a8l ... al2... a,-d ... an) is to move a message along path (a i ... Oil ... ai2 ...a id... a n) r-2 (cq . . . G & ii . . . d{2 . . . &id . • . 0>n) ^ T { d \ . . . Q >i1 . . . d{2 . . . &id . . . Clri) * ( u j . . . d {j . . . d l2 . . . d{d . . . d fi) Ti< such that the job can be executed at the destination node after reconfigura tion. Transferring of tasks of all allocated nodes can be performed simultaneously. Should a node on the path of transferring a task be the faulty node, then the transform ation T^, which leads the path to the faulty node, is replaced with an identity transformation and 2 ). is appended at the end of the path, making the path length d + 1. So, suppose > C t% 2 • • • is the faulty node, then the path will become («1 • . . C S jj . . • di2 . . . did . . . dn) (di . • * • • . di2 .. . a id ... ®ra) ( a i . .. a8l .. • di2 . did . . . ®n) ^ ( « i - • • d(1 . . d f d ( a j .. • • • • . a ^ . . • • • • On) Ti3 This will not make 2 or more messages queued in any node. At each step, every node simply exchanges a message with the other node whose label differs at some fixed bit position. When a node tries to exchange a message with the faulty node, 31 2 0 2 0 (a) o spare node (b) • faulty node Figure 3.2: A loop with six nodes is reconfigured to tolerate a node failure. it ignores the response of the faulty node and retains its own message, i.e., identity transformation. At each step, every node only exchanges messages with one node, so no two messages can be queued at one node. At the (d + l)s t step, every message th at has not reached its destination is sent along the dimension induced by the transform ation appended at the end. Since every node has a distinct destination, again no two messages can be queued at any node. Hence, after d + 1 steps, every node’s message can get to its destination except the faulty node. Suppose the status of the faulty node is periodically stored into one of its adjacent nodes, to move the status to the destination of the faulty node takes either d — 1 or d + 1 steps when d < n. When d = n, it takes n — 1 steps. Hence, in d + l + d ± l = 2 d or 2d + 2 steps reconfiguration can be completed for d < n. If d = n, it takes n-fl+n —1 = 2n steps. Thus in the worst case, reconfiguration can be completed in 2 n steps. □ In some situations, the reconfiguration is obvious such as the loop embedded in Fig. 3.2(a) which can be easily reconfigured to the loop shown in Fig. 3.2(b). However, for more complex situations it is not as obvious, for example the loop 32 (d) X ^ (c) O spare node # faulty node Figure 3.3: A more complex embedding in a 4-dim cube. Bit-complementing operations are used to perform reconfiguration. shown in Fig. 3.3(a). W ith our technique, using T4 o T2 o T\ (x and s differ at bit positions 4, 2, 1) the loop can be reconfigured as shown in Fig. 3.3(d). This technique also works for any arbitrary task graphs. For example, an arbitrary tree is embedded in a 4-dimensional hypercube and there is only one spare node left, as shown in Fig. 3.4(a). Using T4 o T3 o T2, we can obtain a new embedding which does not include the faulty node as shown in Fig. 3.4(b). Note that if the task graph is fixed, it is sometimes possible to reconfigure using perm utations that are not necessarily automorphisms of the hypercube. An example of this is shown in Fig. 3.2, where the loop of Fig. 3.2(a) cannot be mapped to th at of Fig 3.2(b) using a BPC permutation. Throughout this chapter, however, we will restrict ourselves to using BPC mappings only. This implies, of course, that our algorithms will fail to reconfigure some task graphs that might actually 33 root rooi (a) root O spare node # faulty node (b) Figure 3.4: (a)A tree is embedded in a 4-dim cube.(b)After using complementing operations to perform reconfiguration. L be reconfigurable. More general edge preserving remappings will be required to reconfigure such task graphs and will be devoted to future works. For hypercube computer operating in MIMD mode, i.e., with no central con troller, reconfiguration needs to be accomplished in a distributed manner. The algorithm in the previous method can be modified by assigning one of the adjacent nodes of a node to be its supervisor. When a supervisor detects a failed node, it will find out the sequence of transformations based on the labels of the nearest spare and the faulty node. (Assume every node knows the location of its nearest spare. This can be done at the compile time.) It then broadcasts this sequence to all nodes in the hypercube. Broadcasting can be done in O(n) steps, where n is the number of dimensions of the hypercube. When broadcasting is completed, every node starts exchanging messages with nodes specified by the transform ation sequence. Upon receiving an exchanged message it can start the next exchange (transformation). When the supervisor of the faulty node finishes transform ation, it starts sending the status of the faulty node to its destination. This can also be done in 0 ( n ) steps. Therefore the whole operation can be completed in 0(n) steps. 3 .2 .2 R eco n fig u ra tio n o f a S in gle Link F ailure Link failures can always be treated as failure of a node at one of the ends of the link. Thus, node reconfiguration presented in the previous section can be applied I provided th at an unoccupied spare node exists. However, we would like to achieve fault tolerance when there are no spare nodes but only spare links are available. Define transformation, Tij, from L to L, as Tij(ai... a,i.. .aj ... an) = ( a i... dj ... cti... an) where 1 < i ,j < n. Tij simply swaps bit i and bit j of a label. is one-to-one and onto, Tij2 = I. Tij also preserves distance between any two nodes iq and i.e., d{v i,u2) = d(Tij(vi),Tij(v2)) 35 where d(v 1 , 1* 2 ) denotes the Hamming distance between vx and v2. A link under transform ation Tij is defined as Tij(ax ... a,i... a,j ... *k ■ ■ ■ an) = (ai... aj ... di... *k ■ • ■ an) when k ^ i, j Tij(ax... * ctj . • • dji) ----- ( d J a . ■ d^ . . . dg as. dflj B B B flj S B B flj B B B ^ = (<Zj . . . O j . . . d{ . . . dm d\ . . . CSj . . . Oj . . . <2n) — ( ^ 1 • • • Oj — \Ojdi^-X • • * when A ; = i, and it is defined similarly when k = j. transforms a link by trans forming its two end nodes. Notice that the index of * indicates its bit position. L e m m a 3.3 An embedded graph Ghyp in a hypercube is isomorphic to its image of transformation, Tij, i.e., T{j (G hyp) ~ Ghyp for all 1 < i ,j < n. P ro o f: Since Tij preserves distance, the proof is similar to that of Lemma 3.1. W ith bit-swapping transformations and the previous bit-complementing transfor m ations, any single link failure can be tolerated as long as there is an unused link. □ T h e o re m 3.3 The embedded graph Ghyp of a task graph can tolerate any single link failure if there is an unoccupied spare link. P ro o f: Let the faulty link be (axa2 an) and the spare link be ( 6 1 6 2 • • • *j ... bn). Firstly, Tij is applied. Tij(bx ... b{... ... 6n) = (& i bj— \bibj+x ... So the spare link and the faulty link lie on the same dimension. Compare (axa2 .. .*» . . . an) and (bx . bj-xbibj+x... bn). Except the ith bit, let the bit positions where they differ be ix, ..., id- Now apply T;, o • • • o T,d to complement these bits. Then the faulty link will become a spare link and the spare link is used in the task 36 graph. From Lemma 3.1 and Lemma 3.3, T i j O • • • O T i d O T i j ( G h y p ) ~ G h y p Therefore the faulty link is tolerated. □ In an SIMD hypercube machine, the central controller will send out instructions for reconfiguration and also determine the sequence of transformations. For an MIMD hypercube computer, reconfiguration is performed in a distributed manner. Before showing the performance of our reconfiguration scheme for tolerating a faulty link, we define the distance between two links. D efin ition 3.2 Distance between two links is defined as the minimum of all 4 possible distances between the end nodes of one link and those of the other link. For example, let link 1 be between two adjacent nodes u and u1 and link 2 be between adjacent nodes v and v1 . Let d(u, v) denote the distance between node u and v. The distance between link 1 and 2 is min{d(u,w), d(u,v'), d(u',v), d{u\v'Y\. T heorem 3.4 Reconfiguration to tolerate a single faulty link for an embedded graph Ghyv in an n-dimensional SIMD hypercube can be accomplished in d + 4 steps where d < n — 1, is the distance between the faulty link and the nearest spare link. Furthermore reconfiguration time is bounded by n + 3 steps. Proof: The worst case occurs when the nearest spare link does not lie on the same dimension as the faulty link. Transformation Tij, with i and j being the dimen sional orientations where the nearest spare link and the faulty link lie respectively, is needed to make the spare link in Tjj{Ghyp) and the faulty link lie on the same dimension. All 2-dimensional subcubes, specified by (oq ... c£j_i ^ a{±i ... aj— \ ^ ... anj where a/ = 1 or 1, 1 < / < n, can perform Tij in two step, except the 2-dimensional subcube containing the faulty link. To perform Tij in the faulty 2-dimensional Figure 3.5: (a)Routing in a good 2-dim subcube.(b)Routing in a 2-dim subcube containing the faulty link. subcube takes 2 more steps. As shown in Fig. 3.5(a), in a good 2-dimensional subcube spanned along dimensions i and j, to exchange messages between nodes 10 and 01 takes 2 steps. While in the 2-dimensional subcube containing the faulty link as shown in Fig. 3.5(b), in the first two steps only the message from node 10 can be sent to node 01. Two more steps are needed to route the message from node 01 to node 10. Therefore to perform Tij, 4 steps are needed. After Tij is performed, a sequence of transformations Ti1 o ... o Tid is applied, where ... id are the bit positions where the end nodes of the spare link in Tij(Ghyp) differ from those of the faulty link. Notice that q ^ j (dimension where the faulty link lies), 1 < I < d. Thus the faulty link is not used after Tij in the reconfiguration process, and this sequence of transformations can be completed in d steps. The whole reconfiguration process can therefore be completed in d+4 steps. W hen d = n — l(d cannot be larger than n — 1 ), it is n + 3 steps, and is the upper bound. □ E x a m p le : A 2 x 4 mesh is embedded in a 3-dimensional cube as shown in Fig. 3.6(a). Suppose the link (101, 100) between vertex 2 and vertex 6 becomes faulty and the link (001,011) between vertex 1 and vertex 4 is unused and can replace the faulty link. Z2 3 is first applied. (The bit positions are numbered from 38 Oil 00. 00, 101 7 010 000 000 100 (C ) Figure 3.6: A 2 by 4 mesh embedded in a 3-dim cube is reconfigured to tolerate single link failure. left to right starting from 1.) Both planes (1,5,8,4) and (2,6,7,3) need to be trans formed; result is shown is Fig. 3.6(b) where link (000,001) becomes a spare link. After T2 3 a simple bit-complementing operation Ti finishes the reconfiguration as shown in Fig. 3.6(c). The whole reconfiguration takes d + 4 = 1 + 4 = 5 steps since the distance between the spare link and the faulty link is one(node 001 and node 101 are adjacent). 3.3 R eco n fig u ra tio n o f H y p ercu b e u n d er M u ltip le Failures In this section, we generalize our techniques to reconfigure task graphs when there are multiple failures. Our reconfiguration technique can handle m ultiple link failures or multiple node/link failures. However, to keep the explanations simple, from now on we consider only node failures. When there are m ultiple failures in a hypercube, the non-faulty part will look more like a general graph. To find an instance of a given task graph Ghyp in non-faulty portion of the hypercube is, in general, a difficult problem. One approach is to solve this as the subgraph isomorphism problem [26] which is known to be NP-complete. 39 Let an n-dimensional hypercube be viewed as a graph and denoted by Qn — (Vn,E n), where Vn are nodes and E n are edges. Let the set of perm utations of nodes which preserves the adjacency of Qn be denoted by A(Q n). A(Q n) is called the automorphism group of an n-dimensional hypercube. D efin ition 3.3 A permutation on 2n objects, which are represented by binary numbers, belongs to BPC class if its mapping is generated by permuting and/or complementing bits. That is, 0 € BPC if and only if 0(«i... an) = (&i... bn) where bi = aa(i) or a< r(i), and a is a permutation of 1,... ,n. The distance between two vertices of a graph is defined as the length of the shortest path between them. In Qn, it is the same as the Hamming distance of their node labels. L em m a 3.4 Let $ be a permutation on vertices of Qn. 0 £ A(Qn), if and only if 0 preserves distance between any pair of nodes of Qn. T h eorem 3.5 The automorphism group of hypercubes is the BPC class of permu tations. (Or A permutation 6 € A(Q n) < = = > ■ 0 € B P C , i.e., A(Q n) = B P C .) Proof: Bit complementing or permuting the labels of two adjacent nodes do not change their Hamming distance. Therefore, it is clear that all BPC operations are automorphisms of Qn. An automorphism of a hypercube is equivalent to a relabelling of Qn. In [67], it is shown that there are n!2n ways to relabel Qn. The number of different perm u tations in B P C is n\2n since we can permute n bits and complement any number of them. It is not hard to see that these relabellings are exactly those generated by BPC. □ In general, when there are multiple faults, the reconfiguration problem is to find an automorphism to transform an embedded task graph to a new embedding so that it contains no faulty nodes. Conceptually, this is depicted in Pig. 3.7. In Fig. 3.7(a), an embedded task graph G intersects with the fault set F. By an ap propriate automorphism, G is transformed to the embedding shown in Fig. 3.7(b), where G has no intersection with F. Since automorphisms preserve the adjacency 40 Qn Qn (a) (b) Figure 3.7: Conceptual illustration of the reconfiguration problem. of G , this problem is equivalent to transforming the spare nodes to cover the faulty nodes. T hat is by this transformation the faulty nodes are unassigned. In this sense a spare node is simply an unused node which need not be fault-free. More formally the problem is defined as follows. 1. Construct a m atrix Mx where each row is the label of a faulty node, 2. Choose the same number of spares as that of the faulty nodes, construct M s in the same manner. Our reconfiguration problem is then equivalent to perm uting the rows of M s i and perm uting and possibly complementing the columns of M s, such th at M a is identical to Mx. Note that these two sets need not be disjoint as some unused nodes can be faulty. We now show that this problem is as hard as graph isomorphism problem. 3.3.1 Iso m o rp h ism -C o m p leten ess o f th e G en era l P r o b le m Given a binary k x n m atrix M , define a BPC-rearrangement h to be a perm utation on M th at permutes the rows of M , and permutes and possibly complements the columns of M. We consider the following BPC-rearrangement problem. \ 41 Instance. Two k x n binary matrices M and M '. Question. Is there a BPC-rearrangement that maps M to M 'l T h e o re m 3.6 The BPC-rearrangement problem is as hard as determining iso morphism of connected bipartite graphs. P ro o f: Given two connected bipartite graphs G\ and G2, we will construct two binary matrices M x and M 2 such that M 2 is a BPC-rearrangement of Mi iff Gi and G2 are isomorphic. The idea is to add, to each of the two bipartite graphs, large enough ‘enforcing’ components so that the complementing property of the BPC rearrangement is rendered unusable. Let Gi = (Xi, Ei,Yi), where X i and Yi constitute the bipartition of the vertices of Gi and Ei is its edge set. Similarly let G2 = (X2, E2,Y2). Note th at since the Gi is connected, its bipartition is unique [15]. Similarly &2 ’s bipartition is unique. Therefore, if there is an isomorphism / that maps Gi to G2, then either f ( X x) = X 2 or f(X i) = Y2. Now for each graph G{, i E {1,2}, construct a graph G\ as follows. Augment Xi with a set X[ of \X{\ + 2 vertices, and the set Yi with a set Y' of j y, | + 2 vertices. Connect every vertex in X[ to every vertex of Yi, and every vertex of Y/ with every vertex in Xi. Also connect every vertex in X- with every vertex in Y/. Now add two more new vertices X{ and ? /,•; connect Xi with all the vertices in Y-, and yi with all the vertices in X[. j Figure 3.8 shows a symbolic representation of Gx. In this picture, the dashed ■ line represents the original edges of Gi, and the solid lines represent the additional edges to form G[. Thus a solid line between two sets of vertices indicates th at all possible edges between these two sets are present in the graph. We claim that Gi and G2 are isomorphic iff G\ and G'2 are. The only-if part of this claim is easy to see. Conversely, suppose that G'x and G'2 are isomorphic. It is straightforward to verify that: f • If vertex v E Xi then |K | + 3 < degree{v) < 2 |T^| + 2. This is because every vertex of Xi has at least one original edge of Ei incident on it, since Gt is connected. Similarly, if v E Yi then |Aj| + 3 < degree (v) < 2 |X,-| + 2. 42 Figure 3.8: The construction of G\ from G\. • If v £ X- then degree(v) = 2 |K | + 3; and if v € Y/ then degree(v) = 2 |X ;|+ 3 . • degree(xi) = |5^-| + 2, and degree(yi) = |Xj| + 2. Since the isomorphism must be degree-preserving, two possibilities arise. Assuming without loss of generality that |Ai | < |Fi|, if \Xi | < |Yi|, then it must map x x to x 2, X[ to X'2 and X x to X 2; so it must map Yx to Y2 as well, and therefore Gx and G2 are isomorphic. If |A"i| = |Fi|, then there is the possibility that the isomorphism maps X \ to either X 2 or Y2; but similar arguments yield the desired conclusion in these cases as well. Next, for each of the resulting graphs G[ and G2, construct corresponding binary matrices M\ and M2\ the rows of M, {i € {1,2}) correspond to the vertices X{ U X[ U {a;;} and the columns correspond to the vertices Yi U Y- U {yi}- An entry | in the m atrix is 1 when there is an edge between the vertices corresponding to its row and column; the entry is 0 otherwise. From the above constraints on degree, every vertex on a particular side of the bipartition of G[ must have degree strictly I i 43 exceeding half the number of vertices on the other side. Therefore, every column of Mi has strictly more I ’s than 0’s. If G\ and G2 are isomorphic, then so are G'x and G'2. The isomorphism that maps G'x to G2 can be used to permute the rows and columns of M\ so that it looks like M 2. Conversely, suppose that there is a BPC-rearrangement that maps M\ to M 2. Since every column of Mi (and M 2) has strictly more l ’s than 0’s, this rearrange- m ent could not have complemented any of the columns of M\. Consequently, this rearrangement must merely be a permutation of the rows and columns of Mi to obtain M 2\ it therefore corresponds exactly to an isomorphism from G't to G2. □ It is quite easy to show that determining isomorphism of general graphs is as hard as determining isomorphism of bipartite graphs, as follows. Given a graph G = (V, E ), we define B(G) to be the bipartite graph (V, E', E) whose bipartition contains the vertices of G on one side and the edges of G on the other, and E 1 has an edge [u, e] whenever e is an edge incident on v in G. It is straightforward to show that two graphs G and G' are isomorphic iff B{G) and B(G') are isomorphic. In the following subsections, we study the necessary and sufficient conditions for reconfigurations when the number of faults is small or the faulty nodes form a connected component. For general fault patterns, we present an algorithm to determine if reconfiguration is possible when the necessary conditions are satisfied. 3 .3 .2 L ess T h a n F iv e F aults We first consider the case where there are exactly two faulty nodes aq and x 2. To reconfigure them , 2 good spare nodes and s 2 (unoccupied nodes) need to be relocated at x x and x 2 by some perm utation in A(Qn). Since A(Q n) preserves i distances, d(si>s2) must be equal to d(xi,x2). Suppose d(xi,x2) = d(si,s2) = t, sx and s 2 differ at bit positions *i, i2, ... ,it and aq and x 2 at bit positions j i , j 2,... ,jt- Let < 7 be a perm utation of the bits of node labels and I cr : 44 i.e., < 7 maps a number in to a number in { j i , H o w cr maps other numbers can be arbitrarily determined. After a acting on all the labels, the bit positions where sj differs from S 2 are exactly the same as those where Xi and x 2 differ. Complement those bits where cr(si) differ from x\. Then si will become and S2 will become X2- For 3 failures we also have a similar result and reconfiguration procedure. D efin ition 3.4 Two sets of nodes X = {cci , . . . , x m} and Y = {t/i,... ,ym} are similar if there exists a one-to-one mapping < f> from X onto Y such that d(xi,xj) = d(<f>(xi) = yi> , < f> (xj) = yp), 1 < i', j' < m. L em m a 3.5 Any three faulty nodes of Qn can be tolerated using automorphisms if and only if there exists a set of unoccupied spare nodes which is similar to the set of faulty nodes. P ro o f: The necessity of a similar set of unoccupied spare nodes is clear. The sufficiency is proved as follows. Let the set of faulty nodes be {x\,X 2,Xz} and a similar set of spare nodes be {5 1 , 5 2 , 5 3 }. Complement the 1-bit positions of si, such that 5i = 0... 00, then move all the 1-bits of 5 2 to the right (least significant bits) by a suitable bit perm utation. Call this region of 1-bits 1-region and the other part (most significant bits filled with 0’s) 0-region. Apply another appropriate bit perm utation to move all the 1-bits of 5 3 in 1-region and all the 1-bits of s3 in the 0-region to the right. As an example, the spares shown in 3.1(a), after such a BPC perm utation <ti, is i shown in 3.1(b). 0000000 0000000 0110011 0001111 1101001 * 0110011 (a) (6) Perform similar operations to the xi, X2 and X3. Call it < 7 2 . ^ (si) = < r2(ari) = 0... 00. < r1(s2) = < 7 2 (0:2 ) since they have the same distance to o-i(si) = cr2(a;i). By 45 I c?(cr1 (^2 ) 5 cn(-S3 )) = < ^ (< ^ 2 (^ 2 ), ^ 2 (^ 3 ))) < 7 1 (5 3 ) must be the same as ^ (a ^ )- Therefore < 7 2 -1 o cri(6j) = X{ for 1 < * < 3. □ W hen there are 4 faulty nodes, similarity alone is not sufficient to guarantee the transform ation of the spares to the faulty nodes by BPC. A counter exam ple is shown in the following. Consider 2 sets of nodes in Q4, S = {0010,1110, 1000,1011} and X = {1111,1100,1010,1001}. They are similar since the distance between any pair of nodes in either set is always 2. However, there is no perm u tation in A(Q4) which can transform S to X due to the following reasons. The first bit of every node in X is 1 while there is no bit where every node in S agrees. Therefore, no m atter how we permute bits and complement them there is no way to make the first bit of all nodes in S' to be 1. An additional condition is required, i.e., the dimension of the minimum cube containing the spares must be the same as that of the minimum cube containing the faulty nodes, as suggested by the counter example. The necessity of the similarity and dimensionality conditions is clear from BPC and the counter example. We give an informal proof, by an example, for the sufficiency. Let all possible sets of 4 nodes satisfying the same similarity and dimensionality condition form a class A. Every set of A can be transformed to a standard form. Consider the 4 nodes in 3.2(a). 000000000 000000000 000000000 000000000 011010111 000111111 000111111 000111111 101001101 --* 011000111 -» 011000111 011000111 010101110 101101010 101011001 011111000 (a) (*) («) (d) (3.2) By Lemma 3.5, the first 3 nodes of all sets in A can be transformed as shown in 3.2(b). W ithout affecting the first 3 nodes, we can obtain 3.2(c). Call the two 1-regions of the third node right 1-region and left 1-region and similarly the two 0-regions. Suppose that the fourth node of some set in A can not be transformed as the fourth node in 3.2(c). Let us say it has no 1-bits under the right 1-region of node 3. Then it must have 3 1-bits under the right 0-region of node 3 by Lemma 3.5 on node 1, 2 and 4. (Because of the similarity constraint, the number of 1-bits 46 of node 4 under 1-region of node 2 must be the same for all sets.) This makes the distance between node 3 and 4 of this set contributed by the bits under the 1-region of node 2 increased by two. In order to offset this increase, the only way is to switch a 1-bit under the left 0-region of node 3 with a 0-bit under the left 1-region of node 3, as shown in 3.2(d). However, these 4 nodes are now located in a smaller subcube - a contradiction. The other possibility that node 4 be different from the standard form in (c) is to have more 1-bits (say 2) under the right 1-region of node 3. However, by a similar argument it will also lead to a contradiction with the 4 nodes located in a bigger subcube. From the above discussion, we conclude the necessary conditions for general m ultiple faults case are at least the following: 1. The sets of faulty nodes and spare nodes must be similar. 2. The dimension of the minimum cube containing any 4 nodes of the faulty set must be the same as that of the minimum cube containing the corresponding 4 nodes of the spare set. 3. The dimension of the minimum cube containing the faulty set must also be the same as that of the minimum cube containing the spare set. / k \ The checking of the second condition takes I 1 where k is the num ber of nodes in a set. It becomes expensive when k is large. In the next section, we present an efficient algorithm to determine if a set can be transformed to a similar set, when k is large. 3 .3 .3 M u ltip le A d ja cen t F au lts W hen the faulty nodes constitute a connected subgraph of the hypercube, it is always possible to transform a similar set of good nodes to them as stated in the next theorem. T h e o re m 3.7 Multiple faults which form a connected subgraph of a hypercube can always be tolerated using automorphisms if and only if there exists a set of good spare nodes similar to the set of faulty nodes. 47 P ro o f of T h e o re m 3.7 O n ly if part of the proof is straightforward. We prove the if part below. Suppose X\, X2, ■.., Xk are faulty and they are adjacent. Let sx, S2, ..., Sk be a set of good nodes which is similar to the set of faulty nodes. We prove the theorem by induction on the number of faulty nodes. We have seen that for k < 3 the theorem is true. Assume the theorem is true for some k = m > 3. Now consider the case when k = m + 1. It is always possible to find a connected component with m nodes in {sx>..., sm+i} by starting with a node and then adding adjacent nodes one by one. Let the only node left be sm+1. By assumption, s\,S 2, •.. ,sm can be transformed to arj, # 2 , • • • ? xm with some perm utation rj € A(Qn) = B PC . W ithout loss of generality, let 7?(si) = XU J](s2) = X2, ■ . . , T)(sm) = Xm rj preserves adjacency, therefore (-si),..., r)(sm+i) must remain being a connected component and r/(sm+i) must be adjacent to some node. Suppose it is r}(si). Due to the similarity between the faulty set and the good set, Tj(si) = xi is also adjacent to node xm+i. Therefore, both xm+i and r)(sm+i) are adjacent to tj(si) = Xi. This implies that the distance between xm+i and is either 2 or 0. If they are the sam e(distance=0), the theorem is proved. Let us consider the case where rf(sm+i) and x m + 1 differ by bits j\ and j 2 - Let the label of r}{si) = X{ be a x . . . ... aj2 ... an and the labels of ? 7(sm+i) and arm+1be a i ... . . . aj2 ... an and a\ ... ... a^ ... an respectively. Since except at bit positions ji and j'2 all other bits of ^(sm+i) and xm+i are the same, let us concentrate on these two bits only. There are four cases: (i) Xi'. ajxaj2 = 00 (7}(sm+1):aj^aj2 = 10; xm+1: ah ~aj; = 01) ;claim : For 1 < r < m, Bj 1(xr)Bj2(xr) = 11 or 00 where B j(xr) denotes bit j of xr. p ro o f o f th e claim : Suppose for some Xi> , Bj 1(xi>)Bj2(xi>) = 10. Except at bit positions j\ and the bit positions where //(sm+x) differs from X i> m ust be the same as the bit positions where xm+i differs from X i> . However, at bit positions jii \ and j 2, r)(sm+i) are the same as x% i while x m + 1 is complement of Xi>. This implies 48 d(v(sm+i), Xi> ) + 2 = d(xm+i,Xi>). That is d(r)(sm+1), Tj(si>)) ^ d(xm+i, Xi>), contra dictory to the similarity property since rj preserves similarity. If Bj1(xii)Bj2(xii) = 01, by the same reasoning it also leads to a contradiction. Therefore the claim is true. = ajiaj2 = and 5jj(a;m+i)J5j2 = dk aj2 0i. We can simply swap bit jx with j< i in r?(sm+1 ) to make 7?(sm+i) go to xm+i. All other nodes will stay at the same place since jx and j% of them are either 00 or 11. (ii) ak ah = 11 proof is the same as (i). (iii) *»: ak ak = 01 (?/(5m+1):a“ flj2 = 11; xm+1: ak aj; = 00) claim : 1 < i < m, Bk (xi)Bk (xi) = 01 or 10. Proof of the claim is the same as that of (i). In order to make Bjx (Tj(sm+i))Bj2(r)(sm+x)) the same as Bk (xrn+\)Bj2(xm+1), com plement bits j\ and j 2 of t](sm+1 ). Even though tj(sx), y($2), • • •, v(sm) move to new places, they can be fixed. By switching bits jx and j 2, they move back to the proper nodes and this switching of bits has no effect on those two complemented bits of f](sm+i) which are 11 = 00. (iv) ak ak = 10 The proof is the same as (iii). Thus the theorem is proved. □ E x am p le : Consider a 5-dimensional hypercube with a set of adjacent faulty nodes given by, xx = 01001 x 2 = 00111 x3 = 00101 x 4 = 01101 x 5 = 01100 and a similar set of good spare nodes si = 10010 s 2 = HOOl s3 = 10001 s4 = 10011 s5 = 00011 49 We start at moving si,S 2 and S3 to x\,x% and £ 3 by the following bit perm utation / 1 2 3 4 5 \ \ 1 4 5 2 3 / and then complement bits 1 and 5. We number the bits from left to right. Call this combined operation 77 which is a BPC permutation. By 77 we have the following result 17(5 1 ) = 0 1 0 0 1 17(5 2 ) = 0 0 1 1 1 77(5 3 ) = 0 0 1 0 1 77(5 4 ) = 01101 77 (s5) = 11101 The only discrepancy is 77(5 5 ) ^ x$ which differ at bit 1 and 5. Notice th at bits at positions 1 and 5 of all other labels now are 01. By another operation r\‘ which complements bit 1 and 5 and then switches them, the result is 7//77( s i ) = 0 1 0 0 1 j],i](s2) = 0 0 1 1 1 77^ ( 5 3 ) = 0 0 1 0 1 r)'r}(s4) = 0 1 1 0 1 *7/I7(55) = 0 1 1 0 0 t Hence, all the spare nodes map to the faulty nodes. □ Given a set of spare nodes which is similar to the set of adjacent faulty nodes, the complexity of finding a permutation in A(Qn) is O(kn) which is clear from the induction steps. We can simply use the induction steps as an algorithm. In every step of the induction, n bits need to be checked and there are k nodes. Since both n and k are relatively small numbers, it is an efficient algorithm. 3 .3 .4 R a n d o m F au lts (> 5) i In this section, sufficient and necessary conditions are derived for a bijection g : S — » X to be realizable by a BPC operation 77, i.e., for all 5 € S, 77(5 ) = g(s). Let S = {si,..., sm} and X = {x4, ..., xm}. M a is an to x n m atrix, where row i is the label of Sj € S. Two operations, complementing and perm uting the columns, are defined on M a. CF(S), the canonical form of M s is defined as follows. Complement appropriate columns of M a so that the first row contains only zeroes. Then treat each column as a binary number with the most significant bit in row 1 50 row 1 00000000000000000000000000000000000000000000000 row 2 09000000000000000000000 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 row 3 0000000000 0 1 1 1 1 1 1 1 1 1 1 1 1 100000000000 1 1 1 1 1 1 1 1 1 1 1 1 1 row 4 00000 11111 1 9 0 0 0 0 11111C 00000 111111 o o o o n 0 11111 row 5 00(11 noniL 1M 2 L 1 llQ Q Lll Q Q C l l l 00011 00111 j 0 0 ( 3 1 1 regions 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Figure 3.9: Canonical form of a 5-node subset, where the shaded arers are blocks of l's. and the least significant bit in row m. Sort the columns. The result with 0 in the left end and 2m — 1 in the right end is CF(S). Similarly, C F ( X ) can be obtained. The columns with the same value j is called region j. The num ber of columns with value j is the width of region j, denoted w(j). If there is no column with value j , then w(j) = 0. A canonical form of a 5-node set is shown in Fig. 3.9. D efin ition 3.5 Co-dimension of S C Qn is the number of columns in C F {S ) with all entries 0, denoted by co-dim(S). Co-dimension of a single node is defined as 0 for convenience. T h e o re m 3.8 Let S, X C Qn, S = {si,... ,sO T } and X = {xi,... , xm}. g : S — * X is a bijection and g(s{) = X{. If g preserves the co-dimension of all subsets of S , i.e., for all T C S co-dim{T) = co-dim(g(T))} then C F (S ) = C F (X). I I _____ 51 P ro o f: By induction on |5|. Basis: |5 | = 2 Trivial. Suppose |5 | = m — 1 True. 151 = m By induction hypothesis, C F (S — {sm}) = C F {X — i.e., the first m — 1 rows of C F (S ) and C F (X ) are identical. We examine regions from the first to the last, and show that the width of each region is the same in both forms. Once region 2j in both forms are shown to have the same width, it follows that region 2j + 1 are also of the same width, 1 < j < 2m~2. This is because sum of widths of region 2j and 2j + 1 is the width of region j of row m — 1 and rows m — 1 of both forms are identical by induction hypothesis. Assume all the regions < 2j in both forms are of the same width. To show regions 2j in both forms are also of the same width, consider rows 1, z ‘i , ..., z& , m, where rows zi,. . . , z* are chosen such that blocks of l's in rows zj,..., ik covers all regions 2(j + 1) to 2m — 1. For example, in Fig. 3.9, to prove region 8 in both forms are of the same width, rows 1,3,4,5 are chosen so th at regions from 10 to 15 are covered by blocks of l's in rows 3 and 4. More precisely, these row indices m ust satisfy the following equation 2j + 1 + 2m~h + 2m~h + ... + 2m~ik = 2m_1 - 1 Co-dims of row l,z’i ,... ,z’ *,ra of both forms are the same. In regions 2j -f 1 to 2T O — 1, every column has both 0 and 1 and thus can contribute to the co-dimension of 1, z'i,..., ifc, m. Regions 0 to 2j — 1 are exactly the same in both forms. In region 2j, the only entry in rows 1, zl5..., z*, m are 0. Hence, region 2j must be of the same width. Starting from j = 0, all regions can be shown to be of the same width in both forms. □ C o ro lla ry 3.1 S, X , g, are defined as in the theorem, g can be realized by some a € Aut(Qn) < = $ ■ g preserves co-dimension of all subsets of S. P ro o f: (=>): It is easy to see that BPC operations preserve co-dimensions. Hence, if &(S) = X , all subsets are of the same co-dimension. 52 (<£=): The operations used to obtain C F (S ) and C F (T ) are in Aut(Qn). Sup pose (tj is used to obtain C F (S ) and o2 is used to obtain C F(X). Then by composing ox and inverse of o2, S can be transformed to X . □ We are interested in determining whether reconfiguration can be achieved when a similar set of spares exists. That is, given a set of faulty nodes {xy,x2,. ■ .,#&} and a set of good spares {si,S 2, . .. similar to it, where Xi corresponds to s;, we would like to find a BPC perm utation a which maps S{ to Xi for all 1 < i < k. For this situation, we have an efficient algorithm. The idea is to represent these sets of spare nodes and faulty nodes as binary matrices as M s and Mx, respectively, of size k x n and determine how the columns of m atrix M s can be mapped to columns of Mx. Since a BPC perm utation perm utes bits and possibly complements one or more bits, this corresponds to perm uting columns and/or complementing. Therefore, a column of m atrix Ms can be mapped to a column of m atrix Mx in two ways (with or without complementing). The sufficient condition that M s can be mapped to Mx is that every column of M s can be mapped to some column of Mx in either of the two ways. It is possible for a column, say ith, of m atrix M s can map to more than one column of m atrix M x. For example, if the ith column of M s maps to k\ and k2 columns of Mx, then there must exist exactly one other column j ^ i of Ms which also maps to these two columns of Mx. Otherwise there exists no BPC that maps Ms to M x completely. Since either only one or more than two columns map to ki, k2, then a one-to-one and onto mapping can not be found between the columns of M s and those of Mx. If all other columns of M s maps to M x, then we can map J columns i and j of M s to either ky or k2 of Mx to find a BPC which moves Si to re ,-, I jfor 1 < i < k. This concept can be generalized in that if the ith column of M s maps | to several columns of Mx, it suffices to choose one of these mappings to determine ) if reconfiguration can be accomplished. Since there exists at most two choices for each column, we can develop an efficient algorithm to determine if BPC can map M s to Mx which takes 0(n 2 ■ k) time. If the column bits can be compared in one step (by storing in registers) this algorithm takes only 0(n2) time. The algorithm is presented below: 53 A lgorith m F indP Let M s and M x be k x n binary matrices representing the set of spare nodes and faulty nodes which are similar. Fail = false; w h ile Fail = false; b egin for i= 1 to n do begin for j= 1 to n do begin If column[i] of M s is equal to or equal to complement of column[j] of Mx. Assign mapping of i to j . end If no mapping to i is found, set Fail = true. end end At the end if Fail is false, then F indP has succeeded in finding a map for every column of m atrix M s to some column of m atrix Mx. Therefore, a BPC perm utation has been found that can be used to perform the reconfiguration. If F in d P fails, then there exists no BPC perm utation which can map set of nodes in M s to that of M x. However, due to the possible symmetry of nodes in M x, there may exist other correspondences (obtained by row permutations which preserve similarity) between the nodes in M s and those in Mx. In the most general terms, we need to run the algorithm for all possible correspondences before determining whether M s can’t be m apped to Mx. But if faulty nodes are random, it is unlikely that there will be many correspondences due to symmetry. 3 .3 .5 S im u la tio n R e su lts Since we have shown that our reconfiguration problem, in the general setting, is equivalent to graph isomorphism problem, any algorithm for this will take a lot of tim e in the worst case. However, worst case scenarios occur when two graphs look alike and have lots of symmetry. In general, faults occur randomly and the chance that they would form a highly symmetric graph is negligibly small. Fur thermore, we have shown efficient way of performing reconfiguration when there are only few faults(< 5). For large number of faults, we can come up with some heuristic algorithms which will decide quickly whether reconfiguration is possi ble or not. In order to show that heuristics can be effective, we ran simulations for different fault scenarios in a 10-dimensional hypercube. The conclusions of these experiments are that the probability of reconfiguration with small number of faults(< 10) is very high and it rapidly goes to zero with large number of faults. Of course, the probability of successful reconfiguration depends on the task graph embedded in the hypercube, number of spares, and the fault pattern. W ith our limited simulation experiments the results are consistent in that the chance of suc cessful reconfiguration drops rapidly to zero after 10-15 faults in practical(10-15 dimensions) hypercubes. Therefore, it is a good idea to use some quick heuristic procedures when number of faults are large and use exact methods for small num ber of faults. In the rest of this section we discuss the details of fault scenarios simulated and present our results. To evaluate the performance of our reconfiguration technique, we used Monte Carlo simulations for several different task embeddings in a 10-dimensional hyper cube. The following 3 cases are simulated: Case 1: A task graph is embedded with 32 nodes left as spares. Their loca tions are distributed evenly in the entire cube. Case 2: A task graph is embedded with 47 spare nodes which are arranged in such a way that for any set of 3 faults a similar set of spare nodes can be found. Case 3: A linear chain is embedded with nodes 0 to 49 left as unoccupied as spares. Therefore, all spares are located in a 6-dimensional subcube. 55 We select random nodes to be faulty and introduce node faults incrementally from 2 to 9. In a real machine, also, faults occur incrementally and this fact can be used to our advantage in the reconfiguration process. W hat is im portant to us is the tim e to find a mappable set of spares, upon a new fault arrival. Since there is a large amount of time between successive faults, we can have the system well prepared before the next fault occurs. Since a mappable set of spares for the (k + 1) faults must contain a mappable set for the first k faults, we can gather all mappable sets for the first k faults in the background (during the large tim e between the feth and the (A ; + l)st faults). Upon arrival of the (k + l)st fault, we only need to look for a new mappable set for the (k + 1) faults based on the information gathered with k faults. This can be done by checking each old m appable set to see if it can extend to a new mappable set by adding a new spare, by the following process. First, a new spare is added to the spares to make it a similar set to the faulty set, then dimension of the whole set is checked, and finally, the algorithm F indP is used to determine if it is mappable. This process is repeated until a mappable set is found. When a mappable set for the (A : + 1) faults is found, the system starts the reconfiguration process. After reconfiguration, the system continues to find the rest of all new mappable sets, in the background. Since the locations of faults are random, most of the times a new mappable set can be found before going through all mappable sets for the first k faults as shown by the simulation results. When no such set can be found, reconfiguration is impossible and then one must resort to performance degradation and other approaches which are beyond the scope of this chapter. Simulation results for the three cases are shown in Table 3.1, 3.2 and 3.3, ! respectively. The ith row of each table presents the results for the ith fault arrival. The columns represent the following: column 1: The «th arriving fault. j column 2: The mean number of mappable sets for the first (i — 1) faults examined by the program before finding a mappable set for the first i faults. column 3: The mean total number of mappable sets for the first i faults. ______________ 56 ith fault first set all sets reconf. time total tim e prob. succ 2 1.22 165.53 1.44 ms 0.07 sec 0.999 3 3.32 277.02 5.34 ms 0.32 sec 0.996 4 10.90 112.88 20.81 ms 0.59 sec 0.980 5 19.73 13.75 45.57 ms 0.30 sec 0.863 6 9.01 0.97 22.57 ms 0.03 sec 0.301 7 0.74 0.03 3.16 ms < 0.01 sec 0.023 8 0.03 0.00 1.54 ms < 0.01 sec 0.000 Table 3.1: Case 1: A task graph is embedded with 32 spares left. ith fault first set all sets reconf. time total time prob. succ 2 1.00 310.30 0.23 ms 0.04 sec 1.000 3 1.01 564.37 0.56 ms 0.23 sec 1.000 4 8.77 238.64 7.05 ms 0.48 sec 0.999 5 31.33 39.94 29.81 ms 0.26 sec 0.929 6 16.97 3.78 17.55 ms 0.04 sec 0.417 7 2.60 0.27 3.34 ms < 0.01 sec 0.074 8 0.23 0.01 0.75 ms < 0.01 sec 0.007 9 0.01 0.00 0.54 ms < 0.01 sec 0.000 Table 3.2: Case 2: A task graph is embedded with 47 spares left. ith fault first set all sets reconf. time total time prob. succ 2 3.71 292.49 12.57 ms 0.09 sec 0.828 3 181.98 434.03 220.64 ms 0.41 sec 0.223 4 364.23 139.18 513.65 ms 0.66 sec 0.027 5 126.33 16.35 196.75 ms 0.25 sec 0.002 6 15.16 1.02 26.35 ms 0.03 sec 0.000 Table 3.3: Case 3: A task graph is embedded with 50 spares located in a small subcube. 57 column 4: The mean time elapsed from the arrival of the ith fault to the moment when a mappable set is found for the first i faults. column 5: The mean time elapsed from the arrival of the ith fault to the moment when all mappable sets for the first i faults are found. column 6: The probability of successful reconfiguration. For example, let us look at row 2 (after the 3rd fault) of Table 3.1. W ith 100,000 sets of random faults simulated, we examined on the average 3.32 sets out of 165.53 sets mappable to the first 2 faults to find a new mappable set for the 3 faults. In all, there were, on the average, 277.02 mappable sets for the first 3 faults. The reconfiguration time, i.e., time to find the first mappable set after the 3rd fault occurred is about 5.34 msec. The time to find all mappable sets for the first 3 faults is 0.32 sec. The probability of successful reconfiguration with 3 faults is 0.996. Fig. 3.10 graphically shows the simulation results for the three cases. Case 1 was run on a Sun 3/50 workstation and Case 2 and 3 were run on a Sun 3/280 machine. Random faults are generated by Unix system random number generator drand48(). All cases have been run for at least one hundred thousand sets of random faults. Except for case 3, existence of a m appable set can be determined in less than 50 milliseconds, as the average. Due to the randomness of fault locations, the spares packed in a subcube turns out to be a bad choice, as shown by our simulation results - case 3. When there are more than 3 faults, the faults are unlikely to belong to a small subcube. Besides, a subcube of spares generates many unnecessary similar sets (a similar set can be easily transformed t into another by a BPC perm utation of the subcube). W hen spares are evenly J distributed in the entire cube, the results are much better as in case 1 and 2. We also simulated the other extreme scenario with about half the nodes be left as spares. We considered a task graph embedded in a 7-dimensional hypercube with 50 nodes left as spares, and found that up to 12 failures can be tolerated with high probability as shown in Table 3.4 and Fig. 3.11. However, if very large num ber of spares are available, it may be necessary to use some heuristics to reduce searching tim e for mappable sets. Searching for all possible mappable sets may be too long to be accomplished between successive fault arrivals. Prob. o f Success 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 N um ber o f Faults Figure 3.10: The probability of successful reconfiguration for three different task graphs embedded in a 10-dimensional cube. 59 Prob. o f Success 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 N um ber o f F aults Figure 3.11: The probability of successful reconfiguration when a task graph is embedded such that 50 spares are available in a 7-dimensional hypercube. 60 ith fault first set all sets reconf. time total time prob. succ 2 1.00 516.88 0.51 ms 0.15 sec 1.000 3 1.03 2274.05 1.09 ms 1.20 sec 1.000 4 1.45 3719.68 1.99 ms 5.25 sec 1.000 5 2.90 2791.79 6.54 ms 10.91 sec 1.000 6 3.62 1402.81 8.32 ms 7.55 sec 1.000 7 4.22 606.03 9.18 ms 3.79 sec 1.000 8 4.21 228.68 10.37 ms 1.65 sec 1.000 9 3.74 82.36 9.63 ms 0.64 sec 1.000 10 3.67 28.88 9.39 ms 0.24 sec 1.000 11 3.60 10.11 10.12 ms 0.09 sec 0.994 12 3.30 3.28 9.68 ms 0.03 sec 0.885 13 1.94 1.05 6.12 ms 0.01 sec 0.550 14 0.84 0.35 3.53 ms < 0.01 sec 0.255 15 0.31 0.11 2.19 ms < 0.01 sec 0.092 Table 3.4: A task graph is embedded in a 7-dimensional cube with 50 nodes left as spares. I i Since our problem strongly resembles graph isomorphism problem, existing methods in graph isomorphism literature [27, 63] can be readily adapted to solve our problem. One approach is to decompose ^ ( a n d Da) into n matrices, where n i is the number of dimensions. Each m atrix Dx(d) is obtained by setting all entries of Dx to 0 except entries equal to d, 1 < d < n, and similarly for D a(d). Dx(d) can be treated as an adjacency m atrix for a random graph. W herever an entry is not 0, it represents an edge. The presence of an edge is random as the distance between any two nodes is random. Since Dx(d) is a random graph adjacency m atrix, it should be fairly easy, at least most of the times, to determine if Da(d) contains Dx(d). (Since the difficulties of graph isomorphism problem arise in those highly j regular graphs, and the probability of a random graph to be regular is rather small.) If all the copies of Dx(d) found in Da(d), 1 < d < n, consist of the same set of rows(columns) of Da and mappings between the set of rows(columns) of Da and the rows of Dx(d) are all the same, then we have found Dx in Da. 3.4 D iscu ssio n i 1 In this chapter, the problem of reconfiguring embedded task graphs in faulty hy percubes is studied. The idea is to use the unused nodes as spares to replace the i faulty nodes. Automorphisms of hypercubes are utilized to transform the embed ded task graph to another copy which contains no faulty nodes. For single failures, this technique can always reconfigure efficiently with performance of 0(n). For general node failures, necessary and sufficient conditions under which a mapping I from the spare set to the fault set can be realized by an automorphism, are derived for the following 3 cases: 1) small number of faults (< 5), 2) adjacent faults which i form a connected component, and 3) arbitrary number of faults. Efficient algo rithm to determine if a mapping is realizable by an automorphism is also given. jThe existence of such a mapping realizable by an automorphism guarantees that faults can be reconfigured. However, the problem of finding such a mapping is I shown to be as hard as graph isomorphism problem. A heuristic approach based on dynamic programming is then implemented to find such a mapping. Simula tion results show that with careful arrangement of the spare nodes small number of faults can be reconfigured with high probability. Our reconfiguration technique can be applied to other processor networks and it is easy to see that same set of operations will work for Cube Connected Cycles jwhich are essentially hypercubes with cycles at corners. Our technique can also be extended to other regular multiprocessor networks by developing new operations th at can re-embed task graphs in those regular networks. The scheme proposed in this chapter requires the existence of unused nodes. However, there are many tasks which utilize the entire hypercube, for example, algorithms for sorting, and m atrix operations as mentioned in C hapter 2. To exe- » [cute these tasks on faulty hypercubes requires running these tasks on the residual fault-free network. Therefore the basic properties of the residual network set fun dam ental limits on the performance. In the next chapter, these basic properties of faulty hypercubes are analyzed. 62 i C h a p ter 4 A n a ly sis o f F au lty H y p ercu b es i 4.1 In tro d u ctio n In an n-dimensional hypercube, when the number of faults is less than n, the hypercube remains connected. It has been shown that there are n node disjoint paths between any two nodes, and when the number of faults is less than n, the I length of the shortest path between the two nodes is increased by at most 2 [67]. Diameter of a hypercube is also known to increase by at most 1 when the num ber of faults is less than n [43]. When the number of faults is more than the connectivity, n, the hypercube can become disconnected. However, connectivity is a worst-case measure in the sense that n node failures can disconnect a hypercube only if they are all neighbors of a particular node; in practice, this is extremely rare. Therefore, it is useful to study the diameter and shortest path problems under more general \ fault scenarios as they affect performance of algorithms running on them. ' We develop efficient algorithms to find near shortest path between two nodes i I in a connected component. When the number of faulty components, faulty links and/or nodes, is less than n, we give an efficient algorithm to find a near shortest path between any two nodes. Complexity of the algorithm is 0(17^1 log n), where i ji^l is the size of faulty components. The path length found by this algorithm is a t most their Hamming distance plus 2. j W hen the number of faulty components is less than (2n — 2), it is shown that ‘ between any two nonisolated nodes, there exists a path of length equal to at most ,their Hamming distance plus 4. An efficient algorithm of complexity 0(|jF | logra) is also given to find such a path. Worst case scenarios are constructed to show 'that this bound is tight. | This result shows that the diameter of Qn — F, when connected, is at most n + 4. However, when the distance between a given pair of nodes is n or (n — 1), |we show th at a path of length n or at most (n + 1), respectively, can be found. ■Consequently, the longest path with length n + 2 occurs when two nodes are at a ^distance of n — 2. Hence, the diameter of Qn — F is at most n + 2 when |.F| < 2n — 2 ;and Qn — F is connected. When Qn — F becomes disconnected, our results still i apply to show that the diameter of the largest component is bounded by n + 2. i ^ jWorst case scenarios are constructed to show this bound for diam eter is tight. We also show that the diameter among the connected nodes is at most n + 1, when every node has at least 2 non-faulty neighbors, and reduced to n when every node I has at least 3 non-faulty neighbors. I 4.2 P relim in a ries The label of a node in an n-dimensional hypercube Qn is represented as (&i... bn), bi = 0,1, for 1 < i < n. For an i-dimensional link, in this chapter, we represent it ,by its two end nodes I | (& J ... 6,_10&i+j ... 6n, bi ... bi-ilbi+i ... bn) i where the first term is the label of the end node with 0 in bit i. We assume that most logic operations such as A N D , O R , and E X C L U S IV E - O R , denoted as A, V, © respectively, on n-bit labels can be performed in unit tim e by storing each label in a word. This is not unreasonable since most practical machines have a word length of 32 bits and the number of dimensions of practical 'hypercube multiprocessors is no more than 16. ■ In this chapter, we consider finding a path between a source node S and a des- | tination node D in a Qn containing faulty components; S and D are assumed to be ■fault-free. Throughout this chapter we let F denote the set of faulty components. |The Hamming distance between two nodes X = (xi.. .x n) and Y = (t/i... yn) !is denoted as H (X ,Y ). Let I(X ,Y ) = {i|ar* = y^}, J(X, Y) = {i\xi ^ yi}, and 64 Q m(S, D) denote the minimal m-dimensional subcube(m-subcube) containing S j and Z), where m — H(S,D). A path between S and D, denoted as P(S,D), is a ; minimal path if its length, denoted as l-P^, D)\, is equal to H(S,D). P (S ,D ) is represented as a sequence of dimensions that the path traverses starting from S. •An m-subcube is represented as a ternary string over the alphabet { 0 ,1, *}. For Jexample, the 2-subcube {000,001,010,011} is represented as 0**. A node X is ac cessible to its neighbor X ' , if both X and X ' are fault-free and the link connecting them is also fault-free. A node is isolated if all its neighbors are inaccessible. 4.3 T h e S h o rtest P a th P ro b lem 4.3.1 N u m b er o f F au lts L ess T h an n jit is known [67] that there are n node-disjoint paths between two nodes S and D in a fault-free n-dimensional hypercube. Let J(S ,D ) = {ji < j2 < ••• < j m } - Note that an ordering is imposed on J(S,D). These paths, represented in terms 1 |O f dimension sequences, are O 'l • ■ • j m )m m minimal paths of length to j ( j i • • • j m ) j j £ J(S, D) (n - m) paths of length (m + 2) where ( j x . . . j m ) m is the set of paths obtained by rotating cyclically i?n times. Since there are 0 (n 2) nodes and links in this set of paths, a naive algorithm to find a fault-free path by checking each node against the fault set jtakes 0 (n 2 log |F |) time. In this section, we present an efficient algorithm with | I i tim e complexity 0(|F|logn) to find a fault-free path. The algorithm has two parts: part 1 checks if there is a fault-free minimal path while part 2 checks if | there is a fault-free path of length (m + 2). | ! Every node X on some minimal path (jr ... j mj \ .. ■ j r-i) differs from S in j successive bits of { 7 1 < ... < j m} in the wrap-around fashion, i.e., jx is considered jthe next bit of j m. Therefore, to check if a faulty node f £ F blocks a minimal ipath in (4.1), we examine if / differs from S in some successive bits of {jx . • - jm} \ in the wrap-around fashion. Procedure checkl performs this as explained below. I i 65 (4.1) [ For a given faulty node / , let a = S © D and /3 = 7 © S Thus a and (3 contain J jthe bits where S and D differ and those where / and S differ, respectively. If j3 is not contained in a , i.e. if a V j3 ^ a, then / is not in Qm(S , D ) and can not block a minimal path. If f3 has a single block of successive l ’s of J(S ,D ), then it blocks a path. This can be checked as follows. First, compute an array Index[l... n] jwhere Index[i] contains 0 if bit i is not in J (S ,D ) and k if bit i is j *, the fcth bit of Find the rightmost r and leftmost I bit of (3. Compute the weight of i/3, denoted weight(f3). If weight(/3) — Index[r] — Index[l] + 1, then it is a straight jblock of successive l ’s, i.e., no wrap-around from j m to j i . Otherwise, complement jthe bits of f3 in all positions of J(S, D), and repeat the above procedure to check if it is a wrap-around block. If it is a block, the leading bit of the block indicates the path beginning with that dimension is blocked by the fault. To examine if a link blocks a minimal path, we check if each of its end nodes block a path. If they I 1 jblock the same path, then the faulty link lies on that path. | Computing weight by function weight can be done in O (logn) [65] with the assumption that shifting k bits takes one step. Binary search is applied to find the left (right) most bit. To find the leftmost bit, shift right fn/2] bits. If the result is 0, then the leftmost bit is at position < |"n/2"|. If the result is > 0 but not 1, it is > fn /2 ]. In the former case, we take (3 and shift right [rc/4] while in the latter case we shift right [3n/4]. This procedure is repeated until the shift generates a result equal to 1. Similarly, we can find the right most bit by shifting to the left. 'Details of these two procedures flmb, frmb are omitted. , ^ Next, we check those paths of length (m + 2). Since at least m faults are required to block all minimal paths, only n — m — 1 faults can be located outside Q m{S, D). Each of these paths lies in a m-subcube adjacent to Qm and there are !(n — m) such m-subcube. One of them must be fault-free, and this can be checked as follows. Mask the J (S , D) bits of all faulty nodes and the end nodes of all faulty links by setting them to 0. Mask also the J(S ,D ) bits of S. Then exclusive-or j \S with all the faults. Compute the weights of the results of exclusive-or. If the [weight is one for a faulty node, find its bit position using binary search. The ^corresponding adjacent m-subcube of Qm can be disregarded. For a faulty link, | if both end nodes have results with weight 1, the link is contained in an adjacent 1 ; t : I 66 I I j global integer array: Index[ 1... n]; procedure checkl(S,D,F); | /* S: source node, D : destination node, F: the set of faults. */ i begin : ot = S ® D\ ' c = 0; I for * = 1 to n do /* Establish Index[ 1.. .n] */ b egin ■ shift left a; | if the bit shifted out is 1 i th en begin c — c + 1; Index[i] = c; end; end; 1 7 for all faulty nodes f € F do b egin k =block(/); if k T^null /*the path starting with dimension k is blocked */ th en Index[k]=l; j end; I for all faulty link ( / i , / 2 ) € F do t begin I ki =block(/i); k 2 =block(/2); j t if ki = k 2 T^null /*the path starting with dimension ki is blocked */ I th en Index[ki]=l; ! end; ! end i 67 fun ction block(/); /* return null or the starting dimension of the path blocked by / .* / b egin P = f © 5; if a V ft = a th en /* / is in the minimal subcube containing S and D. */ begin ki =frmb(ft); /* find the right most 1-bit of /?*/ /* find the left most 1-bit of /?*/ if weight(j3) = k\ — &2 + 1 /* a straight block */ th en return & 2 j /* The path starting from dimension &2 is blocked. */ else begin /?' = a © /?; /* Complementing the J(5', -D) bits of P */ K =frmb{P'); k ' 2 =flmb{P')\ if weight(P') = k[ — k 2 + 1 /* a wrap-around block */ th en return fc j.; /* The path beginning with dimension k\ is blocked.*/ end; return null; Iprocedure check2(S,D,F); begin Mask J ( S ,D ) bits of S', /* setting those bits to 0 */ for all faulty nodes / £ F do i begin ; Mask J (S ,D ) bits of / ; J if weight(S © / ) = 1 then J Index[flmb(S © /)]= 1; ! /* The path starting with dimension flmb(S © / ) is blocked. */ ! end; ! for all faulty link ( / i , / 2) £ F do , ! begin ! Mask J (S ,D ) bits of fi and f 2‘ , wi =weight{S © /i); | w 2 — weight(S © f 2)', I if = w 2 = 1 /* The link is contained in an adjacent m-subcube. */ j th en Index[flmb(S ® /i)]= l; j if wi + w 2 = 1 th en I /* The link is between Qm(S, D) and an adjacent m -subcube.*/ | begin if w\ = 1 then Index[flmb(S ® /i)]= l; else Index[flmb(S © / 2)]= 1; I end; J end; end; 1 im-subcube and that m-subcube can be disregarded. If one end node has result iwith weight 1 and the other 0, the faulty link is between Qm(S, D ) and an adjacent ! # ; im-subcube. Hence, that m-subcube can also be disregarded. At the end, at least one m-subcube is found to be fault-free and a path of length (m + 2) can be found 1 |through it. The procedure checkl, check2 are as shown and the entire procedure route is shown next. An extra param eter M is included, which is an n-bit word. The positions of l ’s in M indicate that the paths starting from those dimensions are not available. By furnishing the set of faults in a subcube, route can be applied to ■ ' proced u re route(S,D,F,M); begin checkl(S,D,F); /* check if there is a minimal path */ i = 0; rep eat i = i + 1; shift left M] if the outshifted b it= 0 A N D i G J(S,D ) A N D Index[i]=d th en return the minimal path starting from i and exit; until i = n; check 2 (S,D ,F ); i = 0; j rep eat : i = i + l] | shift left M\ j if the outshifted b it= 0 A N D i £ J (S ,D ) A N D Index[i]=0 th en j return the path with length H (S ,D ) + 2 starting from i and exit; j until i = n; end; find a path confined in that subcube with 0’s in M at those spanning dimensions and l ’s elsewhere. The need of this construction will become clear in the next I subsection. The output of route is the dimension sequence of a fault-free path. ! It is not hard to see that the complexity of checkl and checks are 0(1^1 log n). ■Therefore the complexity of route is 0(\F\ log n). T h eorem 4.1 In an n-dimensional hypercube Qn, let F be the set of its faulty \components. If |.F| < n, then for any two nodes S and D, a path P (S ,D ) with ||.P(<S, D )| < H (S,D ) -j-2 can be found in 0 (|F |lo g n ). |When H (S,D ) = n — 1 and S is isolated in Qn-i(S , D), the shortest path will be 1 of length n + 1. Therefore the diameter of Qn — F is at most n + 1 when |F | < n. i I 4 .3 .2 N u m b er o f F au lts B e tw e e n n an d 2n — 3 |When the number of faults is n or more, it is not possible to guarantee a path of length H (S, D ) + 2. This is shown in the next example. 'Figure 4.1: A worst case fault pattern which makes the shortest path between S and D of length 6 is shown above. The length of the shortest path is increased by j 4. E x am p le : In a 4-dimensional hypercube, let faults occur at the corners marked |With a ‘x’, as shown in Fig. 4.1. For S and D designated in Fig. 4.1, it is easy to see th at the shortest path between S and D has length 6, even though the distance between S and D is only 2. . In general, this scenario can happen in an ra-dimensional hypercube with n [faults. An example can be constructed by adding more adjacent 2-subcubes to 'Q2 (S, D) in Fig. 4.1. In each of the adjacent 2-subcube let either S not be accessible to its neighbor or D not be accessible to its neighbor. However, let both S and D have at least one accessible neighbor, so that S and D are not isolated. W ith ,n + m faults, the worst case scenario for S and D with H(S, D) = m < n — 2 can also be constructed such that |P (5, D)| > H {S ,D ) + 4. Let S and D be isolated in Qm(S ,D ) which requires 2m faults for m > 2. In each adjacent Qm, let either the neighbor of S or D be faulty. However, let S and D have at least one good neighbor. This requires at least n — m faults. It is not hard to see th at there is no path of length H (S,D ) + 2. For m = n — 2, a worst case example is shown in Fig.4.2.n 7 1 ! ____ i T h e o re m 4.2 In Qn, if the number of faults |jF| < 2n — 2, then for any two non-isolated nodes S and D with distance H (S ,D ), there exists a path of length at most H (S, D ) + 4, P ro o f: Let S and D differ in m bits given by the set J { S ,D ) = {ji < ... < j m}- A di mension i is good if both S and D are accessible to their ?'th dimensional neighbors. C ase A There is at least one good dimension, say i. Let the ?th dimensional neighbor of S be X and that of D be Y. C ase A .l i € J (S ,D ) Cut Qn into Ql n_l5 / = 0,1, along dimension i. Then one subcube, say con~ tains S and T , and the other, contains X and D. By [67], there are n — 1 node disjoint paths of length at most m + 2 between S and Y in and between X and D in Q*_x. These two sets of (n — 1) paths are clearly node disjoint paths since they are in disjoint Q n -i’s. Therefore, there are (2n — 2) node disjoint paths. Hence, a fault-free path between S and D of length at most m -j- 2 exists. I C ase A .2 i £ J (5 , D ), and therefore m < n Again, we cut Qn into Ql n^\, I = 0,1, along dimension i. Then one subcube, say Qn-iy contains S and D, and the other, Qh_i, contains X and Y. There are (n — 1) node disjoint paths between S and D of length at most (m + 2); and there are (n — 1) node disjoint paths of length at most (m-f-2) between X and Y . This latter set of paths are of length at most (m + 4) between S and D. Hence, a fault-free path between S and D of length at most m + 4 exists. C ase B There is no good dimension. Since S and D are not isolated, assume S can reach its neighbors along dimensions a?i, x 2, .. •, xp and D can reach its neighbors along dimensions yx,y 2, . . . , yq. It is clear th at {zj, x 2, . . . , a rp}n{i/i, y2, . . . , yq} = 0. All the dimension sequences below representing paths are written from S to D. _______________________________________________________________________________________ 72__ C ase B .l There exists some x,-, yj E J (S , D ), 1 < i < p, 1 < j < q. Therefore, | J(S, D )| > 2. If m = 2, it is trivial. (x8 x/j) is a fault-free path. Assume m > 2. The nonexistence of a good dimension implies that there are at least n faults among the neighbors of S and D when m > 2, i.e., along every dimension either S or D has a faulty neighbor or connecting link. Let the neighbor of S along dimension Xi be X and the neighbor of D along dimension yj be Y. Cut Qn along dimensions x,- and yj into Q l n _ 2 where 0 < I < 3. Then X and Y must be in the same subcube, say • Clearly, there are (n — 2) node disjoint paths between X and Y in Qn- 2- Since n faults axe among the neighbors of S and D, they are not in Q ^- 2 - Therefore, there must be a fault free path of length at most (m + 2). C ase B .2 { x \ . . . xp} fl J(S, D) = 0 or ... yq} fl J(S, D) — 0 At most one of the nodes S and D can have accessible neighbors across some k di mensions in J (S , D), where 0 < k < m; without loss of generality, let S be the node with accessible neighbors along these k dimensions, and let x, = ji for 1 < i < k. D will have accessible neighbors along dimensions yj ^ J(S ,D ), 1 < j < q. Note if k = 0 then both nodes do not have accessible neighbors along dimensions in J(S, D). Now construct the following p • q node-disjoint paths from S to D. For each dimension yj along which D has an accessible neighbor, construct Jl2/iJ2j3 • • •jkjk+l • • • jmVj j2Vjj3j4 • • • jkjk+l • • • JmjlVj j k V j j k + l j k + 2 ■ • ■ j m j l • • • j k —l V j This set of k paths are obtained by rotating the bits • Qm without changing the position of yj. This will result in node-disjoint paths [67], and is easy to verify. In all, we will obtain k-q paths of length m + 2 and they are node disjoint after the good neighbors of S. ► k paths of length m + 2 (4-2) 73 I For the remaining reachable neighbors of S, we construct additional (p — k) * q disjoint paths as follows. For each distinct pair of bits Xi, yj we construct a path x iVj(j 132 • • •jm)x iyj of length m + 4 (4.3) Thus we have a total of p ■ q paths, which is > p + q — 1- Since there are already 2n — p — q faults among the neighbors of S and Z), the remaining allowed number of faults of p + q — 3 will not be able to eliminate all paths. We need to consider m — 2 as a special case since the total num ber of distinct neighbor nodes between S and D is only 2n — 2. If 5 has p accessible neighbors and D has q accessible neighbors, there must be at least 2n — 2 —p — q faults among these (2n — 2) neighbors. In addition, at most p + q — 1 faults are allowed. If S has a good neighbor along xi € J(S,D ), then the link from that node to D must be faulty or vice versa. Therefore, we need only to find p + q — 1 node-disjoint paths. We have just that number from the above construction. If S doesn’t have an accessible neighbor along Xi G J(S, D ), we have accounted for only 2n — p — q faults and we need to construct p + q node disjoint paths. p • q paths constructed above will not be sufficient only when p or q is 1. For this special case we construct one additional node disjoint path as shown below with | an example. W ithout loss of generality, let p = 1, xi = 3, y\ = 4 , S = 0 ... 00 and D = 0 ... Oil. Construct paths 314234 342314 3^-123^ , 2 < j < k It is not hard to see that these paths are disjoint. For m = 1 the nontrivial case is when the link between S and D is faulty. The ! num ber of neighbors that are faulty is now 2 n — 1 — p — q and so the above con struction holds. □ p • q + 1 paths of length 6 L 74 I Based on the proof of Theorem 4.2, we construct an algorithm for finding a path within the bound. First step is to determine if there exists a good dimension. In the following procedure, words a and b are used to record non-accessible neighbors of S and D, respectively. a = b = 0 ; for each / G F do begin C ase 1. / : a node S = f®S] if weight(S) = 1 th en a = a V 6 ; S = f © D] if weight{8 ) = 1 th en b — b V < ■ > ; 2- / = ( /i, /a): a link S = ft ® S] 7 = /2 © S\ if weight(6 )+weight^) = 1 th en a=aV<5 V 7 ; * = fi © D\ 7 = /2 © D\ if weight(6 )+weighty) = 1 th en b=b\M V 7 ; end; ! c = a A b; where a and b are complements of a and b. The bit positions where c contains a 1 ! are good dimensions. This procedure for finding good dimensions takes 0 (|.F | log n) time. W hen a good dimension exists, the problem reduces to finding node-disjoint paths in $ n -i an(l Qn- 1 as described in the proof. Procedure route in section 4.3.1 can be applied in and Q^_a. However, we need to set the bits of M so that route does not find a path outside if cA(S © D) ^ 0 then begin pick a good dimension i G J(S,D )\ M = 0; set bit i of M to 1; P(S,D)=row£e(S,Y,-F f! if P(S,D)=»m// th en P(S,D)=ioroute(X,D,F n Q l^ M ) ', end else begin pick any good dimension i; M = 0; set bit * of M to 1; P(S,D)=rotrfe(S,D,F n if P(S,D)=nw// th en P(S,D)=«orot/£e(X,Y,.Fn Qll_1 ,M)oi; end The complexity is dominated by route which is 0(\F\ log n). W hen c=0, there is no good dimension. We need to determine if there exist Xi and yj in J { S ,D ), such th at S has an accessible neighbor along xs and D has an accessible neighbor along dimension yj. If this is true, which is case B .l, route can be applied as shown below. if a A (S © D) 0 and b A (5 © D) / 0 then b egin pick Xi and yj E J(S,D)', M = 0; set bit X{ and yj of M to 1; P(S,D )=X ioroM £e(X ,Y ,F fl Q ° n_2 ,M)oy end 76 In case B.2, we need to check the k paths in (4.2). Besides the accessible neighbors of S , all other nodes on these paths are contained in an m-subcube, Qm. This Qm is adjacent to Qm(S, D) across dimension yj. If the number of faults in Qm and the faulty links in between Qm(S,D ) and Qm is less than k , then one of these paths must be fault-free. Similarly, all nodes but 2 on the paths in (4.3) are contained in a m-subcube, Q'm which are obtained by complementing bits X{ and yj of Qm(S, D). We can simply check if Q'm and the links on the path leading to Q'm are fault-free. Since only p + q — 1 additional faults are allowed, it is clear from the proof that some Qm must have less than k faults or some Q'm is fault-free unless m = 2. If m = 2 and the above check fails, construct one more path as indicated in that proof. To check the number of faulty nodes contained in an m-subcube spanned along J(S, D ) dimensions, only the non- J(S, D ) bits of faulty nodes need to be examined. Therefore, we mask J(S, D ) bits of every faulty node by setting these bits to 0 and put them in list F\. Sort list F\. For an m-subcube, Qm, consider its label as L obtained by replacing *’s with 0’s in the ternary representation of Qm. This new label L is then used to perform binary search on F \. Once an entry is found in F1 with label L, traverse F\ linearly to get the number of L labels in F\ which gives the number of faulty nodes in Qm. Similarly, use the label (L, L) to search the list F 2 which is obtained by masking the J (S ,D ) bits of the end nodes of every faulty link. We can get the number of faulty links inside Qm. To find the number of faulty links between Qm(S,D ) and Qm, perform the same procedure by using the label of Qm(S ,D ) with *’s replaced by 0’s and L together as a link label to perform binary search on F2- If some Qm is found with less than k faults, then there will be a fault-free j path in th at Qm• Since the paths are obtained by rotation with yj at some fixed position, every node X on some path in the m-subcube still differs from S in successive positions of J(S ,D ) and yj. We only need to mask the yj bit and then checkl can be applied with input F consisting of those faults found in the above searches. Similarly, we can check if the m-subcube, through which a path of length ' m + 4 is constructed, is fault-free and has the links connecting to it also fault-free. W hen this is true, a path of length m -f 4 can be established in that subcube. The 77 tim e complexity is also that of checkl, which is 0(\F\ log n). The entire algorithm has tim e complexity 0 ( |F | log n). 4,4 T h e D ia m eter P ro b lem From the results of the previous section, the diameter of a faulty hypercube with IF1 1 < 2n — 2 is bounded by n + 4. However, when S and D are at distance n or n — 1, shorter paths can be found as shown in the following theorem. T h e o re m 4.3 In Qn, if the number of faults is less than 2 n — 2, then I. when H (S , D) — n, a minimal path exists; II. when H(S, D) = n — 1, a path of length n + 1 exists. P ro o f: I. W hen m = n, in the proof of Theorem 4.2, Case A.2 and Case B.2 will not arise. Further, paths of length m + 2 also disappear as all dimensions are in Hence, all existing paths are of length n , and there must be a fault-free path of length n. II. In the proof of Theorem 4.2, paths with length m + 4 can not arise in Case A.2, since all (n — 1) paths of each subcube are minimal. Paths with length m + 4 can not arise in Case B.2 since q = 1 and {a?! ... xp} C J (S , D ). □ ! T h e o re m 4.4 Diameter of a faulty hypercube Qn with fault set |F | < 2 n — 2 is at most n + 2. P ro o f: For any two nodes S and D, from Theorem 4.3, when H(S, D ) > n — 2, a path of length < n + 2 exists. By Theorem 4.2, the longest shortest path occurs when H (S , D) = n — 2 and the path length is at most n + 2. □ In the following, we construct a worst case scenario in which the shortest path between a pair of nodes with distance n — 2 is n + 2. 78 0...010 n — 2 fa u lts n— 2 n— 2 0...011 0 .. n — 2 faults! Qn-2 Qn—2 Figure 4.2: A worst case scenario to show th at the diameter is n 2 when li^j = 2n — 3. E x a m p le : In an n-dimensional hypercube Qn, let S — 0n and D = l" -202, where in denotes n consecutive i’s, i = 0,1. Let the fault set be F = {all neigh bors of S except 0n-1l } |J{ all neighbors of 0n-1l except S and 0n" 2l 2 }, as shown in Fig. 4.2, and |.F| = 2n — 3. Clearly, a path from S to D has length at least n + 2 as depicted in Fig 4.2. By ! Theorem 4.2, there is a path of length at most n + 2. Therefore we can conclude that the shortest path from S to D is of length n + 2.D From the above example, we can see that the diameter bound of Theorem 4.4 is tight. If the number of faults among the neighbors of every node is not high, then better diameter bound can be obtained. In the following discussion about diameter, we consider node failures only and it is easy to extend this to node and/or link failures case. If every node has at least 2 good neighbors, then the diam eter will be n + 1 for n > 3 as shown in the following theorem. For n = 3, some combination of node I 79 and link failures can increase the diameter to n + 2. For n = 2, since every node has 2 good neighbors, no faults are allowed. T h e o re m 4.5 In Qn, n > 3, with number of faults |jF| < 2n — 2, if every node has at least 2 good neighbors then the diameter is bounded by (n + 1) among the connected nodes. P ro o f: For two nodes S and D, 1. if H (S ,D ) = n, Theorem 4.3(1) shows that a path of length H (S ,D ) = n exists. 2. if H (S , D ) — n — 1, Theorem 4.3(11) shows that a path of length n + 1 exists. 3. if H (S, D ) < n — 2, Theorem 4.2 shows that a path of length H (S, D ) + 4 < n + 1 exists. The only case to be considered is H (S , D) = n — 2. We now show th at a path of length at most n exists in this case. In the proof of Theorem 4.2, the path length H (S ,D ) + 4 appears in Case A.2 and Case B.2. In Case A.2, there is only one path of length n + 2 which is in Q^-i • We need to replace this path with another path of length n. In Qn-i, let k be the dimension which is not in J(S, D) and the other good neighbors of S and D be along dimensions i and j , respectively. If i — j = k, this implies th at S and D have another good dimension k in Q°-i- Cut Q®~i into Qn- 2 an(l Q \- 2 along dimension k. W ithout loss of generality, let S and D be in Q®_2. Through their good neighbors in Ql n_2i S an<l D have n — 2 node-disjoint paths in Q \_2. They are of length n. Pick two paths since one is accounted for in the proof of Theorem 4.2 and we need the other to replace the path of length , n + 2. Assume either i or j ^ k, and we can construct two paths of length n as follows. W ithout loss of generality assume j = j\ k and construct paths k(ji ■•-jn- 2 )k and k(j 2 . . .j n- 2)kji i t I I 80 I li i ^ k and j ^ k, then without loss of generality let i = j i , j = j 2 and construct paths jik ( j3 ■ ■ • j n- 2)kj2 and k(j3 ■ . .jn - 2 3 2 3i)k Again one of the two paths have already been accounted for in the proof of Theorem 4.2 and the other one can be used to replace the path of length n + 2. In Case B.2, a path is of length n + 2 if it passes through the neighbors of S and D along wrong dimensions (those dimensions not in J(S,D )). Since no good dimensions exist and there are only 2 wrong dimensions, both S and D can have at most one good neighbor along some wrong dimensions. Hence there is only one path of length (n + 2). The total number of paths of length n is p ■ q — 1. Since P ^ 2, q > 2, p ■ q — 1 is always greater than p + q — 2. Thus the Theorem is true for all n — 2 ^ 2. When n — 2 = 2 which is the case of a 4-dimensional hypercube, it is trivial to find a path of length < n. □ 4 .4 .1 3 or M ore N o n -F a u lty N eig h b o rs If each node has more than 2 good neighbors then the bound on diam eter will be n. The difficulty arises when the distance between S and D is (n — 1) and all faults are in the (n — l)-subcube containing S and D. We show in the following lemma that in this case a minimal path exists. L em m a 4.1 With the number of faulty nodes jT1 ] < 2n in Qn, n > 4, if both S and D have at least two good neighbors and H (S , D) = n then there exists a path of length n. Proof: (a) There is a good dimension i. Cut Qn along dimension i into and Q„_ 1. W ithout loss of generality, let S be in <5°-i and D in Q h-1 * ^ ne them has at most (n — 1) faults. Suppose Qn_i does. Let the neighbor of S in be X . By assumption, D has a good neighbor in If X also has a good neighbor in then by Theorem 4.3(1) the lemma is proved. Assume X has no good neighbors in . Under such a situation, S only needs to reach some distance-2 node in and then traverse dimension i to reach Q\_x. After reaching since all the 81 faults are at the neighbors of X , the rest of Q \_ i is fault-free and it can reach D from there in minimal number of hops. In S has a good neighbor X ' by assumption. S and X ' has 2n — 4 neighbors in Qn-i- Each of them can lead to a distance-2 node. There are at most n faults in Qn-i • Hence for n > 4, it is always possible to reach a distance-2 good node. When n — 4, with 7 faults S and its two good neighbors can be disconnected. (b) There is no good dimension. ! Since both S and D have at least 2 good neighbors and they must be along distinct t dimensions, n must be at least 4. Let the good neighbors of S be along dimensions i ,j and those of D along k , 1 . Cut Qn into Q^_2, * = 0? 152 ,3, along dimensions i and k as shown in Fig. 4.3. Let the neighbors of S and D in Q°_2 be X and Y, respectively. There are n — 2 paths between X and Y. Construct two more paths as follows i k ( j i.../.. and (jji • • - j ■ • -im - 2 ) ^ as shown in Fig. 4.3, where J ( X , Y) — {jx • • -jm - 2 } and / and j denote that I and j are absent in the sequence {jt • • • jm — 2 } ■ It is easy to see that they are minimal and disjoint. Since there are n faulty nodes among the neighbors of S and D, at most n — 1 faults are left to block these n paths. Hence, one of these paths must be fault-free. □ For n = 3 there can be no faults if every node has 3 good neighbors. For n = 4,5, some combination of node and link failures can increase the diam eter to n + 1. However, for n > 5 the diameter can be shown to be n under this fault scenario. T h e o re m 4.6 In Qn, n > 5; with number of faults |F | < 2n — 2, if every node has at least 3 good neighbors then the diameter is bounded by n among the connected nodes. Proof: For two nodes S and D, if H(5', D) = n or < n — 3, by Theorem 4.3 and 4.2, respectively, a path of length at most n exists. If H(S', D) = n — 2, we show in the proof of Theorem 4.5 a path of length H(S, D) + 2 — n exists. If H(S, D) = n — 3, consider the following two cases. Case 1. Good dimensions exist. 82 n - 1 n —2 n —2 Figure 4.3: There are n — 2 paths between X and Y in Q ^- 2 - Two more paths are constructed. 83 If there is a good dimension in J ( S , D ), then the paths constructed in Theorem 4.2 are of length (n — 1). If good dimensions exist only outside J ( S , D ), which is Case A.2, there are 2 paths of length n + 1 through Q\_i- We need to find two other paths of length at most n — 1 to replace them in the following 2 scenarios. (a) If S and D have another good dimension in Q n-n then there are (n — 3) paths across this dimension in the neighboring (n — 3)-subcube. Since n > 5, we can always find two more paths which are of length (n — 1) in that subcube. (b) No good dimensions exist in Q ^ -i• There are two wrong dimensions ki, k2 ^ J ( S , D ), in Qn-i- If S has a good neighbor along some dimension i € J ( S , D ) and D also has a good neighbor along some other dimension j € J ( S , D ) , then construct 2 paths in each adjacent (n — 3)-subcube as follows. Let J ( S , D ) = { j f ' x . . . j m - 3 } and without loss of generality, let i = j i and j = the two paths are ji&i(.73 • • • j m - z ) k \ j 2 and kx( j 3 . . . j m- 3 h h ) k \ j\k 2{h - ■ ■ jm-z)k232 and & 2(j3 • • . j m - s j M h One of the paths in each line has been accounted for and the others can replace the paths of length n -f 1. If either S or D does not have good neighbors along dimensions in J ( S , J9), then without loss of generality let D be the node that has no good neighbors along di mensions in J ( S , D ) . Then, its two good neighbors are along the wrong dimensions and S must have 2 good neighbors along some dimensions i , j € J ( S , D ) . Again we can let i = j \ and j = J 2 and construct 2 paths in each adjacent (n — 3)-subcubes as follows, j i k i ( j 2 . ■ ■ j m -z ) h and j 2f a ( j 3 . . . j m- 3j i ) fa j \ k 2{ j 2 • • • 3 m - z ) k 2 and j 2k2( j 3 ■ • - j m - z 3 i ) k 2 Even though the two paths in each line share the last node, it is a good neighbor of D . In this subcase, it is impossible for both S and D not to have good neighbors in J ( S , D ) . Case 2. No good dimensions exist. 84 In Case B .l there is no path of length n + 1, but in Case B.2, p-q paths which are of length n — 1 and n + 1 are constructed. Since there are only 3 wrong dimensions and there are no good dimensions, S can only have at most one good neighbor and D can have at most 2 good neighbors along wrong dimensions, or vice versa. Therefore, there are at most 2 paths of length n + 1 in (4.3). The total number of paths of length (n — 1) is p ■ q — 2. Since p,q > 3, this number is still more than the required number p + q — 2 . The other case to be considered is when H (S,D ) = n — 1. We need to show that a fault-free path of length n — 1 exists. Since H (S , D) = n — 1, S and D have the same value at some bit i. Cut Qn along i into and Q\_ j and assume S and D are in j. The worst case situation happens when Qn-i contains all 2n — 3 faults. By Lemma 4.1, there is a minimal path. Hence the theorem is proved. □ i 4.5 D iscu ssio n Basic properties of faulty hypercubes are studied in this chapter. Specifically, bounds on shortest paths and diameter of faulty hypercubes are obtained for num ber of faults less than 2n — 2. The shortest paths between two nonisolated nodes is shown to be bounded by their Hamming distance plus 4. An efficient algorithm is given with complexity 0(|.F | log n) to find such a path. The diam eter of faulty hypercubes is bounded by n + 2 for |F | < 2n — 2. Worst case examples are con structed to show that these bounds are tight. The Best known previous bound for the diam eter of connected faulty hypercubes with | j F | < 2n — 2 is n + 6 [24]. We also show that for sufficiently large n if every node has 2 or 3 good neighbors, the diam eter can be reduced to n + 1 or n, respectively. W ith these results, efficient schemes for simulating algorithms on faulty hypercubes are constructed in the next chapter. 85 C h a p ter 5 S im u la tio n o f N o rm a l A lg o rith m s o n F au lty H y p ercu b es 5.1 In tro d u ctio n Many algorithms have been designed for hypercubes to solve various kinds of prob lems, such as sorting [54], BPC routing [53, 10], linear algebra computations [40], PD E algorithm [51] and many other applications [36, 37]. Most of these algorithms possess a common feature — data communication takes place in a regular fashion. In a step every node communicates with its neighbor along the same dimension. SIMD hypercubes operate in this fashion. In MIMD hypercubes, synchroniza tion techniques are employed to facilitate this type of operation. Processors first synchronize and then communicate. This class of algorithms are called normal algorithms [72]. In this chapter, we present a deterministic scheme, without any assumption j on fault distribution, for simulating Normal Algorithms on faulty hypercubes. We introduce a key concept, called free dim ension. Based on this concept the fol lowing results are obtained. In an n-dimensional hypercube, with arbitrary n — 1 faulty nodes, our scheme can simulate any normal algorithms on an n-dimensional faulty hypercube, with a small constant slowdown factor. Specifically, the compu tation tim e is doubled and communication time is increased by at most 3 times when the number of faulty nodes / < [n/2] and bounded by a factor of 9, when [n/2] < f < n. (Communication time is measured in term s of number of hops.) In the latter case, the factor can also be 3 if a specific condition is satisfied. (The 86 condition is the existence of free and uncovered dimensions as explained later.) Since worst case scenarios are rare, our simulation scheme is expected to perform with a slowdown factor of 2 to 3. For some primitive global operations: broadcasting, global maximum, global minimum, global logical A N D , and global logical O R , the simulation scheme is not the most efficient method. Since these operations are frequently used, it is desirable to perform them more efficiently on faulty hypercubes. Based on the results in the previous chapter, it is shown that these operations can be completed in 2n steps, if the number of faults is less than n. Under the condition that the locations of faults are unknown, this is shown to be the optimal number of steps. This result can be generalized to the case where at most 2n — 3 faults are present; however, 4n steps are needed. Using the global logic O R operation, the process for establishing simulation of normal algorithms which includes reassigning the tasks of faulty nodes and finding new paths for the links incident on faulty nodes requires 0 (n) time when number of faults / < [n/2] and, takes 0 (n 2) time for [n/2] < / < n. The process runs in parallel with each node requiring only local knowledge of faulty nodes. Notations and assumptions specific to this chapter are as follows. A subcube spanned by a set of dimensions S is labeled by a ternary string composed of {0,1, *} with *’s at bits in S. Two subcubes are adjacent if they are spanned by the same set of dimensions, and their labels differ in exactly one position. An Tsubcube can also be viewed as a supernode in Qn- i • We denote a Qn - 2 obtained by collapsing dimensions *,/, i.e., whose nodes are 2-subcubes spanned along i,j, as Qn-l- In this chapter, only node failures are considered. Link failures can be absorbed as node failures easily. A faulty node is assumed to be completely non-functional. It is assumed that when a message is sent to a faulty node, the message disappears and when waiting for a message from a faulty node, no response will be received, leading to a tim e out. In effect, the faulty node is assumed to be turned off. This assumption is not crucial. It can be implemented by letting a node ignore the messages sent by its faulty neighbors. Logic operations such as A N D , O R , E X C L U S IV E -O R , etc. on two words are assumed to take unit tim e as in the previous chapter. 87 5.2 F ree D im en sio n s In this section we introduce the concept of free dimension. Based on the results on free dimension, we describe our simulation scheme in the subsequent sections. D efin ition 5.1 A dimension i is free if and only if for all pairs of adjacent nodes across dimension i, {xxx 2 ... X i... x n, X\X2 ■ ■ . X i... x n), at most one of the two nodes is faulty. Otherwise dimension i is said to be occupied (by faults). D efin ition 5.2 Given a set of nodes F which is a subset of the hypercube nodes Vn, F induces a subgraph G f = (V, E) of Qn where the vertex set V = F and there is an edge between two nodes f i,f 2 G F iff f\ and f 2 are adjacent in Qn. P rop osition 5.1 If G f induced by |F| < n + 1 is connected, then F can at most occupy |.F| — 1 dimension. Proof: Let I be a spanning tree of G f - It is clear that every edge of T occupies a dimension. For an edge e in G f but not in T, there exists a path P entirely in T between the end nodes of edge e. Since in a cycle of a hypercube, every dimension must appear an even number of times, the dimension where edge e lies must also be in the remaining part of the loop P + e, i.e., in P. Therefore, the dimensions occupied by G f is the same as those occupied by T. There are |F | — 1 edges in T. Hence G f can at most make |Fj — 1 dimensions occupied. It is easy to see that when G f is a star graph, G f can occupy |F | — 1 dimensions.□ T h eorem 5.1 In Qn with a set of faulty nodes F, the number of dimensions oc cupied by F is at most |jF| — 1 provided li^j < n, and hence at least n — |.F| + 1 free dimensions exist. Proof: Let the number of connected components of Gf be m. Since each component can at most occupy a number of dimensions equal to one less than its size (number of vertices), by Proposition 5.1, the total number of dimensions which can be occupied by Gf is at most |.F| — m. Therefore the number of occupied dimensions is at most |F | — 1 when Gf forms a single connected component. □ 88 5.3 R ea ssig n in g F au lty N o d e s and T h eir L inks In order to run a normal algorithm on a faulty hypercube, tasks of all faulty proces sors assigned by the algorithm need to be reassigned to some fault-free processors. Since the task of a faulty processor communicates with other processors through the links incident on the faulty processor, new communication paths between the new location of the faulty node’s task and the neighbors of the faulty node need to be established also. If one one neighbor of the faulty node is also faulty, the path goes to the new location of that neighbor. From Theorem 5.1, we know that as long as |.F| < n there is at least one free dimension. The tasks of the faulty nodes can then be assigned to their neighbors across a free dimension, since for all pairs of nodes along a free dimension at most one node is faulty. However, n faults can disconnect a node. Hence, in the following |.F| < n is assumed. Once the tasks of faulty nodes are relocated, new communication paths are established as described in the following. Consider the 3 possible fault cases for simulating the communication along a dimension j as depicted in Fig. 5.1, where dimension i is free and faulty nodes are marked by a cross ‘x \ In Fig. 5.1(a) node 0 is faulty and its task is relocated to node 1. For the communication between the task of faulty node 0 and the task of node 1 along dimension i can now be achieved within the same node 1. Communication with node 2 along dimension j can be achieved by going through node 3 with dilation 2. In Fig. 5.1(b), both the tasks of nodes 0 and 2 are reassigned to nodes 1 and 3, respectively. For the task of faulty node 0 (2) communication with the task of node 1 (3) along dimension i is within itself, whereas communication with the task of node 2 along dimension j can now be done in one step by communicating with node 3. In Fig. 5.1(c), communication along dimension i can be achieved within node 1 and 3. For communication along j , a path between node 1 and node 2 need to be established. It can not be routed within the 2-subcube. However, a path of length 4 can be found through an adjacent fault-free 2-subcube. There exists at least a fault-free adjacent 2-subcube due to the following. There are 2 faults in the 2-subcube and the 2-subcube has (n — 2) adjacent 2-subcubes. If every adjacent subcube contains a faulty node, 89 (a) (b ) (c) Figure 5.1: Three cases of fault patterns. then the total number of faulty nodes is 2 4- ra — 2 = n > |.Fj. For all dimensions, new paths can be established as above. 5.4 S im u la tio n o f N o rm al A lg o rith m s Normal algorithms operate in a synchronous fashion where all the processors per form their operations in locked steps. In one step, they either perform processing or communication. In communication steps, they send (or receive) messages along the same dimension. In faulty hypercubes, we separate the processing steps (and the communication steps) into two phases. In the first phase, all fault-free proces sors perform processing (communication) for their own tasks. In the second phase, the ones having the tasks of their faulty neighbors perform processing (communi cation) of the tasks for the faulty nodes. Therefore the processing tim e is doubled, since each node assumes at most 2 tasks. For communication, if it is in the sit uation of Fig. 5.1(a) or (b), it takes at most 3 times as much - one hop for the first phase and at most 2 hops for the second phase. However, if the situation is as in Fig. 5.1(c), it can take 8 hops in the second phase. This is due to the fact that a path is shared by two pairs of tasks - those in nodes 1 and 3 and those in nodes 0 and 2 and each pair takes 4 hops for communication. The total time taken in the 2nd phase is 8 times as that in the first phase only if all such paths in the second phase of a communication step along dimension j, 1 < j < n are node-disjoint; i.e., no further congestion on the 4-hop paths. In the following we show that all those paths can be made node-disjoint. Note that the messages ex changed between nodes 0 and 2 and the messages exchanged between nodes 1 and 90 3 can be combined and sent together to reduce communication time. Thus, the actual communication tim e may not be as much as 8 times. Let Fq be the set of 2-subcubes spanned along a free dimension i and another dimension j and each element of Fq contains 2 faulty nodes with distance 2, as shown in Fig. 5.1(c). T h eorem 5.2 When the number of faulty nodes is at most n — 1, for every 2- subcube in Fq, a path of length 4 can be constructed between the two nonfaulty nodes through an adjacent fault-free 2-subcube in such a way that the \Fq\ paths are node-disjoint. Proof: If li^l = 0, the theorem is obviously true. Assume |Fg| > 0. Let each 2-subcube spanned along dimensions i and j be a supernode. Thus Qn can be viewed as an (n — 2)-dimensional hypercube with each node being a supernode. A supernode is faulty if it contains one or more faults. Since |.Fg| > 1 the num ber of faulty supernode is at most |.F9| + (n — 1) — 2|.Fg| < n — 2. The equality holds when IF1 , | = 1. By Theorem 5.1, in the (n — 2)-dimensional hypercube there exists a free dimension I. Therefore, every faulty supernode has its /-dimensional neighbor fault-free. That is, the 2-subcube neighbor is fault-free. For each 2-subcube in Fq, the path can go through its /-dimensional neighbor. Therefore each path lies completely in a distinct 3-dimensional subcube spanned along dimension i,j,l. The |F ,| paths are thus node-disjoint. □ C orollary 5.1 All the paths established in the second phase of the communication steps for simulating the fault-free communication along dimension j , are node- disjoint. Proof: For the 2-subcubes having faults as depicted in Fig. 5.1 (a) and (b), the path lies entirely in the 2-subcube. For the 2-subcubes having fault pattern as Fig. 5.1(c), as shown in the Theorem 5.2 the path goes through a fault-free adjacent 2-subcube. Therefore, by this fact and Theorem 5.2, all paths are node-disjoint, since all paths lie in mutually disjoint subcubes. □ L 91 Many normal algorithms exchange information along all the dimensions in a regular fashion. For example, in [10], the LC routing algorithm exchanges in formation along every dimension exactly once. The worst case of simulating the communication along dimension j happens when dimension i is free and there are two faulty nodes with distance 2 in a 2-subcube spanned along dimensions i and j, as depicted in Fig. 5.1(c). In this case, to simulate the communication along the j t h dimension it takes 8 hops. However, it may not be the worst case for every 1 < i < Therefore, the overall performance may not degrade by a factor of 9 if every dimension j is not the worst case situation. A free dimension is cho sen carefully so as to minimize the number of dimensions in which the worst-case communication happens. D e fin itio n 5.3 Dimension i is covered if there are two faulty nodes in F with distance 2 and differing in exactly two bits, bit i and another bit. Otherwise it is said to be uncovered. A free dimension with no worst case situation in all dimensions 1 < j < n is thus an uncovered free dimension. We now show that when \F\ is less that n / 2, there exists an uncovered free dimension. Using such an uncovered free dimension, simulating the fault-free case communication increases communication tim e by a factor of 3. An example that a dimension is covered is that shown in Fig. 5.1(c), where dimensions i and j are both covered. From the set of faulty nodes F , a graph Q{V, E ) can be constructed where the vertex set V is the set of faulty nodes F and the edge set E is defined as the following: there is an edge between 2 vertices if the two vertices have distance 2. We have the following proposition. P ro p o s itio n 5.2 If all dimensions of Qn are covered then |F | > [ra/2] -f 1. P ro o f: Let T be a spanning forest of Q. An edge of Q covers 2 distinct dimensions. For an edge in Q but not in T , there is a path between its two end nodes entirely in T . Therefore the dimensions covered by the edge must be in the dimensions covered by the path. The dimensions covered by F are the dimensions covered by 92 the edges of T . T has at most |P | — 1 edges. In order to cover all dimensions, we \ have i ' 2 (|F | - 1) > n => |P | - 1 > n/2 = > • |F | > n /2 + 1 => |F | > \n j 2] + 1 ' . ! where (|F | — 1) is the maximum possible number of edges in Q. □ T h eorem 5.3 If\F \ < [n/2] + 1, then there must be a free and uncovered dimen sion. I P roof: Again let T be the spanning forest of Q. If two adjacent (adjacent in Qn not in Q) faulty nodes which occupy a dimension are in a spanning tree t € T , then this Joccupied dimension is also covered by some edge of t. This is because there is a Ipath in t between these two nodes and, hence, by the basic property of hypercube this path must cover the occupied dimension. 1 Consider the case where there are more than one pair of adjacent (again adja- jcent in Qn) faulty nodes between two spanning trees t\ and t2 of T . Let (tq, v2) [and ke two such pairs of nodes, where Vi and v[ are in t\ and i>2 an<i v 2 [are in t2. (t>i, v2) occupies a dimension i and (u], v'2) occupies another dimension I 1 i'. If i is an uncovered dimension, then i = i' as explained below. In < 1 , there is ja path Pi between ui and v[, and in t2 there is a path P2 between v2 and v'2. P\ jand P2 when connected through {vx,v2) and (v[, v'2) form a loop. On this loop, 'every dimension has to occur an even number of times. Since i is not covered, i ; does not appear on Pi and P2. Thus V has to be equal to i, so th at i appears twice. From above, we can conclude that between any two spanning trees there l [is at most one uncovered and occupied dimension. The number of uncovered and [occupied dimensions is therefore at most \T\ — 1. The number of covered dimen sions is at most 2 (|P | — |T |), since every edge in F covers two dimension and there jare (\F\ — \T\) edges. The number of the rest of dimensions which are free and ; [uncovered is : |After simplification, the least number of free and uncovered dimension is n — 2|.F| + |X| + 1 = n — 2\n/2\ -fi |T |+ 1 when |jF| = [n/2] j The quantity is at least 2 when n is even and 1 when n is odd since \T\ > 1. □ ; For |F | < [n/2] + 1, Theorem 5.3 implies that there exists a free and uncovered ! dimension along which the tasks of faulty nodes can be assumed by their neighbors jso that simulating the communication takes at most 3 times as that of the fault-free situation. 5.5 Fast G lob al O p eration s iWhen running concurrent programs on hypercube machines to solve problems, [often we need to perform global operations. For example, each processor of a hy- i percube machine obtains a solution for a problem. To get the lowest cost solution, i we need to perform a global-Min operation to find the solution of minimum cost. Such an operation is supported on iPSC2 hypercube computers by an efficient sys- jtem routine - GxLOW which is available to users’ application programs through Isystem calls. Other supported global operations are (GxHIGH) to find the global maximum, global sum ( GxSUM), global A N D (GxAND), etc. The previously described simulation scheme can be applied to perform these Operations on faulty hypercubes, since they belong to the class of normal algo rithm . However, this may not be the most efficient approach. In this section, ,we present a fast algorithm to perform some im portant global operations, namely, ' global minimum, maximum, A N D , O R operations, and broadcasting. When the number of faults is less than n, an optimal algorithm which runs in 2n steps is presented. When the number of faults is less than 2n — 2, this algorithm runs in 4 n Isteps. 3n — 1 is shown to be the lower bound. The algorithm is simple. Every node , iperforms the same procedure - exchange its values with neighbors and perform the j | I operation along dimensions 1 to n. No knowledge of the fault locations is required. | i Assume every node v contains a value b(v) in the beginning and denote the 1 operation as ‘+ ’, we like to get E^=0b(v) in all fault-free nodes at the end of |computation. Every node will perform the following simple operations. ' i : 9 4 i 'A lgorithm G lob for i = 1 to c do < for j = 1 to n do b(v) = b(y) -f b(v^) ■One completion of the second for statement is called one round, c is the number of rounds needed for the global operations. I ] j5.5.1 N u m b er o f F au lts L ess T h a n n |Since there are n node-disjoint paths between any two nodes in a hypercube, with 1 less than n faults, there must be a fault-free path between these two nodes. If by performing 2 rounds of exchanges the value of a node S can reach an arbitrary node D through all the n node-disjoint paths between them , then one of these paths must have successfully reached D. J I ! Let S and D differ in J (S ,D ) = {ji < ... < jm}- The dimension sequence of j the n node-disjoint paths are the following i ! (ji ■ • -jm)m m minimal paths of length m i j (ji • • • jm )j j £ J(S, D ) (n - m) paths of length (m + 2) jRecall that (j\ denotes the set of m minimal paths obtained by rotat ing (jq .. .jm) cyclically m times. Consider a minimal path (jk ■ ■ ■ Jmji • •. jk-i) \ 'in (5.1). In step jk of the first round, S sends its value along dimension jk to its j 'jVdimensional neighbor S'. In the next step until jk+i, S' gets values from other neighbors. In step jk+i, S' sends its value which has been combined with the value of S along dimension jk+i- Therefore the value of S has traveled dimensions jk |and jk+i- It is easy to verify th at in this fashion the value of S can travel through all dimensions from jk .. - jm on the path in the first round. In the second round, - ;it then travels through the rest of the dimensions in J(S, D) to reach D. * I For a path of length m + 2 in (5.1), rearrange the dimension sequence to j iOfc ... if there is such a k, jk-i < j < jk- Since all the paths (5.1) of length m + 2 lie in a distinct m-subcube as mentioned in the previous chapter, j rearranging the dimension sequence does not change the node-disjoint-path prop- , I . . . . ■ ' ( erty. For this situation, in the first round the value of S can travel the dimension (sequence jjk ■ ■ ■ j m and in the second round j \ ... jk -ij ■ In the case where j > j m, use the dimension sequence in (5.1). In the first round, it can only travel dimension j. However, in the second round it can travel all other dimensions j \ ... j mj . The jabove argument can be applied to any two nodes. Therefore, in two rounds every node can pass its value to all the fault-free nodes. Hence every node at the end has the result of the global operation. In the worst case, a node S can have all its neighbors faulty except the n- dimensional neighbor. Therefore in the first round the first n — 1 steps have no effect. Since the diameter of the network in this fault scenario is n + 1 and the I 'longest shortest path is between S and another node, it takes at least n + 1 steps to reach the farthest node. Therefore the minimum number of steps needed is 2n. W hen no knowledge of the fault locations is assumed, 2n is the minimum number Jof steps needed to complete the operation. Suppose this is not minimum, i.e., there exists an algorithm in which every node sends at most n + k , 0 < k < n messages along his neighbors. It is clear that k > 0 since n steps is the minimum number of steps in the fault-free case. Choose a node v which sends message along dimension z in the nth step. Let all the neighbors of v be faulty except the z'th dimensional neighbor. Then the value in v comes out of v only in the nth step and it takes at 'least another n steps to reach the farthest node. Thus k must be at least n. 5 .5 .2 N u m b er o f F au lts B etw e e n n and 2n — 3 i W hen the number of faults is more than n, an n-dimensional hypercube can be f 'disconnected. However, those global operations can still be performed in the largest i connected component. Since it is connected, every node has at least one good neighbor. The results in Chapter 4 show that between any two nodes S and D there is at least one fault-free path of length at most their Hamming distance | iplus 4. In this section, we show that in 4 rounds, those global operations can be ; i _ I |achieved by Algorithm Glob. 1 ! * ; i ! 96 [T heorem 5.4 In Qn, with number of faults |.F| < 2n — 2, global Min, Max, A N D , lOR operations can be achieved by Algorithm Glob in at most f rounds. i P ro o f: This proof is based on the proof of Theorem 4.2 in Chapter 4, where it is shown that there are always enough number of paths so that faults can not block all of them . In the following, it is shown that in 4 rounds all the paths constructed for any two nodes S and D in the proof of Theorem 4.2 can be traversed from S to D. Hence at the end of the 4th round, D will have operated on the value from S. T his is true for all pairs of nodes. Hence, these global operations are achieved. I I I C ase A There is a good dimenion i. In this case, 2n — 2 paths are found between S and D. jCase A .l i E J(S ,D ) jO ne set of (n — 1) paths found in a Qn-i is I | (ii • • • i • • m — 1 minimal paths of length m ! j( ji . . . i . . . jm )ji j $ j (s , D) (n - m) paths of length (m + 2) By the analysis of the previous section, except the last dimension i all the rest can .be traversed in 2 rounds. It takes at most one round to traverse the last one. The 'other set of (n — 1) paths are similar except that i is at the beginning instead of at the end. In 3 rounds, this set of paths can also be traversed. ! 'C ase A .2 i ^ J (S ,D ) The first set of (n — 1) paths is ! j (il • • ■ jm)m m minimal paths of length m ' ■ j( ji • • •jm )j j i J(S, D ),j ^ i ( n - m - 1 ) paths of length (m + 2) This set can be traversed in 2 rounds by the result in the previous section. Another set of (n — 1) paths is i ! (5.2) I 97 I i(ji • ■ • jm)mi m minimal paths of length m + 2 j ij(ji • • •jm )ji j £ J(S, D ),j ^ i (n - m - 1) paths of length (m + 4) iThe middle part of these paths which is the same as the first set of (n — 1) paths ! * can be traversed in 2 rounds. The first round can traverse the i at the beginning and the last round can traverse the i at the end. C ase B There is no good dimension. In this case, there are n faults among the neighbors of S and D. Assume S can reach its neighbors along dimensions { x \ ... xp} and D can reach its neighbors |along dimensions { y \.. ,yq}. j i I j J C a se B .l Some Xi,yj € where 1 < i < p and 1 < j < q (n — 2) paths are constructed. They are i i i X i(ji... £{... yj .. . j m)m~2yj m — 2 minimal paths of length m 1 | x a ( j i ... x i . . . iJj ... j m)jyj j $ J (S', D ) (n - m) paths of length (m + 2) J jThe middle part of the set of paths can be traversed in 2 rounds. Each of the X{ a t the beginning and the yj at the end can be traversed in one round. Hence in 4 ■rounds S can reach D. j C ase B .2 { x \ ... x p} fl J (S , D) = 0 or {y\ ... yq} D J (S , D) = 0 If for some x € {xx ... xp} fl J(S , D) ^ 0 then {yx . . . y q}CI J (5 , D ) = 0. According ; to the proof of Theorem 4.2, q paths are constructed with initial dimension x as ] i follows j I (x y jjl...jm jl ■ • j J i •where < x < j h j If x < yj, then in the 1st round xyj can be traversed and in the 2nd round j i ... j m axe traversed, in the 3rd round j i .. .ji-i are traversed and yj is traversed in the last round. If x > yj and yj < ji then in the first round only x is traversed, in the second [round ijjji... j m are traversed, in the 3rd round j\ .. .ji~\ and in the 4th round yj 98 jean be traversed. If x > yj and yj > ji, then in the first round only x and in the 2nd round only yj are [traversed, in the 3rd round j i .. . j m are traversed and in the last round j \ ... ji-iVj can be traversed. For x € {£1 ... xp} but x £ J(S, D ), the q paths are xyjiji • • ■jm)xyJ Jin the following discussion, we put a vertical bar in the sequence to indicate the jend point which can be reached by one round. 1 if x > yj, 1 < yj < ji : x\ys(jt... j m\ j i... j i ^ x l y j l j if x < yj, ji- x < yj < ji : x y j(ji. ■ ■ -ji-i\)xyj\ where 1 < I < m. If yj < ji or yj > j m, it is easy to see that the path can be traversed in 4 rounds. | In the case th at {ari... xp} fl J(S, D) = 0 then some y € {yi ■ . .yq} n J ( S ,D ) ^ !0. The p paths for y 6 J(S ,D ) are I \ Xi{jl . . .jm jl • • -jl-\Xiy) .where jj- i < y < ji- if Xi < jl, Xi(jl . ■ • -i/-lk * b )| if Xi>jh x i\ ( ji ...j m\ j i . . . j i - 1x i\y)\ For y £ J(S ,D ), the p paths are ZiV(ji ■ • •jm)xiy jThese set of paths are the same as in the previous case and can be traversed in 4 rounds. 99 ! However, m = 2 is a special case. The number of faults can be the same as the number of paths constructed. It happens when the two common neighbors of S and D are inaccessible and S or D has only one accessible neighbor as shown in iTheorem 4.2. In this case, there is at least one path of length m + 4 of the form x y (jij2)xy. Replace this path with two node disjoint paths x j\y x j2y and x j2yxj\y. Then we have more paths than the number of faults. Note these two paths are different from those constructed in Theorem 4.2. However, they serve the same ipurpose for the proof. It can be seen that the following are all the possible cases: 1. x < ji, xji\y\xj2\y\ x j2\y\xj1\y\ 2. j 2 < y , a:\j1y\x\j 2y\ x \hy\x\jiy\ 3. jt'i < x < y < j 2, x\jiy\xj2\y\ x j2\y\x\j1y\ 4. ji < y < x < j 2, x\jiyx\j2\y\ x j2\yx\jty\ 5. y < ji < x < j 2l x\jx\yxj2\y\ x j2\yx\jx\y\ 6. ji < y < j 2 < x, x\j-yyx\j2\y\ x\j2\yx\jty\ 7- y < ji < h < x, xljilyxlj^yl x\j2\yx\ji\y\ Except case 7, in all other cases, the two paths can be traversed in 4 rounds. For case 7, there exists two other paths x\yj2x\j\\y\ and x\jt\yj2x\y\, which can be traversed in 4 rounds. Even though the message can not pass through the two ipaths in case 7, it can pass through the above two paths in 4 rounds. □ ■ In Fig. 5.2, S has all its neighbors faulty except the n-dimensional neighbor i IS 1 ', and S' has all its neighbors faulty except its (n — l)-dimensional neighbor S" [and S. Thus in the first round the value of S can only propagate to S' and in the l(n — l)th step it reaches S". From there to reach its farthest node takes at least another n steps. Therefore, at least 3n — 1 steps are needed. ( |5.6 Im p lem en ta tio n o f th e S im u lation S ch em e !ln this section, the process for establishing simulation of normal algorithms is I addressed. This process involves finding free (or uncovered free) dimensions, reas signing the tasks of faulty nodes and the links incident on the faulty nodes. The 0...010 s n — 2 fa u lt:; Q n - 2 n —2 5 " ' = 0...011 n — 2 fa u lt!! Q n —2 Figure 5.2: An example where at least 3n — 1 steps are needed. global operations described in the section are employed to complete the process efficiently. The number of faulty nodes, |F | — / , is assumed to be less than n. i 5.6.1 F in d in g Free D im en sio n s Since |F | < n, for a pair of neighboring faulty nodes which is a 1-subcube, it has (n — 1) neighboring 1-subcubes. If each of these neighboring 1-subcube contains > a faulty node, then the total number of faulty nodes is 2 + n — 1 = n 1 > |.F|. ; Hence there must be a fault-free 1-subcube adjacent to the totally faulty 1-subcube containing 2 faulty nodes. We conclude that if j.F| < n, there must be a fault-free 1-subcube adjacent to a totally faulty 1-subcube. Assume each node knows the fault status of its neighbors and this information is stored in an n-bit word b whose zth bit being one indicates th at the zth dimensional neighbor is faulty and being 0 indicates fault-free. By exchanging their status words and A N D -ing the two status words, the two nodes of a fault-free 1-subcube can detect if there is a totally faulty neighboring 1-subcube. A 1 in the zth bit of the 1 0 1 j result of A N D indicates th at the ith dimensional neighboring 1-subcube is totally I i . . . . faulty. Hence the dimension they cross is occupied. If the result of A N D is not 0 in all bits, then both nodes record this in another n-bit word b' by setting its ith bit to 1. Let all fault-free nodes perform such operations with all its fault- free neighbors. Then a pair of faulty nodes occupying a dimension can always be detected by some pair of neighboring fault-free nodes. A formal parallel algorithm for detecting occupied dimensions is given below. A lg o rith m D e te c t for all fault-free nodes v do for j = 1 to n do b eg in send b(v) to /* b(v) word b of v */ receive b(v^) from vW; if b(v) A N D 6(uW) ^ 0, th e n set the jth. bit of b'(v) to 1; end; Here we assume receiving b from a faulty node, say gets no response and b (v^) is set to 0. Algorithm Detect takes 0(n) parallel steps. To know which dimension is fault-free, we need to O R b'(v) of all fault-free node v in the hypercube. The result is denoted V«,e(Q„-F) b'(w). Algorithm Glob can be applied with £ + ’ replaced by logic O R operation. If the eth bit of word b' at the end of Algorithm Glob is 0, then it implies that no pair of fault-free nodes can detect that it is occupied. Hence it must be free by the argument given before - with |F | < n a pair of neighboring faulty node can always be detected. 5.6.2 F in d in g U n co v ered Free D im e n sio n s I In a 2-subcube spanned by a free dimension and another dimension, there are at most two faulty nodes. If there are 3 or more faulty nodes, then both dimensions must be occupied. If a free dimension i is covered, then in some 2-subcubes spanned 1 0 2 along dimension i and some other dimension j there are 2 faulty nodes with distance 2. In such a 2-subcube, the b word of either fault-free node has bit i ,j with value 1 and thus indicating the fact that free dimension i is covered. Hence for all free dimensions there must be a fault-free node whose word b indicates the fact. I To check if there is an uncovered free dimension, we need to O R the b words of all fault-free nodes in the hypercube. This can be done by running Algorithm Glob with b. At the end of Algorithm Glob, the bit positions of b with value 1 are those covered dimensions. To determine if there is a free and uncovered dimension, simply O R b with b'. A bit position with value 0 indicates that the dimension is free and uncovered. ]5.6.3 R ea ssig n in g F au lty N o d es an d T h e ir L inks If there are free and uncovered dimensions, all fault-free nodes choose the smallest such dimension i and assume the tasks of their «th dimensional faulty nodes in parallel. New paths are established as shown in Fig. 5.1(a),(b), since Fig. 5.1(c) can not occur. | When there is no uncovered free dimension, let every node choose the smallest free dimension i and assume the tasks of their faulty i-dimensional neighbors. However, to establish new paths we need to find free dimensions in Q i- 2 as shown in Theorem 5.2 whose nodes are 2-subcubes spanned along dimension i and some other dimension /. In one of these supernodes, there are 2 faulty nodes with ] distance 2. The same algorithm for finding free dimensions can be used for Qn- 2- In running Algorithm Detect, all fault-free nodes in a faulty supernode are turned off. They neither send nor receive messages. This can be easily done by checking I internally the other 3 nodes. In a fault-free supernode, the 4 nodes O R their word b together to have the same value of b. In this way, if one node is adjacent to a faulty node of an adjacent supernode, all nodes know that supernode is faulty. Then all fault-free supernodes exchange b along all dimensions except i and I (the spanning dimensions of supernodes) and perform A N D operation. A 1 bit in b at the end of executing Algorithm Detect indicates an occupied dimension. 103 In running Algorithm Glob, one problem may arise. Since in the worst case, there can be n — 2 faulty supernodes as shown in Theorem 5.2. If they block messages, then between two supernodes all (n — 2) node-disjoint paths can be blocked. Thus Algorithm Glob may not produce the correct result in all fault-free supernodes. Those supernodes containing only one faulty node will pass messages. Their word b' are set to 0. The rest of faulty supernodes containing two faulty nodes remain turned olf, so that they do not pass messages. W hen a fault-free supernode receives messages from a faulty supernode, the node waiting for the faulty node receives nothing. However, it can be easily recovered by requesting it from a neighbor within the supernode. In this fashion, only those supernodes containing 2 faulty nodes block messages. Since a supernode can contain at most 2 faulty nodes (otherwise both dimensions are occupied), there are at most \n f2] supernodes blocking messages. But there are (n — 2) > fn/2] node-disjoint paths, and hence Algorithm Glob can be successfully executed. Once the smallest free dimension i is found and selected, all fault-free su- jpernodes check if their i dimensional neighbor contains 2 faulty nodes. If so, it establishes a path for the 2 fault-free nodes in the neighbor. In the worst case, we may need to run the modified Algorithm Glob for every dimension I when they are all covered. Each run takes 0{n) time and thus the entire process takes 0 (n 2) time in parallel. In practice, since worst case situations are rare, it is likely that a free and uncovered dimension exists and only O(n) steps are needed. 5 .7 D iscu ssio n i The problem of how to run algorithms on faulty hypercubes are investigated in this chapter. A simulation scheme is developed for an im portant class of algorithms — namely, Normal Algorithms. The idea is to simulate the fault-free operations of these algorithms on faulty hypercubes. When the number of faulty nodes, |F |, is less than n, the number of dimensions of hypercube, our simulation scheme only doubles the computation time and increases the communication tim e by at most 9 times. More specifically, when |T | < n/2, the communication tim e is increased by at most a factor of 3. This is also true for n /2 < |.F| < n, as long as a free 104 and uncovered dimension exists. By our scheme, many im portant algorithms can be simulated in a faulty hypercube such as sorting, BPC routing, LC routing, and many other application algorithms with a small constant slowdown factor. Our scheme can also be applied when there are more number of faults as long as a free dimension exists. For certain global operations such as broadcasting and global Maximum, more efficient approach can be taken. When number of faults is less than n, these operations can be completed in optimal number of steps, 2n, and when there are more number of faults (< 2n — 2), they can be completed in 4n steps. 105 C h a p ter 6 C o n clu sio n s and F uture R esearch Multiprocessing is a promising approach to meet the high demands for computing power. However, as more processors are put together to provide high performance, the systems become vulnerable to faults. Fault tolerance is thus an im portant issue in multiprocessor systems. Due to the structural difference, different m ulti- ! processor systems require different fault-tolerance techniques. In this dissertation, j several fault-tolerance techniques for a popular multiprocessor system, hypercube ! computer, are presented. The underlying idea of the techniques proposed in this i I dissertation is to use the inherent redundancy of hypercube computers to provide fault tolerance. Since no extra nodes and/or links are needed in these techniques, the cost of providing fault tolerance is significantly lower than those techniques which use redundant hardware components. However, in the presence of faults, the performance of the system may degrade. In Chapter 2, a survey on state of the art of hypercube fault tolerance is presented to show the contributions of this dissertation. In Chapter 3, a reconfiguration scheme which used the unused nodes in hy percube as spares to replace the faulty nodes, is developed. The key concept is to transform the embedded task, represented by its task graph, to another copy of embedding which contains no faulty nodes. The automorphisms of hypercube are used in transforming the embedded task graphs. W ith reasonable amount of spare nodes, this technique can tolerate small number of faults. A bigger class of perm utations than just automorphisms can be used to transform task graphs to strengthen the fault-tolerance capability. One future research direction is to 106 characterize this class of permutations and use them efficiently to perform recon figuration. This problem is in general very hard. But for certain special cases, for example when the task graph is some simple and regular graph, results should be obtainable. From practical point of view, the results for special cases are useful, since many common task graphs are simple and regular. The performance of the tasks running on a hypercube multiprocessor system is characterized by its basic properties such as shortest paths and diameter. In Chapter 4, these basic properties are studied for faulty hypercubes. Tight bounds on shortest paths and diameter are obtained when number of faults is less than 2n — 2, where n is the number of dimensions. Efficient algorithms for finding near shortest paths are also obtained. One can further investigate these properties when there are more number of faults. However, this will be more of a theoretical interest rather than of practical value since large number of faults is a rare event. The techniques proposed in Chapter 3 require the presence of unused nodes. However, many algorithms running on hypercubes utilize all the nodes. Techniques for executing some algorithms on faulty hypercubes also need to be developed. In Chapter 5, based on the results in Chapter 4, schemes are developed for running some classes of algorithms on faulty hypercubes. In particular, a simulation scheme is introduced for executing normal algorithms. The idea is to simulate the fault- free operations of these algorithms on faulty hypercubes. This simulation scheme only incurs a constant slowdown factor. For certain special global operations, a direct and more efficient algorithm is also developed. The run tim e is optimal when the number of faults is less than n. The simulation scheme when applied !to general algorithms may incur nonconstant slowdown factor. One im portant future research topic is to develop schemes for running arbitrary tasks on faulty hypercubes with constant or low-complexity slowdown factor. R eferen ce L ist [1] J.A. Abraham, P. Banerjee, C.Y. Chen, W.K. Fuchs, S.Y.Kuo, and A.L.N. Reddy. Fault tolerance techniques for systolic arrays. IEEE Computer, 20(7):65-75, Jul. 1987. [2] T. Anderson and P.A. Lee. Fault Tolerance Principle and Practice. Prentice Hall International, 1981. [3] M. Annaratone, E. Arnould, T. Gross, H.T. Kung, M. Lam, 0 . Menzilcioglu, j and J.A. Webb. The warp computer: Architecture, implementation and per formance. IEEE Trans, on Computers, 0-36:1523-1538, Dec. 1987. [4] J.A. Armstrong and F.G. Gray. Fault diagnosis in a Boolean n-cube array of microprocessors. IEEE Trans, on Computers, 0-30:587-590, Aug. 1981. [5] V. Balasubramanian and P. Banerjee. Compiler-assisted synthesis of algorithm-based checking in multiprocessor. IEEE Trans, on Computers, C- 39(4):436-446, Apr. 1990. [6] V. Balasubramanian and P. Banerjee. Tradeoffs in the design of efficient | algorithm-based error detection schemes for hypercube multiprocessor. IEEE Trans, on Software Eng., (2), Feb. 1990. [7] P. Banerjee. Strategies for reconfiguring hypercubes under faults. In Proc. 20th Inter. Symp. on Fault-Tolerant Computing, 1990. , [8] P. Banerjee, J.T. Rahmeh, C.Stunkel, V.S. Nair, K. Roy, V. Balasubramanian, and J. Abraham. Algorithm-based fault tolerance on a hypercube multipro cessor. IEEE Trans, on Computers, 39(9): 1132— 1145, Sept. 1990. 108 j [9] B. Becker and H.-U. Simon. How robust is the n-cube? Information and ^ Computation, 77:162-178, 1988. [10] R. Boppana and C.S. Raghavendra. Optimal self-routing of linear- complement permutations in hypercubes. In Proc. 5th Distributed Memory Computing Conf., 1990. [11] J. Bruck, R. Cypher, and D. Soroker. Running algorithms efficiently on faulty hypercubes. ACM Computer Architecture News, 19(1), Mar. 1990. [12] D. Callahan and K. Kennedy. Compiling programs for distributed-memory multiprocessors. The Journal of Supercomputing, pages 151-169, Oct. 1988. [13] R J. Cameron. Automorphism groups of graphs. In L.W. Beineke and R.J. Wilson, editors, Selected Topics in Graph Theory. Academic Press, 1983. vol. 2. [14] A. Chandra, L. Kou, G. Markowsky, and S. Zaks. On sets of Boolean n-vectors with all ^-projections surjective. Acta Inf., (20): 103— 111, 1983. [15] G. Chartrand and L. Lesniak. Graphs and Digraphs. Wadsworth, Belmont, Calif., 1986. [16] M.-S. Chen and K.G. Shin. On relaxed squashed embedding of graphs into a hypercube. SIAM J. Computing, 18:1226-1244, Dec. 1989. [17] M.-S. Chen and K.G. Shin. Depth first approach for fault-tolerant routing in hyper cube multicomputers. IEEE Trans, on Parallel and Distributed Systems, 1:152-159, Apr. 1990. [18] M.S. Chen and K.G. Shin. Embedding of interacting task modules into a hypercube. In M. Heath, editor, Hypercube Multiprocessors 1987, pages 122- 129, Philadelphia, 1987. SIAM. [19] S.K. Chen, C.T. Liang, and W .T. Tsai. Loops and multi-dimensional grids on hypercubes: Mapping and reconfiguration algorithms. In Proc. of Inter. Conf. on Parallel Processing, Aug: 1988. 109 __i [20] S.K. Chen, C.T. Liang, and W.T. Tsai. Multi-dimensional grids reconfigura tion algorithm on hypercubes. In Proc. of 18th Inter. Symp. on Fault-Tolerant Computing, Aug. 1988. [21] E. Dilger and E. Armstrong. System level self-diagnosis in n-cube connected multiprocessor networks. In Proc. 14th Int. Symp. Fault-Tolerant Computing, pages 184-189, Jun. 1984. Kissimmee, FL. [22] S. D utt and J.P. Hayes. An automorphic approach to the design of fault- tolerant multiprocessors. In Proc. of the 19th Inter. Symp. on Fault-Tolerant Computing, Jun. 1989. [23] S. D utt and J.P. Hayes. On designing and reconfiguring k-fault-tolerant tree architectures. IEEE Trans, on Computers, C-39(4), Apr. 1990. [24] A.-H. Esfahanian. Generalized measures of fault-tolerance to n-cube network. IEEE Trans, on Computers, 38:1586-1591, Nov. 1989. [25] J.B. Fortes and C.S. Raghavendra. Gracefully degradable processor arrays. IEEE Trans, on Computers, C-34, Nov. 1985. i I [26] M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman And Company, New York, 1979. [27] G. Gati. Further annotated bibliography on the isomorphism disease. Journal of Graph Theory, 3:95-109, 1979. I | [28] J.M . Gordon and Q.F. Stout. Hypercube message routing in the presence of i faults. In Proc. 3rd Conf. Hypercube Concurrent Computing and Applications, pages 318-327, Jan. 1988. [29] A. Gottlieb. The NYU Ultracomputer - designing an MIMD shared memory ! parallel computer. IEEE Trans, on Computers, pages 175-189, Feb. 1983. I | [30] R.L. Graham and H.O. Pollack. On the addressing problem for loop switching. I Bell System Tech. Journal, 50(8):2459-2519, Oct. 1971. 1 1 0 [31] D. S. Greenberg, L.S. Heath, and A.L. Roenberg. Optimal embeddings of butterfly-like graphs in the hyper cube. Technical report, Computer and In formation Science Department, University of Massachusetts, 1988. COINS Technical Report 88-103. [32] F. Harary, J.P. Hayes, and H.-J. Wu. Survey of the theory of hypercube graphs. Computers and Mathematics with Applications, 15:277-289, 1988. [33] J. Hastad, T. Leighton, and M. Newman. Fast computation using faulty hypercubes. In Proc. 21th ACM Symp. Theory of Comp., pages 251-263, 1989. [34] W.D. Hillis. The Connection Machine. MIT Press, New Haven, CT, 1985. [35] K.-H. Huang and J. A. Abraham. Algorithm based fault tolerance for m atrix operations. IEEE Trans, on Computers, C-33:518-528, Jun. 1984. [36] Proc. of 5th Distributed Memory Computing Conf., 1989. [37] Proc. of fth Distributed Memory Computing Conf., 1990. [38] iPSC: The first family of concurrent supercomputers, 1985. Intel. [39] B.W. Johnson. Design and Analysis of Fault-Tolerant Digital Systems. Addison-Wesley Publishing Company, Reading, Massachusetts, 1989. [40] S.L. Johnsson. Communication efficient basic linear algebra computations on hypercube architectures. Journal of Parallel and Distributed Computing, 4:133-172, 1987. [41] S.L. Johnsson and C.-T. Ho. Optimal broadcasting and personalized com munication in hypercubes. IEEE Trans, on Computers, 38:1249-1268, Sept. 1989. i [42] D. Kleitman and J. Spencer. Families of ^-independent sets. Discrete Math., ! (6):255-262, 1973. I l l [43] M.S. Krishnamoorthy and B. Krishnamurthy. Fault diameter of interconnec tion networks. Computers and Mathematics with Applications, 13(5/6):577- 582, 1987. [44] C.L. Kwan and S. Toida. An optimal 2-fault tolerant realization of symmetric hierarchical tree systems. Networks, pages 231-239, 1982. [45] T.-C. Lee. Quick recovery of embedded structures in hypercube computers. In Proc. 5th Distributed Memory Computing Conf., 1990. [46] T.-C. Lee and J.P. Hayes. One-step-degradable fault tolerant hypercube. In Proc. 4th Distributed Memory Compting Conf., 1989. [47] L. Levitin and M. Karpovsky. Efficient exhaustive tests based on mds codes. Technical report, Boston University, 1986. [48] W.-M. Lin and V.K.P. Kumar. Efficient histogramming on hypercube SIMD machines. Computer Vision, Graphics, and Image Processing, 49:104-120, 1990. [49] C.S. Raghavendra M.A. Sridhar. On finding maximum subcubes in residual hypercubes. In Proc. IEEE Symp. Parallel and Distributed Processing, Dec. 1990. [50] S. Madhavapeddy and I.H. Sudborough. Deterministic message routing in faulty hypercubes. In Special edition for Proc. of the 16th Inter. Workshop on Graph-Theoretic Concepts in Computer Science, Springer Verlag Lecture notes on Computer Science, 1990. [51] O.A. Mcbryan and E.F. De Velde. Hypercube algorithms and implementa tions. SIAM Journal Sci. Stat. Comput., 8(2) :227— 287, Mar. 1987. [52] D. Nassimi and S. Sahni. Bitonic sort on a mesh-connected parallel computer. ; IEEE Trans, on Computers, C-28(l):2-7, Jan. 1979. ! i [53] D. Nassimi and S. Sahni. An optimal routing algorithm for mesh-connected parallel computers. Journal of the Assoc, for Computing Machinery, 27(1), Jan. 1980. [54] D. Nassimi and S. Sahni. Data broadcasting in SIMD computers. IEEE 1 Trans, on Computers, 30, Feb. 1981. [55] NCUBE/ten: An overview, Nov. 1985. NCUBE Corp. [56] R. Negrini, M.G. Sami, and R. Stefanelli. Fault Tolerance Through Reconfig uration in VLSI and WSI Arrays. The MIT Press, Cambridge, MA, 1989. [57] A.V. Oppenheim and R.W. Schafer. Digital Signal Processing. Prentice-Hall, Englewood Cliffs, N.J., 1975. [58] G.F. Pfister, W.C. Brantley, D.A. George, S.L. Harvey, W .J. Kleinfelder, K.P. McAuliffe, E.A. Melton, V.A. Norton, and J. Weiss. The IBM research parallel processor prototype(RP3): Introduction and architecture. In Proc. of the Inter. Conf. on Parallel Processing, pages 20-23, Aug. 1985. [59] F. J. Provost and R. Melhem. Distributed fault tolerant embedding of binary trees and rings in hypercube. In Proc. Inter. Workshop on Defect and Fault j Tolerance in VLSI Systems, pages 8.3.1-8.3.8, Oct. 1988. [60] M.J. Quinn. Designing Efficient Algorithms for Parallel Computers. McGraw- Hill Series in Supercomputing and Artificial Intelligence. McGraw Hill, New York, N.Y., 1987. i [61] C.S. Raghavendra, A. Avizienis, and M.D. Ercegovac. Fault tolerance in binaxy tree architectures. IEEE Trans, on Computers, C-33(6), Jun. 1984. [62] P. Ram anathan and K.G. Shin. Reliable broadcast in hypercube multicom puters. IEEE Trans, on Computers, C-37:1654-1656, Dec. 1988. j [63] R.C. Read and D.G. Corneil. The graph isomorphism disease. Journal of Graph Theory, 1:339-363, 1977. i [64] A.L.N. Reddy and P. Banerjee. Algorithm-based fault detection techniques in signal processing applications. IEEE Trans, on Computers, C-39(10):1304- 1308, Oct. 1990. < 113 J I [65] E.M. Reingold, J. Nievergekt, and N. Deo. Combinatorial Algorithm. Prentice Hall, 1977. [66] D.A. Rennels. On implementing fault tolerance in binary hypercube. In Proc. of 16th Inter. Symp. on Fault-Tolerant Computing, pages 344-349, Jul. 1986. [67] Y. Saad and M. H. Schultz. Topological properties of hypercubes. IEEE Trans, on Computers, 37(7):867-872, Jul. 1988. [68] C.L. Seitz. The cosmic cube. Comm, of Assoc, for Comput. Machinery, 28:22-33, Jan. 1985. [69] S.-B. Tien and C.S. Raghavendra. Algorithms and bounds for shortest path and diameter problem in faulty hypercubes. In Proc. the 28th Allerton Conf. on Comm. Control, and Comput. UIUC, Oct. 1990. [70] S.-B. Tien, C.S. Raghavendra, and M.A. Sridhar. Reconfiguring embedded task graphs in faulty hypercubes by automorphisms. In Proc. 23rd Annual Hawaii Inter. Conf. on System Sciences, Jan. 1990. [71] P.-S. Tseng. A Systolic Array Parallelizing Compiler. The Kluwer Interna tional Series in Engineering and Computer Science. Kluwer Academic Pub lishers, Boston, 1990. [72] J.D. Ullman. Computational Aspect of VLSI. Computer Science Press, Rockville, Maryland 20850, 1984. I [73] L.G. Valiant and J. Brebner. Universal schemes for parallel communication. In Proc. ACM Symp. Theory of Computing, 1981. [74] A. Y. Wu. Embedding of tree networks into hypercubes. Journal of Parallel and Distributed Computing, 2:238-249, 1985. [75] C.-L Wu and T.-Y Feng. Tutorial: Interconnection Networks for Parallel and Distributed Processing. IEEE Computer Society Press, Washington D.C., 1984. 114 | [76] P.-J. Yang, S.-B. Tien, and C.S. Raghavendra. Reconfiguration of rings and meshes in faulty hypercubes. Technical report, Department of Electrical Engi neering - Systems, University of Southern California, Los Angeles, California, 1990. [77] H.P. Yap. Some Topics in Graph Theory. London Math. Soc. Lee. Notes Series. Cambridge Univ. Press, 1986. I 115
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11255772
Unique identifier
UC11255772
Legacy Identifier
DP22837