Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Pseudo-Exhaustive Built-In Self-Test System For Logic Circuits
(USC Thesis Other)
Pseudo-Exhaustive Built-In Self-Test System For Logic Circuits
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INFORM ATION TO U SER S This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The qoality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed through, substandard m arg ins, and improper alignment can adversely afreet reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand corner and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6 " x 9 " black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. A Bell & Howell Information Company 300 North Z eeb Road. Ann Arbor. M l 48106-1346 USA 313/761-4700 800/521-0600 PSEUDO-EXHAUSTIVE BUILT-IN SELF-TEST SYSTEM FOR LOGIC CIRCUITS by Rajagopalan Srinivasan A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Engineering) December 1994 Copyright 1994 Rajagopalan Srinivasan UMI Number: 9601066 UMI Microform 9601066 Copyright 1995, by UMI Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. UMI 300 North Zeeb Road Ann Arbor, MI 48103 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, written by ........................ MJAGOPALAN# SRINIVASAN............................... under the direction of M s Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of DOCTOR OF PHILOSOPHY Dean o f Graduate Studies Date ....CfctobsK. 1 3 * 1 9 9 4 . DISSERTATION COMMITTEE D edication To my parents Smt. Vasantha Srinivasan & Sri. V. Srinivasan A cknow ledgem ents I am grateful to my advisor Prof. Melvin Breuer for his inspiration, guidance and encouragement during my dissertation work. His invaluable constructive criticisms have enhanced the quality and presentation of my research over the years. I consider it as my privilege to have worked with him. I express my deep sense of appreciation to Prof. Sandeep Gupta for his enthusiasm, guidance and support towards making my doctoral work a rewarding experience. I wish to thank Prof. Peter Baxendale for serving on the guidance and dissertation committees. I also would like to thank Prof. Alice Parker and Prof. Cauligi Raghavendra for being part of the guidance committee. During my years at USC I benefited greatly from interacting with many col leagues and friends. In particular I would like to mention Suresh Chalasani, Pravil Gupta, Rajesh Gupta, Rajiv Gupta, Rajeshwari Krishnan, Kuen-Jong Lee, Mody Lempel, Jung-Cheun Lien, Sen-Pin Lin, Amitava Majumdar, Debaditya Mukherjee, Sridhar Narayanan, Charles Njinda, Dhabaleshwar Panda and Ishwar Parulkar. All of them have made my stay at USC memorable. I am also indebted to Donnalyn Combest, Lucille Stivers and Mary Zittercob of the Electrical Engineering Depart ment at USC for their help and encouragement. I would like to acknowledge the financial support provided by the Advanced Research Projects Agency (ARPA) through Contract No. J-FBI-90092 (monitored by the Federal Bureau of Investigation). My sincere gratitude goes to my parents, my wife Sujatha and my sisters Radha and Geetha for their love, support and encouragement over the years that has sig nificantly contributed to the successful completion of my doctoral work. C ontents D edication ii Acknow ledgem ents iii List O f Figures viii List O f Tables x A bstract xi 1 Introduction 1 1.1 Design for Testability.................................................................................... 1 1.2 Built-in S elf-T est.......................................................................................... 3 1.3 Pseudo-Exhaustive Test S tr a te g y ............................................................. 5 1.3.1 P artitio n in g ...................................................................................... 6 1.3.2 Test Pattern G en eratio n ................................................................ 8 1.3.3 Bounds on test le n g th s................................................................... 9 1.4 Thesis Organization....................................................................................... 9 2 Background 10 2.1 Partitioning S tra te g ie s................................................................................ 10 2.1.1 Constrained S trateg ie s................................................................... 12 2.1.2 Unconstrained S trategies................................................................ 15 2.2 Test Pattern G e n e ra to rs............................................................................. 17 2.2.1 Universal T P G s ................................................................................ 17 2.2.2 Circuit-specific T P G s ...................................................................... 19 iv 2.3 Bounds on Test L e n g th s ............................................................................. 21 3 Partitioning for Cone Size R eduction 23 3.1 Introduction................................................................................................... 23 3.1.1 Circuit M o d el..................................................................................... 25 3.2 Problem S ta te m e n t...................................................................................... 25 3.2.1 Edge Partitioning Problem .............................................................. 27 3.2.2 Node partitioning Problem ............................................................. 27 3.3 Problem Form ulation................................................................................... 29 3.3.1 N o tatio n.............................................................................................. 29 3.3.2 Basic P rin c ip le ................................................................................. 30 3.3.3 Edge Partitioning Problem .............................................................. 31 3.3.3.1 L inearization.................................................................... 31 3.3.4 Node Partitioning Problem .............................................................. 34 3.3.5 Problem C o m plexity........................................................................ 34 3.4 Test Mode Configuration............................................................................. 35 3.5 Partitioning Special C irc u its...................................................................... 35 3.5.1 Definitions........................................................................................... 38 3.5.2 Pruning Search Space........................................................................ 39 3.5.3 Fanout-free Circuits ........................................................................ 41 3.5.4 Non-Reconvergent Fanout Circuits ................................................ 44 3.5.5 Reconvergent Fanout C irc u its ........................................................ 47 3.5.6 Iterative Logic Arrays ..................................................................... 49 3.5.6.1 One-dimensional A rra y s ................................................. 49 3.5.6.2 Two-dimensional A rra y s................................................. 51 3.6 Partitioning Large C ircuits........................................................................... 58 3.6.1 Heuristic M easure.............................................................................. 59 3.6.2 Heuristic Procedure........................................................................... 59 3.6.2.1 Cone Size R eduction....................................................... 61 3.6.2.2 C a n d id a te s....................................................................... 62 3.6.2.3 Segmentation C ells.......................................................... 62 3.6.3 Experimental R esu lts........................................................................ 63 3.7 Partitioning Sequential C irc u its ................................................................. 68 v 3.8 S u m m a ry ....................................................................................................... 69 4 Partitioning for M axim al Test Concurrency 70 4.1 Introduction.................................................................................................... 70 4.2 Definitions and N o ta tio n ............................................................................. 73 4.3 M e rg in g .......................................................................................................... 75 4.4 Heuristic Procedure....................................................................................... 77 4.5 S u m m a ry ....................................................................................................... 81 5 Test Pattern G eneration 82 5.1 Introduction.................................................................................................... 82 5.2 LFSR/SRs and LFSR/X O Rs....................................................................... 83 5.2.1 Properties of L F S R /S R s................................................................. 85 5.2.2 Operations on L F S R /S R s .............................................................. 90 5.2.2.1 Reconfigurable L F S R /S R s.............................................. 90 5.2.2.2 Permuted LFSR /SR s........................................................ 91 5.2.2.3 Sharing L F S R /S R s........................................................... 92 5.3 Convolved LFSR/SRs ................................................................................ 94 5.4 Multiple L FSR /SR s....................................................................................... 97 5.5 Design P ro ced u re............................................................................................. 101 5.5.1 Determination of R esidues................................................................104 5.5.2 Determination of S eeds...................................................................... 104 5.5.3 Determination of XOR n e tw o rk ...................................................... 104 5.6 Experimental R esu lts......................................................................................105 5.7 S u m m a ry ............................................................................... I l l 6 Bounds on Test Lengths 112 6.1 Introduction......................................................................................................112 6.2 Algebraic R e s u lts............................................................................................112 6.3 Cone Independent B o u n d s ............................................................................118 6.4 Cone Dependent B o u n d s............................................................................... 129 6.4.1 LFSR/XORs ...................................................................................... 130 6.4.1.1 Improvement on Bounds by Input Permutation . . . 132 6.4.2 L FSR /SR s............................................................................................ 136 vi 6.4.2.1 Improvement on Bounds by Input Permutation . . . 139 6.5 S u m m a ry ......................................................................................................... 141 7 Pseudo-Exhaustive Test System 143 7.1 Introduction...................................................................................................... 143 7.2 CRETE S y stem ................................................................................................143 7.3 PET F lo w c h a rt................................................................................................145 7.4 Research C ontributions...................................................................................146 7.4.1 P artitio n in g ........................................................................................... 148 7.4.2 Test Pattern G en eratio n .....................................................................150 7.4.3 Bounds on Test L e n g th s .....................................................................151 7.5 Future Extensions............................................................................................ 152 7.5.1 P artitio n in g ........................................................................................... 152 7.5.2 Test Pattern G en eratio n .....................................................................153 7.5.3 Bounds on Test L e n g th s .....................................................................154 vii List O f Figures 1.1 Segmentation cell behavior during (a) normal mode (b) test mode . . 7 2.1 Partitioning strategies (a) Unpartitioned circuit (b) Constrained par titioning strategy (c) Unconstrained partitioning stra te g y .................... 11 2.2 Hardware partitioning (a) normal mode (b) Testing C l in test mode . 13 2.3 Vertex cut approach ................................................................................... 14 2.4 TPG structures (a) LFSR/SR (b) L FSR /X O R ...................................... 19 3.1 A partitioned circuit in (a) normal mode (b) test m o d e ...................... 24 3.2 Circuit g r a p h ................................................................................................ 26 3.3 EPP and N P P ................................................................................................ 27 3.4 (a) An example gate and (b) its graph m odel.................................... 30 3.5 A segmentation c e ll................................................................................ 36 3.6 Test mode configuration of a partitioned circuit...................................... 37 3.7 A portion of the graph model of a multi-level fanout-free circuit . . . 41 3.8 Graph model of a two-level non-reconvergent fanout c irc u it......... 45 3.9 A portion of the graph model of a multi-level reconvergent fanout circuit 48 3.10 One-dimensional I L A ................................................................................... 50 3.11 Partitioned one-dimensional ILA ............................................................. 50 3.12 Two-dimensional IL A .................................................................................. 52 3.13 Partitioned two-dimensional ILA ............................................................ 54 3.14 A (16,16,16) two-dimensional ILA circuit ............................................ 56 3.15 Logic array partitioned by (a) procedure IL A (b) greedy procedure for r = 8.......................................................................................................... 57 3.16 Logic array partitioned by (a) procedure IL A (b) greedy procedure for r — 10........................................................................................................ 58 viii 3.17 Comparision of best results (r = 20) 64 3.18 Segmentation cells required for various cone s i z e s ................................. 67 3.19 A balanced sequential (24,8,24) circuit (a) before partitioning and (b) after p a rtitio n in g ................................................................................... 68 4.1 Pseudo-exhaustive testing of a non-MTC circuit (a) in a single test session (b) in multiple test sessions (c) modified to an MTC circuit. . 72 4.2 Graph model of an (8,4,6) circuit............................................................... 74 4.3 Graph model of the modified circuit........................................................... 80 5.1 TPG Structures (a) LFSR/SR and (b) L F S R /X O R ............................. 84 5.2 An (6,5,3) c ir c u it.......................................................................................... 87 5.3 TPG Structures (a) LFSR/SR and (b) L F S R /X O R ............................. 88 5.4 Reconfigurable LF SR /SR ............................................................................. 91 5.5 Permuted L F S R /S R s................................................................................... 92 5.6 Sharing L F S R /S R s...................................................................................... 93 5.7 Convolved LFSR/SR: (a) Residue assignment for inputs; (b) TPG stages................................................................................................................. 94 5.8 Convolved LFSR/SR: (a) Residue assignment for inputs; (b) TPG stages................................................................................................................. 96 5.9 Multiple LFSR/SR: (a) Residue assignment for inputs; (b) TPG stages. 98 5.10 Multiple LFSR/SR: (a) Residue assignment for inputs; (b) TPG stages. 100 5.11 Characteristics of various T P G s .................................................................. 110 6.1 A vector space and its subspaces and c o s e t s ............................................ 116 7.1 PET enviroment ............................................................................................144 7.2 PET F lo w c h a rt............................................................................................... 147 7.3 Pseudo-exhaustive testing spectrum of an (n ,m ,k ) circuit ....................149 List O f Tables 1.1 Comparision of design parameters between scan and BIST designs. . 3 3.1 Results on Benchmark Circuits (r — 20) ............................................. . 63 3.2 Results on Benchmark Circuits (r — 16) ............................................. . 65 3.3 Segmentation Cells Required for Various Cone S izes.......................... . 65 4.1 Dependency m a trix .................................................................................... 74 4.2 Relational m a tr ix ....................................................................................... . 74 4.3 Modified dependency m a trix .................................................................... , 81 5.1 Test Length for partitioned benchmark circuits.................................... , 105 5.2 Convolved LFSR/SR designs for partitioned benchmark circuits . . . 107 5.3 Single LFSR/SR designs for partitioned benchmark circuits............. 108 5.4 Comparison among circuit specific TPG d e sig n s................................ 108 6.1 Bounds on test lengths for (n,m , 2) c irc u it.......................................... 127 6.2 Bounds on test lengths for (n, m, k) c irc u it.......................................... 120 6.3 Upper bounds on pseudo-exhaustive test lengths for LFSR/XORs . . 136 6.4 Upper bounds on pseudo-exhaustive test lengths for LFSR/SRs . . . 141 A bstract Built-in self-test is a cost-effective approach for designing testable versions of logic circuits. It obviates external testing by providing on-chip hardware as test resources. Pseudo-exhaustive test strategy involves applying all possible input patterns to in dividual output cones of a circuit. The strategy ensures detection of all irredundant multiple stuck-at faults in the circuit and all irredundant combinational faults within individual cones. This thesis presents an integrated pseudo-exhaustive built-in self test system applicable to circuits designed with random or structured logic styles. The system consists of a suite of partitioning schemes and test pattern generator designs for testing the circuits. Circuits with output cones driven by a large number of inputs need to be partitioned to reduce the overall test application time. Effi cient partitioning schemes are presented to restrict the number of inputs driving the individual output cones. These schemes are based on graph-theoretical concepts and have polynomial computational complexity. Novel test pattern generators that employ knowledge of the circuit cone structures are designed for generating pseudo- exhaustive test sets. These designs are based on the theory of linear feedback shift registers and generate minimal test sets. Tight upper bounds on pseudo-exhaustive test lengths are derived based on new algebraic results. The bounds are used to characterize various classes of circuits and provide good estimates on test length. Experimental results on various benchmark circuits validate the efficiency and qual ity of the system. A coordinated approach to partitioning and test pattern gener ation provides a spectrum of testable design solutions trading off design costs such as hardware overhead and test length. C hapter 1 Introduction The ever-increasing complexity of VLSI circuits, in terms of gate-counts on chips, chip-counts on printed circuit boards (PCBs) and die-counts on multi-chip mod ules (MCMs), results in ever-increasing problems in the external testing of chips, PCBs and MCMs. Our main objective is to develop an integrated built-in self test system for designing self-testable logic circuits. We consider a testing strategy, namely pseudo-exhaustive testing, that ensures comprehensive fault coverage lead ing to highly reliable circuits. In this chapter, we shall introduce the concepts of design for testability (DFT) and built-in self-test (BIST) and then present the issues addressed in this research. 1.1 D esign for T estability Logic circuits typically contain tens of thousands of gates and exhibit sequential behavior due to the presence of storage elements. The storage elements can be either latches or flip-flops. Deterministic test generation is computationally intensive especially for sequential circuits. The ability to control an internal signal value from the primary inputs is referred to as controllability. Similarly, the ability to observe an internal signal value at the primary outputs by controlling necessary inputs is referred to as observability. DFT techniques [2] improve the controllability and observability of circuits and thus reduce the test generation effort for these circuits. Examples of DFT techniques include full scan, partial scan, boundary scan and BIST. 1 Full scan techniques modify all storage elements in a circuit to have shift ca pabilities, while partial scan techniques modify only some of these elements. The modified storage elements are configured to form one or more scan paths during the test mode and are serially accessed from the circuit I/Os. The modified storage elements can be regarded as circuit I/O s during test generation. Test patterns can be generated by an automatic test pattern generator (ATPG) for full scan and par tial scan circuits [14]. Automatic test equipment (ATE) [30] is required to store and apply the test patterns to the circuit. The circuit is tested by applying stimuli from the inputs and scan paths and inspecting responses at the outputs and scan paths. Due to the high packaging density of chips at the board level, probing of I/O pins by ATE is becoming increasingly difficult. To alleviate this problem, chips are built with boundary scan [26] capabilities. All I/O pins of individual chips are connected serially in a boundary scan path. The I/O pins and the internal scan paths of chips can be serially accessed through the boundary scan path. Boundary scan is used for testing the internal circuitry of the chips and interconnections between them. Though scan techniques reduce the test generation effort for sequential circuits, the following costs are associated with them. • A rea O verhead: The modification to storage elements and configuring them into scan paths increase the gate count and routing overhead respectively. Each scan path requires two I/O pins for scan operations. The overhead is propor tional to the number of storage elements enhanced to have scan capability. • T est G en eratio n : Deterministic tests are tailored to detect target faults and are generated by an ATPG system. The system includes test generation and fault simulation routines that are computationally intensive. • A TE: Test stimuli and correct responses for the scan paths and the I/Os are stored by ATE. The amount of storage depends on the characteristics of circuits. For chips without boundary scan, the I/O pins are probed by ATE for applying the test patterns. • T est T im e: Test patterns are serially scanned in and out of the scan paths and a large amount of testing time is spent in scan operations. The speed 2 of testing is determined by the scan clock generated by ATE which is usually much slower than the system clock. 1.2 B uilt-in Self-Test BIST techniques [2] overcome some of the costs associated with scan designs at the expense of more area overhead. BIST designs have on-chip dedicated hardware as test resources. Test patterns are generated internally by test pattern generators (TPGs) and results are compacted by signature analyzers (SAs). TPGs can be either counters or autonomous linear feedback shift registers (LFSRs), and SAs are usually multiple input signature registers (MISRs) based on LFSRs. During the test mode, some of the storage elements are modified to act as TPGs and/or SAs. Parameter Scan Designs BIST Designs Area overhead Storage elements are enhanced to have shift mode. Storage elements are enhanced to have shift, TPG and/or SA modes. Test Generation ATPG is used to generate deterministic patterns for target faults. On-chip TPGs (SAs) are used to generate (pseudo) random patterns. Test Length Usually low. Usually high. Fault Coverage Successive patterns improve fault coverage. Successive patterns may not necessarily improve fault coverage. Test Storage All test patterns are stored. Only the seeds and signatures are stored. Test Application Time Several clock cycles are required for applying each pattern. Test pattern is generated and applied on every clock cycle. Test Speed Test clock is dictated by ATE and is usually slower than system clock. Tested with system clock. Test Control ATE is responsible. On-chip controller is used. Table 1.1: Comparision of design parameters between scan and BIST designs. 3 A comparison of the characteristics of scan and BIST designs is given in Table 1.1. Both scan and BIST techniques concentrate on testing the combinational blocks of a circuit. The modified storage elements are tested by scanning special test patterns. A good strategy for testing the combinational blocks ensures the testing of all unmodified storage elements in the circuit. The various types of testing addressed by BIST techniques can be summarized as follows. • E x h au stiv e te stin g deals with applying all possible input patterns to a cir cuit. The test assumes no fault modeling and hence does not require simulation of faults in the circuit. Exhaustive testing ensures detection of all irredundant combinational faults in the circuit. A combinational fault does not manifest any sequential behavior of the circuit and is testable with a single input pat tern. The test time increases exponentially with the number of inputs to the circuit. • P seu d o -ex h au stiv e te stin g considers the circuit as a collection of output cones and applies all possible input patterns to individual output cones. For an output, the subcircuit comprising of circuit inputs, logic gates and inter connections that feed the output is referred to as the output cone. Subcircuits common to the output cones are tested several times during the exhaustive testing of individual output cones. Pseudo-exhaustive testing ensures detec tion of all irredundant combinational faults within individual output cones and all irredundant multiple stuck-at faults in the circuit. The test time in creases exponentially with the number of inputs feeding the output cone with the largest number of inputs. • P seu d o -ran d o m te stin g deals with applying an incomplete set of unique pat terns generated by an autonomous LFSR. The patterns have pseudo-random characteristics but are deterministic and repeat after a fixed cycle length. Fault coverage is determined either by simulation of modeled faults or by probabilis tic estimators. Patterns can be biased using weighted hardware to improve the fault coverage. 4 • Random testin g deals with random patterns generated by sources like SAs. The patterns have random characteristics and repeat without any fixed cycle length. The effectiveness of the patterns is measured by fault simulation. The strategy requires least additional hardware for test pattern generation. Most BIST systems assume the sequential circuit to be made self-testable con sists of combinational logic blocks separated by registers. However, designers tend to design circuits with functional hierarchy that may not be ideally suited for the systems. For the BIST system being built at USC [25], the input circuits are hi erarchically reorganized and canonically partitioned into combinational blocks and registers [13, 31]. Our BIST system concentrates on testing the combinational blocks of the circuit and deals with the afore mentioned testing strategies. The system ex plores the design space in terms of various design constraints, viz. area overhead, test time and fault coverage. Although several systems have been developed during the recent years, open issues such as achieving optimality and providing a spectrum of solutions in the design space are addressed in our system. This research focusses on the pseudo-exhaustive testing strategy for logic circuits. Some issues related to our BIST system that are not addressed in this research are listed below. • various testable design methodologies (TDMs) [1] based on built-in logic block observation (BILBO) [23] architecture. • test resource allocation for various combinational blocks and test scheduling for various test sessions. • test controller synthesis implementing the test schedules. The research issues related to TDMs, test resource allocation and test schedul ing are addressed in [25] and the problems related to test controller synthesis are addressed in [29]. 1.3 Pseudo-E xhaustive Test Strategy Our goal is to design self-testable logic circuits ensuring comprehensive fault coverage under the combinational fault model. Exhaustive testing ensures comprehensive 5 fault coverage but may not be practical since the number of test patterns increases exponentially with the number of inputs to the circuit. Pseudo-exhaustive test strategy is attractive since it can test almost all combinational faults in the circuit with significantly less test patterns than exhaustive testing. Since each output cone is exhaustively tested, the only combinational faults that a pseudo-exhaustive test may miss are those faults that make an output dependent on additional inputs [2]. For example, a bridging fault between two inputs may not be detected if none of the outputs are dependent on those two inputs together. The pseudo-exhaustive test strategy obviates deterministic test pattern generation and fault simulation of stuck-at faults. Consider a combinational circuit with n inputs and m outputs. The number of inputs feeding an output cone is referred to as the size of the output cone. Let the m output cones of the circuit be of sizes less than or equal to k and assume that there exists at least one output cone with size equal to k. The value k is referred to as the maximum cone size of the circuit. The circuit can be characterized as an (n ,m , k) circuit. Exhaustive testing of the circuit requires 2" patterns which may be prohibitive for large values of n. Pseudo-exhaustive test strategy considers the circuit as a collection of m output cones and exhaustively tests each cone of the circuit. The test time is bounded below by 2k since there exists at least one output driven by exactly k inputs. If k is sufficiently less than n, then the time taken for pseudo-exhaustive testing can be significantly less compared to exhaustive testing. Pseudo-exhaustive testing gives rise to three important research issues, namely partitioning, test pattern generation and bounds on test lengths. We shall briefly describe these research issues in the following. 1.3.1 P artition in g Pseudo-exhaustive testing may not be practical for circuits with output cones driven by an unacceptably large number of inputs. These circuits need to be partitioned such that the output cones are driven by an acceptable number of inputs. The partitioning is achieved by placing segmentation cells [16, 19] in the original circuit. The behavior of a segmentation cell is shown in Figure 1.1. The cell behaves as a 6 transparent signal during the normal mode of operation and manifests as a pseudo input and a pseudo-output during the test mode. Thus the number of output cones in the circuit increases during the test mode due to segmentation cells placed in the circuit. However, the maximum cone size of the circuit decreases by a large extent. The segmentation cells amount to hardware overhead and introduce delays during normal operation. Hence it is required to place a minimum number of cells in the circuit without affecting critical paths. (■) / — \ ps«udo-output pseudo-input <b) Figure 1.1: Segmentation cell behavior during (a) normal mode (b) test mode Logic circuits can be classified into various categories such as circuits with or without fanouts, two-level or multi-level circuits and iterative logic arrays (ILAs). Partitioning strategies can take advantage of the unique structural properties of these circuits. We have developed optimal partitioning strategies of polynomial complexity for both two-level and multi-level fanout-free circuits. Circuits with fanouts make the partitioning problem hard and require strategies of exponential complexity to obtain optimal solutions. The regularity in ILAs makes it easy to determine upper bounds on the number of segmentation cells required to partition these circuits. We have formulated the partitioning problem for fanout circuits as the classical integer linear programming (ILP) problem. For small circuits, optimal solutions can be obtained by using this formulation. However, the formulation may not be computationally viable for large circuits since the number of constraints grows non- linearly with the number of levels in the circuit. To handle large circuits, we have developed an efficient heuristic partitioning procedure of polynomial complexity. The heuristic is based on the graph-theoretical concept of articulation nodes [10]. 7 Reduction in the maximum cone size amounts to reduction in pseudo-exhaustive test time but results in the addition of hardware overhead. Thus there exists trade offs between test time and hardware overhead in designing pseudo-exhaustive self- testable versions of circuits. 1.3.2 T est P a ttern G eneration Any pseudo-exhaustive test set for an (n, m, k ) circuit must contain an exhaustive set of patterns for each of the m output cones of the circuit. The size of the test set is between 2k and 2". Though any individual output cone can be exhaustively tested within a maximum number of 2f c patterns, it may not be possible to test all cones simultaneously with 2k patterns. This is primarily due to conflicting input requirements for the output cones and hence the circuit may require more than 2k patterns. Generation of an optimal pseudo-exhaustive test set for an (n, m, k) circuit is a hard problem [5]. An optimal pseudo-exhaustive test set for an (n,m , k) circuit can be ensured by further partitioning the circuit with segmentation cells to resolve all the conflicting requirements for the output cones. The resolution ensures that all output cones can be exhaustively tested simultaneously with 2k patterns. An (n ,m ,k ) circuit that requires only 2k patterns for pseudo-exhaustive testing is referred to as maximal test concurrent (MTC) circuit [27]. Partitioning non-MTC circuits to achieve maximal test concurrency leads to simple TPG designs. We have developed a partitioning strategy based on the graph-theoretical concept of bridges [10]. TPGs based on either maximal length LFSRs or counters can generate optimal pseudo-exhaustive test sets for MTC circuits. On the other hand, complicated TPGs can be designed to generate minimal pseudo-exhaustive test sets for non-MTC circuits. These circuits need not be par titioned to achieve maximal test concurrency. We have designed novel TPGs that utilize minimal hardware and generate minimal pseudo-exhaustive test sets for these circuits. Our TPGs generate minimal test sets by employing knowledge of the sets of circuit inputs driving each output cone of the circuit. Maximal length LFSRs form the basic underlying structure of our TPG designs. 8 1.3.3 B ounds on te st lengths We have derived tight upper bounds on the sizes of pseudo-exhaustive test sets for circuits. New algebraic results on vector spaces are derived and used in the computation of these bounds. Generic bounds that are independent of circuit output cone structures are used to characterize various classes of circuits. Circuit cone- specific bounds are also derived by utilizing the structural information about the circuit output cones. We have also developed an efficient method to permute circuit inputs in order to obtain the best improvement on the cone-specific bounds. The quality of these bounds are demonstrated by comparision with the existing bounds [3, 4]. Our bounds provide good estimates of test lengths and can be used as guiding factors in designing TPGs for circuits. The computed theoretical bounds for circuits comply well with the sizes of test sets generated by our TPGs. 1.4 Thesis O rganization The thesis is organized as follows. Chapter 2 presents the background by summariz ing the previous related work on partitioning strategies, TPG designs and test length estimations. The shortcomings and open issues in the related work provide the moti vation for this research. Chapter 3 deals with partitioning strategies for reducing the sizes of the output cones in circuits. Various classes of circuits are considered and their unique structural properties are utilized in developing efficient partitioning pro cedures. The chapter also provides details about the generalized formulation of the partitioning problem and our heuristic approach for large sized circuits. Chapter 4 presents our partitioning approach to achieve maximal test concurrency in circuits. The theory involved in our TPG designs and their characteristics are discussed in Chapter 5. The chapter provides details about the TPG design procedures and hard ware overhead involved in the TPG designs. Chapter 6 deals with the derivation of both generic and cone-specific bounds on pseudo-exhaustive test lengths. The chapter provides details about new algebraic results utilized in the derivations of bounds. The integration of the pseudo-exhaustive test system and the contributions of this research are provided in Chapter 7. The chapter also contains concluding remarks and future extensions to this research work. 9 C hapter 2 Background In this chapter we shall present a detailed summary of previous work on pseudo- exhaustive testing related to partitioning strategies, test pattern generator designs and bounds on test lengths. These related work form the background and provide motivation for our research. 2.1 P artitioning Strategies These strategies partition the circuit into smaller segments so that each segment is driven by an acceptable number of inputs. The strategies can be classified into either constrained or unconstrained strategies. Constrained strategies impose constraints on the segments to be totally disjoint so that each segment can be tested independently. Unconstrained strategies do not impose any constraints on the segments and hence the segments may need to be tested together. Consider a combinational circuit C shown in Figure 2.1(a). The circuit is fed by register R l and feeds register R2. Assume that there exists an output cone being driven by an unacceptably large number of inputs. The constrained partitioning strategy constrains the segments to be totally disjoint as shown in Figure 2.1(b). The original circuit C is partitioned into two disjoint segments C l and C2 such that each output cone is driven by an acceptable number of inputs. Four segmentation cells s i through s4 are used to create the two disjoint segments. The problem of finding an optimal solution for the constrained partitioning strategy has been shown to be NP-complete [19]. The constrained strategy usually results in a large number of segmentation cells and hinders all critical paths in the circuit. 10 S4 R2 R2 R2 C2 (a) (b) (c) Figure 2.1: Partitioning strategies (a) Unpartitioned circuit (b) Constrained parti tioning strategy (c) Unconstrained partitioning strategy 11 The unconstrained partitioning strategy does not constrain the segments to be disjoint, as shown in Figure 2.1(c). In this case, only two cells si and s2 may be required to achieve the goal. The original circuit C is not partitioned into disjoint segments. An output (say 0 ) may be driven by both a primary input (say 0) and a pseudo-input (say due to cell s2) as shown in the figure. The unconstrained strategy is a generalized case of the constrained strategy and hence yields better solutions. Critical paths can sometimes be left undisturbed in the unconstrained strategy, though in some cases extra cells may be required. It is shown in [6] that determining optimal solutions to the unconstrained strategy is also NP-complete. Our research is focussed on achieving good suboptimal solutions for the unconstrained partitioning strategy. We shall next present a brief description of previous work related to both con strained and unconstrained partitioning strategies. 2.1.1 C onstrained S trategies McCluskey and Bozorgui-Nesbat [28] partitioned a large circuit into segments with sufficiently fewer inputs so that exhaustive testing of each segment is feasible. To exhaustively test a segment, its inputs are made controllable from the primary inputs and its outputs are made observable at the primary outputs. This is achieved by two partitioning methods, namely sensitized partitioning and hardware partitioning. In sensitized partitioning, segments are isolated by applying fixed patterns to appropriate primary inputs. Paths from other primary inputs to the segment inputs and paths from the segment outputs to the primary outputs are sensitized. A circuit may have conflicting requirements on its primary inputs for sensitizing two or more paths concurrently. TPG designs are complicated since a subset of their stages have to hold constant values for isolating a segment, while the remaining stages have to generate an exhaustive set of patterns for testing the segment. Details about the formation of segments and design of TPGs for sensitized partitioning can be found in [41] and [39] respectively. Hardware partitioning is achieved by inserting multiplexers to gain access to the embedded inputs and outputs of the segments. Let us consider a circuit partitioned into two segments C 1 and C 2 as shown in Figure 2.2. Multiplexers M l through 12 M1 M1 M2 M2 M3 M4 M3 M4 C2 C2 (a) (b) Figure 2.2: Hardware partitioning (a) normal mode (b) Testing C l in test mode M 4 are used to gain access to the inputs and outputs of these segments. The settings of the multiplexers during normal operation are shown by the solid lines in Figure 2.2(a). Figure 2.2(b) shows the sensitized paths during testing of segment C l. This method deals with block-level partitioning rather than gate-level partitioning. Gate-level partitioning usually leads to reduced hardware overhead compared to block-level partitioning. We shall concentrate on hardware partitioning at the gate- level in this research. Roberts and Lala [32] partitioned circuits based on the concept of contribution difference. The contributors to a gate are the primary inputs which feed the gate either directly or indirectly. The contribution limit is the maximum number of contributors allowed for any signal in the circuit. The contribution difference for a signal is the difference between its number of contributors and the contribution limit. Signals with least contribution difference are selected as candidates for partitioning. The circuit is partitioned such that none of its signals exceed the contribution limit. This method determines optimal solutions for fanout free circuits. However, for circuits with reconvergent fanouts, it gives poor suboptimal solutions. Jone and Papachristou [20] developed a coordinated approach for partitioning and exhaustive test pattern generation for circuits. A circuit is partitioned into dis joint segments and TPGs are designed for individual segments that can exhaustively exercise all segment outputs concurrently. The heuristic algorithm is based on the minimum vertex cut method. The circuit is modeled as a directed acyclic graph 13 with the vertices and the edges representing the gates and the signals respectively. A vertex cut is a set of vertices whose removal leaves the graph disconnected. Each vertex cut is associated with a weight equal to the number of vertices involved with the cut. The circuit is partitioned into disjoint segments by selecting a set of vertex cuts and placing segmentation cells along these cuts. Since the weight of a vertex cut amounts to the number of segmentation cells required for the cut, the sum of weights associated with the selection of cuts is minimized. S1 v 1 S2 v2 .. . v 1 (a ) (b ) Figure 2.3: Vertex cut approach The problem of optimally partitioning a circuit into disjoint segments is shown to be NP-complete [19]. The heuristic method for determining a suboptimal solution for a circuit C can be briefly described as follows. 1. Determine the vertex cut (say vl) that is farthest away from the inputs so that each gate in the segment (say 51) formed between the inputs and v \ is driven by an acceptable number of inputs. 2. Determine the minimum vertex cut (say v2) for 51 that is farthest away from the inputs. Let 52 be the segment formed between the inputs and v2. The segments 51 and 52 are shown in Figure 2.3(a) and (b) respectively. 14 3. Estimate the number of cells required for partitioning the segments C — 51 and (7 — 52 by repeatedly applying Step 1. Choose 51 or 52 (say 51) that results in the minimum estimated number of cells. 4. Reduce 51 to 51' so that a TPG that generates an optimal test set can be designed for 51'. 5. Repeat all steps for C — SV. This method has the following drawbacks. 1. Having the restriction that segments need to be disjoint usually requires more cells. 2 . Since the segments are separated by cells, all critical paths encounter delays due to these cells during the normal operation. 3. Though the minimum vertex cut results in minimal number of cells, segment reduction for designing TPGs results in non-minimal number of cells. 2.1.2 U nconstrained Strategies Bhatt et al. [6] partitioned circuits using segmentation cells satisfying the following two objectives. 1. All outputs should depend on a restricted number of inputs. 2. All paths from inputs to outputs should encounter the same number of seg mentation cells. Segmentation cells are connected in a scan path during the test mode. The second objective ensures that all paths from inputs to outputs encounter equal register delays during the normal operation. The following two restricted classes of circuits are only considered for partitioning. 1. Fanin/fanout restricted circuits in which the fanout of a gate is no greater than its fanin. 15 2. Leveled circuits in which a gate at level i can only drive another gate at level (•■ + !)■ Efficient algorithms are proposed for determining optimal partitions for these re stricted classes of circuits. It is shown that optimal partitioning problem satisfying the first objective is NP-complete. Hellebrand and Wunderlich [16] partitioned circuits into exhaustively testable segments using segmentation cells. The problem is formulated as a combinatorial optimization problem. The state of the circuit is described by an iV-bit binary vector < V\,V2 , . . ■ ,Vn >, where N is the number of gates in the circuit. The bit u ,- indicates whether a cell is placed on the output of gate i. The weight of a vector denotes the number of cells placed in the circuit. A state is said to be admissible if the corresponding circuit configuration has all outputs driven by acceptable of inputs. Only a few states are admissible out of the possible 2N states for the circuit. Among the set of admissible states, the one with minimum weight gives the optimal solution. A hill climbing procedure with optional backtracking was proposed for suboptimal solutions. Exhaustive test patterns for individual segments are generated off-chip and applied through the scan path. Each cell has a bypass mode so that signals passing through them encounter only a pass transistor delay during the normal mode. Udell [40] partitioned a circuit such that the maximal fanout-free regions are identified as initial segments. These segments are either merged together or par titioned into smaller segments depending on the number of inputs driving them. Cost functions are used for the merging and partitioning operations. Finally, a post processing step is carried out to remove unnecessary segmentation cells from the circuit. Kagaris et al. [21] proposed a two-pass heuristic partitioning method to reduce the maximum cone size of a circuit. In the first pass, segmentation cells are placed so that each output cone is driven by an acceptable number of inputs. In the second pass, a TPG is designed for the partitioned circuit. The second pass may result in placing more segmentation cells. A probabilistic measure is used as guidance for placing the segmentation cells. This method has a similar objective as in [20] but the segments are not required to be disjoint. 16 We have adopted the unconstrained strategy for partitioning a circuit driven by an unacceptable number of inputs. Critical paths will be left undisturbed by our partitioning strategy. The partitioned segments are tested simultaneously in a single test session. The practicality and efficiency of our partitioning method are demonstrated on the ISC AS combinational benchmark circuits [7]. 2.2 Test P attern Generators Autonomous LFSRs are widely used to generate pseudo-exhaustive test sets [2]. LFSRs are characterized by their feedback connections represented as polynomials. For a non-zero initial state, the period of an LFSR is the number of states generated prior to repeating the initial state. An LFSR with n stages is said to be of maximal length if it has a period of 2" — 1 states. Maximal length LFSRs are based on primitive polynomials and most pseudo-exhaustive TPGs employ maximal length LFSRs in their designs. Pseudo-exhaustive test pattern generators for an (n ,m ,k ) circuit can be clas sified as either universal or circuit-specific TPGs. Universal TPGs generate test sets containing binary n-tuples that cover all k-dimensional subspaces. They are applicable for all circuits with given values of n and k irrespective of output cone dependencies. Circuit-specific TPGs generate test sets containing binary n-tuples that cover specific k-dimensional subspaces corresponding to the m outputs of the target circuit. 2.2.1 U niversal T P G s Test sets containing binary n-tuples that covers all &-subspaces can be generated using any of the following coding theory techniques. C o n sta n t w eight codes satisfy the property that all codewords have identical weights. It is shown in [38] that a test set T covers all k-subspaces if it contains all binary n-tuples of weight w such that w = c mod (n — k + 1) for some constant c, where 0 < c < n — k. The set of all binary n-tuples of weight w forms the codewords of tu-out-of-n constant weight code. Constant weight counters are used to generate the codewords. 17 Linear codes: An (n,w ) linear code contains a set of 2W distinct codewords that are binary n-tuples satisfying the property that if c ,- and cj are codewords, then c ,- © C j is also a codeword. If a linear code generated with a generator polynomial g(x) has length n and minimum distance k + 1, then a standard LFSR with g(x) as characteristic polynomial generates a test set of binary n-tuples that covers all A:-subspaces [37]. Whenever g(x) is not primitive, there is a need for using several seeds for the LFSR. Condensed LFSRs [42] are also based on linear codes. The smallest integer w satisfying k < \ w /(n — + [u;/(n— itf+1)] is determined. Let p(x) be a primitive polynomial of degree w and g(x) be (1 + x + x 2 + ... + xn~w). A modular LFSR with characteristic polynomial p(x)g(x) will generate a test set of binary n-tuples that covers all £-subspaces. Cyclic codes: An (n, w) cyclic code contains a set of 2W distinct codewords that are binary n-tuples satisfying the property that if c ,- is a codeword, then a cyclic right shift of c ,- is also a codeword. Cyclic codes form a subclass of linear codes. An (n, w) cyclic code can be generated with a generator polynomialg(x) of degree (n — w) that divides (1 + a:n). Let g{x)h{x) = 1 + x", where h(x) is the parity-check polynomial of the (n,tn) cyclic code. Since h(x) also divides (1 + a;n), it can be used as a generator polynomial of another cyclic code. The (n, n — w) cyclic code generated by h(x) is the dual code of the (n, w) cyclic code generated by g(x). If the cyclic code generated by g(x) has minimum distance of at least (k -f 1), then the dual code generated by h(x) will cover all &-subspaces. Let p(x) be a primitive polynomial of degree (n — w). A modular LFSR with h(x)p(x) as characteristic polynomial will generate a test set of binary n-tuples that covers all &-subspaces [43]. Lempel and Cohn [24] constructed a test set containing binary n-tuples that cov ers all &-subspaces by concatenating several maxi mum-length sequences generated by different primitive polynomials. Universal TPGs designed with given values of n and k can exhaustively test an output cone dependent on any fc-out-of-n inputs. These TPGs usually require a large amount of hardware and/or generate large test sets. For a specific (n ,m ,k ) circuit with its cone dependency information, a test set of size much smaller than that of the universal test set usually exists. 18 2.2.2 C ircuit-specific T P G s TPGs can be tailored efficiently, both in terms of hardware and test length, by examining the input sets driving the output cones of the circuit. A test signal is the unique sequence of binary values applied at a circuit input. An output cone of size k can be exhaustively tested with k linearly independent test signals. However, it may not be possible to concurrently test all the m output cones of the circuit with k signals because of conflicting input requirements among the cones. An upper bound of n test signals are required for the circuit. Two inputs are said to be related if there exists an output cone that depends on both of them; else they are said to be unrelated. The number of test signals for a circuit can be reduced by sharing some of them among unrelated circuit inputs. Procedures for determining a minimal number of signals (say w , where k < w < n ) are described in [5] and [27]. Circuit-specific TPG structures such as LFSR/SRs [4] and LFSR/XORs [3] can be designed for generating pseudo-exhaustive test sets. An LFSR/SR structure is composed of a maximal length LFSR and a shift register (SR). An LFSR/XOR structure is composed of a maximal length LFSR and an XOR network. The TPG structures are shown in Figure 2.4. LFSR XOR LFSR SR (n,m,k) circuit (n,m,k) circuit (a) (b ) Figure 2.4: TPG structures (a) LFSR/SR (b) LFSR/XOR LFSR/SR generates independent test signals from the maximal length LFSR portion and specific linear combinations of these signals from the SR portion. The 19 linear combinations depend on the feedback polynomial used for LFSR. An output cone will be exhaustively tested if and only if the cone is driven by a set of linearly independent test signals [4]. Hence the set of inputs driving an output cone must be assigned a set of linearly independent test signals to exhaustively test the output cone. LFSR/SR incurs minimal hardware overhead as its stages form a shift register. However, due to the inflexibility of the linear combinations of test signals, LFSR/SR generates a large test set. LFSR/XOR generates independent test signals from the maximal length LFSR and desired linear combinations of these signals from an XOR network. The XOR network provides great flexibilty in generating any desired linear combinations to ensure exhaustive testing of the output cones. LFSR/XOR incurs high hardware overhead due to the XOR network, but generates a minimal length test set. Hellebrand et al. [17] grouped all m output cones of an (n ,m ,k ) circuit into I n testable groups such that all cones in a fc-testable group can be exhaustively tested concurrently with 2k patterns. The problem of determining the minimum number of ^-testable groups is shown to be NP-complete [17]. An upper bound on the number of groups is given by [m /2]. The test patterns are generated off-chip by a &-bit counter and scanned to the appropriate inputs. The method results in high test time since each pattern is applied through a scan path. Chen [8] designed TPGs based on LFSR/XORs by adopting a three-phase pro cedure. During the first phase, test signals are shared among unrelated inputs. The second phase consists of assigning linear sums of two test signals to some inputs. The final phase consists of assigning linear sums of multiple test signals to the remaining inputs. An XOR network is used to realize the linear sums of test signals. The three-phase procedure attempts to reduce the hardware overhead by minimizing the number of linear sums of test signals. Kagaris and Tragoudas [22] designed TPGs based on LFSR/SRs by allowing permutation of inputs. The inputs are indexed and the set of indices is determined for the set of inputs (say Ij) driving an output cone (say Oj). The difference (say dj) between the maximum and minimum values among the indices for the inputs in Ij is then determined. The inputs are assigned a permutation of the indices such that the dj values for all output cones are minimized. If d is the maximum value among various dj values for all output cones, then an LFSR/SR based on a primitive 20 polynomial of degree d is sufficient to exhaustively test all the output cones of the circuit. We have designed circuit-specific TPGs based on the principles of LFSR/SRs and LFSR/XORs. Our TPG designs utilize minimal hardware like LFSR/SRs and generate minimal test sets like LFSR/XORs. 2.3 B ounds on Test Lengths Barzilai et al. [4] determined an upper bound on the degree of LFSR/SR for pseudo- exhaustive testing of an (n , m, k ) circuit as follows. Let the n inputs be denoted by {$i, 02, . . . , 0n}, An LFSR/SR is said to be applicable if it can generate an exhaustive set of patterns for all output cones in the circuit. Let w\ be the degree of the feedback polynomial P(x) used in the LFSR/SR structure. Consider an output Oj being driven by the inputs { 0 ^ ,0 ^ ,... ,0ik}. It is shown in [4] that P (x) generates an exhaustive set of patterns for Oj if and only if each polynomial Q(x) of the form Lg=i O qX,q (where aq = 0 or 1 and not all of them are zeros) is not divisible by P(x). There are (2* — 1) such polynomials Q(x) of degree at most n. Each one of the polynomials Q(x) is divisible by no more than (n/w i) distinct primitive polynomials of degree w\. Therefore an upper bound on the number of inapplicable primitive polynomials of degree wi for Oj is given by (n/u>i) x (2k — 1). Considering all m outputs, the total number of inapplicable polynomials of degree wi is bounded above by (n x m x (2f c — l)/it>i). The total number of primitive polynomials of degree w\ is given by $(2 U '1 — l)/u>i, where $ is Euler’s phi function [11]. To ensure an applicable polynomial of degree w i, the total number of inapplicable polynomials of degree W\ must be less than the total number of primitive polynomials of degree Wi. In other words, the expression (n xm x (2h — l)/w i) must be smaller than the expression 4>(2u;i — l)/u>i. This implies n x m x (2k — 1) < 4>(2U '1 — 1) fa 2Wl. Therefore there exists an LFSR/SR of degree wx for an (n, m, k) circuit provided w\ satisfies the relation wi > log (n x m x (2 k — 1)). Akers [3] determined an upper bound on the degree of an LFSR/XOR for pseudo- exhaustive testing of an (n, m, k) circuit as follows. Consider an output Oj be ing driven by the inputs I = { 0 ^ ,0 ^ ,... ,$ik}. It has been shown in [3] that an LFSR/XOR structure can generate an exhaustive set of patterns for Oj if and only 21 if the inputs in I are assigned k linearly independent test signals. A set (say L) of 2 k unique test signals can be derived by taking all possible linear sums of these k linearly independent test signals. An input 0 £ I that drives output 0 ' is assigned a test signal not contained in L. This arrangement ensures that the test signal assigned to 0 is linearly independent with the test signals assigned to the subset of inputs in I that drive Oj. The linearly independent test signals assigned to the inputs driving O'- leads to another set of at most 2 h unique test signals. For all m outputs, the total number of unique test signals that can be possibly generated is bounded above by (m X 2k). An LFSR/XOR of degree w2 can generate 2W2 unique test signals. The LFSR/XOR can assign linearly independent test signals to all n inputs provided the relation 2W 2 > m x 2k is satisfied. Therefore there exists an LFSR/XOR of degree w2 for an (n ,m ,k ) circuit provided w2 satisfies the relation w2 > (k + log m). 22 C hapter 3 P artitioning for Cone Size R eduction 3.1 Introduction Pseudo-exhaustive testing ensures exhaustive testing of all the output cones in a circuit. A circuit with an output cone driven by an unacceptable number of inputs needs to be partitioned to reduce the test time associated with pseudo-exhaustive testing. Our goal is to place segmentation cells [16, 19] in a given circuit so that each cone is driven by an acceptable number of inputs during the test mode. This is referred to as p a rtitio n in g for cone size red u ctio n . It should be noted that the partitioning of a given circuit may not result in disjoint subcircuits. Segmentation cells add area overhead to the original circuit and introduce delays during the normal mode of operation. Hence it is desirable to place the minimum number of cells in the circuit without affecting the critical paths. We shall adopt a partitioning strategy that does not impose that the partitioned subcircuits be disjoint. For an (n, m, k ) circuit, each output cone requires at most 2k patterns for ex haustive testing and the circuit requires at least 2 k patterns for pseudo-exhaustive testing. The value of k may be so large as to make pseudo-exhaustive testing infeasi ble. The test time can be reduced by restricting the size of all the output cones to be below some desired value, say r, referred to as the cone size limit. The size of each output cone can be reduced by partitioning the circuit, i.e. inserting segmentation cells. In general it is desired to minimize the number, say s, of such cells. The cells are transparent during the normal mode of operation, and act as pseudo-inputs and pseudo-outputs during the test mode. W ith s segmentation cells, the original (n, m, k) circuit becomes an (n + s, m + s, r) circuit during the test mode. 23 □ Segm entation Ceil (a) 01 02 03 04 05 •" 06 > ° 3 J * 0 4 (b) Figure 3.1: A partitioned circuit in (a) normal mode (b) test mode 24 E x am p le 1 Consider the (6,2,6) circuit shown in Figure 3.1(a). The six inputs are denoted as 0 \ through 0 6 and the two outputs are denoted as 0 \ and 0 2, respectively. The circuit requires 26 patterns for both exhaustive and pseudo-exhaustive testing. Let us consider partitioning the circuit for a cone size limit of three inputs. The circuit is partitioned with two segmentation cells so that each output cone depends on at most three inputs. During the test mode the circuit becomes an (8,4,3) circuit as shown in Figure 3.1(b). The two pseudo-inputs are denoted as 0? and 08 and the two pseudo-outputs are denoted as O3 and O4, respectively. All four output cones are exhaustively tested concurrently with 23 patterns. □ 3.1.1 C ircuit M od el A combinational circuit is modeled as a directed acyclic graph. The nodes of the graph represent the inputs, gates and outputs of the circuit. The edges of the graph represent the fanout branches of the interconnection signals. The fanout stems of the signals are not represented explicitly in the model. The input nodes have only fanout edges and the output nodes have only fanin edges. The gate nodes have both fanin and fanout edges. An output cone of a circuit forms a subgraph and two output cones can overlap, i.e. share nodes and edges. Figure 3.2 shows the graph model of the (6,2,6) circuit in Figure 3.1(a). The input and output nodes are shown as squares and the gate nodes are shown as circles. The nodes are labelled as nj through n 14 and the edges are labelled as e\ through ei4. Input nodes n\ through n6 correspond to the circuit inputs 04 through 08, respectively. Output nodes riiz and ni4 correspond to the circuit outputs 0 1 and 0 2, respectively. 3.2 P roblem Statem ent Two interesting problems arise depending on where the segmentation cells are al lowed to be placed in a circuit. In the first case, cells are allowed to be placed on the signals (edges) in the circuit. Cells placed on the fanout branches of the same signal can be merged into a single cell. This is referred to as the edge p a rtitio n in g p ro b lem (E P P ). In the second case, cells are allowed to be placed only on the gate 25 ®13 ” 11 «10 Figure 3.2: Circuit graph 26 outputs (nodes). This is referred to as the node p a rtitio n in g p ro b lem (N P P ). The formal statements of the problems are given below. 3.2.1 E dge P artition in g P roblem Given an (n , m, k) circuit with / being the maximum fanin of any gate and an integer r such that / < r < k, find an assignment of the minimum number of segmentation cells to the signals (edges) such that each output cone depends on at most r inputs in the test mode. 3.2.2 N o d e P artition in g P roblem Given an (ra, m, k ) circuit with / being the maximum fanin of any gate and an integer r such that / < r < k, find an assignment of the minimum number of segmentation cells to the gate outputs (nodes) such that each output cone depends on at most r inputs in the test mode. n 1 n2 n3 n4 n 12 n13 Figure 3.3: EPP and NPP E x am p le 2 Consider the graph model of the (4,3,4) circuit shown in Figure 3.3. The output nodes n i2, riis and n 1 4 depends on the input node sets { n i,n 2,n 3}, {n1,n 2,n 3,n 4} and {n2,n 3 ,n 4 }, respectively. Let r = 3. For the EPP, the two best solutions require a cell to be placed on the edge e\ or e2. For the NPP, the two best 27 solutions require two cells to be placed on the nodes (1) n& and n6, or (2 ) n 7 and ng. The two cases can be explained as follows. Let us first consider the EPP. Since only 7113 depends on four input nodes, a cell needs to be placed on either or e2. Placing a cell on any of these edges does not affect the input dependencies of the other output nodes. Let us next consider the NPP. To restrict the input dependency of n 13, a cell needs to be placed on either n % or n 7. However placing a cell on n6 increases the input dependency of n12. Similarly, placing a cell on n 7 increases the input dependency of 7114. Therefore placing a cell on n6 requires an additional cell to be placed on n5. Similarly placing a cell on n 7 requires an additional cell on n s. □ The complexity of the EPP depends on the number of signals, while that of the NPP depends on the number of gates in the circuit. Since the number of signals is more than the number of gates, the solution to the EPP involves a larger search space than the solution to the NPP. However, the EPP forms a generalized case of the NPP and hence yields better solutions as evident by Example 2. The following lemmas characterize both EPP and NPP. L em m a 1 For circuits without reconvergent fanouts, both EPP and NPP lead to equally good results. P r o o f : For a circuit without reconvergent fanouts, the fanout branches of any signal always feed different output cones. Consider any signal in the circuit that has fanout branches. Placing a cell on one of the fanout branches feeding some output cone is beneficial to that output cone. However placing the same cell on the fanout stem instead of the fanout branch is beneficial to all the output cones fed by that signal. Hence it is preferable to place the cell on the fanout stem rather than on the fanout branches. Thus the solution to both EPP and NPP involves the same search space. Thus both EPP and NPP lead to equally good results. □ L em m a 2 For circuits with reconvergent fanouts, EPP leads to better results than NPP. P ro o f : Let us consider a circuit with reconvergent fanouts and an optimal solution for the NPP. This forms a solution for the EPP since the cells are placed on the 28 fanout stems of the signals. An optimal solution for the EPP can only have fewer or the same number of cells as an optimal solution for the NPP (see example 2). Thus EPP leads to better results than NPP. □ Hence, it is sufficient to consider the NPP for non-reconvergent fanout circuits. Any optimal solution for the NPP can be transformed to a corresponding optimal solution for the EPP and vice-versa. However, if the circuit has reconvergent fanouts, the two-way transformation is not feasible as evident by Example 2. 3.3 Problem Form ulation We have formulated both EPP and NPP as classical integer linear programming (ILP) problems. The o b jective function is to minimize the number of segmen tation cells to be placed in the circuit. Let r be the desired cone size limit. The c o n strain ts are that all the gates in the circuit must depend on at most r inputs during the test mode. The problem formulation is illustrated with the circuit graph shown in Figure 3.2. 3.3.1 N o ta tio n v A U O ' i / , di Xi = IiXi \IiXi\ logical OR operation logical AND operation set union operation pseudo-input created when a cell is placed on the signal (edge) e * input dependency set for the signal (edge) e ,- input dependency value for the signal (edge) e;, where d{ = |/;| -{ { 1 if a cell is ] 0 otherwise. /, if Xi is 1, 0 otherwise. di if Xi is 1, 0 otherwise. 29 3.3.2 B asic P rinciple Consider the example gate and its graph model shown in Figure 3.4. The gate g is represented by the node n and the signals s4 through s4 are represented by the edges e\ through e4, respectively. The set of inputs feeding n through e\ is either 1\ or 6 [ depending on whether a segmentation cell is placed on ei. Thus the set of inputs feeding n through e\ can be expressed as (I\X\ U {0J}xi). The expression is valid because of the complementary nature of the boolean variables x\ and x\. The number of inputs feeding n through e\ is |/ i i i U = d\X\ + x x. Similarly, the set of inputs feeding n through e-i can be expressed as I 2x 2 U {02 }x 2. The number of inputs feeding n through e2 is \I2x2 U {^2}x2| = ^2^2 + x 2. S1 — I '\ S2 — J 3 - ' (b) 53 54 (a) Figure 3.4: (a) An example gate and (b) its graph model The sets of inputs I3 and / 4 driving e3 and e4 are given by h = h = { h x x U { 6 [ } x i } U { / 2x 2 U {0'2} x 2} = {I i U I 2} xxx2 U { { ^ j } U I2} x i x 2 U { / 1 U {^ 2 } } x 1x 2 U {O'x. O ' ^ X ^ and the input dependencies d3 and d4 for e3 and e4 are given by d3 = d4 = |/ i U I2\x\X2 + |{ 0 j} U l2\x \ x 2 + l-fl U {^2)Ix1x2 + 2xix2 In the above equation, if one of the product terms is satisfied, the rest of the terms becomes zero. In other words the terms are mutually exclusive. Hence the addition 30 of the terms is valid. Cells placed on different fanout branches of the same stem can be merged together and replaced with a single cell. For example, the cells placed on the signals S3 and S4 can be merged as a single cell and placed on the fanout stem o the gate. 3.3.3 E dge P artition in g P roblem Consider the circuit graph in Figure 3.2. Let r = 3. The objective is to minimize the function F = xi + x 2 + £3 + x 4 + x5 + x 6 + x 7 + x 8 + (ar9 V x 10) + (zn V x 12) + x i3 + x i4 (3.1) subject to the constraints di < r, W i = 1 ,2 ,..., 14. The non-linearity of the objective function is due to the following reason. For the fanouts of n9, if cells are placed on both eg and eio, then the cells can be merged as a single cell and hence the non-linear term (x9 Vxio) appears in the objective function. The pseudo-inputs created for the fanouts, 6 ' g and 0 'w , are the same since the cells are merged. Cells on the fanouts of the primary inputs and the fanins of the primary outputs do not occur in any optimal solution. Hence Xi = x 2... = x& = 0 and x1 3 = £14 = 0. Therefore the objective function in Equation 3.1 is reduced to F = £7 + Xg + (xg V £ 10) + (^11 V £ 12) (3-2) 3.3.3.1 Linearization Each non-linear term in the objective function can be replaced by a linear term by the following transformation steps. 1. Replace each non-linear term by a new boolean variable. An O R term of the form £1 V £2 V ... V x n is replaced by a variable y and an A N D term £1 A £2 A ... A £„ is replaced by z. 2. Add two constraints for each non-linear term. For the O R term, add the constraints y < £1 + £2 + ... + xn < ny. For the A N D term, add the constraints nz < £j + £2 + ... + x n < z + n - 1. 31 It can be easily verified that the new variables and the new constraints preserve the original objective function. Hence the non-linear objective function of Equation 3.2 can be replaced by the new linear objective function F = x7 + x8 + yi + y2 along with four new constraints Vi < x9 -fx i0 < 2yi 2/2 < X n + X12 < 2 y 2 The input dependency sets /; and the input dependency values d,-, V ? = 1, are computed and listed below. h = {0i}; h = {02}; I3 = {^3}; di = d2 = c ? 3 = 1 I 4 = { 0 4 } ; h = { ^ 5 } ; h — { 0 e } ; d i = d$ = de = 1 h = h U / 2 = {0i,02}; d7 — 2 /s = ^ 4 U h = {04,0s}; dg = 2 7g = h o — { h %7 U { ^ } x 7 } U { 7 3 } = {{0 1 , 02 } X7 U {^7"}X7 } U {# 3 } = {01>#2)03}x7 u {03, ^7 } X7 d9 = dio = 3x7 -f- 2x7 7 n = ^12 = {78X8 U {d g jx s} U { /e } = { { ^ 4 , ^ 5 } x 8 u { d s } 3^ } U { ^ 6 } = {^4)^5)^6}x8 u { 0 6 , 0 g }x s d n = d i2 = 3x8 + 2x8 / 1 3 = {7gX9 U {0 9 } x 9} U { / 11X11 U { ^ i J x i i } = {7g U / i i } x 9 x n U {{d g } U 7 ii } x 9 x n U {7g U { d j ^ j x g x n U {0 9 ,0'n } x 9 x n = ({d l j 02) 03) 04) 05) 06}X7X8 U {01, 02, 03, 06, 0g}x7Xs U {03, 04, 05, 06, 07}x7X8 U {03, 06, 07, 0g}x7X 8)x9X ll U (3.3) (3.4) 2 ,...,1 4 32 ({04,05,06,0g}a:8 U {0q,0'% ,6'9} x s ) x 9x 11 U ({0i , 02, 03, 0n ) x 7 U {03,0 7,0n}x 7)x9x n U {0g, 0ll)®9®ll di3 = (6x7x8 + 5x7x8 + 5x7x8 + 4x7x8)x9x n + (4x8 -f 3x8)x9z n + (4x7 + 3x7)x9xn + 2x9x n l u = {^10^10 u {0;o}*io} U {/l2®12 u {^12)^ 12} = {/10 U I\2}x\3x \ 2 U {{0{o) U 112)^ 10^12 U {/10 u {@ i2 }}xioX\ 2 U {^ioj ^i2}a ? ioa:i2 = ({01,02, 03, 04,05, 0e}*7®8 U {01, 02, 03, 03, 0 '8 }x7Xs U {03, #4, 05,06, ^7}*7*8 U {03,06,07,08}® 7X 8)XiO Xi2 U ({04, 05, 06, 0{o}^8 U {06, 08, 0{o}a;8)3;lOXl2 U ({01,02,03,012)^7 U {03,07, 0i2)a;7)xioXi2 U {010, @ 12 }X10X 12 d14 = (6x7x8 + 5x7x8 + 5x7x8 + 4x7x8)xi0xi2 + (4x8 + 3x8)xi0xi2 + (4x7 + 3x7)xi0xi2 + 2xi0Xi2 Since r = 3, from the expression for di3 we have the constraints xgxn = 0; x8x9xn = 0; x7x9x n = 0 (3.5) Similarly, from the expression for di4 we have the constraints ®io»i2 = 0; x7xi0xi2 = 0; x8xi0xX 2 = 0 (3.6) The constraints in Equations 3.5 and 3.6 are linear as evident by their dual inter pretation. Consider the constraints in Equation 3.5. The dual of the first constraint X 9X 11 = 0 is given by the linear constraint x9 + Xn > 1. This constraint implies that at least one cell needs to be placed on either eg or en; otherwise di3 > r. 33 For the ILP problem with the objective function given by Equation 3.3 and the set of linear constraints given by Equations 3.4, 3.5 and 3.6, an optimal solution is given by X7 — X$ = 0; Xg = Xjo = Xu = X\2 = 1 F = 2 3.3.4 N o d e P artition in g P roblem For the NPP, since the cells are allowed to be placed only on the fanout stem of signals, the variables associated with the fanout branches of a signal are considered the same. Hence xg = xjo and *n = #12. The objective function for the NPP is given by F = x 7 + Xg + Xg + i n (3.7) and the constraints are given by Equation 3.5. Thus for NPP, the objective function is linear with a reduced set of linear constraints. For the ILP problem with the objective function given by Equation 3.7 and the set of linear constraints given by Equation 3.5, an optimal solution is given by x 7 = x& = 0; xg = xn = 1 F = 2 The solution implies that two segmentation cells need to be placed at the fanout stems of rig and n 10. Lemma 1 is applicable since the circuit graph has no reconver gent fanouts. Thus we have identical optimal solutions for both EPP and NPP. The details of the formulation can also be found in [33]. 3.3.5 P rob lem C om p lexity The number of nodes and edges that need to be considered for the ILP formulation can be reduced to some extent. Later we will show that certain nodes and edges need not be considered as they are guaranteed to be absent in any optimal solution. 34 In spite of this reduction, the ILP formulation is not feasible for large circuits. This is due to the fact that the number of constraints grows non-linearly with the number of levels in the circuit. Thus the formulation can be used to obtain optimal solutions only for reasonably small circuits. Since the complexity of the NPP is less than that of the EPP, henceforth we shall only concentrate on the NPP. Our approach can be easily extended for the EPP. 3.4 Test M ode Configuration The details of a segmentation cell are shown in Figure 3.5. Consider a cell being placed at the output of gate G as shown in the figure. The cell is composed of a MISR (multiple input signature register) cell for compacting responses for the pseudo-output and a SR (shift register) cell for generating patterns for the pseudo input. A multiplexer is used to select the output of G during the normal mode and the pseudo-input from the SR cell during the test mode. The test mode configuration of a circuit partitioned with two segmentation cells is shown in Figure 3.6. Consider a combinational circuit C being driven by an input register R l and driving an output register R2. Let us assume that C is partitioned with two seg mentation cells s i and s2 as shown in Figure 3.6. During the test mode, R l is modified as a TPG and R2 is modified as a SA for the circuit. The SR cells of s i and s2 are configured as part of the TPG as shown in the figure. Similarly, the MISR cells of si and s2 form the part of the SA. The output cones of the partitioned circuit can be exhaustively tested either in a single test session or in multiple test sessions. 3.5 Partitioning Special Circuits We shall consider the following classes of combinational circuits and develop efficient partitioning strategies for restricting their maximum cone sizes to desired values. • Circuits without any fanouts. • Circuits without any reconvergent fanouts. 35 MISR call / SR call / Figure 3.5: A segmentation cell 36 TPG MISR cell SR cell T s 2 R2 SA Figure 3.6: Test mode configuration of a partitioned circuit. • Circuits with reconvergent fanouts. • Circuits with two levels of gates. • Circuits with multiple levels of gates. • Circuits with iterative logic array structures. The strategies take advantage of the unique structural properties of these circuits. 3.5.1 D efin itions We shall present a few definitions that are illustrated using the circuit graph shown in Figure 3.2. The sets of nodes {rai, n2> • ■ • > ^6}, { ^ r a g ,.. •, rai2} and {rai3,rai4} represent the sets of inputs, gates and outputs of the circuit, respectively. Node rii is said to feed node rij (node nj is said to be fed by node n;) if there exists a directed path from node n; to node rij. The set of all nodes that feed a node is called its fanin cone and the set of all nodes that are fed by the node is called its fanout cone. The fanin and fanout cones of node n 10 are {n4, 715, «6, rig} and { n n ,n i2,n i3, ni4}, respectively. The set of all inputs that feed a node is called its (input) dependency set. The (input) dependency of node ra ,-, denoted as d,-, is given by the cardinality of its dependency set. The dependency set of node ra40 is {ra4,ra5,n 6} and dw = 3. An articulation node [10] is a node that is contained in all paths originating from its fanin cone and ending in its fanout cone. Node n i0 is an articulation node since it is contained in all paths originating from its fanin cone {ra4, n5, «6, «g} and ending in its fanout cone { n n ,n i2,n i3, «i4}. Consider two nodes n, and nj such that rij is in the fanout cone of ra ,-. There may exist some inputs that feed rij only via n,-. The number of such inputs is called the articulation value of r a ,- with respect to rij and denoted as atJ. For example, «io,n = 3 since the input nodes {n4,ras,n6} feed ran °nly via raio- For a specified cone size limit, the nodes with dependencies no greater than the cone size limit are called the good nodes. The remaining nodes with dependen cies greater than the limit are called the bad nodes. For the cone size limit of three, the good and the bad nodes for the circuit graph are {rai,n2, ... ,«io} and 38 {nn, n i2,n i3,n i4}, respectively. The set of bad nodes that are directly fed by at least one good node is called the envelope. For our example, the set { n n ,n i2} forms the envelope. Consider a good node n,- such that it is not an input node and feeds at least one bad node. The node n,- is called a can d id ate node if there exists no good node nj such that nj is contained in all paths from n ,- to all bad nodes in the fanout cone of 7 i,-. For example, node rc8 is not a candidate node since the good node nio is contained in all paths from n8 to bad nodes {7111, 7112, 7113, 7114} in the fanout cone of ti8. On the contrary, nodes T ig and Tiio are candidate nodes. 3.5.2 P ru ning Search Space Our goal is to partition a given circuit using the minimum number of segmentation cells in order to restrict the maximum cone size of the circuit to some desired value. We shall present a few observations that help in pruning the search involved in the solution space. O b serv atio n 1 A circuit graph with no reconvergence among paths contains only articulation nodes. O b serv atio n 2 The articulation value of an articulation node ni with respect to any node in its fanout cone equals the input dependency of node rii. The circuit graph shown in Figure 3.2 contains only articulation nodes as there is no reconvergence among paths. Consider the articulation node 7i1 0 in the circuit graph. The articulation value of Tiio with respect to any node in its fanout cone equals dw . O b serv atio n 3 Placing a segmentation cell on node rii decreases the input depen dency of node nj in the fanout cone of ni by (a. j — 1). Consider two nodes 71, • and nj such that nj is in the fanout cone of 7 1,-. Placing a cell on node 7if - restricts the dependencies of all inputs that feed rij only via n,-. In addition, the cell creates a pseudo-input at node ti,-. Thus the input dependency of node rij gets decreased by (a,j — 1). 39 The following results identify a subset of nodes in the circuit that need not be considered for placing segmentation cells. Placing cells on these nodes does not lead to an optimal solution. Thus the following results help prune the search space. L em m a 3 The fanouts of any input node and the fanin of any output node need not be considered for placing segmentation cells. L em m a 4 Let node nt - be an articulation node such that its input dependency is no greater than the cone size limit. Let all nodes in the fanin cone of n, feed only the nodes in the fanout cone of n, . Then the nodes in the fanin cone of m need not be considered for placing segmentation cells. P ro o f : Consider node rij in the fanin cone of the articulation node n,-. Both n,- and rij feed the same set of bad nodes. Placing a segmentation cell on n,- reduces the dependencies of all bad nodes in the fanout cone of nt - by (d,- — 1). On the contrary, placing a segmentation cell on rij reduces the dependencies of all bad nodes in the fanout cone of n,- by only at most (dj — 1). Since d, > dj, it is always better to place a cell on n,- than on rij. Hence, it is not necessary to consider the nodes in the fanin cone of n, for placing segmentation cells. □ L em m a 5 Non-candidate good nodes need not be considered for placing segmenta tion cells. P ro o f : Consider a good node nj that is not a candidate node and feeds at least one bad node. There must exist a candidate node n, such that n, is contained in all paths from nj to any bad node n*. In other words, both n,- and nj feed the same set of bad nodes. Placing a segmentation cell on n,- decreases d* by (a,-^ — 1). Similarly, placing a segmentation cell on nj decreases d* by (a^jt — 1). Since a,-,* > ajtk for any bad node n*, it is always better to place a cell on n; than on nj. Hence non-candidate good nodes need not be considered for placing the segmentation cells. □ It should be noted that the sets of good nodes, bad nodes, envelope and candidate nodes are dynamic, i.e. they change as segmentation cells are added to a given circuit. To satisfy a desired cone size limit, cells are placed sequentially, i.e. one at a time in the circuit till all nodes become good nodes. Nodes that are not considered for cell placement due to Lemma 5 will not affect the optimality of any solution. In other words, an optimal solution can be obtained without considering these nodes. 40 3.5.3 Fanout-free C ircuits A circuit without any fanouts can be trivially partitioned into disjoint subcircuits, where each subcircuit has only one output. The graph model of each subcircuit forms a directed tree structure consisting of only articulation nodes. We shall present a partitioning strategy for determining the optimal number of segmentation cells required to satisfy the desired cone size limit for such circuits. Figure 3.7: A portion of the graph model of a multi-level fanout-free circuit Consider a portion of the graph model of a multi-level fanout-free circuit shown in Figure 3.7. Let the nodes be assigned unique indices such that a node has a lower index than all the nodes in its fanout cone. Let nq be the bad node with the minimum index. Let the good nodes S = {ni,,n,-2, ... , n,p} directly feed nq as shown in the figure. Among all the nodes in the fanin cone of nq, it is sufficient to consider only the nodes in S as candidates for placing segmentation cells in order to reduce dq (as per Lemma 4). This basic principle is used in the partitioning procedure FanFreeCkt to determine the optimal number of segmentation cells required to satisfy the cone size limit. For a two-level fanout-free circuit, the nodes in S form the first-level nodes and nq forms the second-level node. Hence all the nodes in S form the candidates for placing cells. Procedure FanFreeCkt can be used for two-level fanout-free circuits to obtain optimal solutions. 41 D efinition 1 (Corm en 90) (Elements of Greedy Strategy) • A problem exhibits greedy choice property if a global optimal solution can be arrived at by making a locally optimal (greedy) choice. • A problem exhibits optim al substructure property if an optimal solution to the problem contains within it optimal solutions to subproblems. The greedy choice and optimal substructure properties are the two ingredients that are exhibited by most problems that lend themselves to a greedy strategy [9]. The optimal substructure property is exploited by both greedy and dynamic pro gramming strategies. A greedy strategy can be applied to the partitioning problem for fanout-free circuits due to the following lemma. Lem m a 6 The partitioning problem for fanout-free circuits exhibits both greedy choice and optimal substructure properties. P roof : Consider the portion of a fanout-free circuit graph shown in Figure 3.7. The nodes in S = {ntl,n,-2, . . . , n,- } directly feed the bad node nq. The benefit b {j of placing a cell on node 6 S is given by (dtj — 1) since the cell reduces the dependencies of all nodes in the fanout cone of n;y by (d{] — 1). The benefits of nodes in S are independent of each other. Assume that the nodes in S are sorted in the non-increasing order of their benefits. A cell is placed on rat - , for reducing dg to satisfy the cone size limit (say r ). If required, a second cell is placed on n,-2, a third cell is placed on n,3 and so on until dq < r. Let L C S be the set of nodes selected for placing cells to reduce dq. We shall show that L is a local optimum solution for reducing dq. Consider a local optimal solution L* C S. Let the nodes in L* be ordered according to their benefits and assume that node n,* € L* has the maximum benefit among all nodes in L*. We know that node € S has the maximum benefit among all nodes in S and € L. If n * 2 7 ^ n ,j, then there exists another optimal solution L* = L* — {«,* } + {n,-,} since |L*| = |L*| and 6;, > 6?. Repeating this argument for the remaining nodes in L, we can transform L* to include all the nodes of L. Hence X is a local optimal solution for reducing dq. Let Gbea set of nodes selected for placing cells representing a global optimum solution. Let L be a local optimum solution for reducing dq. Assume G contains 42 a local non-optimal solution V for restricting the dependency of node nq. Since V is non-optimal, \L'\ > \L\. The set of nodes G' = (G — L' U L) is another global solution with IG'I < \G\ contradicting the optimality of G. Thus any global optimum solution contains local optimal solutions. □ Procedure FanFreeCkt Input: Circuit graph and cone size limit r. O utput: Partitioned circuit satisfying the cone size limit. 1. Assign unique index to each node such that the node has a lower index than all the nodes in its fanout cone. 2. While there exists a bad node do (a) Select the bad node (say nq) with the minimum index. (b) Determine the good nodes S = {n,j,n,-2, . .. , n,p} that directly feed node nq. (c) Determine the benefit 6q of placing a segmentation cell on node n q , Vnq € 5. bij dij — 1 f°r ni, G S. (d) Sort the nodes in S in the non-increasing order of their benefits. Let 6,, > 6 ,-j > ... > 6,p. (e ) i ♦ - I- (f) While dq > r do i. Place a segmentation cell on nq . ii. Update the dependencies of all nodes in the fanout cone of nq. iii. j <- j + 1. T h e o re m 1 The partitioning procedure for fanout-free circuits determines the opti mal number of segmentation cells to satisfy the cone size limit and is of polynomial complexity. 43 P ro o f : The outer while loop in the procedure selects the bad node nq with the minimum index so that during the process of reducing dq, the dependencies of some of the other bad nodes may also get reduced. The benefits of the nodes in S are independent of each other and hence the sorted order of benefits is valid during every iteration of the inner while loop. The node with the best possible benefit is selected during every iteration of the inner while loop. The procedure selects the optimal number of cells for reducing dq to satisfy the cone size limit since the partitioning problem exhibits the greedy choice property (refer to Lemma 6). Every iteration of the outer while loop selects an independent subproblem of re stricting the dependency of the bad node nq with the minimum index and determines an local optimum solution for the subproblem. The procedure determines a global optimal solution since the partitioning problem exhibits the optimal substructure property (refer to Lemma 6). The complexity of assigning unique indices to nodes is linear in terms of the number of nodes in the circuit graph. The assignment is done by traversing the circuit graph in breadth first search manner. The main computation step involves sorting the nodes in S during every iteration of the outer while loop. Thus the complexity of the procedure is 0 ( N 2 log N ), where N is the number of nodes in the circuit graph. □ 3.5.4 N on-R econvergent Fanout C ircuits A circuit with non-reconvergent fanouts has multiple outputs and the circuit graph consists of only articulation nodes. Lemma 7 states the characteristics of non- reconvergent fanout circuits that makes it difficult to obtain optimal solutions for the partitioning problem. Theorem 2 states that the partitioning problem for non- reconvergent fanout circuits requires a procedure of exponential complexity to de termine an optimal solution. We shall present an efficient partitioning strategy for determining a minimal number of segmentation cells required to satisfy a desired cone size limit. L em m a 7 The partitioning problem for non-reconvergent fanout circuits does not exhibit the optimal substructure property. 44 Figure 3.8: Graph model of a two-level non-reconvergent fanout circuit 45 P r o o f : (by example:) Consider the two-level non-reconvergent fanout circuit shown in Figure 3.8. The first-level nodes through n5 have input dependencies of 3,4,4,4 and 3, respectively. Each of the output nodes 0\, O2 and O3 has a dependency of ten inputs. The dependencies of the first-level and output nodes are shown in the figure. Let us consider partitioning the circuit for a cone size limit of seven inputs. The global optimal solution is given by placing two segmentation cells on nodes n\ and n 5. The local optimum solutions for output cones 0\, O2 and O3 are obtained by placing segmentation cells on nodes n2, nz and n4, respectively. Thus the global optimal solution does not contain the local optimum solutions. □ T h e o re m 2 (B h a tt8 6 ) The partitioning problem for non-reconvergent fanout cir cuits is NP-complete. It has been proved in [6] that the partitioning problem for fanout circuits is NP- complete. The example circuit considered in the proof is a two-level non-reconvergent fanout circuit. Procedure NonReconvFanCkt describes our partitioning strategy for placing a minimal number of segmentation cells in the circuit. The heuristic procedure is of polynomial complexity and yields good suboptimal solutions. The procedure is applicable to both two-level and multi-level non-reconvergent fanout circuits. P ro ce d u re N onR econvF anC kt In p u t: Circuit graph and cone size limit r. O u tp u t: Partitioned circuit satisfying the cone size limit. 1. While there exists a bad node do (a) Determine the set (say 5) of candidate nodes. (b) For each node n,- € S, determine the set of output nodes 5,- fed by n,-. (c) For each node n,- G S, determine the benefit 6, - of placing a cell on n,-. bi <- E o x6S, {m in { < /,• - 1 ,d'x - r } ,0 } where d 'x is the dependency for output node Ox € 5,-. 46 (d) Determine the node n* 6 S that has the maximum benefit and place a cell on n*. (e) Update the dependencies of all the nodes in the fanout cone of n*. A good node «,• is a candidate node if it satisfies at least one of the following conditions: (1) n,- directly feeds a bad node or (2) n,- has multiple fanouts and feeds a bad node. The benefit of placing a segmentation cell on n,- is determined as follows. Let Ox be an output node with dependency d 'x and driven by rc,. Placing a cell on n ,- reduces the dependency of Ox by ( < /,• — 1). However, the dependency of Ox needs to be reduced only by the amount (d ! x — r) to satisfy the cone size limit. W ith respect to Ox, the benefit of placing a cell on n,- is measured as the minimum value between (di — 1) and (dx — r). If dx is already less than r, then there is no benefit for Ox by placing the cell on n ;. Thus the benefit of placing a cell on n,- with respect to Ox is given by the expression m ax {m in {di — 1 ,dx — r},0 }. If rii is a fanout node, then the benefit of placing a cell on n,- is measured as the cumulative sum of individual benefits for all output nodes driven by n,. Among the nodes in S, the node with the maximum benefit is selected for placing segmentation cell. Since every iteration of the while loop results in the transforma tion of some bad nodes to good nodes, the candidates are determined during every iteration. The main computation step involves determining the benefits of the nodes in S during every iteration of the while loop. The complexity of the procedure is 0 ( N 3), where N is the number of nodes in the circuit graph. 3.5.5 R econvergent Fanout C ircuits A circuit with reconvergent fanouts can have either single or multiple outputs. All the nodes in the circuit graph need not necessarily form articulation nodes. Lemma 8 highlights the contrast among the characteristics of circuits with and without recon vergent fanouts. L em m a 8 Let rik be a common node in the fanout cones of nodes ni and nj. Let the articulation values of ni and nj with respect to nk be and aj^, respectively. If 47 the circuit has (no) reconvergent fanouts, then the decrease in dk due to placement of two segmentation cells on n a n d nj can (not) be greater than (atijt + ajtk — 2). Figure 3.9: A portion of the graph model of a multi-level reconvergent fanout circuit P ro o f : Case 1: (Circuit with reconvergent fanouts) Consider a portion of the graph model of a multi-level circuit with reconvergent fanout as shown in Figure 3.9. Assume that nodes ni and nj are directly driven by the input node n\. Let n< and nj directly feed node n* as shown in the figure. Node nk is dependent on input node n\ even after placing a segmentation cell on either n,- or nj. In other words, the articulation values ati* and ajtk do not account for input node m . Placing two cells on n, ■ and nj makes njt no longer dependent on the input node n\ and dk gets decreased by (a,-^ + — 2 + 1). Case 2: (Circuit without reconvergent fanouts) Since the circuit graph contains only articulation nodes, each node has at most one directed path from each input node. Therefore placing two cells on r a ,- and nj decreases dk by (a,-,* + ajtk — 2). □ Due to this characteristics of reconvergent fanout circuits, multiple nodes need to be considered simultaneously for placement of segmentation cells in order to measure the true benefits. Hence it involves an exponential complexity procedure to determine optimal solutions. For partitioning two-level reconvergent fanout circuits, procedure NonRecon vFanCkt can be modified to obtain good suboptimal solutions. In the procedure, consider the expression for the benefit 6 ,- of placing a cell on a candidate node r a ,- with respect to an output node Ox. Since all nodes are articulation nodes for a non- reconvergent fanout circuit, the articulation value of r a ,- with respect to Ox is given 48 by di. Hence the value di is used in the expression. For a two-level reconvergent fanout circuit, a candidate node n,- need not be an articulation node. Hence the value di used in the benefit expression must be replaced by the articulation value of rii with respect to Ox. The same procedure can then be used for partitioning two- level reconvergent fanout circuits to determine a minimal number of cells required to satisfy the cone size limit. For multi-level reconvergent fanout circuits, procedure NonReconvFanCkt does not yield good suboptimal solutions due to the following reason. The benefit of placing a cell on a candidate node with respect to any output node in its fanout cone tends to be zero due to reconvergence among paths. Hence we need to adopt some other heuristic measure for the candidate nodes. We shall describe an efficient heuristic procedure of polynomial complexity in the next section. 3.5.6 Iterative Logic Arrays 3.5.6.1 O ne-dim ensional A rrays Consider the one-dimensional iterative logic array (ILA) shown in Figure 3.10. The array is composed of m identical cells referred to as acells. Each cell has p top inputs and q left inputs. Similarly, each cell has p bottom outputs and q right outputs as shown in the figure. Let the acells be indexed from 1 through m, respectively. The one-dimensional ILA is formed by feeding the right outputs of the ith acell to the left inputs of the (t + l)st acell. The one-dimensional ILA can be characterized with three parameters m, p and q. Theorem 3 states the optimal number of segmentation cells required for partitioning the array structure for a specified cone size limit. We shall refer to the segmentation cells as scells and assume that placement of scells is prohibited inside any acell. T h eo rem 3 Let r be the cone size limit for the one-dimensional ILA shown in Figure 3.10. Let i satisfy the following constraint pi + q < r < p(i -f 1) + q (3.8) 49 1 + 1 Figure 3.10: One-dimensional ILA Let s satisfy the following relation s = 9( f m / * l - l ) (3.9) Then s scells are necessary and sufficient for partitioning the ILA so that each output is driven by at most r inputs. P ro o f : Consider the unpartitioned ILA shown in Figure 3.10. The ith and (i + l)st acells are dependent on (pi + q) and (p(i + 1) + q) inputs, respectively. As per the constraint given by Equation 3.8, the dependency of (i + l)st acell exceeds the cone size limit r. The one-dimensional ILA forms a multi-level fanout-free circuit and hence each acell forms an articulation node. Hence by Lemma 4, only the inputs to the (i + l)st acell need to be considered for placing scells. It is therefore necessary to place q scells after the ith acell as shown in Figure 3.11 in order to restrict the dependency for the (i -|- l)st acell. seg m en ta tio n cells i+1 M T p Figure 3.11: Partitioned one-dimensional ILA Analyzing the dependencies of the acells with indices greater than i, it is evident that a second set of q scells have to be placed at the outputs of the 2«th acell in order to restrict the dependency for the (2i + l)st acell. The partitioning process is 50 repeated till the end of the array structure. Thus the total number of scells placed in the ILA is given by the expression provided by Equation 3.9. These scells are necessary since they are being placed at indispensible locations and are sufficient since each output is driven by at most r inputs in the partitioned ILA. □ It should be noted that the circuit graph of an one-dimensional ILA is a binary tree structure and hence forms a special case of fanout-free circuit. Procedure Fan FreeCkt described for fanout-free circuits can be used to obtain optimal solutions for partitioning one-dimensional ILA structures. The procedure places the optimal number of scells at indispensible locations described in the proof of Theorem 3. 3.5.6.2 Two-dim ensional Arrays Consider the two-dimensional ILA shown in Figure 3.12. Let the array be composed of n identical rows with each row consisting of m identical acells. Let the acells be indexed from (1,1) through (n ,m ) as shown in the figure. Each acell has p top inputs and q left inputs. Similarly, each cell has p bottom outputs and q right outputs as shown in the figure. The two-dimensional ILA is formed by feeding the bottom outputs of the acell (i,j) to the top inputs of the acell (* + l ,j ) and by feeding the right outputs of the acell (i,j) to the left inputs of the acell (i,j + 1). The two-dimensional ILA can be characterized with four parameters m, n, p and q. Theorem 4 states an upper bound on the number of scells required for partitioning the array structure for a specified cone size limit. T heorem 4 Let r be the cone size limit for the two-dimensional ILA shown in Figure 3.12. Let i\ > i2 > ... > ix > 1 satisfy the following row constraints {pia -f qa) < r < {p(ia + 1) + qa} Va = 1 ,2 ,..., x. (3.10) Let ji > j 2 > ... > j y > 1 satisfy the following column constraints {qjp + p/3) < r < {q{j/3 + 1) + p/3} V /? = l,2,...,y. (3.11) 51 2,2 a, 2 a, 1 a+1,1 L J n, 1 " P n, 2 ~ P 1 , ia+i • • • 2, la+1 • • a, la+ 1 • • a+ • • • • 1, m < r 2, m n, m Figure 3.12: Two-dimensional ILA 52 Let s satisfy the following relation a _ m {n9 (rW * « l- 1 ) + m p{\n/a} - 1)} Va = 1 ,2 ,..., x 1 3 — TTtZTt \ }(o<12) \ {m p(\n/jg] - 1) + nq(\m/(3] — 1)} V/? = 1,2, . . . , y J Then s scells are sufficient for partitioning the ILA so that each output is driven by at most r inputs. P r o o f : Consider the unpartitioned ILA shown in Figure 3.12. The acells (a, ia) and (a, i„ + l) are dependent on {pia + qa} and {p(ia+ l) + go} inputs, respectively. As per the row constraint given by Equation 3.10, the dependency of the acell ( a ,ia + l) exceeds the cone size limit r. It is sufficient to place qa scells in the first a rows between the iath and (ia -f l)st columns as shown in Figure 3.13 in order to restrict the dependency for acell (a, ia + 1). The dependency for the acell (a + 1,*’ 0) may also exceed the cone size limit. To restrict its dependency, it is sufficient to place pia scells in the first ia columns between the ath and (a + l)st rows as shown in Figure 3.13. The partitioning process is repeated by placing nq scells after every set of ia columns and mp scells after every set of a rows as shown in the figure. Thus the total number of scells placed in the ILA is given by the expression {nq( \m fia~ \ — 1) + mp{\n/ot\ — 1)}. These scells are sufficient since each output is driven by at most {pia -f qa) < r inputs in the partitioned ILA. Each of the x row constraints and y column constraints gives rise to an unique expression for the total number of scells placed in the partitioned ILA. The minimum value among these (x + y) expressions is given by Equation 3.12. It is sufficient to place s scells in the ILA in order to restrict the dependencies of all outputs below the cone size limit r. □ Procedure IL A provides a method for placing a minimal number of scells deter mined by Equation 3.12 on a two-dimensional ILA structure. The two-dimensional ILA has reconvergent fanouts and hence to optimally partition such an array re quires a procedure of exponential complexity. Theorem 4 provides only an upper bound on the number of scells required for the two-dimensional ILA structure for a 53 segmentation cells l l J 1.1 1.2 2,1 2,2 r i a, 2 o L J n, 1 T 1 * \ n, 2 • • • • m 1, m 2 , m T • • • • n, m Figure 3.13: Partitioned two-dimensional ILA 54 specified cone size limit. The applicability of this theorem is illustrated by Exam ple 3. For multi-dimensional ILAs, upper bounds on the number of scells required for partitioning can be derived in a similar fashion. Procedure ILA Input: Two dimensional ILA and desired cone size limit r. Output: Partitioned ILA satisfying the cone size limit. 1. Determine the parameters m, n, p and q for the ILA. 2. Determine a and ia satisfying the minimum value in Equation 3.12 of Theo rem 4. 3. c « — 1. 4. While (ca < n ) do (a) Place mp segmentation cells after cath row as shown in Figure 3.13. (b) c < — c + 1. 5. c <- 1. 6. While (ciQ < m) do (a) Place nq segmentation cells after czQ th column as shown in Figure 3.13. (b) c < — c + 1. 7. Check for the necessity of all segmentation cells placed in the ILA and remove those that are not necessary. E x am p le 3 Consider the two-dimensional ILA shown in Figure 3.14. The indices of the acells are shown in the figure. The ILA parameters (with respect to Figure 3.12) are p = q = 2 and n = m = 4. Let us partition the ILA for r = 8. 55 2,3 3,3 3,4 2,2 4,3 4,2 2,4 3,2 4,4 Figure 3.14: A (16,16,16) two-dimensional ILA circuit The ILA has identical row and column constraints due to the symmetrical struc ture. The row and column constraints (refer to Equations 3.10 and 3.11) are given by 2x3 + 2x1 < 8 < 2 x 4 + 2 x 1 2x2 + 2x2 < 8 < 2x3 + 2x2 2x1 + 2x3 < 8 < 2x2 + 2x3 The row and column constraints parameters (refer again to Equations 3.10 and 3.11) are x = y = 3, ii — j j = 3, *2 = J2 = 2 and i3 = j 3 = 1. Substituting all the parameters into Equation 3.12 yields s = 16. Theorem 4 states that 16 scells are sufficient for partitioning the ILA so that each output is driven by at most eight inputs. The 16 scells are placed in the ILA by the procedure IL A as shown in Figure 3.15(a). The input dependencies of the acells are shown in the figure. Alternatively, consider a greedy procedure that places scells only before those acells that have dependencies greater than the cone size limit r. The greedy procedure results in placing 20 scells as shown in Figure 3.15(b). The input dependencies of the 56 acells are shown in the figure. This case illustrates that the procedure IL A results in a lower number of scells than the greedy procedure. (a) (b) Figure 3.15: Logic array partitioned by (a) procedure IL A (b) greedy procedure for r = 8. Consider again the unpartitioned ILA and let us partition for r = 10. The row and column constraints (again refer to Equations 3.10 and 3.11) are given by 2 X 4 + 2 X 1 < 1 0 < 2 X 5 + 2 X 1 2 X 3 + 2 X 2 < 1 0 < 2 X 4 + 2 X 2 2 X 2 + 2 X 3 < 1 0 < 2 X 3 + 2 X 3 2 X 1 + 2 X 4 < 1 0 < 2 X 2 + 2 X 4 The constraints parameters (refer again to Equations 3.10 and 3.11) are x = y = 4, *i = ji = 4, i2 = j 2 = 3, i3 = j 3 = 2 and i4 = j 4 = 1. Substituting all the parameters into Equation 3.12 yields s = 16. Two among the 16 scells are found to be unnecessary and hence only 14 scells are placed by the procedure IL A as shown in Figure 3.16(a). Alternatively, the greedy procedure also results in placing 14 scells as shown in Figure 3.16(b). □ 57 10 10 10 (a) (b) Figure 3.16: Logic array partitioned by (a) procedure IL A (b) greedy procedure for r = 10. 3.6 P artitioning Large Circuits Large circuits invariably have reconvergent fanouts and hence require a procedure of exponential complexity to determine the minimum number of segmentation cells. For partitioning large circuits, we have developed an efficient heuristic procedure of polynomial complexity to obtain good suboptimal solutions. The heuristic is based on the articulation values of the nodes in the circuits. The benefit of placing a segmentation cell on a node is measured in terms of its articulation value with respect to other nodes in its fanout cone. Placing a cell on a bad node n, requires further placement of cells in the fanin cone of n, in order to reduce d,. Due to this nature, the measured benefit for the cell placed on n,- may get drastically reduced. On the contrary, placing a cell on a good node does not require placement of another cell in its fanin cone. Hence we shall consider only the good nodes for placing segmentation cells. Lemma 5 states that it is sufficient to consider only the candidate nodes among the good nodes. 58 3.6.1 H euristic M easure There are three minor variations to the main theme in our heuristic approach. These variations form the alternate heuristics H I, H 2 and H 3. Let G, B , E and C represent the sets of good nodes, bad nodes, envelope nodes and candidate nodes, respectively. The benefit of placing a segmentation cell on a candidate node n, is quantified by the heuristic measure h,-. The measure is primarily based on the articulation value of node n, with respect to the nodes in its fanout cone and varies with the heuristics. • H eu ristic H I: The heuristic measure hi for candidate node m is given by hi — Hnj£Bai,i + 12nke c ai,k- The heuristic gives equal importance to the bad nodes and other candidates. Bad nodes not in the fanout cone of rct - and candidates not fed by node n, • do not contribute to the heuristic measure. • H eu ristic H 2: The heuristic measure h, for candidate node rii is given by hi = ^2n}eB^ai,j + J2nke c ai,k• The heuristic gives twice the importance to the bad nodes compared to other candidates. Since the candidates already have input dependency values less than r, the benefits for the bad nodes are stressed compared to the benefits for the candidates. • Heuristic H3: The heuristic measure hi for candidate node m is given by hi — 52njeEai,j- The heuristic gives importance to only the envelope nodes. Since the envelope nodes feed the rest of the bad nodes, the benefits for all bad nodes are implicitly considered. 3.6.2 H euristic P rocedure The details of the heuristic procedure are provided below. Among the three heuristics H I, H 2 and H3, one of them is selected prior to invoking the procedure. P ro ce d u re C SR Input: Graph of (n,m , k) circuit and desired cone size limit r. Output: Partitioned circuit satisfying the cone size limit. 59 1. S * — 0. /* Initialize the set of segmented nodes */ 2. Determine the sets of good nodes G, bad nodes B and the envelope nodes E. 3. can d id a te (C ). /* Determine the set of candidate nodes G */ 4. For each node n,- € G, determine its heuristic measure /t,. • (Heuristic HI:) h, = + £n*eca.\fc • (Heuristic H2:) /* ,• = £ nj6B2atii + • (Heuristic H3:) A ,- = E n > e £ a« \i 5. Select the candidate node n* with the highest heuristic measure. Place a segmentation cell on n*; S < — S U n*. 6. Update the dependencies of all the nodes in the fanout cone of n*. 7. If there exists a bad node, then go to step 2. 8. check (5). /* Check the necessity of all segmented nodes */ P ro c e d u re can d id ate (G) 1. G <- 0; Q <- 0. /* Initialize the candidate set G ; create a FIFO queue Q. */ 2. / , ■ < — 0 for all n, 6 G. /* Initialize the labels for all good nodes G */ 3. count < — 1. /* initialize the label counter */ 4. Determine the good nodes (say G') that directly feed envelope E. 5. For each node n,- € G' do (a) /,• + — count; count « — count + 1. (b) G < — G U n,-; Add to Q. 6. While Q ^ 0 do 60 (a) Remove the first node n,- from Q. (b) For each fanin node nj of n, do i. If (lj = 0) then lj < — /* new label assigned for nj */ ii. If (lj ^ li) and (nj ^ C ) then /* nj is a candidate */ {lj * — count-, count * — count + 1; C * — C U n j). iii. If all fanout nodes of nj are traversed, then add nj to Q. Procedure check (S) 1. For each node n,- € S do (a) Remove the segmentation cell placed on n,-. (b) Determine the maximum cone size k for the circuit. If (k < r) then S « — S — {n,}; else place the cell back on n,-. 2. For each node n, € S do (a) Remove the segmentation cell placed on t h . (b) Determine the maximum cone size k for the circuit. If (k < r) then S' < — S'—{n,}; else do i. Determine the candidate n* with the highest heuristic measure, ii. If (n* — ni), then place the cell back on n,-; else do A. Place the cell on n*. B. If (k > r) then remove the cell from n* and place back on n,-. 3.6.2.1 Cone Size R eduction Procedure C S R progressively modifies bad nodes into good nodes. The procedure terminates when the bad set becomes empty. During each iteration, the candidate (n*) with the highest heuristic measure is selected for placing a segmentation cell. Due to the greedy nature of the procedure, cells placed during earlier iterations may 61 become unnecessary towards the end of the iteration process. As the final step, the cells are examined and unnecessary ones are removed from the circuit. The main step is determining the heuristic measure for the candidates. The time complexity for determining the articulation values of a candidate with respect to the nodes in its fanout cone is O ( N ) , where N is the number of nodes in the circuit. Repeating for all the candidates leads to a complexity of 0 ( N 2). The steps are repeated until the maximum cone size of the circuit gets reduced to a value less than or equal to the cone size limit. Thus the complexity of the procedure is given by 0 ( N 3), where N is the number of nodes in the circuit. 3.6.2.2 Candidates Procedure candidate(C) determines the candidates among the good nodes G. The good nodes G' that directly feed the nodes in the envelope form the initial set of candidate nodes. The labeling scheme described in the procedure is based on the following principle. Node rij in the fanin cone of a candidate node n, becomes a candidate if rij (but not n<) lies in the fanin cone of another candidate node n*. The nodes in G' are assigned unique labels. Nodes are traversed in breadth first manner starting from G' towards the input nodes. Node rij that directly feeds node n,- and yet to be assigned a label is assigned the same label as that of n,-. If node rij is revisited from another node with a different label, then rij forms a candidate node. The complexity of the procedure is linear in terms of the number of signals in the circuit. 3.6.2.3 Segm entation Cells Procedure check(S) examines the necessity of segmentation cells placed in the circuit. Every cell placed in the circuit is checked for its necessity by temporarily removing it from the circuit. Sometimes removal of a cell from n,- may result with better placement of the cell on another node n*. The complexity of the procedure is 0 ( N 2) where N is the number of nodes in the circuit. 62 3.6.3 E xp erim en tal R esu lts The procedures described above have been implemented and experiments were con ducted on several circuits including the ISC AS combinational benckmark circuits [7]. The results on the benchmark circuits are compared with the results reported in [16]. Ckt (n,m,k) HW1 HW2 HI H2 H3 c432 (36,7,36) 20* 21 20* 20* 20* c499 (41,32,41) 9 9 8* 8* 8* c880 (60,26,45) 14 14 10* 10* 11 cl355 (41,32,41) 9 9 8* 8* 8* cl908 (33,25,33) 18 17 14* 14* 15 c2670 (233,140,122) 37 29* 29* 29* 29* c3540 (50,22,50) 63 68 58* 66 61 c5315 (178,123,67) 42 46 37* 37* 38 c6288 (32,31,32) 65* 70 67 79 70 c7552 (207,108,194) 79 85 79 76* 83 Table 3.1: Results on Benchmark Circuits (r = 20) Table 3.1 presents the experimental results on the benchmark circuits for the desired cone size limit (r) of 20 inputs. Column two describes the characteristics of the benchmark circuits. For example, circuit c432 has 36 inputs and seven outputs with a maximum dependency of 36 inputs. The subsequent columns denote the number of segmentation cells required for partitioning the circuits to satisfy the cone size limit. HW1 and HW2 are the heuristics proposed in [16]. H i, H2 and H3 are the heuristics described earlier. For example, circuit c432 requires 20, 21, 20, 20 and 20 segmentation cells using the heuristics HW1, HW2, HI, H2 and H3, respectively. Entries with asterisks indicate the best results obtained by the five procedures. Heuristics HW1 and HW2 are basically hill-climbing procedures with backtrack ing involved in the search process. Heuristic HW1 contains an exponential com plexity procedure to determine the optimal number of cells for restricting the de pendencies of envelope nodes. The procedure is repeated till all the output cones have acceptable dependencies. The technique tends to neglect the global effects on 63 placing the individual cells. Heuristic HW2 considers the global effects but cells are allowed to be placed only on certain nodes in the fanin cones of envelope nodes. w 50 8 o 40 b m E E 3 C 29% 30 1 3 t 18% 20 c 12% c 0 ■ 5 3 1 11% 11% 8% 4% C6288 C432 C499 C880 C1355 C1908 c2670 C3540 C5315 C7552 ISCAS B enchm ark Circuits Figure 3.17: Comparision of best results (r = 20) Our heuristics consider all the bad nodes simultaneously. We consider all the candidate nodes as opposed to the subset of candidate nodes considered in HW1 and HW2. Our heuristic measures evaluate the placement of cells on candidate nodes with respect to all the bad nodes. This evaluation provides a global approach to the problem and results in good suboptimal solutions. Figure 3.17 shows a barchart comparing the best results produced by our heuristics with the best results produced by heuristics HW1 and HW2. Table 3.2 presents the experimental results on the benchmark circuits for the desired cone size limit (r) of 16 inputs. Table 3.3 presents the required number of segmentation cells for various circuits for different values of cone size limit. The experiments were performed on six cir cuits, viz. benchmark circuits c432, c499 and c880, a bit-sliced 16-bit adder, a 8-bit multiplier and an ALU. The value for the cone size limit (r) is varied between 40 and 12. The required number of segmentation cells (s) to achieve the desired cone sizes 64 Ckt (n,m,k) HW1 HW2 HI H2 H3 c432 (36,7,36) 27* 27* 27* 27* 27* c499 (41,32,41) 8* 8* 8* 8* 8* c880 (60,26,45) 19 16 11* 11* 11* cl355 (41,32,41) 8* 8* 8* 8* 8* cl908 (33,25,33) 21 22 18* 18* 18* c2670 (233,140,122) 45 33* 33* 33* 33* c3540 (50,22,50) 87* 90 91 93 87* c5315 (178,123,67) 52* 62 54 52* 54 c6288 (32,31,32) 93* 98 126 126 102 c7552 (207,108,194) 100* 117 108 104 102 Table 3.2: Results on Benchmark Circuits (r = 16) Ckt (n,m,k) r 40 36 32 28 24 20 16 12 c432 (36,7,36) s - - 9 13 16 20 27 35 k’ - - 32 28 24 20 16 12 c499 (41,32,41) s 2 6 6 7 7 8 8 16 k’ 39 32 32 22 22 14 14 10 c880 (60,26,45) s 1 2 4 9 9 11 11 19 k’ 37 36 32 24 24 20 16 12 addl6 (33,17,33) s - - 1 1 1 1 2 3 k’ - - 31 27 23 19 15 11 mult8 (18,16,18) s - 6 22 k’ 16 12 ALU (22,16,22) s - - - - - 5 17 34 k’ - - - - - 20 16 12 Table 3.3: Segmentation Cells Required for Various Cone Sizes are listed in the table. After partitioning with the segmentation cells, the resulting maximum dependency (k') for the partitioned circuits are also listed. The difference between the desired cone size (r) and the maximum dependency (k') obtained after partitioning with (s) cells is due to the nature of the circuit. For example, circuit c499 requires six segmentation cells for reducing the maximum dependency to 32. However, after placing another cell, the maximum dependency of the circuit reduces to 22. This drastic reduction is due to the presence of an appropriate articulation node in the circuit. The data in Table 3.3 is graphically depicted in Figure 3.18. The graph illustrates the variations in the required number of segmentation cells as per the variations in the desired cone size. Circuits c432, m ult8 and ALU have steep curves while the circuits c499, c880 and addl6 have curves with both steep and flat portions. The steep behavior is due to the drastic increase in the required number of segmentation cells as the desired cone size is reduced. We believe this behavior can be attributed to the following reasons. First, the scarcity of articulation nodes and/or poor articulation values of the nodes in a circuit. For this case, segmentation cells are placed on many reconvergent fanout paths in the circuit in order to reduce the dependencies on output cones having more than r inputs. Secondly, output cones having more than r inputs may be disjoint and hence may require separate cells to reduce their dependencies. The flat behavior illustrates that no additional segmentation cells are required for the reduction in the maximum cone size. Circuits with flat behavior have well placed articulation nodes with good articulation values. For example, addlG is a bit-sliced circuit, and forms a one-dimensional ILA structure. The circuit has no fanouts and contains only articulation nodes connecting the adjacent bit-slices that naturally form candidates. Hence the circuit exhibits an almost flat curve. Note that the curves are extrapolated as straight lines between the actual data points. This extrapolation need not necessarily reflect the true values between the data points. The circuits should ideally give rise to monotonically decreasing curves (step functions) if experimented with all possible values of r. However, anomalies of the greedy heuristic procedure may result in deviations from the expected behavior. 66 No. o f segmentation cells 40 35 30 25 20 15 10 5 0 0432 * ALU W \ \ \ * \ \ \ '■ mult8 * '• \ \ \ c880 V . \ \\ > C499 X ' \ \ \ \ ' i v - V - K v \ -------- " X ---------- X->. add16o- - - ..g. _ _ 1 k \ " C * ----------x. 10 15 20 25 30 35 40 C one size Figure 3.18: Segmentation cells required for various cone sizes 67 3.7 P artitioning Sequential C ircuits A sequential circuit can be hierarchically reorganized and partitioned into combina tional blocks and registers [13]. A sequential circuit is said to be balanced [14] if all the paths between any two combinational blocks encounter equal number of register delays. For example, consider a (24,8,24) sequential circuit shown in Figure 3.19(a). The combinational blocks and the registers are represented by squares and horizontal lines, respectively. Every pair of combinational blocks has at most one path between them except the pair C l and C 2 has two reconvergent paths between them. Both the paths encounter two register delays and hence the circuit is a balanced sequential circuit. We have extended our partitioning strategy for balanced sequential circuits. R1 R 2 R 3 R1 R 2 R3 R51 R 52 R5 C1 C 2 R4 R4 (■) <b) Figure 3.19: A balanced sequential (24,8,24) circuit (a) before partitioning and (b) after partitioning We shall assume that the individual combinational blocks have acceptable num ber of inputs and the segmentation cells are placed external to the blocks. In other words, only some of the existing registers are modified as segmentation registers to 68 satisfy the desired cone size limit. Whenever a register is selected for modification, the entire register is modified as a segmentation register. Let us partition the circuit for a desired cone size limit of 20 inputs. Register 725 is selected and modified as a segmentation register as shown in Figure 3.19(b). After partitioning, the original (24,8,24) circuit gets modified to a (28,12,20) cir cuit. Registers R51 and 7252 form the pseudo-output and pseudo-input registers, respectively. During the test mode, registers 721, 722, 723 and 7252 are configured as TPG and registers 724 and 7251 are configured as MISR. The partitioned circuit is pseudo-exhaustively tested in a single test session. Procedure C S R can be extended for partitioning balanced sequential circuits. A balanced sequential circuit can be modeled as a directed acyclic graph with the combinational blocks and the registers being represented as nodes and edges, respec tively. Each node n,- is associated with a weight W { equal to the width of the output register of the combinational block represented by n,-. With reference to the proce dure C S R , the heuristic measure hi of a candidate node n,- is normalized by dividing W {. The candidate node with the highest normalized heuristic measure is selected and the corresponding output register is modified as a segmentation register. 3.8 Sum m ary We have presented several strategies to partition logic circuits for pseudo-exhaustive testing. Various classes of circuits are considered in detail and efficient partitioning procedures are developed for these circuits. Two-level and multi-level fanout-free circuits can be partitioned with an optimal number of segmentation cells using our procedures of polynomial complexity. The characteristics of circuits with fanouts justify the need for a partitioning procedure of exponential complexity to obtain op timal solutions. For partitioning ILA structures, the necessary and sufficient number of cells are determined based on the regularity of the structures. Optimal solutions to small circuits with fanouts can be obtained by solving the ILP formulation. Good suboptimal solutions can be achieved for large circuits using our heuristic approach. The efficiency and quality of our heuristic approach are demonstrated on the ISCAS combinational benchmark circuits. The partitioning strategy is also extended for balanced sequential circuits. 69 C hapter 4 Partitioning for M axim al Test C oncurrency 4.1 Introduction Pseudo-exhaustive testing of an (n,m , k) circuit considers the circuit as a collection of m output cones and exhaustively tests each output cone of the circuit. The output cones can be exhaustively tested in two ways. One way is to apply k independent test signals (2f c patterns) per test session and have multiple sessions to test all the output cones. The output cones of the circuit are grouped into k-testable groups [17]. All the cones in a k-testable group are exhaustively tested simultaneously with k test signals. The upper bound on the number of test sessions is [m /2], since any two arbitrary cones can be tested simultaneously [27]. Another way is to test all the output cones of the circuit in a single test session. The number of test signals required is bounded below and above by k and n re spectively. Two inputs that do not fan out to any common output can share a test signal during the test mode. Circuits requiring only k test signals are referred to as m axim al te s t co n currency (MTC) circuits [27]. We shall restrict our attention to pseudo-exhaustive testing of circuits in a single test session. Generation of an optimal pseudo-exhaustive test set for an (n , m, k) non-MTC circuit is a hard problem. An optimal test set must contain an exhaustive set of patterns for each output cone of the circuit. In order to test all the output cones in a single test session, the circuit requires w test signals, where k < w < n. However, for exhaustive testing of the output cones, all 2W patterns are not necessary. TPGs designed for non-MTC circuits usually do not generate optimal test sets [5, 27]. 70 On the contrary, an optimal pseudo-exhaustive test set can be easily generated for an (n , m, k) MTC circuit. A fc-bit counter can exhaustively test all output cones. Therefore we attem pt to modify non-MTC circuits during test mode to achieve maximal test concurrency. Since the circuit has the maximum cone size of k inputs, it may be possible to test the circuit with k independent test signals by modifying the circuit in the test mode. Partitioning for cone size reduction described in Chapter 3 partitions an (n, m, k) circuit with s segmentation cells for a desired cone limit r. The partitioning results in an (n 4- s,m + s,k') circuit (where k' < r < k). If the partitioning results in a MTC circuit, then the circuit can be tested with k' test signals in a single test session. Test patterns can be generated very easily without any complicated TPGs. However, the partitioning procedure does not guarantee that the modified circuit is an MTC circuit. Our goal is to further partition the circuit to achieve maximal test concurrency. This is referred to as p a rtitio n in g for m axim al te s t co n cu rren cy [36]. This demands more modification to the original circuit in terms of additional segmentation cells. In this chapter we investigate the feasibility of transforming non-MTC circuits to be MTC circuits. There exists a trade off between area overhead due to segmentation cells and test time due to multiple test sessions as illustrated by Example 4. E x am p le 4 Consider the (3,3,2) non-MTC circuit shown in Figure 4.1(a). The inputs and outputs are denoted as {$i, 02) $3} and {Oi, 0 2 , 0 3 } respectively. Ex haustive testing of all three output cones in a single test session requires three independent test signals. The three test signals consists of all 23 patterns as shown in Figure 4.1(a). For the circuit, an optimal test set consists of just 22 patterns and is given by {000, Oil, 101, 110}. Allowing multiple test sessions with 22 patterns per session, the circuit requires two test sessions as shown in Figure 4.1(b). In the first session, 0\ and O3 are exhaustively tested and 6 2 and O 3 share a test signal. In the next session, 6\ and 62 share a test signal and O 2 and O 3 are tested exhaustively. The circuit can be modified to achieve maximal test concurrency by placing a segmentation cell. The modified (4,4,2) MTC circuit in the test mode is shown in Figure 4.1(c). Input O 4 and output O4 are the newly formed pseudo-input and 71 11110000 0, •—* 11001100 e2 •- 10101010 03 •- 1100 01 *— t 1010 02 • - °4 1010 64 1100 03 • - (a) (C) 3 - 0 , j - o 2 J - o , > ' j - 0 3 1100 0i • — t 1010 02 • " 1010 03 •- > ' r* 02 j - ° 3 session 1 1100 0 , • — t 1110 02 •- 1010 03 •- ~\ session 2 (b) Figure 4.1: Pseudo-exhaustive testing of a non-MTC circuit (a) in a single test session (b) in multiple test sessions (c) modified to an MTC circuit. 72 pseudo-output respectively. Inputs 0i and 03, and 02 and 04 share test signals. The modified circuit can be tested with an optimal test set as shown in the figure. □ 4.2 D efinitions and N otation We shall present a few definitions and our notation prior to describing our partition ing procedure for achieving maximal test concurrency. The circuit is modeled as a directed acyclic graph as described in Chapter 3. Each output cone of the circuit forms a subgraph and overlapping cones make the corresponding subgraphs share nodes and edges. Let us consider an (n, m, k) circuit along with the following notation. The n inputs are denoted as 0,-, i = 1 ,2 ,...,n , and the m outputs are denoted as Oj, j = 1 ,2 ,..., m, respectively. The input dependencies of the outputs are given by a d ep en d en cy m a trix (D -m atrix ) denoted as DmXn. The matrix element ditj = 1 if Oi depends on O j', dij = 0 otherwise. The w eight of in p u t Oj, denoted as Wj, is given by YZLi ditj. The weight wj denotes the number of outputs dependent on Oj. The input dependency of 0,- is given by d ,j. Since any output depends on at most k inputs, Y%=i di,j < k, \/i = 1 ,2 ,..., m. The relationship among the inputs are given by a relational m atrix (R- m atrix) denoted as Rnxn- The R-matrix is a symmetric matrix consisting of binary vector elements r,j = f e 2 • • • bm where bx = 1 if Ox depends on both inputs 0, - and Oj; bx = 0 otherwise (V cc = 1 ,2 ,...,m ) . The weight of relational vector r .j is given by bx and denotes the number of outputs that depend on both 0 ,- and Oj. Inputs 0, and Oj are said to be related if the weight of the relational vector r, j is greater than zero. We shall present a few definitions similar to those defined in Chapter 3. The set of all edges that feed an edge is called its fanin cone and the set of all edges that are fed by the edge is called its fanout cone. The articulation value of an edge e ,- with respect to an output O j , denoted as a ,j, is the number of inputs that feed Oj only via e,-. A bridge [10] is an edge that is contained in all paths originating from its fanin cone and ending in its fanout cone. An edge is a bridge for an I/O pair (0,, Oj) if the edge is contained in all paths originating from 0, and ending in Oj. The definitions are illustrated by Example 5. 73 Figure 4.2: Graph model of an (8,4,6) circuit. 01 02 03 04 05 06 07 08 Ox 1 1 1 1 1 1 0 0 0 2 0 1 1 1 1 1 0 0 Os 1 1 1 0 0 1 1 1 o 4 0 0 1 1 1 1 1 1 Table 4.1: Dependency matrix 0 1 02 03 04 05 06 07 08 01 - 1010 1010 1000 1000 1010 0010 0010 02 - 1110 1100 1100 1110 0010 0010 03 - 1101 1101 1111 0011 0011 04 - 1101 1101 0001 0001 05 - 1101 0001 0001 06 - 0011 0011 07 - 0011 Table 4.2: Relational matrix E xam ple 5 Let us consider the graph model of an (8,4,6) circuit shown in Fig 4.2. The D-matrix is given by Table 4.1. The weights of the inputs 0i through 08 are 2 ,3 ,4 ,3 ,3 ,4 ,2 and 2 respectively. The R-matrix is given by Table 4.2. Since it is a symmetric matrix only the upper diagonal elements are shown in the table. The edge e ,- shown in the figure forms a bridge for the subgraph formed by the output cone O4 . The articulation value at)4 = 4 since the four inputs {03, 6 4 , 05,06} feed O4 only via ej. □ Consider an output cone Oj driven by the inputs {0,-,, 0,2, . . . , 01 /t}. The weight o f Oj is given by where W {x is the weight of input 0tl. Among the output cones of maximum cone size k, a cone with the maximum weight is selected as the reference cone. The edges that do not belong to the reference cone are candidates for placing segmentation cells. 4.3 M erging Two inputs are said to be m erged if they share a test signal during the test mode. Unrelated inputs can be merged without affecting the exhaustive tests of any output cone in the circuit. On the contrary, merging of related inputs does not ensure exhaustive testing of the outputs fed by the related inputs. Merging of unrelated input pairs reduces the required number of independent test signals for the circuit. The number of test signals for an (n, m, k) circuit must be reduced from n to k to achieve maximal test concurrency. Hence (n — k ) input pairs have to be merged during the test mode. The results on the merging concept are summarized below. Lem m a 9 Inputs 0, and 0j related only by output 0 X can be merged if and only if there exists a bridge for the I/O pair (0 ,- (Oj), 0 X) that is not being driven by Oj (Oi). P ro o f : (If) Assume there exists a bridge for the I/O pair (0,- (Oj), Ox) that is not being driven by Oj (0,). Placing a segmentation cell on this bridge makes Ox depend on Oj {Oj) but not on Oi (Oj). The new pseudo-output depend on 0, (Oj) and not on Oj (Oi). Thus 0, and Oj become unrelated and hence can be merged. 75 (Only if) Assume there exists no bridge for the I/O pair (0,- ( O j ) , O x ) that is not being driven by Oj (Oi). Inputs Oi and Oj cannot be made unrelated and hence cannot be merged. □ T h eo rem 5 Inputs Oi and Oj related by r,t j = 6j62 • • • bm can be merged if and only if for every bx / 0 there exists a bridge for the I/O pair (Oi (Oj), Ox) that is not being driven by Oj (Oi). P ro o f : Repeatedly apply Lemma 9 for every output Ox dependent on both O i and Oj (i.e. for every bx / 0). □ T h eo rem 6 (B arzilai81) The problem of merging the optimal number of unrelated input pairs is NP-complete. The following observations can be made with respect to special circuits and help prune the solution space. O b serv atio n 4 A fanout-free circuit is a M TC circuit. A circuit without any fanouts can be trivially partitioned into disjoint subcircuits, where each subcircuit has only one output. Inputs that feed different outputs are unrelated and hence can be merged. Thus a fanout-free circuit is always a MTC circuit. O bserv atio n 5 A non-reconvergent fanout circuit contains only bridges. A non-reconvergent fanout circuit has at most only one path from any input to any output and thus all signals form bridges. O b serv atio n 6 The fanout-free signal of any input node and the fanin signal of any output node need not be considered for placing segmentation cells. Placing a segmentation cell on the fanout-free signal of an input (say Oi) makes 0{ unrelated with another input (say Oj). However, the new pseudo-input becomes related to Oj. Placing a cell on the fanin of any output node does not make any of the original inputs unrelated. Hence it is not necessary to consider the fanout-free signal of any input node and the fanin signal of any output node for placing segmentation cells. 76 4.4 H euristic Procedure A related input pair can be merged only if they are made unrelated. Hence a related input pair needs at least one segmentation cell to make them unrelated. Placing a segmentation cell on any bridge makes a few inputs unrelated and hence these inputs can be merged. The outline of the partitioning procedure for modifying an (n, m, k) non-MTC circuit to a MTC circuit is given below. 1. Merge all unrelated input pairs. 2. Identify the reference cone and all candidates for placing segmentation cells. 3. Place a minimal number of cells to make required number (maximum of (n — k)) of unrelated input pairs. 4. Merge the resulting unrelated input pairs. The partitioned circuit being an MTC circuit can be exhaustively tested in a single test session with 2* patterns. Our partitioning method provides a systematic way of modifying a non-MTC circuit to a MTC circuit. We shall now present the detailed procedure for modifying an (n ,m ,k ) circuit to achieve maximal test concurrency. Procedure M TC Input: Graph of an (n ,m ,k ) circuit. Output: Partitioned MTC circuit. 1. Form D-matrix Dmxn = [dtij] and determine the weights of inputs. Form R-matrix /2nX n = [r^] and determine the zero relational vectors. 2. While there exists a zero relational vector r,-tj do (a) Merge inputs 0, - and Oj and update D-matrix and R-matrix. 3. q < — (n — k — p). (where p is the number of unrelated input pairs merged and q is the number of related inputs pairs yet to be merged). 77 4. Determine the reference cone and the candidates C for placing segmentation cells. 5. For each candidate e ,- € C, determine its heuristic measure h, • = J2o} ai,j (where the summation is over all the outputs in the circuit). 6. While (q > 0) do (a) If (C — 0), exit with failure. (b) Select the candidate e* with the highest heuristic measure and place a segmentation cell on e*. C < — C — {e*}. (c) Update D-matrix and R-matrix by including the pseudo-input and the pseudo-output. (d) Merge all unrelated input pairs and update D-matrix and R-matrix after every merge operation. (e) q <_ (q - p'). (where p' is the number of unrelated original input pairs merged). (f) If (q = 0), exit with success. The goal is to reduce the number of independent test signals from n to k. Since merging an unrelated input pair reduces the number of test signals by one, (n — k) unrelated input pairs have to be merged to achieve maximal test concurrency. Since merging the optimal number of unrelated input pairs is a hard problem [5], we shall adopt the following heuristic for merging operation. Whenever there are choices of inputs exists for merging with an input, the input with the maximum weight is selected. This selection facilitates in merging a maximal number of unre lated input pairs. Merging an unrelated input pair 0, and Oj collapses their columns into one in the D-matrix. The binary entries in the original columns are or-ed to create entries for the collapsed column. The R-matrix is also modified accordingly. Example 6 illustrates the collapsing concept. Among the output cones with the maximum cone size k , a cone with the maxi mum weight is selected as the reference cone. This cone is being driven by a set of k 78 inputs that fan out to most of the outputs in the circuit. Hence it will be relatively easier to make the remaining inputs unrelated compared to this set of k inputs. All the signals in the circuit that are not contained in the reference cone are candidates for placing segmentation cells. The heuristic measure /i, for the candidate e, is the cumulative sum of the ar ticulation values of e ,- with respect to all the outputs in the circuit. The heuristic measure results in high values for the bridges in the circuit. Bridges are preferred candidates because placing cells on the bridges usually result in many unrelated input pairs. Placing a segmentation cell on a candidate creates a new pseudo-input and pseudo-output. The number of rows and columns in the D-matrix and R-matrix are increased by one. The length of all relational vectors is also incremented by one. The computational complexity of determining the candidates is linear in terms of the number of signals in the circuit. The heuristic measure computed for all the candidates is of the order of 0 (E m ) where E is the number of signals and m is the number of outputs in the circuit. Every iteration of the while loop selects a candidate with the highest heuristic measure and updates the D-matrix and R-matrix. The entire circuit needs to be traversed for the update of the matrices. The complexity of the partitioning procedure is 0 ( E 2) where E is the number of signals in the circuit. E x am p le 6 We shall illustrate the partitioning procedure for the (8,4,6) non-MTC circuit graph shown in Figure 4.2. 1. The D-matrix and R-matrix are given in Tables 4.1 and 4.2 respectively. 2. The R-matrix does not contain any zero relational vector and hence the number of unrelated input pairs merged is zero (p = 0). 3. The number of related input pairs yet to be merged is q = n — k —p = 8— 6— 0 = 2. 4. Among the output cones of maximum cone size, Oi has the maximum weight and hence forms the reference cone. Since the circuit has no reconvergent fanouts, all the signals form bridges. 79 5. Candidate e, with the highest heuristic measure is selected for placing segmen tation cell. The candidate is shown in Figure 4.2. 6. The new pseudo-input is 0g and the pseudo-output is O5 . The modified D- matrix is given in Table 4.3. 7. Inputs Or, 0s and O q can now be merged with 04, 05 and 0q respectively. 8. The number of original input pairs merged is p' = 2. q = q — p' = 2 — 2 = 0. Hence exit with success. The test mode version of the circuit graph with the merged inputs is shown in Fig 4.3. □ O — o Figure 4.3: Graph model of the modified circuit. 80 01 02 03 04 05 06 07 0 0 09 Oi 1 1 1 1 1 1 0 0 0 0 2 0 1 1 1 1 1 0 0 0 0 3 1 1 1 0 0 1 1 1 0 0 4 0 0 0 0 0 0 1 1 1 Os 0 0 1 1 1 1 0 0 0 Table 4.3: Modified dependency matrix 4.5 Sum m ary In this chapter we have presented a novel method for partitioning non-MTC cir cuits to achieve maximal test concurrency during the test mode. The partitioning procedure is based on the graph-theoretical concept of bridges in the circuit. The partitioned circuits can be pseudo-exhaustively tested with the minimum number of independent test signals in a single test session. Pseudo-exhaustive test sets can be easily generated using maximal length LFSRs or counters. Circuits can be first par titioned for cone size reduction and can be further partitioned to achieve maximal test concurrency. A combined strategy can be investigated for partitioning circuits such that the two requirements — (a) the sizes of the output cones are restricted to some user-defined value, and (b) the circuit is maximal test concurrent in test mode — are met simultaneously rather than by a two-step procedure. 81 C hapter 5 Test P attern G eneration 5.1 Introduction Our goal is to design hardware efficient TPGs to generate minimal pseudo-exhaustive test sets for a given (n,m , k) circuit. Universal TPGs based on coding theory princi ples are not tailored for a given circuit as they do not utilize any information about the circuit output cone structures. The test sets generated by universal TPGs are often several orders of magnitudes larger than the minimum length test set for a given circuit. In this chapter we will describe novel circuit-specific TPGs that em ploy knowledge of the circuit output cone structures for generating minimal test sets. We will assume that the inputs of the (n, m, k) circuit are ordered and driven by an n-stage input register. During the test mode, the stages of the input register are configured as TPG stages. Since the successive stages of the input register are adjacent to each other, it is desirable to have the successive stages of the TPG also be adjacent to each other. This configuration can significantly reduce the routing overhead due to the interconnections among the TPG stages. LFSR/SRs [4] have successive stages adjacent to each other forming a shift reg ister and therefore incurs minimal hardware overhead. However LFSR/SRs usually generate non-optimal pseudo-exhaustive test sets. On the contrary, LFSR/XORs [3] usually generate optimal test sets but incur high overhead due to XOR network. We shall describe novel TPGs that incur low hardware overhead like LFSR/SRs and generate small test sets like LFSR/XORs. The theory involved with LFSR/SRs and LFSR/XORs are described below as they form the basis for our TPG designs. 82 5.2 L F S R /S R s and L F SR /X O R s Consider an (n , m, k) circuit with its inputs being driven by an n-stage input register. Let the inputs be denoted as Oi,0 2 ,. • • and the input register stages be denoted as 5 1 ,5 2 ,...,s n respectively. During the test mode, the input register is configured as a circuit-specific TPG. The (n, m, k) circuit can be exhaustively tested by configuring the n-stage input register as a maximal length LFSR with P(x) as its feedback polynomial. Running the LFSR through its period (and including the all-zero pattern) generates all pos sible n-bit patterns. The unique sequence of 2" binary values generated from an individual stage of the LFSR is referred to as a test signal. Stage 5 ,- of the input register correspond to the stage s,- (Vi = 1,2 ,...,n ) of the TPG. The test signal generated by stage 5 ,- is characterized by the residue r, of that stage. The residue r; is computed as (rp x x) mod P (x), where rp is the residue of the test signal feeding the input of stage s,-. The residue 7 q representing the test signal generated from stage si is considered as the reference with value one (i.e. 7q = 1). For the LFSR, since stage s,_i directly feeds stage 5;, the residue r,- is given by (r,_i x a:) mod P(x). The residues rq, r 2, . . . , r„ are 1, x , . . . , x"-1 represent the n independent test signals generated by the LFSR stages 51, 52, . . . ,5„, respectively. For the (n, m, k) circuit, an LFSR/SR can be constructed by trying all primitive polynomials of degrees k, k + 1 ,... ,uq (where uq < n) until a TPG of degree uq is found that generates an exhaustive set of patterns for each of the m output cones of the circuit. The first uq stages of the input register are configured as a maximal length LFSR with primitive polynomial Pi(x) of degree uq. The remaining (n — uq) consecutive stages are connected as a shift register (SR). The LFSR/SR structure is shown in Figure 5.1(a). The LFSR stages generate uq independent test signals represented by the residues 1, x ,. . . , x’ "1-1 respectively. The SR stages generate (n —uq) unique linear combinations of these uq independent test signals. Stage s,- of the input register correspond to stage 5 ,- (V * = 1 ,2 ,... ,n) of the TPG. For stage s,-, the residue r,- given by x*-1 mod P\{x) is a unique linear combination of the residues jq through rm . For example, if r,- equals x + 1, then r,- is a linear combination of the residues r < i and /q. Hence the test signal applied to the input 0, is a linear combination of the test signals applied to the inputs 92 and 6 1 . Residues rq through 83 r„ are fixed by the polynomial Pi(x). We shall refer to the LFSR/SR structure described above as single LFSR/SR. We shall assume that LF SR /SR always refers to single LF SR /SR structure. The TPG is represented as a {w\ ,n ) single LFSR/SR composed of n stages and consisting of an LFSR of degree w\. The following theorem provides the necessary and sufficient condition for exhaustive testing of the output cones of the circuit. T h e o re m 7 (B arzilai83) A n output cone that depends on the inputs 0tl, 0 ,-2, . . . , 6ik will be exhaustively tested if and only if the residues r tl, r,-2, . . . , r,-f c are linearly in dependent. An output cone dependent on the inputs 0,-,, 0 ,-2, . . . , 0{k will be exhaustively tested provided it is driven by a set of k linearly independent test signals. Since the residues rtl, r,-2, . . . , r,-t represent the test signals applied to the inputs of the output cone, these residues must be linearly independent for the corresponding test signals to be linearly independent. W 2 stages stages XOR LFSR (n,m,k) circuit Wi stages (n - w?) stages SR LFSR (n,m,k) circuit (a) (b) Figure 5.1: TPG Structures (a) LFSR/SR and (b) LFSR/XOR For the (n ,m ,k ) circuit, an LFSR/XOR [3] can be constructed as follows. Let W 2 (where k < w2 < n) be the required number of independent test signals for the LFSR/XOR structure as per the design procedure in [3]. A pre-determined set of w2 84 stages of the input register (not necessarily the first w2 stages) is configured as an LFSR with a primitive polynomial ^ 2(2 ) of degree w2. The LFSR stages generate w2 independent test signals represented by the residues 1 ,® ,..., x W 2 ~1 respectively. A pre-determined set of (n — w2) specific linear combinations of these w2 independent test signals are generated using an XOR network. These (n — w2) test signals are applied to the remaining (n — w2) inputs. The residues for these (n — w2) test signals are computed as the linear combinations of the residues 1, x , . . . , x W 2 ~1. The LFSR/XOR structure is shown in Figure 5.1(b). The TPG is represented as a (w2 ,n ) LFSR/XOR consisting of an LFSR of degree w2 and generating n specific test signals for the circuit inputs. LFSR/XORs incur high area overhead due to the XOR network but generate minimal pseudo-exhaustive test sets by utilizing the information about cone dependencies. Both LFSR/SR and LFSR/XOR generate independent test signals from their LFSR stages. For the LFSR/SR, the linear combinations for the remaining stages are fixed by the feedback polynomial. For the LFSR/XOR, any desired linear com bination of the test signals can be generated using XOR network and assigned for the remaining inputs. The flexibility of generating desired linear combinations in LFSR/XOR does not exist in LFSR/SR. Hence for some circuits, LFSR/SRs gener ate larger test sets than LFSR/XORs. However, LFSR/SRs incur low area overhead due to the avoidance of XOR network. 5.2.1 P ro p erties o f L F S R /S R s The following lemmas characterize a few important properties of single LFSR/SRs. These results are useful in reducing the complexity of the design procedure for de termining single LFSR/SRs. L em m a 10 (B arzilai83) The residues of any w consecutive stages of a (w , n) sin gle LF SR /SR are linearly independent. C o ro llary 1 For a (w ,n ) single LFSR/SR, the residues r,1+p, r;2+P, . . . , r lu)+p are linearly independent for any value of p if and only if the residues r,-,, r,2, . . . , n w are linearly independent. 85 P ro o f : The residue r tj+p equals xv x rt> mod P (x), for 1 < j < w. The test signal represented by the residue r,--+p can be considered as the delayed version of the test signal represented by the residue by p clock cycles. The clock delay is common for all the test signals represented by the residues r :i, r,-2, . . . , r lu,. Hence the residues r,1+p, r,-2 +p, . . . , r^+p can be considered as the delayed versions of the residues r tl, r,-2, . . . , rIu ,. Therefore one set of residues are linearly independent if and only if the other set of residues are linearly independent. □ L em m a 11 For a (w, 2W — 1) single LFSR/SR, different primitive polynomials of degree w generate different permutations of (2W — 1) distinct residues for the 2W ~1 individual stages. P ro o f : Let the single LFSR/SR be based on a primitive polynomial P\(x) of degree w and let its (2™ — 1) stages be denoted as sj, s2, . . . , s2 « > _ i respectively. The residue r,- for stage s;, given by r,- = x'~l mod Pi{x), is a polynomial of degree less than w. Since Pi(x) is primitive, x 2 W is the smallest power of x such that x 2 W mod P\{x) = x. In other words, the residues repeat after (2*" — 1) stages. Hence the stages of the LFSR/SR generate (2W — 1) distinct residues. The residue of any individual stage represents a unique linear combination of the w independent test signals. LFSR/SR based on another primitive polynomial P2{x) of degree w also generates (2W — 1) distinct residues for the stages. For stage sw, the residue rw for P2(^) is different from that for P\{x) since -P 2(x) ^ Pi(x). Hence the residues generated for the stages using P2 {x) is a permutation of the residues generated using P\{x). Thus different primitive polynomials generate different permutations of the residues. □ A (w ,n) single LFSR/SR designed for an (n ,m ,k ) circuit ensures that the residue sets for all the m output cones are linearly independent. The lemmas men tioned earlier help in reducing the number of residue sets that need to be consid ered for linear independence. Lemma 10 guarantees the linear independence of the residue set for an output cone driven by k or less consecutive inputs. A residue set {r,-,, r,2, . . . , rIu)} (where i\ < < ... < iw) can be normalized to the residue set {ri,r,-2_,1+i , . . . , r lu )_,1+i}. The normalization of the residue sets reduces the number of unique residue sets that need to be considered for linear independence. Ensuring the linear independence of all normalized residue sets ensures the linear indepen dence of all the residue sets for the m output cones of the circuit as per Corollary 1. 86 For a degree w > k, each primitive polynomial needs to be considered since each polynomial generates a unique permutation of the residues as given by Lemma 11. The residue assignment for the circuit inputs is fixed by the feedback polynomial P (x). Some of the circuit output cones may not be exhaustively tested due to the linear dependencies arising from the fixed residue assignment. Example 7 illustrates the application of the lemmas characterizing the proper ties of LFSR/SRs. The example also illustrates the construction of LFSR/SR and LFSR/XOR structures. 0| Ot -----► (01.02.Oj) -----► <0,.0j.04) 03 °3 -----► (02. 03. 05) 04 05 -----► (02.04.06) 0S Og -----► (03.05.06) Figure 5.2: An (6,5,3) circuit E x am p le 7 Consider the (6 ,5,3) circuit shown in Figure 5.2. The six inputs are denoted by G \ through 0$ and the five outputs are denoted by 0 \ through O5 re spectively. The input dependencies for the five outputs are given by {0i , 02, 03}, {<M3,04}, {02 , 03, 05}, {02,04,06} and {03,05,06} respectively. Let the inputs be driven by an input register whose stages are denoted by through s6 respectively. A single LFSR/SR for the circuit can be determined as follows. Output Oi is driven by three consecutive inputs and hence it is guaranteed to be exhaustively tested (Lemma 10). The residue sets for 0 2 and O5 can be normalized to the set { n ,r3, r 4}- Hence the residue sets for 0 2, O3 and 0 4 only need to be considered for linear independence (Corollary 1). It is not necessary to consider the residue sets for 0 \ and 0$. The two primitive polynomials P\(x) : (x3 + x + 1) and P2(®) : (x 3 + x 1 + 1) of degree three are considered for the single LFSR/SR design as they generate different permutation of residues (Lemma 11). 87 Stages Residues Inputs — s SR n etw o rk 1 \ s 2 —» S 3 - l - • s 4 i i * s 5 “ * s$ 1 X X2 X3 x+ x2+ 1 X e. 02 03 04 05 06 (a) Stages ( + > -------------- S-j -♦* S2 —► S3 - 1 XOR n etw o rk Residues -----► 1 X X2 x+ 1 1 x2+ X Inputs -----► 0j 02 03 04 05 06 (b) Figure 5.3: TPG Structures (a) LFSR/SR and (b) LFSR/XOR Consider a (3,6) LFSR/SR based on the primitive polynomial .Pi(x) : (a7 3+ar+ 1). Inputs Q \ through 06 are driven by six unique test signals represented by the residues l,x ,x 2,x + l,x 2 + x and x2 + ® + 1 respectively. The residue set for O3 given by { x ,x 2,x 2 + a;} is linearly dependent. Hence the single LFSR/SR based on P\(x) cannot test O3 exhaustively. Consider a (3,6) LFSR/SR based on the primitive polynomial F^®) • (x 3 + x2 + 1). Inputs Q \ through 06 are driven by six unique test signals represented by the residues 1, a:, x 2, x 2 + 1, x 2 + x + 1 and x + 1 respectively. The residue set for O2 given by { l,x 2,x 2 + 1} is linearly dependent. Hence the single LFSR/SR based on P2 (x) cannot test O2 and O5 exhaustively. Since Pi(x) and P2(^) are the only two primitive polynomials of degree three, it is not possible to design a TPG for this circuit based on any (3,6) single LFSR/SR. Consider a (4,6) LFSR/SR based on the primitive polynomial /^(x) : (x4-j-x+l). Inputs 6\ through 06 are driven by six unique test signals represented by the residues l ,x ,x 2, x3, x -f 1 and x2 + x respectively. The residue sets for O2 , O3 and O4 given by { l,x 2,x 3}, {x,x2,x + 1} and {x,x3,x 2 + x} are linearly independent. Hence the single LFSR/SR based on /^(a:) exhaustively tests all the five outputs of the circuit. The single LFSR/SR structure is shown in Figure 5.3(a). Consider a (3,6) LFSR/XOR based on the primitive polynomial P\(x) : (z3 + x + 1). Inputs 02 and 03 are driven by three unique test signals represented by the residues l,x and x 2 respectively. Inputs O4, 05 and 06 are driven by three specific linear combinations given by the residues x + 1 ,1 and x 2 + x respectively. The residue sets for the outputs 0 \ through O5 given by { l,x ,x 2}, { I,x 2,x + 1}, {x,x 2,l} , {x,x + l ,x 2 + x} and {x2, l , x 2 + x} are linearly independent. Hence the LFSR/XOR based on Pi(x) exhaustively tests all the five outputs of the circuit. The LFSR/XOR structure is shown in Figure 5.3(b). Note that the LFSR/SR and LFSR/XOR in Figure 5.3 are based on primitive polynomials of degree four and three respectively. Hence the LFSR/SR generates a test set consisting of sixteen patterns while the LFSR/XOR generates an optimal test set consisting of only eight patterns. However, the LFSR/XOR does not utilize all the stages of the input register and incurs hardware overhead due to XOR network. □ 89 5.2.2 O perations on L F S R /S R s Design operations such as reconfiguration of feedbacks, permutation of stages and sharing of test signals can be applied on single LFSR/SRs to obtain reconfigurable LFSR/SRs, permuted LFSR/SRs and sharing LFSR/SRs respectively. These op erations have the potential to avoid linear dependencies in the residue sets of the output cones but result in increased hardware overhead. We shall next describe TPG structures that result from these operations on single LFSR/SRs. 5.2 .2.1 R econfigurable L F S R /S R s A reconfigurable LFSR/SR [39] has the capability to reconfigure its feedback taps to realize different primitive polynomials for different test sessions. For a single LFSR/SR, a single feedback polynomial is selected such that the residue sets for all cones are linearly independent. For a reconfigurable LFSR/SR, a minimal set of feedback polynomials is selected such that the residue set for each cone is linearly independent for at least one of the polynomials. The feedback taps are reconfigured using multiplexers to realize a different primitive polynomial during each test session. A subset of output cones is exhaustively tested during each session and each cone will be exhaustively tested in at least one of the sessions. Reconfiguration hardware overhead can be minimized by judiciously selecting polynomials that have common feedback taps. E x am p le 8 For the (6,5,3) circuit shown in Figure 5.2, a single LFSR/SR based on P\{x) : x3 + x + 1 exhaustively tests the outputs 0\, O2, 0 4 and O 5. Similarly, a single LFSR/SR based on P?(x) : x3 + x 2 + 1 exhaustively tests the outputs Oi, O3 and O4. A reconfigurable LFSR/SR based on the polynomials Pi(x) and Pzix) is shown in Figure 5.4. Polynomials Pi(x) and P2(^) are realized during the first and the second test session respectively. This arrangement ensures that all the outputs are exhaustively tested at least once. The TPG generates a test set with 2 x 23 = 16 patterns. □ 90 Stages Inputs 0i 02 03 04 0. 0. Figure 5.4: Reconfigurable LFSR/SR 5.2.2.2 P e rm u ted L F S R /S R s A permuted LFSR/SR [22] is essentially a single LFSR/SR whose stages form a permution of stages of the input register. In other words, stage s,- of the TPG need not necessarily drive the input # ,• of the circuit. The inputs are permuted such that the residue sets for all cones are linearly independent. A single LFSR/SR that could not be used because of the fixed assignment of residues may lead to an acceptable solution after reassigning the residues. The permutation of the inputs can result in hardware overhead due to routing among the TPG stages. E x am p le 9 A permuted LFSR/SR based on the primitive polynomial P\(x) : (x3 + x + 1) can be determined for the (6,5,3) circuit in Figure 5.2 as follows. Residues are assigned to the inputs such that the residue sets for all outputs are linearly independent. The residue assignment for the inputs and the ordering of the stages for the (3,6 ) permuted LFSR/SR is shown in Figure 5.5. Stages Si through S4 drive inputs 9 \ through O4 respectively. Stages s5 and S6 drive inputs 6 & and 65 respectively. The outputs are exhaustively tested with 23 = 8 patterns. □ 91 Stages Residues x+ 1 x2+ x2+ x + 1 Inputs 01 09 0^ 0d 0fi 0^ Figure 5.5: Permuted LFSR/SRs 5.2 .2 .3 S h arin g L F S R /S R s A sharing LFSR/SR [43] allows sharing of test signals among different inputs. In other words, a residue can be assigned to more than one input. Unrelated inputs are assigned the same residues and identical test signals are applied to them. E x am p le 1 0 For the (6 ,5,3) circuit shown in Figure 5.2, a possible residue assign ment for the inputs (allowing sharing of residues) is shown in the Figure 5.6(a). Inputs 0\ and 0$ are unrelated and hence share the residue r\. The single LFSR/SR shown in Figure 5.6(a) has only five stages and is modified to a (3,6) sharing LFSR/SR shown in Figure 5.6(b). Stages si and S5 generate identical test sig nals. The (3,6) sharing LFSR/SR can exhaustively test all outputs with 23 = 8 patterns. □ The operations discussed above enhance the capabilities of single LFSR/SRs only by a limited extent. In the next section we shall describe a new TPG design, called convolved LF SR /SR , which is a powerful extension to single LFSR/SR. For simplicity of presentation, we shall discuss convolved LFSR/SRs without reconfiguration of feedbacks, permutation of stages and sharing of test signals. Nevertheless convolved LFSR/SRs can also be constructed allowing these design operations. 92 Stages — ► s 1 - * * S2 —► s 3 - * - * s 4 —► S5 T T T T T Residues -----► 1 X C M X X+ x 2+ 1 X Inputs ---- ► 0 2 0 3 0 4 0 6 0 5 (a) Residues — ► 1 X X2 X + 1 1 + C M X X Inputs — ► 0 i 0 2 03 04 05 0 6 (b) Figure 5.6: Sharing LFSR/SRs 5.3 Convolved L F S R /S R s A convolved LFSR/SR is derived from a single LFSR/SR design. Circuit inputs are sequentially assigned residues generated by the successive stages of a single LFSR/SR. During the assignment process, it may not be possible to assign a residue to an input due to linear dependencies for some output. Stages whose residues give rise to the problem of linear dependencies are skipped as shown in Figure 5.7(a). This single LFSR/SR ensures linear independence for all outputs but has more stages than the input register. The extra stages required by the single LFSR/SR can be avoided by using XOR gates as shown in Figure 5.7(b). The resulting structure is referred to as convolved LFSR/SR. extra stages Residues — v r, r2 r„ rwt1 r,., r( r^j., rU j Inputs -*• 0i 02 a* 0 w + i eM 6i j _______________ | eit1 skipped residues (a) stag* S|t Stages Inputs e, 0 2 * • © « -| V • • • - * Q O • • • • / 0M 0 1 s t a g e * 0 1 + 1 shift register segment (b) \ feed forward stage Figure 5.7: Convolved LFSR/SR: (a) Residue assignment for inputs; (b) TPG stages. A (w ,n ) convolved LFSR/SR for an (n,m , k) circuit can be designed as follows. Residues generated by a single LFSR/SR of degree w are considered for assignment to circuit inputs. The inputs are assigned residues one at a time avoiding linear 94 dependencies with already assigned residues. Let the inputs 6\ through 0, be as signed the residues r\ through r,- respectively. Stages si through s,- of the convolved LFSR/SR are identical to the single LFSR/SR. Assume that input 0,+i cannot be assigned any of the residues r,+i, r,+2, , r,+j_ j because all of them are linearly de pendent on already assigned residues for some output. Residue r,+j is then selected for assignment to the input 0,+i as shown in Figure 5.7(a). The single LFSR/SR requires {j — 1) extra stages (shown as shaded stages in the figure) whose residues are not assigned to inputs. These extra stages are avoided in the convolved LFSR/SR design. Stage s,+i of the convolved LFSR/SR is made to generate the residue r,+ J- by feeding the residue at its input. Let the residue r,+j_i be a linear com bination of the residues r X l, r,-2, . . . r,f c . Stages s,-,, s,-2, . . . , S{k generate the residues r,-,, r,-2, . . . , r{k respectively. These residues are combined using XOR gates and fed at the input of stage s,+i as shown in Figure 5.7(b). The stage s,+i is referred to as feed forward stage and the maximal set of contiguous stages (stages si through st) is referred to as shift register segment. The assignment process is continued until all the inputs are assigned residues such that the residue sets for all output cones are linearly independent. The stages between the feed forward stages form shift register segments. The area overhead for the convolved LFSR/SR design is given by the number and size of XOR gates used to realize the individual feed forward stages. E x am p le 11 Let us design a (3,6) convolved LFSR/SR for the example circuit in Figure 5.2. Consider the residues from a single LFSR/SR based on Pi(x) : (x3 + x + 1). Residues r 4 through r4 are assigned to inputs 0\ through 04 without any linear dependence problem. Residue rs cannot be assigned to input 05 since the residue set {r2,r 3, r 5} = {a:, a:2, a:2 + x} for output 0 3 is linearly dependent. Skipping residue rs, inputs 05 and 6q are assigned residues r6 and r-r respectively as shown in Figure 5.8(a). This assignment ensures linear independence of the residue sets for all five outputs. The extra stage required by the single LFSR/SR can be avoided using a two-input XOR gate for the convolved LFSR/SR. Stage s$ of the convolved LFSR/SR can generate the residue r& by feeding rs at its input. Residue rs is obtained by combining the residues r2 and r3. Hence the linear combination of the outputs of stages s2 and s3 is fed as input to stage s5 as shown in Figure 5.8(b). The convolved LFSR/SR can exhaustively test all outputs with 23 = 8 patterns. □ 95 S ta g e s R esid u es In p u ts — ► 2 x2* x2+ x x + x+ 01 02 03 04 1 0 5 0, S ta g e s (a) s 2 S 3 S 4 s 5 s 6 T T T T T T R e sid u es — ► 1 X X 2 X+ 1 x2+ X + 1 x2+ 1 In p u ts — ► 0 i 02 03 04 05 06 (b) Figure 5.8: Convolved LFSR/SR: (a) Residue assignment for inputs; (b) TPG stages. 96 T h eo rem 8 A (w, n) convolved LF SR/SR exists for generating pseudo-exhaustive tests for an (n, m, k) circuit if and only if there exists a (w, n ) LFSR/XO R for the circuit. P ro o f : (If) Consider any (w, n) LFSR/XOR designed for the (n, m, k) circuit. The LFSR/XOR can be transformed to an equivalent (w ,n) convolved LFSR/SR as follows. The residues for the test signals generated by the LFSR/XOR stages are determined from the XOR network. These residues can be generated from the corre sponding stages of the convolved LFSR/SR by making some stages as feed forward stages if necessary. In constructing the convolved LFSR/SR, design operations such as sharing of residues and permutation of inputs may be required. (Only if) It is sufficient to show that any convolved LFSR/SR can be transformed to an equivalent LFSR/XOR. The residues of the convolved LFSR/SR stages are determined from the TPG structure. The XOR network for the LFSR/XOR can be designed such that the corresponding stages of the LFSR/XOR generate the assigned residues. □ Convolved LFSR/SRs have great potential to generate minimal test sets. They bridge the gap between LFSR/SRs and LFSR/XORs. A trivial convolved LFSR/SR is one having no XOR gates and is simply an LFSR/SR. On the other extreme, any stage of a convolved LFSR/SR can be made as a feed forward stage to generate any desired residue using XOR gates similar to LFSR/XOR. Typically convolved LFSR/SRs achieve low test lengths like LFSR/XORs and utilize low area overhead like LFSR/SRs. The linear independence for the outputs are assured by adding XOR gates to stages whenever necessary. However, most of the stages form shift register segments thereby avoiding high area overhead. 5.4 M ultiple L F SR /S R s A multiple LFSR/SR forms a special case of convolved LFSR/SR. It is composed of two or more independent single LFSR/SRs that are run in parallel. The single LFSR/SRs have identical feedback polynomials but may have different shift register lengths and initial seeds. 97 •xtra atages • • • 4 --- ©a— - | • • • ■ « — W * — I I | ^ 0 * 0 * • • H Z f a k Z b * • •-O-*®-*'* • • Residua* r, r, rw rw t1 rn 1 rnU 1 rH r, r^ j.. Input* -*• e, 6j ew ew ,, e„,_|_____________ | anU, en sk ip p e d residues (a) j • • • 4 © * 1 |----------- • • • 4 © * 1 stages -*• I q * • • • H z ftlH H IK • • *Q*- • • • • • —* □ Inputs -► e, ew ew t1 en , en U 1 e „ (b) Figure 5.9: Multiple LFSR/SR: (a) Residue assignment for inputs; (b) TPG stages. A multiple LFSR/SR is derived from a single LFSR/SR. A multiple LFSR/SR for an (n,m,fc) circuit can be determined as follows. Consider the residue assignment for the inputs as shown in Figure 5.9(a) ensuring linear independence for all output cones. Let both n\ and «2 be greater than or equal to w and the sum of n\ and «2 be n. Residues rj through r ni are assigned to inputs 6\ through O ni respectively. Residues r„1+i through r j ^ are skipped and not assigned to any input. Residues rj through rJ+n2_i are assigned to inputs 0ni+i through 0n respectively. The single LFSR/SR requires (j — ni — 1) extra stages (shown as shaded stages in the figure) whose residues are not assigned to inputs. These extra stages are avoided in the multiple LFSR/SR design. Stages Si through sni of the input register are modified to an (w, ni) single LFSR/SR. Stages 5„1+j through s„ of the input register are modified to an (w ,n 2) single LFSR/SR. A (w ,n ) multiple LFSR/SR composed of (u>,ni) and (w,ri2) single LFSR/SRs is shown in Figure 5.9(b). Both single LFSR/SRs have identical LFSRs of degree w. A residue assigned to an input can be expressed as a linear combination of the residues r l5 r 2, ..., rw. The residues rq, r2, ..., rw are generated by the LFSR stages of the first single LFSR/SR. Let an initial seed S be applied to the LFSR portion 98 of the first single LFSR/SR. From this seed, the initial contents for the rest of the stages of the multiple LFSR/SR can be determined. For example, if an input 0, is assigned a residue which is a linear combination of the residues r,-,, r,-2, ..., r tJ , (where «x, i2, ... ,ik < w), then the initial content of stage s,- is a linear combination of the initial contents of the stages stl, S{ 2,..., s,-f c . This initialization ensures that stage s, of the multiple LFSR/SR generates the residue assigned to input 0 ,-. E x am p le 1 2 A multiple LFSR/SR for the (6,5,3) circuit shown in Figure 5.2 can be designed as follows. The residue assignment for the inputs shown in Figure 5.10(a) satisfy the linear independence of residue sets for all outputs. We can construct a multiple LFSR/SR composed of two (3,3) single LFSR/SRs as shown in Fig ure 5.10(b). Both single LFSR/SRs are based on the same primitive polynomial x 3 + x + 1. The stages of the first LFSR/SR generates residues ri, r 2 and r 3. Let an initial seed Si = 100 be applied to the first LFSR/SR. The initial seed S 2 for the second LFSR/SR is determined such that its stages generate the residues rs, re and rr. Input 6 4 is assigned the residue rs which is the linear combination of the residues r 2 and r 3. Hence the initial content of stage 54 is zero which is a linear combination of the initial contents of the stages s 2 and s3. Thus the initial seed of the second LFSR/SR is computed as 52 = Oil. The stages of the multiple LFSR/SR generate the assigned residues shown in Figure 5.10(a). This multiple LFSR/SR generates 23 = 8 patterns to exhaustively test all the five outputs. □ The following lemma characterizes the relation between multiple LFSR/SRs and convolved LFSR/SRs. T h eo rem 9 A (w ,n ) multiple LF SR /SR exists for generating pseudo-exhaustive tests for an (n, m, k) circuit if and only if there exists a (w, n ) convolved LF SR /SR for the circuit where the length o f each shift register segment is at least w. P ro o f : (If) For the (w ,n ) convolved LFSR/SR, since the shift register segments are of length at least w, each one of them can be modified as an independent single LFSR/SR with the same feedback polynomial as that of the convolved LFSR/SR. The initial seeds for the single LFSR/SRs are the same as the initial seeds of the shift register segments of the convolved LFSR/SR. 99 S ta g e s R esidues x x2 x+ x2+ x2+ x2+ x + 1 1 Inputs 0i 0i 0i 04 05 0g (a) S ta g e s — ► R esid u es — ► 1 X X2 x2+ X x2+ x + 1 X2+ 1 Inputs — ► e2 03 04 05 e6 S eed s — ► 1 0 0 0 1 1 (b) Figure 5.10: Multiple LFSR/SR: (a) Residue assignment for inputs; (b) TPG stages. 100 (Only if) It is sufficient to show that any multiple LFSR/SR can be transformed to an equivalent convolved LFSR/SR. The independent single LFSR/SRs of the (w ,n ) multiple LFSR/SR can be modified to form shift register segments of a (ie, n) con volved LFSR/SR. The initial seed of the multiple LFSR/SR determines the residues generated by the individual stages. For the convolved LFSR/SR, the feed forward stages are made to generate their assigned residues by using XOR networks. The remaining stages of the convolved LFSR/SR automatically generates their respective assigned residues. The resulting convolved LFSR/SR has each shift register segment having at least w stages. □ Multiple LFSR/SRs utilize XOR network for realizing the feedback polynomials of the individual single LFSR/SRs. On the contrary, convolved LFSR/SRs utilize XOR network for realizing the feed forward stages. The hardware overhead due to XOR network is the deciding factor for selecting either a multiple LFSR/SR or a convolved LFSR/SR. It is usually hard to construct a multiple LFSR/SR that generates an identical size test set as that of a convolved LFSR/SR due to the restriction on the length of the shift register segments. 5.5 D esign Procedure For an (n ,m ,k ) circuit, a (w ,n ) convolved LFSR/SR can be designed as follows. A primitive polynomial P{x) of degree w is selected as the feedback polynomial. The polynomial can generate (2W — 1) unique residues from rj through r 2u> _ i and all of them can be considered for assignment to the inputs. The number of possible residues is exponential with the degree of the polynomial. Practically only a few residues, say r\ through r# , are considered for input assignment. This practical consideration limits the convolved LFSR/SR designs. The shift register segments are constrained to have a desired minimum length, say /. This constraint attempts to reduce the number of feed forward stages and hence the area overhead due to XOR network. A (w, n) multiple LFSR/SR can be obtained by restricting I > w and a (w,n ) single LFSR/SR can be obtained by restricting I = n. 101 Procedure T PG Input: Input dependencies for all the outputs of (n, m, k ) circuit; A primitive polynomial P (x) of degree w > k \ (I) :: minimum length requirement for the shift register segments; (N ) :: maximum number of residues considered for assignment. O utput: Residue assignment for the inputs; Initial seeds for the shift register segments; XOR network for the feedforward stages. 1. residues (); /* determines residues for the inputs such that the sets of residues for all outputs are linearly independent. */ 2. seeds (); /* determines the seeds for the TPG stages such that the inputs are driven by the test signals that represent the corresponding assigned residues. */ 3. network (); /* determines the XOR network to implement the linear combinations of the test signals for the feedforward stages. */ Procedure residues () 1. Assign residues ri through rw to inputs 0\ through 0W . 2. i <- (w + 1); j <- (w + 1). /* residue rj is assigned to input 0t - */ 3. While not all inputs are assigned residues do (a) If (i = w) and (j = u; + 1) exit with failure. (b) If (N — j ) < (n — i) /* not enough residues left for remaining inputs */ then {z <- i'\ j <- ( /+ 1 ) } (where i' = i — 1 and r,/ is the residue assigned to input 0,/). 102 (c) Assign residue rj to input 0j. (d) If the assigned residues are linearly independent for all outputs, then i < — (i + 1). (e) j (j + 1). (f) If the last shift register segment length < / /* requirements not met */ then {i <- i"\ j < — (j" + 1)} (where i" is the first input in the last shift register segment and rj» is the residue assigned to input 0,//) 4. Print the residue assignment for the inputs. Procedure seeds () 1. For every input 0 ,- do (a) Determine the residue ry assigned to input 0,. (b) If contains the term r x (= 1) initialize stage s,- to 1, else initialize stage st - to 0. 2. Print the initial values of all stages. Procedure network () 1. For every feed forward stage s, do (a) Determine the residue /y that should appear at the input of stage Si. (b) R « — 0. /* R contains the collection of residues to realize ry */ (c) While (ry ^ 0) do i. Determine the residue rj> from a stage j ( < i) which differs from the residue r,< in minimum number of terms. ii. R * — R U ry : r,< < — r,» 0 ry. (d) Combine the residues in R using two-input XOR gates. 103 5.5.1 D eterm in ation o f R esidu es The iterative search procedure progressively assigns residues to the inputs. The linear independence among the residues for all outputs is checked after every as signment. Backtracking occurs whenever there are not enough residues left for the unassigned inputs or the shift register segments lengths become less than I. The pro cedure reports after all inputs are assigned residues. Alternate primitive polynomials must be tried if a convolved LFSR/SR cannot be determined with P (x). Convolved LFSR/SR will generate the minimum test set only if the degree of the selected poly nomial equals k. Note that if the TPG designed by this procedure requires test length greater than 2f c , then operations such as reconfiguration, permutation and sharing may be used to obtain TPG with lower test lengths. 5.5.2 D eterm in ation o f Seeds The first w stages generate independent signals and these stages are initialized with the seed 100... 0. All other stages generate linear combinations of these signals as given by their assigned residues. For stage s,-, if the assigned residue is a linear combination of the residues r,-,, r,-2, . . . , r l ( k (where z ’i, i2, ■ ■ . ,ik < w), then the ini tial content of stage s,- is a linear combination of the initial contents of the stages s,j, s,-2, . . . , Sik. Among the stages Si, s2, . . . , 3 W , only stage s j has the initial content of non-zero value. Hence stage s,- is initialized to non-zero value only if its assigned residue contains the term ri = 1. The initial seed determined by this method ensures that the TPG stages generate their respective assigned residues. 5.5.3 D eterm in ation o f X O R network The feedback and the feedforward stages need XOR gates to realize their assigned residues. The XOR network required by the feedback stage is specified by its feed back taps. The XOR network for a feed forward stage s, is determined as follows. From the residue assigned to the input 0,-, the residue (say r,-») of the test signal that needs to be fed at the input of stage s,- is determined. This residue r,< is obtained as the linear sum of a minimal number of residues from the earlier stages. The 104 procedure network gives an upper bound on the number of two-input XOR gates to realize the feed forward stages. Multiple LFSR/SRs can be realized if the shift register segments are of length at least equal to the degree of the feedback polynomial P(x). In that case, each shift register segment is transformed into an independent single LFSR/SR with P (x) as the feedback polynomial. 5.6 E xperim ental R esults We have designed various pseudo-exhaustive TPGs for the ISCAS combinational benchmark circuits [7]. The circuits were partitioned using our partitioning proce dure [34] such that the output cones are restricted to have twenty or less inputs. The test time for the TPGs are tabulated in Table 5.1. The first three columns in the table describe the characteristics of the benchmark circuits before and after the application of the partitioning procedure. Three circuit specific TPGs — convolved LFSR/SRs, multiple LFSR/SRs and single LFSR/SRs — were designed for the par titioned circuits. The last three columns in the table denote the pseudo-exhaustive test time for these TPG designs. (n,n i,k) Test Length Ckt Before Partitioning After Partitioning Convolved LFSR/SR Multiple LFSR/SR Simple LFSR/SR c432 (36,7,36) (56,27,20) 2 %r- 22 0 2iu c499 (41,32,41) (49,40,14) 21 4 21 4 21 5 c880 (60,26,45) (70,36,17) 21 7 21 7 21 7 cl355 (41,32,41) (49,40,14) 21 4 21 4 21 5 cl908 (33,25,33) (47,39,20) 22° 22 0 2 21 c2670 (233,140,120) (262,169,20) 2 20 22 0 22 0 c3540 (50,22,50) (108,80,20) 2 2 0 22 0 22 2 c5315 (178,123,67) (215,160,20) 2 20 22 0 22 2 c6288 (32,31,32) (99,98,20) 220 22 0 --- c7552 (207,108,194) (286,187,20) 22 0 22 0 22 2 Table 5.1: Test Length for partitioned benchmark circuits 105 The TPGs effectively utilize the information about the circuit output cone de pendencies. For each circuit, only two primitive polynomials were considered while designing convolved LFSR/SRs and multiple LFSR/SRs and up to 100 primitive polynomials were considered while designing single LFSR/SRs. Both convolved LFSR/SRs and multiple LFSR/SRs generate minimum test sets for all the parti tioned circuits. For example, the partitioned c3540 circuit has a maximum depen dency of 20 inputs and hence any pseudo-exhaustive test set must contain at least 22 0 patterns. Both convolved LFSR/SR and multiple LFSR/SR generate minimum pseudo-exhaustive test lengths. On the other hand, a single LFSR/SR generates a test set with 22 2 patterns. LFSR/XORs can be designed by transforming the convolved LFSR/SR designs (refer to Theorem 8). LFSR/XORs will also generate minimum test sets for all the partitioned circuits. However, too many XOR gates may be needed for realizing the LFSR/XOR structures. Single LFSR/SRs were determined by considering only 100 primitive polynomials of each degree because of practical limitations. Due to this restriction, there may exist single LFSR/SRs for the partitioned circuits with smaller degrees than the tabulated designs. For the partitioned c6288 circuit, no single LFSR/SR design was found in spite of trying 100 primitive polynomials of degrees from 20 to 40. The circuit contains a few outputs driven by 19 consecutive inputs and one non- consecutive input. Single LFSR/SRs failed to generate exhaustive test sets for these cones. On the other hand, both convolved LFSR/SRs and multiple LFSR/SRs generated test sets due to their flexibility in assigning residues. Another set of convolved LFSR/SR designs for the partitioned benchmark circuits can be found in [35]. Table 5.2 presents the details about the residue assignment for the convolved LFSR/SR designs for the partitioned benchmark circuits. The first two columns provide the characteristics of the partitioned circuits. The exponent terms for the feedback polynomial, the residue assignment for the stages and the length of shift register segments are given by the third, fourth and fifth columns respectively. For example, for the partitioned c432 circuit, convolved LFSR/SR design is based on the feedback polynomial P (x) : x2 0 -f x6 + x4 -f x + 1. The residue assignment for all the 56 input register stages are given by the fourth column. Stages si through 106 Ckt (n,m,k) Polynomial Residue Assignment SR Segments c432 (56,27,20) 20 6 4 1 0 1-36 49-68 36 20 c499 (49,40,14) 14 5 3 1 0 1-16 53-68 151-167 16 16 17 c880 (70,36,17) 17 3 0 1-24 118-146 194-210 24 29 17 cl355 (49,40,14) 14 5 3 1 0 1-16 53-68 151-167 16 16 17 cl908 (47,39,20) 20 6 4 1 0 1-20 56-82 20 27 c2670 (262,169,20) 20 3 0 1-100 103-244 260-279 100 142 20 c3540 (108,80,20) 20 3 0 1-20 129-152 157-176 20 24 20 210-232 379-399 23 21 c5315 (215,160,20) 20 3 0 1-110 121-142 156-177 110 22 22 208-227 577-597 683-702 20 21 20 c6288 (99,98,20) 20 6 4 1 0 1-20 209-229 272-292 20 21 21 350-386 37 c7552 (286,187,20) 20 3 0 1-49 51-114 133-190 196-229 49 64 58 34 246-267 290-310 456-493 22 21 38 Table 5.2: Convolved LFSR/SR designs for partitioned benchmark circuits S36 are assigned residues r\ through r 36 respectively. Stage s37 forms a feed forward stage and stages s37 through s 56 are assigned residues r 49 through r 68. The residue r,- is given by a:'-1 mod P(x). The lengths of the two shift register segments are 36 and 20 respectively. All convolved LFSR/SR designs have very few shift register segments and each segment is of length at least equal to the degree of the LFSR. The residue assignment enables to realize multiple LFSR/SRs for the circuits. Table 5.3 presents the details about the single LFSR/SR designs for the parti tioned benchmark circuits. The feedback polynomials and the residue assignments are given by the third and fourth columns respectively. The hardware overhead involved in these designs is given by the number of 2-input XOR gates required to realize the LFSR structure. For example, for the partitioned c432 circuit, seven XOR gates are needed to realize the LFSR of degree 20. The comparison of test lengths and hardware overhead among the TPG designs are given in Table 5.4. Columns 3, 5 and 7 presents the pseudo-exhaustive test lengths for the TPG designs. The hardware overhead in terms of 2-input XOR gates required for realizing the TPG designs are given in columns 4, 6 and 8 respectively. For example, the number of XOR gates utilized in TPG designs for the partitioned 107 Ckt (n,m,k) Polynomial Residue Assignment c432 (56,27,20) 20 11 7 5 4 3 2 1 0 1-56 c499 (49,40,14) 15 9 8 6 5 3 0 1-49 c880 (70,36,17) 17 10 8 6 5 3 2 1 0 1-70 cl355 (49,40,14) 15 9 8 6 5 3 0 1-49 cl908 (47,39,20) 21 8 6 3 2 1 0 1-47 c2670 (262,169,20) 20 9 5 4 0 1-262 c3540 (108,80,20) 22 11 10 7 4 2 0 1-108 c5315 (215,160,20) 22 9 7 5 3 2 0 1-215 c6288 (99,98,20) ---- — c7552 (286,187,20) 22 9 8 6 5 3 0 1-286 Table 5.3: Single LFSR/SR designs for partitioned benchmark circuits Ckt (n,m,k) Convolved Multiple Simple TL XOR TL XOR TL XOR c432 (56,27,20) 2'2 0 6 22 0 6 22 0 7 c499 (49,40,14) 2u 10 21 4 9 21 5 5 c880 (70,36,17) 21 7 7 21 7 3 21 7 7 cl355 (49,40,14) 21 4 10 21 4 9 21 5 5 cl908 (47, 39,20) 22 0 10 22 0 6 22 1 5 c2670 (262,169,20) 22 0 3 22 0 3 22 0 3 c3540 (108,80,20) 22 0 11 22 0 5 22 2 5 c5315 (215,160,20) 22 0 7 22 0 6 22 2 5 c6288 (99,98,20) 22 0 18 22 0 12 - - c7552 (286,187,20) 22 0 11 22 0 7 22 2 5 Table 5.4: Comparison among circuit specific TPG designs 108 c432 circuit is computed as follows. The convolved LFSR/SR residue assignment for c432 comprises of two shift register segments with stage 36 forming a feed forward stage. This stage can be realized as the linear sum of the test signals from the output of stages 7, 9, 15 and 19 using three XOR gates. The LFSR requires three XOR gates and hence the convolved LFSR/SR requires six XOR gates. For this circuit, a multiple LFSR/SR can be realized by transforming the two shift register segments into two independent single LFSR/SRs. Since each LFSR requires three XOR gates, the multiple LFSR/SR design also requires six XOR gates. On the other hand, a single LFSR/SR design requires seven XOR gates for this circuit. Both convolved LFSR/SRs and multiple LFSR/SRs generate minimum test sets for all the partitioned benchmark circuits. However, multiple LFSR/SR designs usually incur less hardware overhead than convolved LFSR/SRs. It should be noted that only an upper bound on the number of XOR gates required for convolved LFSR/SRs is given in the table. This is due to the fact that the procedure network uses a suboptimal heuristic to determine the XOR network for the feed forward stages. Single LFSR/SRs generate minimum test sets only for three circuits c432, c880 and c2670. For these circuits, multiple LFSR/SRs utilize fewer XOR gates than single LFSR/SRs. For the remaining circuits, single LFSR/SRs generate up to four times the size of the minimum test sets. For these circuits, multiple LFSR/SRs require less than twice the number of XOR gates utilized by single LFSR/SRs. Figure 5.11 highlights the characteristics of various TPG designs based on the empirical observations. Among the four TPG designs, single LFSR/SRs generate the largest test sets but incur the least hardware overhead. On the other extreme, LFSR/XORs generate smallest test sets but utilize maximum hardware. Convolved LFSR/SRs generate small test sets like LFSR/XORs but utilize less hardware. The arrows indicate the possible transformations from one TPG design to another. Any convolved LFSR/design can be transformed to an LFSR/XOR design and vice versa. Similarly, any multiple LFSR/SR design can be transformed to a convolved LFSR/SR design. However, a convolved LFSR/SR design can be transformed to a multiple LFSR/SR design provided the shift register segments are of lengths greater than or equal to the degree of the LFSR (refer to Theorem 9). Similarly, a multi ple LFSR/SR design can be transformed to a single LFSR/SR design provided the length of the shift register segment equals the number of inputs to the circuit. 109 Hardware Overhead LFSR / XORs Convolved LFSR / SRs Multiple LFSR I SRs Simple LFSR / SRs Test Set Size * if s h if t r e g is t e r s e g m e n t le n g t h s 2 d e g r e e o f L F S R * * if s h if t r e g is t e r s e g m e n t le n g t h = n u m b e r o f in p u t s Figure 5.11: Characteristics of various TPGs 110 5.7 Sum m ary In this chapter we have described hardware efficient TPG designs to generate min imal pseudo-exhaustive test sets. These TPGs are tailored for a given circuit and utilize the information about the circuit output cone dependencies. Convolved LFSR/SRs have great potential to generate minimum test sets as demonstrated by the experiments on the combinational benchmark circuits. These structures can be used to derive other test pattern generators such as LFSR/XORs and multiple LFSR/SRs. The flexibility of manipulating residues with XOR network that was absent in single LFSR/SRs but present in convolved LFSR/SRs makes convolved LFSR/SRs powerful test pattern generators. Ill C hapter 6 B ounds on Test Lengths 6.1 Introduction Circuit-specific TPGs such as LFSR/XORs [3] and LFSR/SRs [4] can be designed to generate pseudo-exhaustive test sets for a given circuit. In this chapter we shall derive tight upper bounds on the sizes of test sets generated by these TPG structures. We shall first derive a few important algebraic results that are used in the derivation of the bounds on pseudo-exhaustive test lengths. We shall then determine a few generic bounds on test lengths that are independent of the structural information about the circuit output cones. Circuit-specific bounds are later derived by utilizing the structural information about the circuit output cones. 6.2 Algebraic R esults We shall present a few algebraic definitions that are used in the bound computations. Our definitions of Abelian group and vector space are simplified due to the fact that the group is defined over modulo-2 addition (denoted as -f) and the space is defined over the Galois field GF(2). D efinition 2 (Abelian Group and Vector Space) • A non-empty set S is an A belian group if S is closed under modulo-2 addi tion. • A non-empty set S is a (vector) space over GF(2) if S is an Abelian group under modulo- 2 addition. 112 • The (linear) span of a non-empty set B, denoted as L(B ), is the set of all linear combinations (modulo-2 addition) of elements in B . • Any subset B of a vector space S is a basis of S if B consists of linearly independent elements and L(B ) = S. • The dim ension of a vector space S spanned by basis B equals \B\. The modulo-2 addition operation satisfies the group properties such as commu tativity, associativity and existence of additive inverses. Hence it suffices to check only the closure property to validate an Abelian group. GF(2) forms a field with respect to modulo-2 addition and modulo-2 multiplication operations and satisfies all the axioms defined for a vector space. Hence it suffices to check only the closure property to validate a vector space. A ^-dimensional space S spanned by a basis B consists of 2k elements. The space S forms an Abelian group under modulo-2 addition operation. We shall next define operations between subspaces. D efin itio n 3 (Operations between Vector Subspaces) • A non-empty subset S\ of a vector space S over GF(2) is a subspace if Si is an Abelian group under modulo-2 addition. • The direct sum of two subspaces Si and Si, denoted as Si © Si, equals the set {sj -{-Si | Sj £ Si, Si £ 6 2}. • The set union of two subspaces Si and Si, denoted as Si U Si, equals the set {s | s £ Si or s 6 .S 2}. • The set intersection of two subspaces Si and Si, denoted as Si fl Si, equals the set {s | s £ Si and s £ S i}. Conventional algebraic theory deals with direct sum operation between vector subspaces. In contrast, our bound computations are based on set union and inter section operations between subspaces. 113 E x am p le 13 Consider the set S = {0,1, x, 1 + x, x2, 1 + x2, x + x2, 1 + x + a:2}. The set S is closed under modulo-2 addition and hence forms an Abelian group and a vector space. The set B = { l,x ,x 2} consists of linearly independent elements and L (B ) = S. Thus B forms a basis of S and the dimension of S equals \B\ = 3 . Consider two distinct subspaces Si = {0,1, x, 1 + x} and S 2 = {0, l , x 2, l + x 2} contained in S. The direct sum operation between Si and 52, Si © S 2 = {0, l,x , 1 + x , x2, 1 -f x2, x + x2, 1 + x + x2} = S. The set union operation between Si and S 2 is Si U S 2 = {0,1, x, 1 + x, x2, 1 + x2} ^ S, and set intersection operation between Si and S 2 is Si ft S 2 = {0,1}. □ We have derived a few algebraic results regarding set union and intersection operations between subspaces and these results differ from the classical results. The following results characterize a few properties of subspaces contained in a vector space. T h e o re m 10 Consider a k-dimensional space S and any two distinct subspaces Si and S 2 of dimensions ki and k2 contained in S. The set Si fl S 2 is a subspace contained in S and consists of at least f2A l+*2-<:] elements. P ro o f : Let S 3 = Si fl S2. Consider any two elements a and b such that a, 6 € S 3 . Since S 3 C Si and S 3 C S2, we have a, 6 € Si and a, 6 € S2. Since Si and S 2 are subspaces, we have that (a + b) 6 Si and (a + b) £ S 2 implies (a + 6) ( E S 3 . Thus S3 forms an Abelian group and is a subspace contained in S. Let x be the dimension of subspace S 3 . Let Si, S 2 and S 3 be spanned by the bases B i, B 2 and B j respectively. Since S 3 C Si and S 3 C S 2 , we can choose Bi and B 2 such that f?3 C Bi and B 3 C B 2. Since |£?i| = ki, \B2\ = k2 and |f?3| = x, we have \Bi U B 2\ = \Bi\ + \B2 \ - \ B iC ) B 2\ = l ^ l + l ^ l - l ^ l = ki + k2 - x Let S 4 be the (ki + k2 — x)-dimensional subspace spanned by the basis f?i U B 2. Since S< C S we have ki + k2 — x < k => x > ki + k2 — k 114 Hence Si fl S 2 is a subspace of dimension at least (k\ + k2 — k). Although the term (ki + k2 — k) could be negative, Si fl S 2 always contains the additive identity element (zero). Hence Si fl 52 is a subspace contained in S and has at least [2fcl+fc2-* :] elements. □ Corollary 2 Consider a k-dimensional space S and any two distinct (k—1)- dimen sional subspaces Si and S 2 contained in S. The set Si fl S 2 is a (k — 2 )-dimensional subspace contained in S. T heorem 11 Consider a k-dimensional space S and any three distinct (k — 1)- dimensional subspaces Si, S 2 and S3 contained in S. Let S 4 = S\C\S2. The subspace 5 3 satisfies the relation S iC S2 C S 3 = S if and only if Si fl S 2 l~ l S 3 = S 4 . P roof : Since Si and S 2 are distinct (k — l)-dimensional subspaces contained in S , 54 is a (k — 2)-dimensional subspace as per Corollary 2 . Consider an element a such that a 6 Si and a £ S4. Let T\ — {a + s | Vs G S 4 } . Then we have 7\ C Si and \Ti\ = |£ 41 = 2k~2. The set S4 n 7\ = 0 since a S4. The set S4 U 7\ contains 2k~l elements and hence S4 U 7\ = S i. Consider another element 6 such that b G S 2 and b £ S4. Let T2 = {6 + s | Vs 6 5 4 }. Then we have T2 C S 2 and \T2\ = IS4I = 2k~2. The set S4 fl T2 = 0 since b $ S4. The set S4 U T2 contains 2k~l elements and hence S4 U T2 = S2. The Venn diagram of these sets are shown in Figure 6.1. Thus we have 51 = S 4 U {a-f s | Vs € 1 S4} = S4 U Ti 5 2 = S 4 U {6 + s | Vs G < 94} = S 4 U T2 Si U S 2 — S 4 U {ci T s | Vs G S4 } U {6 -f- s | Vs G 5 4} = S 4 U T\ U T 2 Let T3 = {a + b + s | Vs G 5 4 }. Since a,b $ S4, we know that a £ S 2, b £ Si and a + b £ Si U S2. Therefore T3 D S\ = T3 D S 2 = 0. We know that T3 C S and IT3I = |5 4 | = 2k~2. The sets £4, Ti, T2 and T3 are disjoint to each other and the set tS * 4 U T\ U T2 U T3 contains 2k elements and hence S4 U Ti U T2 U J 3 = S. The elements of S are partitioned into four equal sized subsets S4, T\, T2 and T3 (the subsets are called cosets in algebraic terminology [18]) as shown in Figure 6.1. 115 Si Figure 6.1: A vector space and its subspaces and cosets 116 The set T ,- (i = 1,2,3) does not form a subspace and S4 C T(T,). If a subspace (say Sx) contains S 4 and an element from T,, then T, C S x . The subsets S 4 , 7\, T 2 and T3 are unique to any given two subspaces Si and S 2- (If):: Assume that Si D S 2 H S 3 = S 4 . The set S 3 f l T i = 0 since if S 3 f l 7i ^ 0 , then Ti C S 3 and S 3 = Si. Similarly the set S 3 f l T 2 = 0 since if S 3 f l T 2 7^ 0 , then T 2 C S 3 and S 3 = S 2 . Hence S 3 f l T3 ^ 0 . Since S4 C S 3 and T 3 f l S 3 7^ 0 , we have T 3 C S 3 and S 3 = S4 U T 3. Therefore Si U S 2 U S 3 = S 4 U T \ U T 2 U T 3 = S . (<9n/y 7/ 7 ; ; Assume that S i U S 2 U S 3 = S . This implies T 3 C S 3 . Since S 3 is a subspace, S 4 C L ( T 3) C S 3 . Therefore Si f l S 2 f l S 3 = S 4 . □ T h e o re m 12 A k-dimensional space is composed of at least (2* + 1) distinct sub spaces of dimensions less than or equal to (k — i), where 1 < i < (k — 1). P ro o f : We shall prove the theorem in two parts. First we will show that a k- dimensional space is composed of at least (2* + 1) distinct (k — 2)-dimensional sub spaces. Then we will generalize the dimensions of the (2* -f- 1) distinct subspaces to less than or equal to (k — i). Part I: Consider any two distinct (k — i)-dimensional subspaces Si and S2 contained in a k-dimensional space S. From Theorem 10, the two subspaces Si and S 2 must have a common subspace of dimension at least (k — 2i). Hence we have |Si| = |S2| = 2k~ * I Si n s2 | > 2f c _ 2 ‘ I S i U S a l = | S i | + | S 2 | - | S i n S a l < 2^ + 2k~{ - 2k~2i. Let S i , S 2 ,..., S x be x distinct (k — i)-dimensional subspaces such that U J = 1 S j = S . Each of these subspaces can have at most (2fc-‘ — 2*-2‘) elements unique to them. Hence we have |S| = U Si j = l = 2 k < 2 k~' + ( x - l) ( 2 * " ‘ ' - 2*-2') Multiplying throughout by 22 ’ k we get 22* .< 2{ + (a r — 1)(2< — 1) 117 X > 22 ' - 2 * 2* — 1 > 2* + 1. + 1 Thus a fc-dimensional space is composed of at least (2’+ l) distinct (k—i)-dimensional subspaces. Part II: Now we shall generalize the dimensions of the (2‘ + 1) distinct subspaces. Let Sq, S i, . . . , S£ be (2* + 1) distinct subspaces contained in a ^-dimensional space S with dimensions ko, &i, • • •, k 2> respectively. Let kj < (k — i) V j — 0,1,..., 2*. Let So, S \ , . . . , S 2i be (2‘ + 1) distinct (k — i)-dimensional subspaces contained in S such that S j Q S j V j = 0,1,..., 2*. From Part I we know that U Sj C 5. j = o =* U s ; C u Sj C 5. j = o j = o Thus a ^-dimensional space is composed of at least (2‘ + 1) distinct subspaces of dimensions less than or equal to (k — i). □ Theorem 10 gives a condition on the minimum overlap between any two subspaces contained in a A;-dimensional space. Theorem 11 states that the elements of a k- dimensional space S are not entirely covered by the elements of any two distinct (k — l)-dimensional subspaces contained in S. A unique third subspace of dimension (k — 1) is required to cover all the elements of S. Theorem 12 specifies the minimum number of distinct subspaces of smaller dimensions contained in a ^-dimensional space. In fact Theorem 11 provides an outline for constructing these minimum number of distinct subspaces. 6.3 Cone Independent Bounds For an (n, m, k) circuit, the computation of an upper bound on the pseudo-exhaustive test length involves determining the smallest number of independent test signals (say k* > k) that are sufficient for pseudo-exhaustive testing of the circuit. We shall derive a few important cone independent results on the bounds on test lengths. In 118 other words, these results do not utilize the information about the specific output cone dependencies of the circuit. A set of 2f c * distinct test signals can be obtained as linear combinations of k* independent test signals. The distinct test signals are considered as distinct residues. The k* independent test signals can be considered as a basis of a ^‘-dimensional space and the residues can be considered as elements of this space. The k* independent test signals can be generated using a k* degree LFSR and linear combinations of these test signals can be obtained by an XOR network. D efinition 4 A residue r is said to be a proper residue with respect to a set of residues R if r is linearly independent with respect to the residues in R. Residue r is said to be a prohibited residue with respect to R if r is a linear combination of a subset of residues in R. T heorem 13 (Barzilai83) An output cone will be exhaustively tested if and only if the inputs driving the output cone are assigned proper residues. For an (n ,m ,k ) circuit we need to assign proper residues to the circuit inputs from a ^‘-dimensional space (where k* > k) such that the residues assigned to the inputs driving any output cone are linearly independent. The bound computation involves ensuring the availability of proper residues (elements) to all circuit inputs from the ^‘-dimensional space. Let us consider an (n ,m ,k ) circuit along with the following notation. The n inputs are denoted as 0,, i = 1,2, ...,n, and the m outputs are denoted as Oj, j = 1,2, ...,m, respectively. The inputs are partitioned into m sets /j, /2, ... ,/m such that /* denotes the set of inputs that drive exactly i outputs in the circuit. During the residue assignment process, inputs in /,• are considered prior to inputs in 1,-1 • D efinition 5 Output Oi is said to dom inate output Oj if each input that drives Oj also drives O i . Lem m a 12 It is sufficient to consider dominating circuit outputs for determining pseudo-exhaustive test lengths. 119 P roof : Let an output O, dominate another output Oj in a circuit. Proper residue assignment to the set of inputs driving Oi ensures exhaustive testing for both output cones Oi and Oj. Hence there is no need to consider residue assignments separately for Oj. □ D efinition 6 A circuit is said to be reduced if none of its outputs is dominated by any other output. Lemma 12 states that dominated outputs are guaranteed to be exhaustively tested provided the dominating outputs are guaranteed to be exhaustively tested. Any given circuit can be reduced by ignoring all its dominated outputs. Henceforth we shall consider only reduced circuits. L em m a 13 For an (n, m, k ) circuit let k* ( > k) independent test signals be suffi cient to assign proper residues for all inputs in /,• for all i > 2k*~k+1. Then these test signals are also sufficient to assign proper residues for all inputs in Ij for all j < 2 k*~k+ x . P roof : Let S be the ^’-dimensional space generated by k* independent test signals. Assume that all inputs in 7j for all i > 2k ' ~ k+ 1 have been assigned proper residues from S. Let input 0 € / 2**-*+i drive output Oj. Assume that kj inputs driving Oj have been already assigned proper residues and the residue assignment for 0 is under consideration (Note that kj < (k — 1)). The residues assigned to kj inputs span a kj-dimensional subspace and none of the elements from this subspace can be assigned as a proper residue for 0. In other words, all the elements in this subspace are prohibited residues for 0. Since 0 drives exactly 2 k ' ~ k+ 1 outputs, there are at most 2 k * ~ k+ 1 distinct subspaces of dimensions less than or equal to (k — 1) whose elements are prohibited residues for 0 . Theorem 12 states that S is composed of at least (2A ’~fc + 1 + 1) distinct subspaces of dimensions less than or equal to (k — 1). Thus the total number of prohibited residues for 0 is less than 2 k’. Hence 0 can be assigned a proper residue from S. Since 0 is arbitrary, all inputs in I 2k*-k+i can be assigned proper residues from S. Similarly, it can be shown that the total number of prohibited residues is less than 2 k" for any input in Ij for all j < 2k ’ ~ k + 1 . Hence all inputs in Ij for all j < 2f c * -fc+1 can be assigned proper residues by k* test signals. □ 120 C o ro llary 3 For an (n, m, k) circuit, let k independent test signals be sufficient to assign proper residues for all inputs in Ii for all i > 2. Then these test signals are also sufficient to assign proper residues for all inputs in I 2 and I \ . D efinition 7 (M cC luskey 84) An (n ,m ,k ) circuit is said to be a m ax im al te s t c o n cu rren t (MTC) circuit, if it can be pseudo-exhaustively tested with k indepen dent test signals. T h e o re m 14 (M cC luskey 84) Any (n ,m ,k ) circuit with m < 3 is a M TC circuit. Any (n, m, k) circuit needs at least k test signals due to its maximum cone size. If m < 3, then the circuit inputs can only be partitioned into / 2 and Ii- Thus Corollary 3 directly leads to Theorem 14 inferred from [27]. L em m a 14 For an (n ,m ,k ) circuit with m < 6, let k independent test signals be sufficient to assign proper residues for all inputs in I 5 and I 4 . Then these test signals are also sufficient to assign proper residues for all inputs in I 3 . P r o o f: Let S be the ^-dimensional space spanned by the k independent test signals. Assume that all inputs in I 5 and I 4 have been assigned proper residues from S. We shall show that all inputs in I 3 can also be assigned proper residues from S for any (n ,5 ,k) circuit. The lemma follows for any (n ,m ,k ) circuit with m < 6. Let the five outputs of the circuit be denoted as 02, 0 3 , 0 4 and O 5 respec tively. Let us sequentially assign proper residues to inputs in I 3 and assume that input 0 6 I3 is under consideration for residue assignment. Let 0 drive outputs 0\, 02 and O3. Each of these three outputs can have at most (k — 1) inputs that are already assigned proper residues. Let ki, k2 and k3 be the number of inputs driv ing 01 , 02 and O3, respectively, that have already been assigned proper residues. W ithout loss of generality, assume ki < k2 < £3 < (k — 1). Let Si (i — 1,2,3) be the subspace spanned by the residues assigned to ki inputs driving 0{. The sub spaces Si, S 2 and S3 are of dimensions ki, k2 and k3 respectively. The elements in Si U S 2 U S3 are prohibited residues for 0. As per Theorem 10, we have |Si D S31 > 2*1+* 8-* [ S a n S 31 > 2k2+k3~h 121 Hence the total number of prohibited residues for 0 is given by |£i U S 2 U $31 < 2*1 + 2*2 + 2*3 — 2kl+ha~k — 2 k2+k3~k < 2k (6.1) The equality in Equation 6.1 is satisfied only for A i = k2 = k3 = (k — 1). That means any input 0 in I 3 can be assigned a proper residue from £, provided the values of ki, k2 and A 3 are not simultaneously equal to (k — 1). Since the circuit has only five outputs, there can be at most only one input in / 3 with kx = k2 = k3 = (k — 1). Let 0* be the unique input in / 3 satisfying the condition k\ = k2 = A 3 = (k — 1). Therefore all the inputs in / 3 except 0* can be assigned proper residues from £. Input 0* appears in Oi, 0 2 and 0 3 as shown below. 01 ::.......... 0i .............0* 0 2 :: 02 0 * 0 3 :: 03...............0* O 4 : : • • • 0 \ 0 2 0 3 • • • 0 3 :: • • ■ 0 \ 0 2 0 3 - ■ ■ Let T = Si U 52 U S3. Input 0* can be assigned a proper residue as long as T C S. Let T = S under a residue assignment for inputs in I 3 — {0*} so that 0* cannot be assigned a proper residue from S. We shall show that there exists another residue assignment for inputs in I 3 — {0*} such that T C S and 0* can also be assigned a proper residue from S. Let R i, R 2 and R 3 be the sets of (k — 1) residues assigned to the remaining (k — 1) inputs driving 0 1, 0 2 and 0 3 respectively. The (k — l)-dimensional subspaces Si, S 2 and S 3 are spanned by the sets R i, R 2 and R 3 respectively. Let S 4 = Si fl 1 S2 D S3. Since T = S by our assumption, ^4 is a (A — 2)-dimensional subspace as per Theorem 11. Since T = S, there exists residues ri, r2 and r3 unique to Ri, R 2 and R 3, respectively, such that r x ^ S 2 U S3, r 2 & Si U S 3 and r3 £ Si U S2. Following similar arguments given in the proof of Theorem 11, we can show that 51 = S4 U {rj + s | Vs € £ 4 } 5 2 = £4 U {r2 + s | Vs € £4} 122 S3 = s 4 U {r3 + s | Vs G S4] Let Tx = {n + 3 | Vs G £4 }, r 2 = {r2 + s | Vs € S4} and T3 = {r3 + s | Vs € S4}. Since T = S, Theorem 11 implies that the set {ri + r 2 + s | Vs € £4} must be equal to T3. In other words, r3 must be equal to (rx + r 2 + s*) where s* G £ 4. Let inputs 0X , 02 and 03 drive outputs 0 \ , 0 2 and O3 (as shown above) and be assigned the residues rx, r2 and r3 respectively. Since the residues r x, r2 and r3 are unique to R x, R 2 and R 3, the inputs 0X , 02 and 03 are also unique to 0 X , 0 2 and 0 3 respectively. Inputs 0\ , 02 and 03 cannot belong to / 4 or I 5 and hence must belong to / 3. This implies the last two outputs O4 and O5 must be driven by all three inputs 0 X , 0 2 and 6 3 as shown above. We shall show that the residue r3 = (rx + s*) instead of r 3 = (rx + r2 + > s* ) is still a proper residue for input 03. Input 03 drives 0 3, O4 and O 5 . Let us consider 0 3 and show that r' 3 can also be assigned as a proper residue for 0 3 instead of r3. Since r x 0 S3, we infer that r3 ^ S3. Since L ( R 3 — {r3}) C S3, we know that r3 $ L ( R 3 — {r3}). Hence r3 is linearly independent with the residues in ( R 3 — {r3}) and r3 instead of r 3 can be assigned as a proper residue for 0 3 as far as 0 3 is concerned. Next let us consider O4. Let R4 be the set of linearly independent residues assigned to the inputs driving O4. Since the inputs 0X , 02 and 03 appear together in O4, {rx,r 2,r 3} C jR 4. Since r3 ^ L ( R 4 — {r3}), r2 G L ( R 4 — {r3}) and r3 = (r3 + r2), we infer that r3 £ L ( R 4 — {r3}). Therefore r ' 3 instead of r 3 can be assigned as a proper residue for 03 as far as O4 is concerned. Similarly, it can be shown that r3 instead of r3 can be assigned as a proper residue for 0 3 as far as 0 5 is concerned. Thus we reassign r3 instead of r3 as a proper residue for 03. Let R 3 = R 3 — {r3} + {r^} and S3 = L(R 3). By the reassignment process R!3 instead of R 3 becomes the set of (k — 1) residues assigned to the remaining (k — 1) inputs driving 0 3. Since r 2 £ L(R 3) = S3, we know that r 2 ^ L(R 3 — {r 3}). Since r2 ^ L(R 3 — {r3}), r3 ^ L(R 3 — {r3}) and r3 = r2 + r3, we infer that r2 £ L (R 3) = S 3 and r3 $ L (R 3) = S3. Since r3 £ S\ U S2, we infer r3 $ Si U S 2 U S 3 and therefore 0* can be assigned r3 as a proper residue. Thus all inputs in I 3 can be assigned proper residues from S. □ T h e o re m 15 Any (n ,m ,k ) circuit with m < 6 is a M TC circuit. 123 P ro o f : Consider any (n ,m ,k ) circuit with m < 6. Since the maximum cone size of the circuit is k, it requires at least k independent test signals for pseudo- exhaustive testing. Let S be the ^-dimensional space spanned by the basis B = {1, x, x 2, ..., a:*-1} (representing k independent test signals). We only need to show that all inputs in I 5 and I 4 can be assigned proper residues from S. Inputs in I 3 are guaranteed of proper residues from S as per Lemma 14. Inputs in I 2 and 7i are guaranteed of proper residues from S as per Corollary 3. Case m = 4: Let |/4| = k4. Since each output is driven by all inputs in I 4 and the maximum cone size for the circuit is k , k4 < k. Hence all inputs in J4 can be assigned proper residues by selecting k4 elements {1, x, x 2, . . . , a:*4-1} of B . Hence the circuit is a MTC circuit. Case m = 5; Let |/5| = k5 and |/4| = k4. Since each output is driven by all inputs in I§ and the maximum cone size for the circuit is k, k$ < k. Inputs in 15 can be assigned proper residues by selecting k5 elements {1, a:, x2, ..., a:*5-1} of B . We shall consider inputs in / 4 and assign proper residues from the subspace spanned by the remaining (k — £5) elements {xf c 5,xfcs+1,... ,x fc_1} of B. Let the five outputs of the circuit be denoted as 0 \, 0 2 , 0 3 , 0 4 and O5 respec tively. Partition the inputs in I 4 into five subsets / 4 ]i , / 4 )2, . • • ,h ,s such that / 4i, - = {inputs that do not drive 0,- } (i = 1,2, ...,5). Let |/4,,| = k4 > i (i = 1,2, ...,5). W ithout loss of generality, assume that I 4>5 is the smallest subset among the five subsets. Select one input (say O i) from each / 4 ft- and form the input set I = {0 1 , 0 2 , 6 3 , 0 4 , 6 5 }. Note that only four inputs from I appear together in any out put as shown below. 0 \ :: 6 5 0 4 6 3 6 2 0 2 0 $ 0 4 6 3 0 4 0 3 : : ..........................# 5 0 4 62 0 \ 0 4 :: 65 63 02 Oi 0 5 :: 04 O3 62 O i The inputs in / completely occupy four columns in the cone dependencies. Consider a four dimensional subspace spanned by the four elements {a;**, xfcs+1, x ki+2, a;fcs+3} 124 of B. We shall assign the five residues {xks, x kb+ 1, x kb+2, x kb+3, x k b +a;fc 5 + 1 +a;fcs+ 2 + xf c s +3} to the five inputs in I. Since only any four inputs from / appear together in any output, this assignment ensures proper residues to all inputs in I. This process is repeated until all inputs are selected from I4i5. Thus 5fc4is inputs in / 4 are assigned proper residues from the subspace spanned by 4k4 y $ elements of B. The remaining (fc4 — 5fc4is) inputs in I 4 need to be assigned proper residues from the subspace spanned by the remaining (k — ks — 4& 4i5) elements in B . Since none of the remaining inputs in / 4 belong to I 4y 5 , all of them drive O5 . Also all £ 5 inputs in I 5 and Ak4t$ inputs in I4 drive O5 . Hence the total number of inputs driving 0 5 must be greater than or equal to (£5 + 4k4is + k4 — 5k4ts) = (k$ + k4 — k4ts). Since the maximum cone size for the circuit is k, we have k > ks + k4 — f c 4)5 which implies k — ks — 4 A ; 4i5 > k4 — 5k4 < 5 . Hence we have the number of remaining elements in B is greater than or equal to the number of remaining inputs in I 4 and we can assign each of the remaining elements in B to each of the remaining inputs in I4. Thus all inputs in Is and I 4 can be assigned proper residues from S. Hence the circuit is a MTC circuit. □ Theorem 15 states that any five output circuit is a MTC circuit. The result is independent of the number of inputs and the maximum cone sizes of the circuits. Our result is a significant improvement over the well known result that any two output circuit is a MTC circuit (Theorem 14). Example 14 illustrates a six output non-MTC circuit. In the example, note that even though all inputs drive exactly three outputs, the circuit is not a MTC circuit since it contains six outputs. The example illustrates the tightness of the results stated in Lemma 14 and Theorem 15. E x am p le 14 Consider an (4,6,2) circuit driven by inputs {Oi,O2 ,O3 , 0 4}. The cir cuit has the following input dependencies for the six output cones: {(#i, #2), ($i> #3)5 (0 2 , 0 3 ), (0 i , 0 4 ), (0 2 , 0 4 ), (0 3 , 0 4 )}• The circuit is not a MTC circuit and needs three independent test signals (say l,a:,x 2). Inputs 04 through 04 can be assigned residues l,x, 1 + x and x 2 respectively. □ 125 T h eo rem 16 For any (n,m,2) circuit, let k* > 2 be the smallest number satisfying at least one the following inequalities n < 2 k* m < (2 k'~ 1)(2 k’ — 1) Then k* independent test signals are sufficient for pseudo-exhaustive testing of the circuit. P ro o f : Since the maximum cone size is two for the circuit, every output is driven by at most two inputs. A pair of inputs that drive an output must be assigned two distinct residues for exhaustive testing of that output. Case n < 2 k’ : A set of k* independent test signals can generate (2f c * — 1) distinct non-zero residues. Since the number of inputs is at most (2** — 1), each input can be assigned a unique non-zero residue. This assignment ensures that any pair of inputs that drive an output are assigned two distinct residues. Thus k* test signals are sufficient for exhausting testing of all outputs. Case m < (2**-1)(2A * — 1): Let us partition the set of circuit inputs into p subsets I\, I 2, ..., Ip such that the following two conditions are satisfied. 1. No two inputs from the same subset fan out to any common output. 2. For every pair of subsets (say /,• and Ij), there exists an input in each subset (say O i G /,• and O j G Ij) such that these inputs (0{ and Oj) fan out to some common output. Note that a pair of subsets that do not satisfy the second condition can be further combined into a single set. The first condition allows us to assign the same residue to all inputs in a subset. The second condition mandates that each pair of subsets be assigned two distinct non-zero residues. Therefore assigning a unique residue to each subset ensures exhaustive testing of all outputs. In order to satisfy the second condition, there must exist at least one output for every pair of subsets. Hence to have a valid partition of p subsets, the circuit must have at least p(p — l)/2 outputs. For an (n,m,2) circuit to have a valid partition of 2 k* subsets, the circuit must have at least (2f c *)(2f c * — l)/2 = (2**-1)(2f c * — 1) 126 outputs. Since m < (2**-1)(2f c * — 1), the circuit inputs can only be partitioned into at most (2** — 1) subsets. A set of k* independent test signals can generate (2** — 1) unique non-zero residues that can be assigned to the subsets. Thus k* test signals are sufficient for pseudo-exhaustive testing of the circuit. □ C o ro llary 4 Any (n ,m , 2) circuit with either n < 4 or m < 6 is a M TC circuit. E x am p le 15 Consider again the (4,6,2) circuit given in Example 14. Theorem 16 states that three independent test signals are sufficient for pseudo-exhaustive testing of the circuit. In fact, three independent test signals are necessary for the circuit as evident by the residue assignment given in Example 14. □ Test Signals Inputs Outputs 2 3 5 3 7 27 4 15 119 5 31 495 k* 2**-l - 1) - 1 Table 6.1: Bounds on test lengths for (n,m , 2) circuit Table 6.1 shows the number of test signals sufficient for pseudo-exhaustive testing of any (n,m, 2) circuit. Column 1 indicates the number of test signals. Columns 2 and 3 indicate upper bounds on the number of inputs and outputs for an (n, m, 2) circuit that can be pseudo-exhaustively tested with the number of test signals given in the first column. An (n,m,2) circuit with at most seven inputs or with at most 27 outputs can be pseudo-exhaustively tested with three test signals. For example, an (8,16,2) circuit requires four (three) test signals as far as the inputs (outputs) are concerned. Hence three test signals are sufficient for pseudo-exhaustive testing of any (8,16,2) circuit. T h eo rem 17 For any (n ,m ,k ) circuit, let k*( > k) be the smallest number satisfy ing the following inequality m < 2h’~k+1 (6.2) 127 Then k* independent test signals are sufficient for pseudo-exhaustive testing of the circuit. P roof : Since the circuit has only at most 2k * ~ k+ 1 outputs, any input can drive only at most 2k ' ~ k+ 1 outputs. From Lemma 13, we know that all inputs that drive at most 2 k * ~ k+ 1 outputs can be assigned proper residues by k* independent test signals. □ T heorem 18 For any (n , m, k) circuit, our bound on the number of independent test signals for pseudo-exhaustive testing given by Theorem 17 is tighter than the bound derived in [3]. P roof : It has been shown in [3] that k* independent test signals are sufficient if k* satisfies the inequality m < 2k'~k. (6.3) It is evident that our bound is tighter than the bound derived in [3] as we can accommodate twice the number of outputs for the same number of test signals. □ Conjecture 1 For any (n, m, k) circuit, let k* ( > k) be the smallest number satis fying the following inequality m < 2f c ‘_*+2 + l (6.4) Then k* independent test signals are sufficient for pseudo-exhaustive testing of the circuit. Conjecture 1 is true for MTC circuits since any (n, m, k) circuit with m < 5 is a MTC circuit as per Theorem 15. Table 6.2 shows three upper bounds on the number of outputs for an (n, m, k) circuit that can be pseudo-exhaustively tested with the number of test signals given in the first column. For example, any (n ,m ,k ) circuit with at most four outputs can be pseudo-exhaustively tested with (k + 2) test signals according to the bound derived in [3]. Theorem 17 states that (k + 1) test signals are sufficient for pseudo- exhaustive testing of any (n, m, k) circuit with m < 4. Conjecture 1 states that k test 128 Number of Test Signals Number of outputs Akers Theorem 17 Conjecture 1 k 1 2 5 k + 1 2 4 9 k + 2 4 8 17 k + 3 8 16 33 lb * 2 k--k 2**— fc+ i 2**— *+2 + i Table 6.2: Bounds on test lengths for (n, m, k) circuit signals are sufficient for the same circuit. For a given number of test signals (say k*), we guarantee exhaustive testing of twice the number of output cones (Theorem 17) and possibly four times the number of output cones (Conjecture 1) compared to the number of output cones guaranteed by the bound in [3]. 6.4 Cone D ependent Bounds Given an (n , m, k ) circuit, we can utilize the information about cone dependencies to achieve a tighter bound on the number of independent test signals required for pseudo-exhaustive testing of the circuit. We shall derive tight upper bounds for both LFSR/XOR and LFSR/SR structures and show that our bounds are better than those derived in [3] and [4]. Let us consider the (n ,m ,k ) circuit along with the following notation. The n inputs are denoted as 0{,i = 1,2, ...,n, and the m outputs are denoted as O j, j = 1,2,... ,m, respectively. Input O i is assigned a unique index tt,-, where 1 < 7 r ,- < n. A permutation of inputs is specified completely by the ro-tuple (7Ti, 7 t2, ..., 7 T n). The default permutation is given by 7 r ,- = i, i = 1,2,..., n. We shall assume the default permutation of inputs unless stated otherwise. The input dependencies for an output is represented by an ordered set of inputs. The inputs are arranged in the ordered set in increasing order of their indices. Con sider output Oj being driven by k inputs 0 , 0 ,-2, ..., 0ik. Let 1 < ii < i 2 < ... ik < ro under the default permutation of inputs, the input dependencies for Oj is represented by the ordered set {0 , 0 ,-2, . . . , 9{ k}. Let p,,j denote the position of 0, in the ordered 129 dependency set for Oj. If 0 ,■ drives Oj, then pij takes appropriate value between 1 and k, otherwise pij = 0. Let p* = max {pi,i,Pi,2 , • ■ •,Pi,m} denote the maximum position in which 0 ,- occurs among the input dependencies for all m outputs. Let f i j be a Boolean variable such that / , j = 1 if > 0 and f i j = 0 if pij = 0. Let /* = fi,j denote the number of occurrences (frequency) of 0, • among all m outputs. The notation is illustrated in the following example. Exam ple 16 Consider a (6,6,3) circuit along with inputs denoted by 0j through 06 and outputs denoted by 0 \ through Oq respectively. Let us assume the default permutation where 0, is assigned index 7 r ,- = i. Let the input dependencies for the six outputs be the following ordered sets: {0i, 02, 03}, {01, 03, 04}, {02, 03, 0s}, {02, 04, 0s}, {01,05,06} and {04,05,06} respectively. For 02 we have p 2,j values (j = 1,2, ...,6) of 2, 0, 1, 1, 0 and 0, respectively, and f 2j values (j = 1,2,..., 6) of 1, 0, 1, 1,0 and 0, respectively. Hence p2 = 2 and the frequency /£ equals three. □ 6.4.1 L F S R /X O R s We shall derive tight upper bounds for the pseudo-exhaustive test sets generated by LFSR/XOR structures for a given (n ,m ,k ) circuit. Since these bounds are derived based on the ordering of the circuit inputs, we shall determine the best permutation of inputs in order to achieve the best improvement of these bounds. T h e o re m 19 For an (n ,m ,k ) circuit, let pX ) j, p* and fcj be the circuit parameters (defined earlier) characterizing the cone dependencies. Let k* be the smallest number satisfying the following inequality for all inputs 0 i, 1 < i < n. m [22p?-2-**i + £ f i tj { 2 Pi’j ~1 - [2p ,r+ p ‘' - ,-2-fc*]} < 2** (6.5) j=i Then k* independent test signals are sufficient for pseudo-exhaustive testing of the circuit. P ro o f : An (n, m, k ) circuit can be pseudo-exhaustively tested by k* independent test signals if all inputs can be assigned proper residues from the ^‘-dimensional space (say S). Inputs 01 through 0n are considered in succession for residue assign ment. Let us assume that inputs 0\ through 0,_i have been successfully assigned 130 proper residues and input 0 ,- is under consideration. We shall explore the feasibility of assigning a proper residue for 0; from S. Consider an output Oj* in which 0 ,- appears at position p* among the input dependencies. For this output, 0 ,- appears along with (p* — 1) inputs that have been already assigned proper residues. These (p * — 1) residues span a (p* — l)-dimensional subspace (say Sj*) and all the elements in this subspace are prohibited residues for 0,'. Consider another output Oj with p ij > 0 and hence f j = 1. For Oj, Oi appears along with (pij — 1) inputs that have been already assigned proper residues. These residues span a (pi,j — l)-dimensional space (say Sj) and all the elements in this space are prohibited residues for 0;. From Theorem 10, we know that subspaces S j * and Sj have at least |’ 2p' f+Pi'- ,-2~< : *] common elements. Hence the number of prohibited residues for Oi due to Oj* and Oj is given by |S j . u Sj| = |S,-.I + |S ,- I - I S ,- , n s,-| < 2P ^-1 + 2 P ii:» -1 — [2pi+Pi- > _2-fc*] Considering all outputs driven by 0 ,-, the total number of prohibited residues for 0, is given by m m I U Sj\ < |Sj.| + £ /«{|s,-| - I S ,- , n Sj|} j= U P i ,} > 0 j=l;j^j* m < 2P ^ -1 + /«'.j{2p,,J-1 — |" 2 P i?+ p ,' J-2-fc*]} m _ |- 22 p ? — 2 -**i + f i tj { 2Pi'> -1 - |‘ 2pt * +p,.-'-2-fc*]} j=i Thus the LHS expression of Equation 6.5 gives an upper bound on the total number of prohibited residues for 0,. As long as this expression is less than 2f c *, a proper residue from S is guaranteed for 0 ,-. Hence the satisfiability of Equation 6.5 for all inputs guarantees the existence of proper residues for all inputs in the space generated by k* independent test signals. □ Theorem 20 The cone dependent bound on the number of independent test signals given by Theorem 19 is tighter than the cone independent bound given by Theorem 17. 131 P ro o f : It is enough to show that the cone independent bound can be derived by assuming the worst case in the derivation of cone dependent bound. For an input 0, with p^j = k for all m outputs, we have p* = k and Equation 6.5 simplifies to \2 2k~ 2 *'] + m x (2 k~x - r22 *-2-**D < 2 k* = * • (m - l)(2fc _ 1 - f22*“2-**1) < 2 k' - 2 ‘ = > m < Thus the cone dependent bound is tighter than the cone independent bound. □ Theorem 21 For an (n ,m ,k ) circuit, let Ii denote the set of inputs that drive exactly i outputs in the circuit. Let k* be the smallest number satisfying the following inequality m E w £ (* ' +!) (6 - 6 > i= 2 * * - * + * + l Then k* independent test signals are sufficient for pseudo-exhaustive testing of the circuit. P roof : The number of inputs that drive more than 2k ' ~ k+1 outputs is given by the LHS expression of Equation 6.6. Assume the worst case, where the number of such inputs equals (k* + 1). Assign the (k* + 1) residues {1, x ,x 2, . . . , x k*~x, 1 + x H ( - a:**-1} to those (k* + 1) inputs. Any output will be driven by only at most k of these (k* + 1) inputs and hence this assignment ensures that all those (k* + 1) inputs are assigned proper residues. From Lemma 13, we know that inputs in Ij V y < 2k*~k+ x are guaranteed of proper residues in a ^‘-dimensional space. □ 6.4.1.1 Im provem ent on Bounds by Input Perm utation Given an (n, m, k ) circuit, the bound on the number of independent test signals given by Theorem 19 can be improved by allowing permutation of inputs. We shall de scribe a permutation algorithm that assigns unique indices to circuit inputs resulting in low (high) ptiJ - values for inputs driving many (few) outputs. The algorithm mod ifies the circuit parameters (that characterizes the cone dependencies) and allows Equation 6.5 to be satisfied for a smaller value of k*. 132 Procedure XO RBound Input: O utput cone dependencies of (n,m , k) circuit. O utput: Upper bound on the number of independent test signals k* ( > k). 1. Determine all dominating outputs and consider only the reduced circuit. 2. k* « — k. /* k* is the number of independent test signals */ 3. Determine /* for input 0 ,- Vi = 1 , 2 ,n. /* determine the frequencies of inputs */ 4. 7 r ,- « — 0 V * = 1,2,. ..,n ; n* * — n. /* initialize the indices of inputs and n* is the current highest index */ 5. For each unassigned 0; do (a) If /,* < 2f c *- *+1 then { 7 r ,- < — n*; n* < — n* — 1 } 6. While n* is decremented do (a) For each unassigned # ,• do i. 7 T , - < — n*. ii. Check the satisfiability of Equation 6.5 for 0 ,-. iii. If the equation is satisfied then n* « — n* — 1; else 7 r ,- « — 0. 7. If n* > 0 then (a) If k* = n, go to Step 8. (b) k* + — k* + 1; Go to Step 4. 8. Output the number of test signals (k*). The algorithm X O R B ound determines a minimal number of independent test signals that are sufficient for pseudo-exhaustive testing of a given circuit. Lemma 12 enables us to consider only dominating outputs for determining the bound on test 133 length. Lemma 13 states that a set of k* test signals guarantees proper residues for each input that drives at most 2f c *-fc+1 outputs and hence all these inputs are assigned highest possible indices. From the remaining set of unassigned inputs, an input (say Oi) is assigned the current highest index (n*) provided it satisfies Equation 6.5. The Pij values for O i are determined based on the fact that the remaining unassigned inputs can have indices only less than n*. The unassigned inputs are repeatedly considered for assignment until there is no decrease in the value of n*. Any further existence of unassigned inputs mandates an increment to the number of test signals and an iteration of the entire algorithm. The complexity of the algorithm can be computed as follows. Every iteration of the while loop results in assigning proper indices to one or more inputs. The number of iterations of the while loop is bounded above by n(n + l)/2 since every iteration can result in assigning a proper index to only one input. The satisfiability check for input O i involves determining pij values for all m outputs. Thus the complexity of the while loop is given by 0 {m n 2). The number of iterations of the entire algorithm is bounded above by (n — k ). Thus the complexity of the algorithm is given by 0 (m n 3), where n and m are the number of inputs and outputs to the circuit respectively. Note that in general, considering all permutations of inputs and Theorem 19 for determining the tightest possible bound has exponential complexity. The following theorem states that our permutation algorithm of polynomial complexity is sufficient to find the tightest possible bound using Theorem 19. T h eo rem 22 The X O R B ound algorithm (of polynomial complexity) determines the tightest possible bound on the number of test signals that can be achieved using Theorem 19. P ro o f : Let I denote the set of circuit inputs. Let k* be the number of test signals considered during some iteration of the XO R B ound algorithm. Assume that a subset of inputs (say I\ with |/x| = n x) are not assigned indices after the completion of the while loop. This implies that all the (n — ni) inputs in I — Jx have been successfully assigned indices greater than n i. Let III denote the partial permutation of circuit inputs in which nx inputs are not assigned indices and the remaining (ra — n x) inputs are assigned indices greater than nx. The while loop must have terminated after determining that none of the inputs in 7X can be assigned the index 134 ni. We claim that 7X is the minimum set under any partial permutation of inputs and shall prove the claim as follows. Let IT 2 denote another partial permutation of circuit inputs that results in the minimum subset of inputs (say / 2 with I/2I = ti2) such that (1) none of the tz2 inputs in I 2 can be assigned the index ti2 and satisfy Equation 6.5 and (2) all of the (ra — 7 7 2) inputs in / — 72 are assigned proper indices greater than n 2 and satisfy Equation 6.5. We shall prove that 7X = / 2 by contradiction. Let I{ = 7X — (7X f l / 2) and consider an input 0\ G I[. Input 0X does not satisfy Equation 6.5 with index n x under IIi but satisfies the equation with an index greater than 712 under II2- Hence for some output (say Oj), the p ij value for 0X under n x must be greater than the p ij value under n2 . This is possible only if there exists another input (say 6 2 ) that appears before 0\ among the dependencies for Oj under n x and appears after 0X among the dependencies for Oj under n2 . This implies (1) 02 G 7X under nx ; (2) 02 $ I 2 under n2 and (3) 02 being assigned an index greater than that of 0X under n2 . Repeating the argument for 02 G I[ leads to a third input 03 G I[ and 03 being assigned an index greater than that of 02 under n2 . The argument can thus be repeated for all inputs in I[. The argument fails for the last input in I[ since there are no more inputs left in /(. This is a contradiction. Hence there exists no 0X G I[ and 7X C / 2. Since / 2 is the minimum set by definition, 7X = 72. Thus the XO R Bound algorithm determines the minimum set of inputs that cannot be assigned indices and iterates with an increment to the number of test signals. Thus the algorithm determines the tightest possible bound on the number of test signals that can be achieved using Theorem 19. □ E x am p le 17 Consider the (6,6,3) circuit described in Example 16. Akers’ bound using Equation 6.3 requires six signals. Our bound using Equation 6.5 without allowing a permutation of inputs requires four signals. Applying the X O R B ound algorithm reduces our bound to three test signals. The circuit can be tested with three independent test signals. Residues { l,x ,x 2, 1 + x, 1 + x 2,x } are assigned to inputs 1 through 6 respectively. □ Table 6.3 presents the upper bounds on test lengths for LFSR/XORs for the partitioned versions of ISCAS combinational benchmark circuits. The benchmark circuits are partitioned using our partitioning procedure [34] such that the output 135 Ckt (n,m,k) Dominating Outputs Bound on Test Length Akers [3] with default permutation with best permutation c432 (56,27,20) 20 2 25 24 0 22 0 c499 (49,40,14) 40 22 0 21 6 21 4 c880 (70,36,17) 29 2 22 21 8 21 7 cl355 (49,40,14) 40 22 0 21 6 2 14 cl908 (47,39,20) 26 2 25 22 0 22 0 c2670 (262,169,20) 117 22 7 22 0 2 2° c3540 (108,80,20) 57 22 6 22 1 22 0 c5315 (215,160,20) 91 2 27 22 1 22 0 c6288 (99,98,20) 39 22 6 22 2 22 1 c7552 (286,187,20) 69 22 7 22 0 22 0 Table 6.3: Upper bounds on pseudo-exhaustive test lengths for LFSR/XORs cones are driven by 20 or less inputs. Columns 2 and 3 present the (n, m, k) char acteristics and the number of dominating outputs of these circuits. The last three columns present the bounds on test lengths by considering the reduced circuits. Akers’ bounds are determined by using Equation 6.3. Our bounds with default per mutation of inputs are determined using Equation 6.5. The X O R B ound algorithm achieves tighter bounds on pseudo-exhaustive test lengths by determining one of the best permutation of inputs. The improvement of the bounds by allowing per mutation of inputs is evident from the table. Our bounds determined by allowing permutation of inputs are optimal for all circuits except c6288, and a few orders of magnitude smaller than Akers’ bound. It should be noted that we exploited the information about the input dependencies of the output cones in the circuit. 6.4.2 L F S R /S R s An (n, m, k) circuit can be pseudo-exhaustively tested by an LFSR/SR if there exists a primitive feedback polynomial of degree k*( > k) such that the residues assigned to the inputs driving each output are linearly independent (Theorem 13). We shall assume the default permutation of inputs where input 0 ,- is fed by the ?th stage of 136 LFSR/SR. Input is assigned the residue a;' mod P (x), where P(x) is the primitive feedback polynomial of the LFSR/SR. D efinition 8 The primitive feedback polynomial of an LF SR /SR considered for a given circuit is said to be inapplicable if the polynomial results in a set of linearly dependent residues for the set of inputs driving some output of the circuit. T h eo rem 23 (G olom b 1982) The total number of primitive polynomials of degree k* is given by $(2f e * — l)/k*, where $ is Euler’ s phi function. T h eo rem 24 For an (n ,m ,k ) circuit, let p;i2 and f j be the circuit parameters (defined earlier) characterizing the cone dependencies. Let k* be the smallest number satisfying the following inequality m n E E i x - 1) < * (2 ‘‘ - 1) ss 2*‘ - 1 (6.7) j = 1 « = J t * Then a LF SR /SR based on a degree k* primitive polynomial is sufficient for pseudo- exhaustive testing of the circuit. P ro o f : (The following is an extension of the arguments presented in [4]). Let us consider output Oj and determine an upper bound on the number of inapplicable primitive polynomials of degree k* for this output. Let the input dependencies of Oj contain input 0 ,- in p,,jth position. Let inputs 0 ,-, ,0,-2, . .. , 0,p. appear in positions 1 through (pi,j — 1) respectively for this output. An applicable primitive polynomial P (x) of degree k* should ensure that the residues {a;* 1 mod P (x), x 1 2 mod P (x), ..., mod P (x), x' mod P (x)} are linearly independent. In other words, each polynomial Q(x) of the form x' -f ]£j=i_1 aqxt < l (where aq = 0 or 1 and not all of them are zeros) must not be divisible by P(x). There are (2P ' - ’-1 — 1) such polynomials Q(x) of degree i. Each one of the polynomials Q(x) is divisible by no more than i/k* distinct primitive polynomials of degree k*. Therefore an upper bound on the number of inapplicable primitive polynomials of degree k* that may assign linearly dependent residues to some inputs in the set {0,j,0,2, ... , 0,p. , Of) driving Oj is given by the expression E ij = (i/k*)(2p' ^ ~ 1 — 1). Summing up for all values of i > k* yields an upper bound on the number of inapplicable primitive polynomials 137 of degree k* for Oj. There is no need to consider any Q(x) polynomial of degree less than k* since the primitive polynomial P(x) is of degree k*. The Boolean variable fitj ensures that only those inputs that drive Oj are considered. Again summing up for all values of j yields an upper bound on the total number of inapplicable primitive polynomials of degree k* for all circuit outputs. This double summation is given by the LHS expression of Equation 6.7. Theorem 23 states that the total number of primitive polynomials of degree k* is given by $(2f c * — 1 )/k*. To ensure that the total number of inapplicable primitive polynomials of degree k* is less than the total number of primitive polynomials of degree k*, we must have m n E E ■'/*'x / u ( 2WJ- 1 - 1) < * ( 2‘- - i ) / r j=l t=k* m n => E E * x - 1 ) < $(2** - 1) * 2** - 1 j=i ,=jt* Thus the satisfiability of Equation 6.7 guarantees a primitive polynomial of degree k* applicable to all outputs. □ T h eo rem 25 For any (n ,m ,k ) circuit, our bound on the degree of L F SR /SR given by Theorem 24 is tighter than the bound derived in [4]- P ro o f : It has been shown in [4] that a LFSR/SR of degree k* is sufficient for pseudo-exhaustive testing of an (n ,m ,k ) circuit if k* satisfies the equation n x m x (2* - 1) < $(2** - 1) « 2** - 1 (6.8) We shall show that the value of k* in Equation 6.7 is bounded above by the value of k* in Equation 6.8. Let E j denote the expression * x fi,j x (2P ' ,_1 — 1). The LHS expression of Equation 6.7 can be expressed as YfjLi Ej- Let us consider the input dependencies for output Oj given by the ordered set {0:i, 0 ,-2, . . . , 0,k}. For this output we compute E j as Ej = E * ' * h i X - 1) i=k* 138 < E i,x (!!•->-1) 9 = 1 k < n x — 1) (since iq < n) 9 = 1 < n x (2f c - 1) Summing up E j for all values of j , we get m m E j < ^ n x ( 2 i - l ) = m x n x ( 2 t - l ) j= i j=i Thus the LHS expression of Equation 6.7 is smaller than the LHS expression in Equation 6.8. Hence k* value in Equation 6.7 is bounded above by k* value in Equation 6.8. □ 6.4.2.1 Im p ro v em en t on B ounds by In p u t P e rm u ta tio n Given an (n,m , k) circuit, the bound on the degree of the applicable primitive poly nomial for an LFSR/SR given by Theorem 24 can be improved by permuting the inputs. We shall attem pt to minimize the total number of inapplicable primitive polynomials given by the LHS expression in Equation 6.7. Thus improvement on the bound can be obtained for the degree of the applicable primitive polyno mial for LFSR/SR. This is similar to the improvement on the bound achieved for LFSR/XORs. P ro ce d u re S R B ound In p u t: Output cone dependencies of (n, m, k) circuit. O u tp u t: Upper bound on the degree k* ( > k) of applicable primitive polynomial. 1. Determine all dominating outputs and consider only the reduced circuit. 2. k* «- k. /* k* is the degree of the primitive polynomial */ 139 3. Assign indices to inputs according to the input permutation determined by the algorithm XO RBound. 4. While Equation 6.7 is not satisfied do (a) If k* = n, go to Step 5. (b) k* «- k* + 1. 5. Output the degree of the applicable primitive polynomial (k*). For a given circuit, the SRB ound algorithm usually determines an applicable primitive polynomial of smaller degree than the default permutation. Only dominat ing outputs are considered as per Lemma 12. The input permutation determined by the X O R B ound algorithm is used to minimize the LHS expression of Equation 6.7. The satisfiability check involves computing j values for all inputs driving each out put. Since the input permutation determined by the X O R B ound algorithm is used again in the SR B ound algorithm, the complexity of the SRB ound algorithm is same as that of the complexity of the XO R Bound algorithm. However, the SRB ound algorithm for LFSR/SRs does not guarantee the tightest possible bound unlike the X O R B ound algorithm for LFSR/XORs. E x am p le 18 Consider the (6,6,3) circuit described in Example 16. For LFSR/SRs, Barzilai’s bound determined by Equation 6.8 requires eight test signals. The bound computed using Equation 6.7 without allowing permutation of inputs requires a primitive polynomial of degree five. The SRB ound algorithm still requires a degree five primitive polynomial. However, the circuit can be tested with an LFSR/SR using the primitive polynomial x 4 -f x + 1. □ Table 6.4 presents the upper bounds on test lengths for LFSR/SRs for the parti tioned versions of ISCAS combinational benchmark circuits. The last three columns present the bounds on test lengths by considering only the reduced circuits. Barzilai’s bounds are determined using Equation 6.8 and our bounds with default permutation of inputs are determined using Equation 6.7. The SR B ound algorithm results in tighter bounds by using the same permutation of inputs that were originally deter mined for LFSR/XORs. The improvement of the bounds by allowing permutation 140 Ckt (n,m,k) Dominating Outputs Bound on Test Length Barzilai [4] with default permutation with good permutation c432 (56,27,20) 20 2si 22 S 22 6 c499 (49,40,14) 40 22 S 22 3 22 2 c880 (70,36,17) 29 22 8 22 6 22 5 cl355 (49,40,14) 40 22 S 22 3 22 2 cl908 (47,39,20) 26 23 1 22 7 22 6 c2670 (262,169,20) 117 23 5 23 0 22 9 c3540 (108,80,20) 57 23 3 23 1 23 0 c5315 (215,160,20) 91 23 5 23 2 23 1 c6288 (99,98,20) 39 23 2 23 0 23 0 c7552 (286,187,20) 69 23 5 23 2 23 0 Table 6.4: Upper bounds on pseudo-exhaustive test lengths for LFSR/SRs of inputs is evident from the table. It should be noted that our LFSR/SR bounds are of a few orders of magnitude smaller than Barzilai’s bounds. 6.5 Sum m ary In this chapter we have first derived a few important algebraic results on the set union and intersection operations between vector subspaces. We have determined (a) the minimum overlap between distinct subspaces and (b) the minimum number of distinct subspaces contained in a vector space. These algebraic results are used in the derivation of the bounds on pseudo-exhaustive test lengths. We have determined a few generic bounds on test lengths that are independent of the structural information about the circuit output cones. We have shown that any circuit with less than six outputs is a maximal test concurrent circuit. We have derived expressions for the number of independent test signals that are sufficient for pseudo-exhaustive testing of any given (n, m, 2) circuit. The expressions are based on either the number of inputs or outputs to the circuit. Similar expressions are derived for an (n, m, k) circuit based on the number of outputs and the maximum cone size to the circuit. 141 We have also derived a few circuit-specific bounds utilizing the structural in formation about the circuit output cones. We have derived tight upper bounds on the test sets generated by LFSR/XORs and LFSR/SRs and shown that our bounds are better than those derived in [3] and [4]. We have developed algorithms of polynomial complexity to permute circuit inputs to obtain good improvements on these bounds. Our bounds provide good estimates of pseudo-exhaustive test lengths and can be used as guiding factors in designing circuit-specific TPGs. The com puted theoretical bounds for the partitioned benchmark circuits comply well with the pseudo-exhaustive test lengths generated by circuit-specific TPGs as reported in [35]. 142 C hapter 7 P seudo-E xhaustive Test System 7.1 Introduction The results of the research work presented in this thesis are being implemented in a prototype software system — named the P E T (pseu d o -ex h au stiv e te s t) system. In this chapter we shall present the details of the PET system that forms an integral part of the BIST system developed at USC [25]. We shall also list the significant contributions and possible extensions to our research work. PET is a test application system built upon Cbase [15], an object oriented frame work for VLSI design and test applications. The PET environment is given in Fig ure 7.1. Circuits that need to be made testable are first hierarchically reorganized and partitioned into clouds and registers by the CRETE system [13]. The test hier archical view of the circuits outputted by the CRETE system are compatible with both our scan and BIST systems, namely SIESTA [12] and BITS [25]. 7.2 C R ETE System Circuit designers tend to perceive circuits as a collection of functional logic blocks. A functional logic block is usually composed of many levels of hierarchy and in variably contains storage elements such as latches or flip-flops. Test engineers tend to perceive circuits as a collection of combinational logic blocks separated by the storage elements. Thus the perception of circuits differs drastically between the cir cuit designers and the test engineers. Test engineers develop testable versions of the 143 Circuit: Designer’s view Circuit: Test Engineer’s view scan Figure 7.1: PET enviroment 144 circuits designed by circuit designers. In general, the hierarchical circuit descrip tion provided by the circuit designers may not be suited for the test engineers to apply various testable design methodologies (TDMs) [1]. Circuits may need to be hierarchically reorganized prior to the application of suitable TDMs. We have developed a methodology, named CRETE [13], for hierarchical reorgani zation of circuits. C R E T E is an acronym for Clouding, hierarchical Reorganization, Equivalence determination, Test methodology embedding and Editing. CRETE bridges the gap between circuit designers and test engineers and can be interpreted as a meta-methodology that enables application of and experimentation with various TDMs. The required hierarchical reorganization of circuits is carried out by CRETE and various TDMs can then be applied to the hierarchically reorganized circuits. Scan based approaches usually assume a gate and flip-flop model of the circuit, while BIST techniques assume a clustering based on blocks of logic separated by registers. Many TDMs result in very large combinational or sequential circuits. It is desirable to have a technique for partitioning circuits to reduce (1) the test generation effort and (2) the test application time. CRETE attem pts to achieve these goals by partitioning and reorganizing the hierarchical descriptions of circuits. The combinational logic gates are clustered to form disjoint partitions referred to as clouds and the storage elements are clustered to form registers. The clouds and the registers form canonical partitions of circuits. For scan designs, deterministic tests can be generated for the cloud and the registers can be modified to have scan capabilities. For BIST designs, the cloud can be tested by modifying the registers as LFSRs. 7.3 P E T Flowchart The flow chart of the PET system is given in Figure 7.2. The input to the system consists of the following. 1. Gate level description of a combinational circuit. 2. Constraint on hardware overhead in terms of percentage with respect to the total number of gates in the circuit (say < P%). 145 3. Constraint on test time in terms of the maximum number of patterns allowed (say < 2**). The circuit is first processed to determine its maximum size. The necessary condition to satisfy the constraint on test time is that each of the output cones must be driven by no greater than k* inputs. If necessary, the circuit is partitioned to reduce its maximum cone size to a value less than or equal to k*. Partitioning involves placement of segmentation cells on appropriate gate outputs in the circuit. During the iterative partitioning process, the constraint on hardware overhead is checked after the placement of each cell in the circuit. If the constraint is not satisfied, the user decides whether to modify the constraints or to exit the system. Test pattern generators are then designed for the partitioned circuit. Different LFSRs of the same degree are considered as part of the TPG designs. Though the maximum cone size of the partitioned circuit is no greater than k*, sometimes it may not be possible to generate a pseudo-exhaustive test set of size 2f c *. TPG designs incur hardware overhead and hence the constraints on both hardware overhead and test time are checked during the design process. Again the user intervenes to make decision upon the violation of any of the two constraints. The output of the system consists of the following. 1. Partitioned version of the original circuit. 2. Circuit-specific TPG design(s) for the partitioned circuit. 3. Hardware overhead and test time involved with the test strategy. The PET system is written in C and C + + and is being integrated as a subsystem within the BITS system. The BITS system invokes PET subsystem for determining the pseudo-exhaustive testable versions of the subcircuits that require full compre hensive fault coverage. 7.4 R esearch Contributions The research work presented in this thesis can be classified into three major cate gories, namely partitioning, test pattern generation and bounds on test lengths. We shall describe our significant contributions under each of these categories. 146 1. Combinational Circuit 2. Constraint on Hardware Overhead (say 2 P % ) 3. Constraint on Test Time ( say 2 2k*) No Yes Cone sizes 2 k*? Yes No HO 2 P%? No TT 2 2k ‘ ? Yes 1. Partitioned Circuit 2. Test Pattern Generator Design 3. Hardware Overhead and Test Time Partition Circuit Design TPG Decision User Decision User Figure 7.2: PET Flowchart 147 7.4.1 P artitioning Pseudo-exhaustive test strategy is attractive since, compared to exhaustive testing, it offers most of the benefits with significantly less number of test patterns. The maximum cone size of the given circuit dictates the lower bound for the pseudo- exhaustive test time. Pseudo-exhaustive testing may not be practical for circuits with output cones driven by a large number of inputs. Chapter 3 presents an efficient partitioning strategies for reducing the maximum cone sizes of various classes of logic circuits. Reduction in the maximum cone size amounts to a reduction in the pseudo- exhaustive test time but results in the addition of hardware overhead. Thus there exists a trade-off between test time and hardware overhead in designing a pseudo- exhaustive self-testable versions of logic circuits. Logic circuits can be classified into various classes such as (1) circuits without any fanouts, (2) circuits with or without non-reconvergent fanouts, (3) two-level or multi-level circuits, and (4) iterative logic array (ILA) structures. We have developed efficient partitioning procedures for all these various classes of circuits. Two-level and multi-level fanout-free circuits can be partitioned with the optimal number of segmentation cells using our procedures which have polynomial complexity. We have determined a few important characteristics of circuits with reconvergent fanouts that justify the need for partitioning procedures of exponential complexity. We have de rived upper bounds on the number of segmentation cells required for partitioning ILA structures. The expressions are derived based on the regularity of these struc tures. We have formulated the partitioning problem as the classical integer linear pro gramming (ILP) problem. Optimal solutions for small circuits can be obtained by solving the ILP formulation. However, the formulation may not be computationally viable for large circuits since the number of constraints grows non-linearly with the number of levels in the circuit. To handle large circuits with reconvergent fanouts, we have developed an efficient heuristic procedure of polynomial complexity based on the graph-theoretical concept of articulation nodes [10]. The quality of our heuris tic approach is demonstrated on the ISCAS combinational benchmark circuits. We have extended the partitioning strategy for balanced sequential circuits [12]. 148 Chapter 4 presents our partitioning method for logic circuits to achieve maxi mal test concurrency during the test mode. The heuristic procedure is based on the graph-theoretical concept of bridges [10]. Circuits partitioned for maximal test concurrency can be pseudo-exhaustively tested with the minimum test set in a sin gle test session. Pseudo-exhaustive test sets can be easily generated using maximal length LFSRs or counters. Circuits can be first partitioned for cone size reduction and can be further partitioned to achieve maximal test concurrency. |< ------- Psaudo - Exhaustive tasting ► 100% ct CO o > Fault Coverage O B O J 0% , 1 0 2r No. of patterns Figure 7.3: Pseudo-exhaustive testing spectrum of an (n,m , k) circuit Figure 7.3 shows the stuck-at fault coverage curve with respect to the number of distinct test patterns applied to an (n , m, k) circuit. Exhaustive testing applies the complete set of all possible 2n patterns to the circuit. On the contrary, the circuit 149 can be pseudo-exhaustively tested (without any partitioning) by applying a test set of 2W patterns, where k < w < n. The maximum cone size (k) of the circuit can be reduced to a desired value (say r) by partitioning the circuit with a minimal number of segmentation cells. The circuit partitioned for cone size reduction (CSR) can be pseudo-exhaustive tested with 2r' patterns, where r < r' < k. The circuit can be partitioned further for achieving maximal test concurrency (MTC) during the test mode. Thus partitioning for both CSR and MTC results in a pseudo-exhaustive test set containing 2r patterns, where r is the desired cone size limit. Thus partitioning significantly reduces the size of a pseudo-exhaustive test set ensuring full coverage of stuck-at faults. However, a price is paid in terms of hardware overhead for the reduction in test time. 7.4.2 T est P a ttern G eneration A pseudo-exhaustive test set for an (n ,m ,k ) circuit contains an exhaustive set of patterns for each of the m output cones of the circuit. The size of the test set is bounded below by 2k and bounded above by 2". Circuit-specific TPGs [3, 4] utilize the cone dependency information of the given circuit and usually generate test sets of sizes much smaller than that of the universal TPGs [37, 42, 43]. Chapter 5 describes our novel circuit-specific TPG designs that utilize minimal hardware and generate minimal pseudo-exhaustive test sets. Maximal length LFSRs form the basic underlying structure of our TPG designs. Among our novel TPGs, convolved LFSR/SRs have great potential to generate minimum test sets. Convolved LFSR/SRs bridge the gap between LFSR/XORs [3] and (simple) LFSR/SRs [4], Typically, convolved LFSR/SRs achieve low test lengths like LFSR/XORs and uti lize low hardware like LFSR/SRs. The design procedure for convolved LFSR/SRs can also be used to determine other TPGs such as multiple LFSR/SRs and simple LFSR/SRs. The XOR network to form test signals makes convolved LFSR/SRs into very efficient TPGs. The efficiency of convolved LFSR/SRs in terms of both hardware and test time are demonstrated by the experiments on the combinational benchmark circuits. We have derived the necessary and sufficient conditions for possible transforma tions among the following TPGs — LFSR/XORs, convolved LFSR/SRs, multiple 150 LFSR/SRs and simple LFSR/SRs. Based on empirical observations, we have con structed a lattice among these TPG designs in terms of hardware overhead and test time. 7.4.3 B ounds on T est Lengths We have derived tight upper bounds on the sizes of pseudo-exhaustive test sets. Chapter 6 has derivations of a few important algebraic results on the set union and intersection operations between vector spaces. These algebraic results form the backbone for the derivation of the bounds on pseudo-exhaustive test lengths. We have determined a few generic bounds on test lengths that are independent of the structural information about the circuit output cones. Generic bounds are used to characterize various classes of circuits. We have shown that any circuit with less than six outputs is a MTC circuit. This is a significant improvement over the existing result that any circuit with less than three outputs is a MTC circuit [27]. We have derived expressions for generic upper bounds on the pseudo-exhaustive test sets for (n, m, 2) circuits. These expressions are based on either the number of inputs or the number of outputs to the circuits. Similar expressions are derived for (n, m, k) circuits based on their number of outputs and their maximum cone size. We have also derived a few circuit-specific bounds utilizing the structural infor mation about the circuit output cones. Tight upper bounds on the pseudo-exhaustive test sets have been derived for LFSR/XORs and LFSR/SRs. We have shown, both theoretically and experimentally, that our bounds are significantly better than those derived in [3] and [4]. We have developed algorithms of polynomial complexity to permute the circuit inputs in order to obtain good improvements on these bounds. Our bounds provide good estimates of pseudo-exhaustive test lengths and can be used as guiding factors in designing circuit-specific TPGs. The computed theoretical bounds comply well with the sizes of test sets generated by our TPG designs for the combinational benchmark circuits. 151 7.5 Future E xtensions Future work on the various aspects of pseudo-exhaustive testing can be carried out in various prospective directions. We shall describe a few extensions to our research work under each of the following categories: partitioning, test pattern generation and bounds on test lengths. 7.5.1 P artition in g We have considered pseudo-exhaustive testing of partitioned circuits in a single test session. During the test session, each segmentation cell is configured as both a TPG cell and a SA cell. Allowing multiple test sessions may provide the flexibility of configuring a segmentation cell as either TPG cell or SA cell during a test session. Thus the hardware involved in the design of a segmentation cell can be reduced to some extent since the segmentation cell need not be configured necessarily as both TPG cell and SA cell during the same session. Also, the number of segmentation cells required for partitioning may be reduced by possible sharing of cells required at different locations during different test sessions. However, a price has to be paid in terms of an increase in the pseudo-exhaustive test time. We have highlighted a few important characteristics of circuits with reconvergent fanouts providing justification for partitioning procedures of exponential complexity. We have proposed heuristic partitioning procedures of polynomial complexity to obtain good suboptimal solutions for circuits with and without non-reconvergent fanouts. It will be interesting to characterize and classify these fanout circuits based on the guarantee of obtaining optimal solutions using our heuristic procedures. We have considered partitioning for cone size reduction and for achieving maxi mal test concurrency as a two-step process. A combined strategy can be investigated for partitioning circuits such that the two requirements — (1) the sizes of the out put cones are restricted to some user-defined value, and (2) the circuit is maximal test concurrent in test mode — are met simultaneously rather than by a two-step process. The combined strategy may lead to a reduced number of segmentation cells compared to the two-step process. 152 We have extended the partitioning strategy for balanced sequential circuits [12]. The strategy can be extended in a similar fashion for unbalanced acyclic sequential circuits. Unbalanced acyclic circuits contain stuck-at faults that can be detected only by multiple sequences of test patterns. Hence it involves a detailed study of the TPG designs for generating pseudo-exhaustive test sets for unbalanced acyclic circuits prior to partitioning these circuits for reducing the test time. Our strategies can be classified under hardware partitioning approaches for reduc ing the pseudo-exhaustive test time. Sensitized partitioning [41] is another approach where the functionality of the logic gates in the circuit are exploited for reducing the pseudo-exhaustive test time. In sensitized partitioning, some of the circuit inputs are held at constant values while the remaining inputs are applied an exhaustive set of patterns. Thus a portion of the given circuit is exhaustively tested by sensitizing the remaining portion of the circuit. A combined strategy involving both hardware and sensitized partitioning approaches can be investigated in order to reduce both the hardware overhead and the test time. 7.5.2 Test P a ttern G eneration Our design procedure for convolved LFSR/SRs described in Chapter 5 considers only the minimum length requirement for the shift register segments. This consideration helps to some extent to reduce the number of feed-forward stages in the convolved LFSR/SR designs and thus reduces the hardware. The hardware overhead can be minimized by considering the following optimization criteria: (1) minimize the number of feed-forward stages, and (2) minimize the number of two-input XOR gates used for realizing the feed-forward stages. Both the criteria lead to optimization problems of exponential complexity and efficient procedures can be investigated for obtaining good suboptimal solutions. TPG design for balanced sequential circuits is complicated since each pattern may need to be applied at the circuit inputs for a few successive clock cycles. In other words, TPG may have to hold each pattern for a few successive clock cy cles prior to generating another new pattern. Alternatively, TPG designs can be investigated without holding each pattern for two or more clock cycles. The most 153 important criterion in the TPG design is to ensure the minimality of the pseudo- exhaustive test set in order to reduce the overall test application time. Generating a pseudo-exhaustive test set for unbalanced acyclic sequential circuits is even more complicated since the test set must contain multiple sequences of patterns in order to test the stuck-at faults that are not single pattern testable [12]. 7.5.3 B ounds on Test Lengths We have conjectured an important result in Chapter 6 that needs to be investigated and proven for any arbitrary circuit. The conjecture states a generic (circuit cone independent) upper bound on the number of circuit outputs that can be exhaustively tested with a given number of independent test signals (allowing linear combinations of these signals). It should be noted that the result is independent of the number of inputs to the circuit. The conjecture, if proven, will be a significant improvement over the proven upper bound given by Theorem 17. The circuit cone dependent bound for LFSR/XORs given by Theorem 19 can be improved in the following manner. In the proof of the theorem, the number of prohibited residues is overestimated due to multiple countings resulting in a loose upper bound. The upper bound can be improved by determining a minimal set of prohibited residues. The circuit inputs can be permuted to further improve the upper bound. Similarly, the cone dependent bound for LFSR/SRs given by Theo rem 24 needs to be improved by determining a minimal set of inapplicable primitive polynomials as described in the proof of the theorem. An input permutation al gorithm to achieve the best improvement on the LFSR/SR bound, similar to the permutation algorithm described for LFSR/XORs, needs to be investigated. 154 R eference List [1] M. S. Abadir and M. A. Breuer. A Knowledge-Based System for Designing Testable VLSI Chips. IEEE Design & Test, 2:56-68, August 1985. [2] M. Abramovici, M. A. Breuer, and A. D. Friedman. Digital Systems Testing and Testable Design. IEEE Computer Science Press, 1992. [3] S. B Akers. On the Use of Linear Sums in Exhaustive Testing. In Proc. 15th In t’ l. Symp. on Fault-Tolerant Computing, pages 148-153, June 1985. [4] Z. Barzilai, D. Coppersmith, and A. Rosenberg. Exhaustive Bit Pattern Gen eration in Discontiguous Positions with Applications to VLSI Testing. IEEE Trans, on Computers, C-32(2):190-194, February 1983. [5] Z. Barzilai, J. Savir, G. Markowsky, and M. G. Smith. The Weighted Syndrome Sums Approach to VLSI Testing. IEEE Trans, on Computers, C-30(12):996- 1000, December 1981. [6] S. N. Bhatt, F. R. K. Chung, and A. L. Rosenburg. Partitioning Circuits for Improved Testability. In Proc. 4th M IT Conf. on Advanced Research in VLSI, pages 91-106, April 1986. [7] F. Brglez and H. Fujiwara. A Neutral Netlist of Ten Combinational Bench mark Circuits and a Target Translator in FORTRAN. In Proc. In t’ l. Symp. on Circuits and Systems, pages 663-698, June 1985. [8] C. H. Chen. BISTSYN - A Built-In Self-Test Synthesizer. In Proc. In t’ l Conf. on Computer Aided Design, pages 240-243, 1991. [9] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Press and McGraw-Hill Book Company, 1990. 155 [10] N. Deo. Graph Theory with Applications to Engineering and Computer Science. Prentice Hall, Inc., 1974. [11] S. W. Golomb. Shift Register Sequences. Aegean Park Press, 1982. [12] R. Gupta. Advanced Serial Scan Design for Testability. Technical Report CENG 91-10, University of Southern California, 1991. [13] R. Gupta, R. Srinivasan, and M. A. Breuer. Reorganizing Circuits to Aid Testability. IEEE Design & Test, 8(3):49— 57, September 1991. [14] Rajesh Gupta, Rajiv Gupta, and M. A. Breuer. The BALLAST Methodology for Structured Partial Scan Design. IEEE Trans, on Computers, C-39(4):538- 543, April 1990. [15] Rajiv Gupta, W. H. Cheng, Rajesh Gupta, I. Hardonag, and M. A. Breuer. An Object-Oriented VLSI CAD Framework: A Case-Study in Rapid Prototyping. IEEE Computer, 22(5):28— 37, May 1989. [16] S. Hellebrand and H-J. Wunderlich. Tools and Devices Supporting the Pseudo- Exhaustive Test. In Proc. 1st European Design Automation Conf., pages 13-17, March 1990. [17] S. Hellebrand, H-J. Wunderlich, and O. F. Haberl. Generating Pseudo- Exhaustive Vectors for External Testing. In Proc. In t’ l Test C onf, pages 670- 679, September 1990. [18] I. N. Herstein. Topics in Algebra. Xerox College Publishing, 1975. [19] W. B. Jone. Methodology of Partitioning and Exhaustive Test Pattern Gener ation for Built-in Self-Testing of VLSI Circuits. Technical Report CES-88-xx, Case Western Reserve University, January 1988. [20] W. B. Jone and C. A. Papachristou. A Coordinated Approach to Partitioning and Test Pattern Generation for Pseudoexhaustive Testing. In Proc. Design Automation Conf, pages 525-530, June 1989. 156 [21] D. Kagaris, F. Makedon, and S. Tragoudas. On Minimizing Hardware Overhead for Pseudo-Exhaustive Circuit Testability. In Proc. In t’ l Conf. on Computer Design, pages 358-364, 1992. [22] D. Kagaris and S. Tragoudas. Cost-Effective LFSR Synthesis for Optimal Pseudo-Exhaustive BIST Test Sets. IEEE Trans, on VLSI Systems, 1(4):526- 536, December 1993. [23] B. Konemann, J. Mucha, and G. Zwiehoff. Built-In Logic Block Observation Technique. In Proc. In t’ l Test Conf, pages 37-41, October 1979. [24] A. Lempel and M. Cohn. Design of Universal Test Sequences for VLSI. IEEE Trans, on Information Theory, IT-31(1):10-17, January 1985. [25] S. P. Lin. A Design System to Support Built In Self Test of VLSI Circuits Using BILBO Oriented Test Methodologies. Technical Report CENG 94-11, University of Southern California, 1994. [26] C. M. Maunder and R. E. Tulloss. The Standard Test Access Port and Boundary-Scan Architecture. Addison-Wesley Publishing Company, 1991. [27] E. J. McCluskey. Verification Testing — A Pseudoexhaustive Test Technique. IEEE Trans, on Computers, C-33(6):541-546, June 1984. [28] E. J. McCluskey and S. Bozorgui-Nesbat. Design for Autonomous Test. IEEE Trans, on Computers, C-30(ll):866-875, November 1981. [29] D. Mukherjee. An Integrated Test Controller Synthesis System. Technical Report CENG 94-21, University of Southern California, 1994. [30] K. P. Parker. Integrating Design and Test: Using CAE Tools for A TE Pro gramming. IEEE Computer Society Press, 1987. [31] I. Parulkar, M. A. Breuer, and C. A. Njinda. Extraction of a High-Level Structural Representation from Circuit Descriptions with Applications to DFT/BIST. In Proc. Design Automation Conf, pages 345-350, June 1994. [32] M. W. Roberts and P. K. Lala. An Algorithm for the Partitioning of Logic Circuits. IEE Proc., 131(4):113— 118, July 1984. 157 [33] R. Srinivasan, S. K. Gupta, and M. A. Breuer. An Efficient Partitioning Strat egy for Pseudo-Exhaustive Testing. Technical Report CENG 93-08, University of Southern California, 1993. [34] R. Srinivasan, S. K. Gupta, and M. A. Breuer. An Efficient Partitioning Strat egy for Pseudo-Exhaustive Testing. In Proc. Design Automation Conf., pages 242-248, June 1993. [35] R. Srinivasan, S. K. Gupta, and M. A. Breuer. Novel Test P atttern Generators for Pseudo-Exhaustive Testing. In Proc. In t’ l Test Conf., pages 1041-1050, October 1993. [36] R. Srinivasan, C. A. Njinda, and M. A. Breuer. A Partitioning Method for Achieving Maximal Test Concurrency in Pseudo-Exhaustive Testing. In Proc. of IEEE VLSI Test Symp., pages 34-39, April 1991. [37] D. T. Tang and C. L. Chen. Logic Test Pattern Generation using Linear Codes. IEEE Trans, on Computers, C-33(9):845-850, September 1984. [38] D. T. Tang and L. S. Woo. Exhaustive Test Pattern Generation with Constant Weight Vectors. IEEE Trans, on Computers, C-32(12):1145-1150, December 1983. [39] J. G. Udell. Reconfigurable Hardware for Pseudo-Exhaustive Test. In Proc. In t’ l Test Conf, pages 522-530, September 1988. [40] J. G. Udell. Efficient Segmentation for Pseudo-Exhaustive BIST. In Proc. Custom Integrated Circuits Conf, pages 13.6.1-13.6.5, May 1992. [41] J. G. Udell and E. J. McCluskey. Efficient Circuit Segmentation for Pseudo- Exhaustive Test. In Proc. In t’ l Conf. on Computer Aided Design, pages 148- 151, November 1987. [42] L. T. Wang and E. J. McCluskey. Condensed Linear Feedback Shift Regis ter (LFSR) Testing — A Pseudoexhaustive Test Technique. IEEE Trans, on Computers, C-35(4):367-370, April 1986. 158 [43] L. T. Wang and E. J. McCluskey. Circuits for Pseudoexhaustive Test Pat tern Generation. IEEE Trans, on Computer-Aided Design, 7(10):1068-1080, October 1988. 159
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Built-in self-test for interconnect faults via boundary scan
PDF
Consolidated logic and layout synthesis for interconnect -centric VLSI design
PDF
Induced hierarchical verification of asynchronous circuits using a partial order technique
PDF
Functional testing of constrained and unconstrained memory using march tests
PDF
Automatic code partitioning for distributed-memory multiprocessors (DMMs)
PDF
Array processing algorithms for multipath fading and co-channel interference in wireless systems
PDF
Test generation for capacitance and inductance induced noise on interconnects in VLSI logic
PDF
Efficient PIM (Processor-In-Memory) architectures for data -intensive applications
PDF
Alias analysis for Java with reference -set representation in high -performance computing
PDF
Testing for crosstalk- and bridge-induced delay faults
PDF
Error-rate testing to improve yield for error tolerant applications
PDF
Architectural support for network -based computing
PDF
Redundancy driven design of logic circuits for yield/area maximization in emerging technologies
PDF
Error-rate and significance based error-rate (SBER) estimation via built-in self-test in support of error-tolerance
PDF
Encoding techniques for energy -efficient and reliable communication in VLSI circuits
PDF
A framework for coarse grain parallel execution of functional programs
PDF
Automatic array partitioning and distributed-array compilation for efficient communication
PDF
Fault simulation and multiple scan chain design methodology for systems -on -chips (SOC)
PDF
A deep submicron drain-current and charge model for MOS transistors
PDF
Self-Organizing Neural Networks Based On Gaussian Mixture Model For Pdf Estimation And Pattern Classification
Asset Metadata
Creator
Srinivasan, Rajagopalan (author)
Core Title
Pseudo-Exhaustive Built-In Self-Test System For Logic Circuits
Degree
Doctor of Philosophy
Degree Program
Computer Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Breuer, Melvin A. (
committee chair
), Baxendale, Peter H. (
committee member
), Gupta, Sandeep (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c20-581473
Unique identifier
UC11226893
Identifier
9601066.pdf (filename),usctheses-c20-581473 (legacy record id)
Legacy Identifier
9601066.pdf
Dmrecord
581473
Document Type
Dissertation
Rights
Srinivasan, Rajagopalan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical