Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DESIGN OF HIERARCHICALLY TESTABLE AND MAINTAINABLE SYSTEMS by Jung-Cheun Lien A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Engineering) August 1991 Copyright 1991 Jung-Cheun Lien UMI Number: DP22825 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. Dissertation PLbi.shmg UMI DP22825 Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106- 1346 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, w ritten by JTOG^CHETO LIEN under the direction of h±s Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillm ent of re quirem ents for the degree of Ph.D. D O C TO R OF PH ILO SOPH Y Dean of Graduate Studies D a te .......... DISSERTATION COMMITTEE Chairperson D ed ication To My wife Jung-Yu and our parents A cknow ledgem ent s I am very grateful to Prof. Melvin Breuer for his encouragement, guidance and support during my dissertation work. I consider it my privilege to have worked with him. I would also like to thank Prof. Irving Reed and Prof. Charles Lanski for serving on my dissertation committee. During my years at USC, I benefited greatly from interacting with many colleagues and friends. In particular I would like to mention Kuen-Jong Lee, Dr. Rajesh Gupta, Rajagopalan Srinivasan, Amitava Majumdar, Sing-Ban Tien, Dr. Rajiv G upta and Dr. Charles Njinda. I would like to acknowledge the financial support provided by the Defense Advanced Research Projects Agency through Contract No. N00014-87-K-0861, mon itored by the Office of Naval Research, and No. JFBI90092, monitored by the Fed eral Bureau of Investigation. Finally, I would like to thank my wife Jung-Yu for her endless love and support. This dissertation would never have been completed without her. 111 C o n ten ts D ed ication ii A cknow ledgem ents iii A bstract xiii 1 Introduction 1 1.1 Background ................................... 1 1.2 Previous Work ..................................................................... 3 1.2.1 System A rc h ite c tu re ......................................................................... 3 1.2.2 Testability Bus ................................................................... 4 1.2.3 Testable Chip Control Model ................................ 5 1.2.4 Testable Chip D esign................... 6 1.2.5 Testable Module Design ........................................ 7 1.2.6 Interconnect Test Generation ....................... 7 1.2.7 Test Scheduling ................... 9 1.3 The Design Approach ........................................................................ 9 1.4 The Hierarchical Test M ethodology............................... 10 1.5 Test Controllers for a Circuit .................................................................... 11 1.6 Thesis O utline.................................................................................................. 15 > 2 C ontrollers for Testable Chips 16 2.1 The M odel........................................................................................................ 16 iv 2.1.1 Boundary Scan Architecture ..................... 16 2.2 Bus-Dependent BIT Controllers .............................................. . 22 2.2.1 BIT Controller for a LSSD K ernel................................. 25 2.2.2 BIT Controller for a BILBO K e rn e l............................................... 26 2.2.3 BIT Controller for a Complex K e rn e l................. 28 2.2.4 A Mapping Algorithm ........................... 32 2.3 Autonomous BIT C o n tro ller........................................................................ 37 2.3.1 Serial BIT C o n tro lle rs................ 37 2.3.1.1 A Hard-Wired Serial BIT Controller ................... 37 2.3.1.2 A Microprogrammed Serial BIT C ontroller.......................39 2.3.2 Parallel BIT C o n tro lle rs................................. 42 2.3.2.1 Interleaved FSM Controller ..............................................43 2.3.2.2 Tree of Counters D e sig n ................................................... 47 2.3.2.3 Counter Sharing D e s ig n ................................................... 52 2.3.2.4 Comparison of Three D e sig n s........................ 54 3 C ontroller for Testable M odules 59 3.1 Requirements for an M M C ........................... 60 3.2 MMC Architecture ............................................... 61 3.2.1 Test Channel Design ........................................ 61 3.2.2 Bus D riv er/R eceiv er......................................................................... 72 3.2.3 Functional Bus In te rfa c e .................................................................. 73 3.2.4 Testability R eg ister............................................................................ 73 3.2.5 Analog Test Interface.............................. 74 3.2.6 Ll-Slave . ....................................... 74 3.2.7 P ro c e s s o r............................................................................................. 74 3.2.8 M e m o ry ..................................................................... 79 3.2.9 Stand-Alone M M C ............................................................................. 80 3.3 MMC S e lf-T e st................................................................................................ 80 3.4 Discussion of MMC D esign............................................................................. 82 v 3.4 Discussion of MMC D esign........................................................................ 82 3.5 An MMC Prototype .................................................................................. 84 3.5.1 Test C h a n n e l.................................................................................... 84 3.5.2 Processor and M em ory.................................................................... 85 3.5.3 Processor and Test Channel In terface.......................................... 85 3.5.4 D iscussion........................................................................................... 88 3.6 Testing a Kernel Using MMC and CMC .............................. 89 4 Test Program Synthesis 92 4.1 L anguages.................. 94 4.1.1 CTL - The Chip Test L an g u ag e................................................... 94 4.1.1.1 Formal Definition of the C T L .......................................... 102 4.1.2 MTL - The Module Test L a n g u a g e .................................................103 4.1.2.1 Formal Definition of the MTL Syntax ......................... 108 4.2 S ynthesizers......................................................................................................110 4.2.1 C2C S y n th e siz e r............................................................................ 110 4.2.2 M2C S y n th e size r................................................................................. 110 4.3 An E x am p le................................................................................................. 119 4.3.1 An M TL-file...........................................................................................120 4.3.2 CTL-files ..................................... 121 4.3.3 The Interconnect In fo rm a tio n .......................................................... 123 4.3.4 Synthesized Test P rogram s........................................ 124 4.3.5 Activities between Processor and Test Channel ...........................126 4.3.6 Activities on the Test Bus ...................127 4.4 R e su lts......................................................................... 128 5 G lobal C ontroller M inim ization U sing Test Program Synthesis 130 5.1 Tradeoff Curve-Based M inim ization...........................................................132 5.2 Algorithm-Based Minimization ................................................................. 138 5.2.1 One Test C hannel.................................................................................139 vi 5.2.2 Multiple Test C hannels........................................................................ 142 5.2.3 Results ..................................................................................... 145 5.3 Discussions ........................................................................................................ 146 6 Interconnect Test G eneration 148 6.1 In tro d u ctio n ........................................................................................................148 6.2 P re lim in a rie s.....................................................................................................150 6.2.1 Fault M o d e l........................................................................................... 150 6.2.2 Notation and D efinitions.................................................................... 152 6.2.3 Non-Diagnosable Faults .................................................................... 153 6.2.4 Diagnostic Resolution ........................................................................154 6.2.5 Previous Results . ...........................................................................155 6.2.6 Deficiencies in Previous Approaches ..............................................157 6.3 One-Step D iagnosis...........................................................................................159 6.4 Two-Step D iagnosis..................................................................... 163 6.4.1 Adaptive Algorithm A1 .................................................................... 163 6.4.2 Adaptive Algorithm A2 .................................................................... 165 6.4.3 Comparison with Other Adaptive Algorithms ...........................167 6.5 Diagnosis Using Structural Inform ation.......................................................169 7 Interconnect Test Scheduling 173 7.1 In tro d u ctio n............................................................................... 173 7.2 Testing M o d el................................................................................................ 175 7.3 The P r o b le m .................................................................................................... 178 7.3.1 The Use of Multiple Scan C h a in s ....................................................178 7.3.2 Scheduling Problem in Testing Interconnects . . . . . . . . . . 181 7.4 Optimal Test Scheduling T h e o re m s............................................................. 183 7.5 An Algorithm for Generating Schedules.......................................................190 7.6 An Extension to Full S c a n ............................................................................. 198 V ll 8.1 On-Chip Test C o n tro lle r.............................. 200 8.2 Module Test C ontroller........................................... 201 8.3 Test Program Synthesis . .................................... 202 8.4 Controller Minimization ...............................................................................202 8.5 Interconnect Test Generation ..................................................... 203 8.6 Interconnect Test Scheduling............................................................................204 8.7 Future R esearch...................................................................................................204 8.7.1 On-chip Test Controller .................................................................. 205 8.7.2 Module Test Controller ........................................206 8.7.3 Test Program Synthesis ....................... 207 8.7.4 Controller M in im izatio n .................... 208 8.7.5 Interconnect T e s t .................................................................................. 208 L ist O f T ables 2.1 Test schedule for the complex kernel........................... 30 2.2 Microinstruction List for Controller............................................................ 40 3.1 Counter usage.................................................................................................. 68 3.2 Processor instruction set. . ...................................................................... 77 4.1 The numbers used in the shifting....................................................................115 4.2 Synthesis results for some modules.............................. 128 7.1 Typical results; (a) N s , (b) saving S S (in %) on test tim e...................... 197 ix L ist O f F igu res 1.1 The architecture of a HTM system............................................................... 12 1.2 Partitioning the test controller between the CMC and MMC................. 13 2.1 A typical boundary scan cell............................ 17 2.2 A module having a boundary scan path....................................................... 17 2.3 Test bus architecture of a module.................................................................. 18 2.4 The model of a chip with boundary scan architecture.............................. 19 2.5 The boundary scan bus state transition diagram............................ 20 2.6 Control model for an addressable register. . ........................................ 21 2.7 The control signals during test mode. ................... 22 2.8 A LSSD kernel; (a) control signals, (b) control graph............................... 24 2.9 A general model for the bus-dependent BIT controller. . ................. 25 2.10 A BILBO kernel: (a) control signals, (b) control graph................................27 2.11 A complex kernel; (a) control signals, (b) control graph.......................... 29 2.12 An autonomous BIT controller for a LSSD kernel................... 38 2.13 Testing many kernels in sequence.................................................................. 39 2.14 Microprogram controller............................................. 41 2.15 Interleaved FSM controller.............................................................................. 44 2.16 A controller for interleaved test execution................................................... 47 2.17 A Tree-Of-Counter Design................................................ 49 2.18 Three Trees-Of-Counters. ...................................................................... 51 2.19 A counter-Sharing design.................................................................................' 53 2.20 Hardware complexity for different designs........................................................55 x 2.21 Time complexity for different designs....................................................... 57 3.1 The architecture of an MMC.................... 62 3.2 The architecture of the test channel.......................................................... 63 3.3 The state transition diagram for a test channel....................................... 69 3.4 The state transition diagram (cont.)......................................................... 70 3.5 The Bus Driver/Receiver............................................................................. 73 3.6 A Testability Register; (a) block diagram (b) circuit of bit i................ 75 3.7 Analog Test Interface.................................................................................... 76 3.8 Control signals for MR and MRMW instructions.................................. 78 3.9 Testable design features for a test channel............................. 81 3.10 Physical configuration of the MMC prototype........................................ 85 3.11 The data bus adaptor................................. 87 3.12 Overview of the test control........................................................................ 90 4.1 Overview of the test hardware/software h ierarch y .............................. 93 4.2 Test control model used in CTL .............................................. 96 4.3 Test control model used in MTL .........................................................104 4.4 Scanning a data register in a ring..................................................................107 4.5 Generating test programs for a module........................................................I l l 4.6 The structure of the M2C................................................................................112 4.7 Model for calculating the number of shifting.............................................. 115 5.1 Possible partitions of test resources...............................................................133 5.2 Test time versus controller complexity......................................................... 135 5.3 Tradeoff curve: Test time versus controller c o m p le x ity ........................ 137 5.4 Test time estimated by algorithm T C l........................................................ 142 6.1 A soft stuck-at 1 case....................................................................................... 151 6.2 A short to an opened net.................................................................................151 6.3 An open that is non-diagnosable....................................................................153 xi 6.4 A short that is non-diagnosable....................................................................... 154 6.5 A short that cannot be identified by a diagonally independent sequence. 157 6.6 An open that cannot be identified by a diagonally independent se quence................................................... 157 6.7 A short that cannot be identified by an independent test set.................. 158 6.8 Achieving maximal diagnosis using a set-cover independent sequence. 162 6.9 A deficiency in the W-Test Algorithm............................................................168 6.10 A deficiency in the C-Test Algorithm.............................................................169 6.11 Example 6-2: (a) the NG, (b) the ANG, (c) the colored graph. . . . 172 7.1 Test interconnect via two boundary scan chains; (a) block diagram, (b) graph model..................................................................................................176 7.2 The test controller model......................... 177 7.3 Different classes of interconnect................................................. 179 7.4 Two schemes for testing interconnect; (a) distributed control, (b) centralized control.............................................. 180 7.5 Deriving test schedules for several examples.................................................182 7.6 Example: Deriving an optimal schedule........................................................ 191 7.7 Testing a circuit via two scan chains; (a) block diagram, (b) graph model.....................................................................................................................199 xii A b stra ct The cost associated with the test and maintenance of a complex system, which includes test generation, and detection, location and repair of faulty components, represents a significant portion of the overall life-cycle cost. This cost can be drasti cally reduced if testability and maintainability techniques are properly incorporated in the system design. Despite its importance, most systems are built with insufficient or no concern for testability and maintainability. This is mainly due to costs related :o design-for-test structures, in designing test controllers, and the development of :est programs. This thesis presents a tool called BOLD that assists in the design of a system jwith extremely high degree of testability and maintainability. A system designed using BOLD is testable and maintainable at every level of the design hierarchy, i.e., chips, modules, subsystems and the system, since test controllers are employed in the hierarchy. Test busses are used to allow communication between test controllers. Methods for designing various test controllers are also presented along with some example designs. To describe the test aspects of various hardware units, a set of high-level languages are provided. These languages can be used by designers with little or no knowledge of testability. Tools are provided to synthesize test programs from these descriptions. These test programs are then compiled and executed by jcontrollers so that hardware units can be tested. These test programs include test for the interconnect between chips. The test synthesis approach also provides the capability of predicting test time before the controllers are actually built. Given a bound on system test time and a requirement on system testability and m aintain ability, BOLD can quickly search the whole design space to provide the designer with the best solution satisfying the requirements of test time and hardware overhead. C h ap ter 1 In tro d u ctio n 1.1 B ack grou n d The increased complexity of modern digital systems has significantly increased the costs associated with the activities of test generation, detecting, locating and repair ing faulty components. These activities are referred to as the test and maintenance of a system. For a complex system, these costs constitute a significant portion of its overall life-cycle costs. Current approaches to system test and maintenance often employ a three level maintenance scheme. S3 'stem self-test is initiated by personnel with little training. If a faulty field replaceable unit is found, it is replaced with a working spare. The faulty unit is then sent to a shop, where a technician uses sophisticated equipment to isolate the fault to a shop replaceable unit. The faulty unit is then discarded or sent back to a depot, where a well trained technician can locate a faulty component clown to a minimum replaceable unit, such as a chip, and repair the unit. This form of testing often encounters several problems such as those listed next: (1) the unreliability of the tester; (2) the increased tim e and skill required of maintenance personnel; (3) the loss of s}'stem capability when the tester failed; (4) the cost and quality in developing test software; and (5) the so-called Cannot Duplicate (CND) and Retest Okay (RTOK) problems. The problem of Cannot Duplicate refers to the situation where one level of test and maintenance indicates a failure in a unit, while the next level cannot detect a fault in the unit This is in part due to interm ittent faults and/or differences in test procedures used at various 1 level of testing. The problem of Retest Okay refers to the situation where the fault jsolation capability is insufficient. Therefore, instead of sending the faulty unit to the next level of test, a good unit is sent. Thus no fault can be identified at the next level. It has been pointed out in [49] that the false alarm, which is the m ajor factor n lowering the system readiness and availability, is closely related to both Cannot Duplicate and Retest Okay. The major causes of these problems, which have been identified in [II], are 1) mode of operation dependency, referring to test procedures which can be exe cuted at one level of the system hierarchy but not at other levels; (2) environmental dependency, which means that a failure is caused by such conditions as tem perature or vibration; (3) false alarms due to design error or transient faults; (4) inadequate fault isolation; (5) incompatibility of tests and test tolerances, which is caused by the inconsistent testability and maintainability techniques used in different levels of testing; (6) system parameters which are out of specification; (7) faults in Built-In jlest (BIT) hardware; (8) test data unaccessible; and (9) other causes which we have no knowledge about. To reduce the cost of test and maintenance, one needs to design a system so that the above mentioned problems and their causes can be eliminated or at least reduced. Such a system would then have a high degree of testability and maintainability. In addition, the system should be able to perform self-test with a high degree of fault coverage. This leads to a two level maintenance scheme, where the shop level is eliminated, resulting a tremendous reduction in test and maintenance cost. In summary, the ideal system should be able to (1) improve fault isolation capability; (2) eliminate the mode of operation dependency; (3) unify test and test tolerance among different levels of testing; and (4) detect faults in BIT hardware. Various issues related to the design of such a system are addressed in this work. 2 1.2 P rev io u s W ork Previous work provides partial solutions for the problem of designing a system with a high degree of testability and maintainability. Work has been done in the areas of system architecture, testability busses, testable chip design, testable module design, interconnect test and test scheduling. However, a complete solution that deals with both the hardware and software aspects of the problem is not available. The BOLD system presented in this thesis can provide such a complete solution. Previous work ;hat addresses various aspects of the problem is briefly described below. 1.2.1 S y ste m A rch itectu re Several system architectures to support system level design for testability and main tainability have been proposed. These architectures share features such as hierar chical design and component built-in self-test (BIST) capability. Haedtke et al. [27] proposed a multilevel self-test architecture. Each level of integration of the system is assumed to have BIST capability and is controlled by a maintenance processor. A standardized test bus and a standardized BIST control protocol are required to enable simultaneous self-test of all chips. These same standardized interfaces and protocols are used by the maintenance processor in both the factory and field test. The lower level tests remains valid at each higher level of integration. Using this architecture, the system test and maintenance cost can be greatly reduced. The system self-test is carried out by the maintenance processor, which must be first tested by an external tester. It is not clear how the maintenance processor will be tested if no external tester is available. Thus the capability of system self-test is not assured. IBM ’s Common Signal Processor architecture [19] is designed with built- in test and maintenance capability. Each chip has an on-chip monitor to control the chip’s BIST circuitry. Each functional module is associated with an element supervisor unit and an element control bus. The former can access the on-chip monitor through an element maintenance bus. A subsystem manager controls all the 3 element supervisor units via a pi-bus. The Common Signal Processor architecture incorporates the test and maintenance hierarchy in accordance with the system functional hierarchy. TRW employs a hierarchical architecture for the system test and maintenance 48]. A system maintenance node controls several functional maintenance nodes jvia a subsystem maintenance bus. A functional maintenance node controls several module maintenance nodes via a module test bus. A module maintenance node controls several device maintenance nodes via a device test bus. The TEA system [62] also employs a hierarchical test methodology. Each Doard is made testable by inserting a Test Switch between every pair of Ambiguity Groups. An Ambiguity Group is the basic unit of circuitry to be tested. The test process of the board is controlled by a maintenance node, which is controlled by a subsystem test control unit, which is in turn controlled by a system test control unit. 1 .2 .2 T e sta b ility B u s i\ great amount of effort has recently been focused on the interoperatability among products from different vendors. Interoperatability can be assured by using a stan dard test bus. Some major initiatives for the standardization of test busses are listed below. T M /E T M in itia tiv e s [20, 60] A VHSIC subcommittee consisting of IBM, Honey well and TRW proposed the TM /ETM test busses. The Element Test and Main tenance (ETM) bus [20] was to be the standard testability bus for VHSIC chips. It consists of 6 signal lines, namely CLK, DI, D O , Mode, C o n tro l, and In te rr u p t. Recently the ETM bus protocol has been modified to be compatible with the IEEE Std. 1149.1, which combines the Mode and C ontrol into a single line (TMS) to reduce the overhead of pin count. The Test and Maintenance (TM) bus [60] was proposed to be used as the backplane test bus. It consists of four lines, namely Clock, C o n tro l, M D and SD. The du s configuration uses a multi-drop architecture, so that the addition/removal of 4 other modules to/from a backplane will not affect the testability operation of other modules. IEE E Std . 1149.1 [33] The Joint Test Action Group (JTAG) proposed a standard test architecture, which became the IEEE Std. 1149.1. The bus consist of four signal lines, namely TCK , TM S, TDI and T D O . The T C K is the test clock line; the TDI is the serial data input line; the T D O is the serial data output line; and the T M S is the control line which controls the state of a Test Access Port (TAP) that resides on each chip. One of the main objectives of the IEEE Std. 1149.1 was to minimize Doth pin count and area overhead associated with the test bus. IEEE P1149.X initiatives [61] The IEEE P I 149 Testability Bus Standardization pom m ittee was formed in an effort to standardize one or more testability busses [33, 61]. The P1149.X (x=2,3,4) proposal includes four subsets that could be used ) individually or in any combination. These subsets are the P1149.2 Extended Serial Digital Subset, the P1149.3 Real Time Digital Subset and the P1149.4 Real Time knalog Subset. These initiatives, once approved, will become the IEEE standard 1149. 1 .2 .3 T esta b le C h ip C on trol M od el raig et al. [18] proposed a hierarchical test control scheme. In this scheme, a piece of autom atic test equipment (ATE) controls a level 2 supervisor, which controls several level 1 supervisors which in turn controls the Test Control Logic in self- testable units, which are the basic unit for test scheduling and test application. In addition to the Test Control Logic, each self-testable unit contains several test resources and a circuit block under test. A problem occurs when a test resource, such as a signature analyzer (SA), is shared among several units. Each Test Control Logic in these units must provide a set of control signals to the test resource. Since the test resource can only be controlled by a single set of control signals, control signals provided by different Test Control Logic blocks must be combined (either wired ORed or ANDed). This not only increases area overhead but also slows down ;he functional operation. 5 Beausang and Albicki [7] proposed a model for self-testable chips. Their model describes test resources, the test distribution network, the test controller and I the test procedure in mathematical form. Necessary and sufficient conditions for a test controller to implement a test procedure are also derived. The problem of interfacing this model to a standard test bus is not treated. Thus, it is not clear row to incorporate their work with an external test controller, such as an MMC. <1.2.4 T esta b le C hip D esig n Avra [5] proposed a design for an ETM-BUS compatible on-chip test and mainte nance controller (TMC). The controller, under the control of an ETM-BUS, can direct the chip under test to perform five different operations, namely functional, debug, reset, serial scan and built-in self-test. The controller consists of control logic, command decode logic, parity logic and three registers, namely transfer regis ter, Command Register and Status Register. The Transfer register and the Status register are converted into a test pattern generator (TPG) and a parallel signature Lnalyzer (PSA) during built-in self-test mode. The overhead of a TMC is 6 I/O pins and about 500 gates. The problem of synchronizing the system functional clocks and the the test bus clock is not considered. Whetsel [33, 65] proposed a design for the Test Access Port used in the boundary scan architecture. In this design, edge-triggered flip-flops are used as the Dasic storage elements. Clock inputs of flip-flops are allowed to be gated. The total gate count for this design is about 80 and the pin count is 4. Compared to the design of the controller proposed by Avra, the overhead of this design is small. This design is suitable for most chips. Whetsel also did not address the problem of synchronizing functional and test clocks. LeBlanc [41] reported on a built-in self-test technique, called LOCST, for chips designed at IBM using level sensitive scan design (LSSD). LOCST utilizes on- chip pseudorandom pattern generation, on-chip signature analysis, boundary scan land an on-chip monitor as the test controller. The monitor includes a standard maintenance interface with seven dedicated signal lines. The major functions of an monitor are scan string control, error monitoring and reporting, chip configuration 6 control, clock event control, run/stop single cycle and stop on error. The boundary scan chain can be used to test external logic, which cannot be tested by the monitor. The advantages of this technique are low area overhead (< 2%), design independent implementation and effective static testing. The key drawback of using a monitor is ;he high I/O pin count overhead. <1.2.5 T esta b le M o d u le D esig n Budde [15] presented a board test controller called Testprocessor. Testprocessor can control the test process of a chip through a dedicated test bus. Since a Testprocessor is designed for use on either a printed circuit board or a VLSI chip, its functionality is limited by area constraints. The only data processing unit in a Testprocessor is a fault-secure comparator. Due to its limited data processing capability, diagnostic urograms cannot run on a Testprocessor. The TEA [62] system employs a Maintenance Node to make a board testable, lest Switches are added to a board to increase its testability and controllability. Ambiguity groups are first identified. A Test Switch is then inserted between every pair of Ambiguity Groups. The Maintenance Node uses 10 signal lines to control all the maintenance activities of the board. Babiak et al. [48] reported on a module BIST scheme using Module Main- ;enance Nodes, one of which is embedded in every module and is controlled by a Vlaintenance And Diagnostic System that contains a processor to run both test and diagnosis programs. 1.2.6 In terco n n ect T est G en eration The IEEE Std. 1149.1 requires that every chip be built with the boundary scan architecture, where each I/O pin is associated with a scan cell. By shifting data along a chain consisting of these scan cells, the interconnect between chips can be easily tested. This helps subdivide the test problem and leads to increased fault isolation capability. 7 Kautz [37] derived a minimal test set for detecting opens and shorts in a wiring network. The number of test required is p — 1 + \log2q \, where p is the number of terminals in the largest interconnect net in the network, and q is the total number of nets. This result has become the foundation of later work on interconnect ;esting. Wagner [64] presented a method for testing interconnections using boundary scan registers. Both stuck-at faults and shorts are considered. Stuck-at faults are tested for free while testing for shorts. By complementing the test vector set, faults can be located. For n 2-point nets, 2 x log2\{n -f 2)] tests are required. The order of the I/O pins in the boundary scan chain must be given. If tristate pins and bi-directional pins are included, additional tests are required. Hassan et al. [29, 30] extended the minimal test set for interconnect test developed by Wagner [64] to a generalized test set, where information on the order of the I/O pins is not required. He also presented several BIST schemes for intercon nect test using the boundary scan architecture. Walking ones and zeroes sequences are proposed as efficient test patterns. Several in-place diagnosis schemes are also presented, including a modifier sequence for area efficient built-in diagnosis. Jarwala and Yau [34] have developed a comprehensive framework for dealing jwith the test and diagnosis of interconnect. In addition, a diagonally independent property is identified. They also proposed the C-Test algorithm which can generate a minimal test set for identifying all shorts and opens in a network. Their results are valid only if a net is not involved in both opens and shorts at the same time. Cheng et al. [16] researched the self-diagnosis property. Constant weight codes are proposed to achieve self-diagnosis. In addition, several optimal adaptive algorithms are proposed for deriving test sets that achieve the self-diagnosis property. All these results are based on the assumption that a net cannot be involved in both open and short faults. When this assumption is removed, these results are invalid. 8 1.2.T T est S ch ed u lin g Abadir and Breuer [3] solved the problem of optimizing the execution schedule of a test plan for a single test block. A resource conflict graph is used to indicate the sharing of a resource at different steps of a test plan. They showed th at the lower bound on the time delay (D) between the initiation of two tests is equal to the chromatic number of the conflict graph. No-operation steps (no-ops) are inserted into a test plan to get an optimal test schedule. The problem of scheduling m ultiple test blocks was not addressed. Craig, Kime and Saluja [18] addressed the problem of scheduling m ultiple test blocks. Unlike Abadir and Breuer’s work, they assumed that all test blocks have heir D values equal to one. Assuming all test blocks have the same test length, they constructed an algorithm to minimize the number of concurrent test sets, which are defined as a set of tests that can be executed in the same test session. The same algorithm is then extended to solve the scheduling problem for test blocks with unequal test lengths. The problem of test blocks with D value greater than 1 was not dealt with. Sayah and Kime [58] dealt with the problem of scheduling multiple test blocks jwith a complex test plan. Unlike previous work, which considers only a single aspect of test parallelism, a broader consideration including both time and space parallelism is taken. Based on a resource allocation graph and a so-called delta graph, they found a good heuristic algorithm for the scheduling of tests. The work discussed above does not take into consideration the control struc ture that implements the scheduling process. Also, the problem of scheduling test at the module and subsystem level has not yet been addressed. 1.3 T h e D esig n A pproach A systematic design technique, called the hierarchically testable and maintainable (HTM) design methodology, is addressed in this work. Adopting the HTM m ethod ology at all levels of the physical hierarchy of a design, i.e. chip, module, subsystem 9 and system, will increase system availability and significantly lower hardware life cycle costs. A system designed with such a methodology is called an HTM system. A design-for-test tool called BOLD is presented in this work. BOLD is a tool T at supports both the hardware and software design of an HTM system. In the lardware support, a set of test controllers to execute the test process of different testable units are provided. In the software counterpart, a set of high level languages ;o describe the test aspects of these units are provided. Tools are also provided such T at these descriptions can be translated into executable code for the test controller. BOLD also provide the necessary tool for automatically testing interconnects among different units. In designing an HTM system, there exists tradeoffs between the test time and the hardware complexity of test controllers. From a designer’s point of view, the goals are to reduce both the test time and hardware overhead. These are two conflicting requirements since in general more hardware has to be added to the test controllers to reduce test time. Using the capability of autom atic synthesis of test programs provided by BOLD, the overall test time of an HTM system can be quickly predicted. Hence the whole design space is quickly explored in choosing a feasible solution. 1.4 T h e H ierarchical T est M eth o d o lo g y A hierarchy of test controllers is employed in a system designed with the hierarchical ;est methodology. These controllers are distributed among each level of the physical iiierarchy. In such a system, which is referred to as an HTM system, each VLSI chip las an on-chip test and maintenance controller (CMC); each module (or board) has ja module test and maintenance controller (MMC); each subsystem has a subsystem test and maintenance controller (SuMP); and each system has a system test and maintenance controller (SMP). These controllers participate in all system test and maintenance activities, and communicate via test busses. Figure 1.1 shows part of the test hierarchy for four levels of test hierarchy. 'Different busses may be used for communication between different levels. The SMP 10 communicates with SuMPs through a Level 2 bus (L2-bus); a SuMP communicates jwith MMCs through a Level 1 bus (Ll-bus); and an MMC communicates with CMCs through a Level 0 bus (LO-bus). The IEEE 1149.1 [33] boundary scan bus is used jas the LO-bus throughout this work. P ro s a n d cons Advantages of the hierarchical test methodology include the follow ing: 1. Lower level tests remain valid at higher levels of the design hierarchy. 2. Interoperatability at each level of subassembly due to standardized interfaces. 3. Increased fault isolation capability provided by boundary scan. 4. Reduced testability and maintainability (T&M) design time. 5. Reduced maintenance time due to consistent T&M techniques. 6. Increased system availability due to reduced maintenance time. 7. Increased reliability due to increased test effectiveness. 8. Reduced overall test and maintenance cost. The disadvantages of such a hierarchical test methodology are the controller lardware overhead and also that components with standardized interface are re quired. The following example shows how a circuit can be tested when the HTM methodology is used. 1.5 T est C ontrollers for a C ircuit There are numerous testable design methodologies (TDMs) that can be used to make a circuit testable. In selecting a TDM, criteria such as total test time, area overhead, and circuit performance degradation must be considered. Tradeoffs often need to be Inade so that design constraints and goals are satisfied. It is possible to autom ate 11 LO-bus SMP L2-master L2-bus LO-bus LO-bus L2-slave SuMP L1 -master L1-bus M M C L1 -slave test channel 1 test channel 2 iLO-maslerl I LO-master I LO-slave LO-slave LO-bus LO-bus an application chip / ' module 1 module 2 subsystem 1 Figure 1.1: The architecture of a HTM system. such a selecting process by employing an expert system [2, 69]. Once a TDM for a circuit is selected, the associated BIT structure and test plan can be derived. To execute such a test plan a test controller is required. The test controller configures the circuit for testing and controls the execution of the test. In addition, it may also generate test data and analyze test results. Not all aspects of the controller need be on-chip. Some of the possible tradeoffs are considered in this example. p a r t it io n 2 p a r t it io n 1 p a r t it io n 3 p a r t it io n 4 seeds c o r r e c t s ig n a tu r e s Go/ NoGo comparator TPG T C SCI A . FSM i_ _ j_____i i - + ■ -------------- . 1 ___ i___ . SC2 SA SCS_J-T 71 R1 r r b K ernels n R2 a p p l. c ir c u it P o s s ib le p a r t it io n boundaries between C M C and M M C . Figure 1.2: Partitioning the test controller between the CMC and MMC. 'artitioning th e test controller: Suppose the BIT structure of an application circuit consists of one or more scan loops (see Figure 1.2). The test plan for a loop is as follows: (1) shift in a test vector; (2) latch the response data into the scan register; and (3) shift the test results out of the loop while shifting in a new test rector. This process is repeated t times, where t is the number of test vectors. Assume test vectors are to be generated by a serial test pattern generator 'TPG ) and the results compressed by a serial signature analyzer (SA). A complete ;est controller must have the following hardware facilities (or test resources) to carry but this test process: a TPG to provide the test vectors, a SA to compress the test results, a counter TC to keep track of number of the test vectors, one counter SCI and a register SC2 to keep track of the number of shifts for each vector, seeds vectors 13 for various registers such as the TPG and SA, a stack containing correct signatures, a signature comparator, a register SCS for scan chain selection, and a finite state machine to control the test process. Several different configurations for these test resources are possible. Figure 1.2 shows four possible partitions of these resources between on-chip and off-chip controllers, i.e., CMC and MMC. Once partitioned along some boundary, an in terface or bus is required to connect the two partitions. For example, a boundary scan bus is used for an MMC to communicate with a CMC. The CMC must have a slave interface and the MMC must have a m aster interface. Partition 1 puts all resources into the CMC. Such a CMC is capable of executing a test process completely on its own, once initiated by the MMC. After t test vectors have been applied to the kernel, the signature is compared with the correct one. The CMC then reports only the Go/NoGo status to the MMC through the boundary scan bus. Partition 2 incorporates the seeds, correct signatures and the comparator into the MMC while leaving the rest of the resources in the CMC. The MMC first loads the seeds into the TPG, SA, SCS, TC, SCI and SC2 registers in the CMC via the boundary scan bus. The CMC then generates and applies test vectors to the test kernel and the test results are compressed in the SA. After t test vectors have been applied, the CMC requests that the MMC read the signature in the SA via the boundary scan bus. The MMC then compares the signature with the correct one and determines the health status of the kernel by generating a Go/NoGo indication. Partition 3 keeps the TPG and the SA in the CMC while putting the rest of the test resources in the MMC. The MMC must provide control signals for the BIT structures, the TPG and the SA. All control signals must be derived from the test bus directly during the test process. The MMC keeps track of the number of shifts for each test vector and the total number of the test vectors that have been applied to the kernel. After t vectors have been applied, the MMC then reads the signature out of the CMC. A comparison is made against the correct signature to determine the health status of the kernel. 14 Partition 4 puts all the before mentioned test resources into the MMC. The CMC is simply a boundary scan bus slave interface. The complexity for this MMC is maximal and corresponds to the concept of a test channel, which will be described later. The MMC must provide test vectors and collect test results through the test bus. Control signals for the on-chip BIT structures are also provided by the MMC through the boundary scan bus. An MMC can control several CMCs through a bus. The hardware partitions may be any of the four mentioned above. In fact they can be different for each chip. To be able to control any CMC, the MMC must not only have the resources dictated by partition 4, but even additional capabilities in order to control BIT configuration other than just scan chains using random data. Test resources in the MMC can be shared by the CMCs, while resources in a CMC cannot be shared by other CMCs. Obviously, the more test resources in the CMC the higher degree of test execution parallelism that can be achieved, thus leading to a reduction in total test time. 1.6 T h esis O u tlin e This thesis is organized as follows. The designs of test controllers are presented first. On-chip test controllers are described in chapter 2, followed in chapter 3 by a test controller for testable modules. The generation of test programs is described in chapter 4, where test description languages are presented along with various synthesis tools. The results of test program synthesis for several examples are also shown. Chapter 5 deals with how the synthesis technique facilitates the minimiza tion of test controller complexity. The issue of testing interconnect among different hardware units is described in chapters 6 and 7. In chapter 6, a new fault model is presented along with theorems and algorithms for deriving test sets to identify all faults. In chapter 7, the problem of actually applying a test set to test interconnect is investigated. Theorems and algorithms are provided so that a schedule can be constructed to achieve minimal test time. Conclusions and future work are given in chapter 8. 15 I ----------------------------------------------------------------------- < i I ! i * C h a p ter 2 C on trollers for T estab le C hips 2.1 T h e M o d el In this chapter the design of one or more on-chip test controllers (CMC) for testable I chips are presented. A testable chip is assumed to have some DFT or BIST features | which can be controlled through a boundary scan bus. It consists of a CMC and an application circuit. The CMC in turn contains a bus interface, called LO-slave, and a Built-In Test (BIT) controller. The IEEE Std. 1149.1 boundary scan bus is used as the test bus in this work. A detailed description of this standard can be found m [33]. For convenience, a brief introduction of the boundary scan architecture is Igiven below. 2.1 .1 B o u n d a ry S can A rc h itectu re ! The boundary scan technique requires the inclusion of a scan cell for each I I/O pin of a chip. A typical boundary scan cell is shown in Figure 2.1. A boundary | scan register is formed by cascading all the boundary scan cells. During norm al' joperation, the scan cells are transparent to the operation of the chip except th at a j multiplexer delay is added to each I/O pin. During test operation, the logic values ! Jof these I/O pins can be captured into the first flip-flop (Q l) and then shifted out for observation; meanwhile, new values can be shifted in and transferred to the output 16 Scan out Mode Signal in Signal out Shift / Load Clock B Scan in Clock A 1D Q1 >C1 1D Q2 >C1 Figure 2.1: A typical boundary scan cell. (of these cells. A boundary scan module contains chips th at have this boundary !scan architecture. A scan path can be formed by cascading all the boundary scan Registers (see Figure 2.2). This scan path can be used in two ways: (1) to allow the interconnects between the various chips to be tested, and (2) to allow the chip on the module to be tested. TDI TDO boundary scan cell interconnect Figure 2.2: A module having a boundary scan path. The boundary scan bus consists of at least four signal lines, namely T D I, TDO, TMS, and TCK. A fifth signal line TEST*, which is not shown here, is optional. The TCK line provides the clock for the test logic in the LO-slave. The logic value of the TMS line is decoded by an on-chip test controller to control test operations. The 17 Chips with boundary scan architecture TDI TDI TDO->" TDI TDO TDI TDO test TMS TCK TMS TCK TMS TCK TMS controller TCK TDO Figure 2.3: Test bus architecture of a module. TDI line provides serial instruction and data to be received by the test logic of the chip. The TD O line is the serial output for test instruction and data from the test logic of the chip. One test bus architecture is shown in Figure 2.3. The boundary scan architecture of a chip is shown in Figure 2.4. The shaded area labeled as application circuit is the circuit designed with a predefined BIT methodology. During the test mode, many scan registers are formed in the appli cation circuit. These scan registers can be used to control the test process of the circuit. The unshaded area consists of the Test Access Port (TAP), which can also be referred to as the LO-bus slave, and is required in a chip with the boundary scan architecture. The TAP consists of a TAP controller, an instruction register (IR), a boundary scan register, a one-bit bypass register, an optional device identification register (ID), and multiplexers. The hatched area labeled as the B IT controller contains additional (optional) test facilities for controlling the test process of the application circuit. The CMC consists of the TAP and the BIT controller. The state transition diagram of the TAP controller is shown in Figure 2.5. The states are represented by the values of the flip-flops used in the TAP controller. The state transition is controlled by the logic value of the T M S line. Each state has two possible next states, designated by the two outgoing directed edges. The state transition follows the edge with label 1 if the current value of the TM S line is 1, otherwise the edge with label 0 is followed. The R eset state is entered whenever the TAP controller is reset, which can occur when the system power-on-reset is 18 TDI TMS TCK (INT*) <----- TDO TAP controller Interrupt Circuit Boundary Scan Register Device ID Register Data Register 1 Data Register 2 sSSEsSSS Application Circuit s Bypass Reg. BIT controller w . Instruction Register Output Buffer Status MUX Figure 2.4: The model of a chip with boundary scan architecture. activated or the T M S line is held high for more than 5 consecutive T C K clock cycles. The R u n -T est-Id le state is entered when executing the self-test activities, or when the chip is in test mode with no ongoing test activity. Two major branches are used to transm it instructions and data. W hen trans m itting instructions, the number of activations of the state S h iftIR equals the num ber of instruction bits sent. A new instruction is loaded into the IR register when the Update IR state is activated. The contents of the IR determines the operation of the on-chip test controller. One major function of the data in the IR is to select a data register for scanning. When transm itting data, both the CaptureDR and UpdateDR Jstates are activated exactly once for each transmission. The number of activations of |the S h if tDR state equals the number of data bits sent to the selected DR. By prop erly driving a TAP controller, a module level test controller can send/receive both i . instructions and data to/form a chip. This information controls the test execution of the chip. 19 a 1. .T e s t logic w reset SelectDR Select R m m Capture IR Shift R W K w f f i w y v w J S J i ■ : . - - r .; . :: : X v . Exit(1)DR Exit(1)IR / — — r \ I PauseIR J 1 PauseDR Exit(2)DR Exit(2)IR ( Updatel R h rnmmwm Figure 2.5: The boundary scan bus state transition diagram. 20 During the test mode, all test control signals are controlled by the BIT controller, which in turn is controlled by the contents of the IR and the current state of the TAP controller. Signals generated from the TAP controller include RunTest, Capture, S h ift and Update, are active during the state R u n -T est-Id le, CaptureDR, S h ift DR, and UpdateDR, respectively. Only one of these four signals can be active at a time. Also, the sequence of activation of these signals must be consistent with the state transition diagram described previously. W hen a register is selected for scan, the control signals must be designed in such a way that the logic values of the inputs to this register are captured when the TAP state is in CaptureDR. The selected register is shifted whenever the TAP state is in S h iftDR. Throughout this work it is assumed that the output of an addressable register is updated only when the TAP state is S h if tDR, i.e., an implicit HOLD mode is assumed for all addressable registers. Figure 2.6 shows a general model for an addressable register. ^ data in from TD^ first stage to TDO ----- ► second stage C ap tu re!^ ShiftDR UpdateDg^ data out Figure 2.6: Control model for an addressable register. The control signals of a testable chip during the test mode are shown in Figure 2.7. The signals C l, C2, . . . , Cn control the test execution of the applica tion circuit, which has been built using some BIT methodologies. The signals IR1, IR2, . . . , IRm are the output of the IR. Registers in the application circuit must hold their data when the TAP controller is in certain states, such as Exit(l)D R , PauseDR, SelectDR. This can be done using an explicit hold control signal or by Idisabling the clock. 21 IR IR1 IR2 IRm TAP c o n t r o lle r RunTest Capture S h ift Update — 3 » BIT c o n tr o lle r Cl C2 C3 Cn BIT s tr u c tu r e (a p p l. c i r c u i t ) Figure 2.7: The control signals during test mode. Based on the dependence of the test bus, BIT controllers can be grouped into two categories; bus-dependent and autonomous BIT controllers. During the testing of the application circuit, a bus-dependent BIT controller uses the lines C apture, S h i f t , Update and RunTest; while an autonomous BIT controller uses only the RunTest line. In the other words the operation of a bus-dependent BIT controller depends on the state transitions of the test bus during the entire test execution process. The operation of an autonomous BIT controller is independent of the bus state transitions once it has been properly initiated. In general, an autonomous controller has a higher hardware complexity than a bus-dependent controller. 2.2 B u s-D ep en d en t B IT C on trollers In this section, the design of BIT controllers for various test structures are presented. These controllers use the state of the TAP controller as a source of the control signals. When testing a kernel, the state transitions of the BIT controller must follows the control graph of the kernel. A control graph that can be used to test a LSSD kernel [21] is shown in Figure 2.8(b). In a control graph, a node Si represents a control state of a BIT controller, which is associated with a signal FSi (not shown in the graph) that is active in 22 [this state; an arc represents a“st at e~transition; and~tfieTlabePof an arc“Tepfesent¥j |the number of iterations associated with that transition. A rectangle decision box j determines the state transitions. The arc with label 1 in the box is taken when this box is first entered. The arc with the next highest number will be taken only when a sufficient number of iterations has been taken in the currently selected arc. For example, in Figure 2.8(b), once state S2 is entered, the second arc (to state SI) is taken after the first arc (self-loop) has been taken s times. Similarly, the third arc (to exit) is taken after the second arc has been taken t times. For those states that have only one possible next state, no decision box is needed. Thus the decision box corresponds to a nested loop of the form do once do f o r j = l , . . . ,t do for i = l , . . . , s . The control signals C l, . . . , Cnare decoded from the signals F S i, i = l , . . . ,k. IThus the implementation of a bus-dependent BIT controller deals with the activation Jof the signals FSI, . . . ,FSk, in the sequence as described by the control graph. A !general model of the bus-dependent BIT controller is shown in Figure 2.9. The BIT controller consists of two combinational decoders and an optional finite state machine FSM. The decoder dec2, which generates control signals C l, . . . ,Cn from the signals FSI, . . . ,FSk, consists of a set of OR gates. The decoder d e c l generates the signals FSI, . . . ,FSkfrom three sources, namely the contents of the IR, the TAP controller state signals and the output PHI, . . . ,PHp of the finite state machine FSM. ! Note th at the finite state machine is not needed if the controller can be implemented j using a combinational circuit. The finite state machine is implemented as a programmable counter, which, when enabled, can count from 1 to c (c < p) repeatedly. The signal PHi is active only when the counter value is i, where 1 < i < c. Thus the signals on PHI,. . . ,PHc form a one-hot code, and these values are generated repeatedly as long as the FSM is (enabled. When the finite state machine is disabled, the value of the counter is 0 and the outputs of the machine are all disabled. The value c is determined by a register 23 Sout (a) S2 (b ) Figure 2.8: A LSSD kernel; (a) control signals, (b) control graph. 24 IR1 1 1 IRm RunTest Capture _ S h ift Upda te d ecl PHI T C K TDI FSI FS2 FSk PHp FSM r • L . . dec2 Enable T D O -3 » - C1, C 2_ Cn Figure 2.9: A general model for the bus-dependent BIT controller. in the machine which can be modified by shifting a new value into it. Note th at this register can be part of the instruction register or an addressable data register. <2.2.1 B IT C on tro ller for a L SSD K ern el If a kernel is made testable using the LSSD technique [21], the BIT structure consists of two registers (Rl, R2) and a combinational circuit C (see Figure 2.8(a)). (The registers R l and R2 can be combined into a single register). These two registers form a scan register that is selected if the contents of the IR is Oil (denoted as [IR = 01l]). During the test mode, the control signals of the LSSD kernel are LD1, L.D 2, SHI, SH2. The signals LD1, LD2, which control the parallel loading of new jdata into registers R l, R2, respectively, are activated while in state SI. The signals jSHl, SH2, which control the shifting of data along registers R l and R2, are activated ■ in state S2. W hen none of these signals are active, both registers R l and R2 retain jtheir values, i.e., remain in the HOLD mode. To test the LSSD kernel properly, jthe control signals should be activated according to the control graph shown in 'Figure 2.8(b). 25 To test the LSSD kernel properly, it is necessary to derive the control signals as follows. Cl = LD1 = LD2 = FSI; C2 = SHI = SH2 = FS2. From the model described in Figure 2.7, it is clear that a circuit that implements the following functions can be used as a BIT controller. FSI = Capture * [IR =011]; FS2 « S h ift * [IR =011]. jWhen such a BIT controller is used, an external controller can test the LSSD kernel by first loading the IR with the proper value (Oil in this case) and driving the TAP to the states CaptureDR and S h if tDR, which in turn activate FSI and FS2 according to the control graph. Therefore, the control signals LD1, LD2, SHI and SH2 are properly activated and the LSSD kernel is tested. Note that the external controller m ust have at least two counters to keep track of the values of t and s required in the control graph. The mapping between the TAP controller states and the test control of the application circuit may not be obvious in some cases. An algorithm that facilitates .this mapping properly will be given later in this chapter. 2 .2 .2 B IT C on troller for a B IL B O K ern el In the case of a BILBO kernel [38] (see Figure 2.10(a)), the BIT structure consists of two BILBO registers (Rl, R 2). During the test mode, the control signals Lf R l are TPG and SHI, while those of register R2 are PSA and SH2. These two registers form a scan register that can be selected when the value in the instruction register is 101, that is [IR=101] . To test a BILBO kernel properly, the control signals m ust be activated as illustrated in the control graph in Figure 2.10(b). The signals ! sH1 and SH2, which control the shift operation in registers R l and R2, respectively, Lre both active in states Si and S3. When the signal TPG is active, register R l acts 26 Sout (a) (b ) Figure 2.10: A BILBO kernel: (a) control signals, (b) control graph. 27 as a test pattern generator. When the signal PSA is active, register R2 functions as L parallel signature analyzer. Both TPG and PSA are active in state S2. Note th at to correctly execute a test according to the control graph, different instructions must be used in states SI and S2. One reason for this is that in going from the Reset state to the ShiftIR state in the TAP control, one enters the RunTest state for at least one clock cycle. According to the control graph, it is clear th at the decoder iiec2 should be implemented as follows. Cl = TPG = PSA = FS2; C2 - SHI = SH2 = FSI + FS3. Using the control model shown in Figure 2.7, it is clear that the decoder d e c l should 3e implemented as follows. FS2 = RunTest * [IR =011]; FSI = FS3 = S h ift * [IR =101]. Note that in the above two examples, the BIT controller is simply a combi national circuit consists of two decoders. This is because the mapping mechanism is simple. If the mapping scheme is very complex, the algorithm presented in sec tion 2.2.4 can be used to construct a BIT controller. 2 .2 .3 B IT C o n tro ller for a C o m p lex K ern el Consider the complex kernel in Figure 2.11(a), where the BIT structure con sists of four registers (R l, R2, R3, R4), a combinational circuit C and a bus with its associated controls. During the test mode, the control signals are: LD1, SHI, jTPGl, LD2, LD3, LD4, SH4, PSA, G 1 and G2. A test vector is generated when the TPG signal is active. The test vector is transferred to the register R2 and then applied to C by activating both signals G 1 and LD2. The results are then transferred jto register R3 by activating the signal LD3. Finally, the results can be compressed in the register R4 if both the signals G2 and PSA are activated. This process is repeated t times, where t is the number of vectors required to test C. To reduce the test 28 Sin R1 R4 LD' — ► ---------► ■ -----► ■ G r^C LD^ SH4 ^ PS^ i \ bus Sout > 2 -K J (a) Figure 2.11: A complex kernel; (a) control signals, (b) control graph. 29 iime, a new test vector can be applied before the completion of the previous vector, lowever, resource conflicts must be avoided. For example, to avoid any conflict on he bus, the signals G1 and G2 cannot be activated in the same clock phase. A minimal tim e test schedule for this kernel is shown in Table 2.1, where nine steps are required to apply four vectors (vl, v2, v3, v4). Each table entry represents a set of control signals that should be active at each time step for applying a test vector. tim e v l v2 v3 v4 1 TPG 2 LD2, Gl TPG 3 LD3 LD2, Gl 4 PSA, G2 LD3 5 PSA, G2 TPG 6 LD2, Gl TPG 7 LD3 LD2, Gl 8 PSA, G 2 LD3 9 PSA, G2 Table 2.1: Test schedule for the complex kernel. From the table, one can conclude that the activation of control signals can be classified into four phases. In the first phase, the activated control signals are TPG, G2 and PSA. In the second phase, the activated signals are TPG, LD2 and Gl. In the third phase, the activated signals are LD2, Gl and LD3. In the fourth phase, the activated signals are LD3, G2 and PSA. Two vectors are applied for each iteration of these four phases. The control graph that can be used to execute the test schedule is shown in Figure 2.11(b). The BIT controller that implements this control graph can be either a sequential or a combinational circuit. These two approaches are jdescribed next. Sequential approach In this approach, the finite state machine FSM is used to derive the required con trol signals. The control signals FSI and FS6 are active when the instruction is 'IR=[0010] and the bus signal S h ift is active. The control signals FS2, FS3, FS4, 30 FS5 are derived from the signals PHI, PH2, PH3, PH4, respectively. T he finite state m achine is programmed as a counter that repeatedly counts from 1 to 4, thus activating PHI, PH2, PH3 and PH4 in sequence when the instruction is IR=[0111] and the bus state is RunTest. The counting continues until the signal PH4 has been Jactivated t / 2 tim es. This means that the test bus must stay in the RunTest state for exactly 2 t clock cycles. The decoder d e c l is im plem ented as follows: FSI = FS6 = S h ift * [IR =0010]; FS2 - PHI; FS3 = PH2; FS4 = PH3; FS5 = PH4; The decoder dec2 is implemented as follows: SHI = SH4 = FSI + FS6; TPG = FS2 + FS3; PSA = G2 = FS2 + FS5; LD2 = Gl = FS3 + FS4; LD3 = FS4 + FS5; An external test controller can thus execute the test by driving the bus states according to the control graph. Note that counters are required for the external controller to keep track of the values of s and t. In this example, the required finite state machine is a simple sequencer, which upon activation, repeats a sequence of steps which equals the number of phases in the control graph. ^Combinational Approach In this case, the BIT controller is implemented using combinational circuits and the jFSM part is not used. Each control signal must be generated by using a separate instruction. For example, the control signals FS2 , FS3, FS4 and FS5 can be gener ated using the instructions IR=[0101] , [0110] , [0111] and [1000], respectively. In this case, the decoder d e c l is implemented as follows. 31 FSI = FS6 = S h ift * [IR =0010]; FS2 = RunTest * [IR = 0101]; FS3 - RunTest * [IR = 0 1 1 0 ]j FS4 = RunTest * [IR = 0111]; FS5 = RunTest * [IR = 1000]. The decoder dec 2 is the same for both the sequential and combinational approaches. An external controller can thus execute the test by driving the bus state ac cording to the control graph. Again, counters are required for the external controller to keep track of the values of s and t. Note that a new instruction is required for the activation of each new control signal. Therefore, the test execution time is increased ^dramatically compared to the sequential approach. From these two approaches, one can conclude th at there is a relation between the test tim e and the complexity of the BIT controller. In general, the more complex the BIT controller is, the shorter the test time. >2.2.4 A M a p p in g A lg o rith m jWhen designing a BIT controller for a single kernel, as shown in the above exam ples, a mapping algorithm is required to ensure the correctness of the BIT controller. 'During the test mode, the signals C l, C2,. . . , Cn should be controlled by the BIT Controller so that the kernel can be properly tested. The test procedure for the kernel is described in a control graph, which has been previously defined. Each state .(or node) of the control graph is associated with a set of control signals, which are jthe signals that are active in that state. The inputs to the BIT controller is the con sents of the IR and the signals directly derived from the bus state, namely C apture, S h i f t , Update and RunTest (see Figure 2.7). The BIT controller outputs the sig nals C l, C2, . . . , Cn. The input of this algorithm is a control graph, where each state is associated w ith a set of control signals that should be active in that state. The output of this algorithm is a set of boolean functions along with a set of instructions for the TAP. All control signals are represented in terms of the bus signals C a p tu re , S h i f t , 32 U pdate, RunTest, the contents of the IR, and possibly the output of the FSM. A state signal (FSi) is associated with the state S i. The state signals are used as the interm ediate form for generating the control signals C l, C2,. . ., Cn. Once the state signals are generated, all control signals Ci associated with that state are active jwhen FSi is active. If a control signal Ci is associated with various states ( S i , S j , S k )then Ci = FSi + FSj + FSk. The mapping algorithm is used for the design of a bus-dependent BIT con troller. Therefore, it is assumed that one of the four bus state signals, namely RunTest, S h if t , Update, Capture, must be used to derive any control signal. |The Mapping A lgorithm : 'Input: A c o n tr o l graph and a l i s t o f c o n tr o l s ig n a ls a s s o c ia te d w ith each s t a t e . D utput: Ci in term s o f F S j’ s , which i s a fu n c tio n o f th e IR c o n te n ts , th e bus s t a t e s ig n a ls and (o p tio n a lly ) th e output o f th e FSM. 1. Repeat t h i s ste p fo r a l l s h if t in g s t a t e s . 1.1. A ssign a d i s t i n c t in s tr u c tio n to each scan ch a in . 1.2. For each s t a t e S i, F S i= S h ift* [IR = a ssig n ed in s t r u c t io n ] , and mark t h i s s t a t e . 2 . Repeat t h i s s te p fo r a l l unmarked s e lf - lo o p s t a t e s . 2.1. A ssign a d i s t i n c t in s tr u c tio n to each s e lf - lo o p s t a t e . 2.2. For each s t a t e S i, FSi=R unTest*[IR=assigned in s t r u c t io n ] , and mark t h is s t a t e . 3 . I f th e o n ly n ex t s t a t e o f an unmarked s t a t e (S i) i s a s h i f t in g s t a t e , l e t F Si= C apture*[IR = instruction o f th e s h i f t in g s t a t e ] , and mark t h is s t a t e . 4. I f th e o n ly p rev io u s s t a t e o f an unmarked s t a t e (S i) i s a s h i f t in g s t a t e , l e t FSi = Update * [IR = in stru ctio n o f th e s h i f t in g s t a t e ] , and mark t h i s s t a t e . 5 . I f a l l n eigh b orin g s t a t e s o f an unmarked s t a t e (S i) have been marked, l e t FSi=R unT est*[IR =instruction o f one o f i t s n ex t s t a t e s ] and mark t h i s s t a t e . 6. Do t h i s ste p fo r a l l unmarked s t a t e s . 33 6.1. P a r titio n th e se unmarked s t a t e s in to groups such th a t fo r an unmarked s t a t e S i in group j , i f th e n ex t or p r ev io u s s t a t e s o f S i are a ls o unmarked then th ey are a ls o in group j , i . e . , no n eig h b o rin g s t a t e o f a s t a t e in group j i s unmarked. Do s te p 6 .2 fo r each group. 6.2. I f th e group c o n ta in s more than one s t a t e , go to s te p 6.3, e l s e a s sig n a new in s tr u c tio n to t h is s t a t e S i, and l e t FSi=R unTest*[IR =assigned in s t r u c t io n ] . Mark t h i s s t a t e . 6.3. S e q u e n tia l Approach: A ssign a new in s tr u c tio n to t h i s group. Let th e number o f s t a t e s in t h is group be c . Program th e FSM such th a t i t cou n ts from 1 to c when Enable = RunTest* [IR=the s e le c t e d in s tr u c tio n ] i s a c t iv e . A ssign th e FSM s t a t e s ig n a ls PHl,...,PHc to th e ap p rop riate F S i's so th a t a l l F S i's in t h is group are p ro p erly a c tiv a te d . Mark a l l s t a t e s in t h i s group. 7 . I n s tr u c tio n m erging: The c o n tr o l graph can be reduced to a subgraph fo r a g iv e n s e t o f nodes i f a l l o th er nodes and t h e ir a s s o c ia te d a rcs are removed. G enerate a subgraph fo r each group o f nodes formed in s te p 6. For a l l subgraphs th a t are isom orphic and are a s s o c ia te d w ith th e same c o n tr o l s ig n a ls , an in s tr u c tio n can be used fo r a l l th e s e grou p s. The number o f a ssig n ed in s tr u c tio n s can th u s be reduced. 8 . G enerate a l l c o n tr o l s ig n a ls Ci from th e FSj's. I f a s ig n a l i s a c t iv e in s e v e r a l s t a t e s , say S i, Sj and Sk, th en Ci=FSi+FSj+FSk. Initially, all states of the control graph are unmarked. After the execution of nhis algorithm, all states are marked and their associated signals are assigned with a boolean function. Since data must be sent and received when shifting a register in the application circuit, the shifting states in the control graph can only be mapped into the bus state ShiftDR. In step 2, a self-loop state is mapped into the RunTest bus state since this state can be held for many consecutive times without changing the contents of IR. In step 3, 4 and 5, more interm ediate signals are generated without assigning any extra instruction. In step 6, the state signals for all the 34 unmarked states are generated by using RunTest. The BIT controller can either be a combinational circuit or a sequential circuit. If step 6.3 is entered, a sequencer is created, the BIT controller is a sequential circuit; otherwise it is a combinational circuit. After step 6, all the states have been marked (or processed). Instructions that can be shared among different groups formed in step 6 are found in step 7. The total num ber of instructions assigned is then further reduced. In step 8 all control signals (Ci) are generated from the intermediate state signals (FSi). The correctness of the BIT controller is guaranteed since all control signals Ci are controlled by the contents of IR and the bus states. E xam ple The Mapping Algorithm is illustrated using the complex kernel shown in Figure 2.11(b). The associated control graph is shown in Figure 2.11(b). 1.1. Assign instruction i l to the only scan register in this example. 1.2. There are two shift states SI and S6, therefore, FSl=FS6=Shift* [IR = il] . Mark states SI and S6. j 2 . This step is skipped since there are no unmarked self-loop nodes. 3. Skipped, j l . Skipped. 5. Skipped. 6.1. Only one group consisting of states S2, S3, S4 and S5 is formed. 6 .2. Skipped. 6.3. (Use Sequential approach) Assign instruction i2 to this group. The FSM is en abled when Enable = RunTest* [IR=i2] is active. The FSM is programmed to keep counting from 1 to 4 when enabled. Let FS2=PH1, FS3=PH2, FS4=PH3, FS5=PH4. 7. No instruction merging can be done. Two instructions are assigned. 8. Assign all control signals. TPG = FS2 + FS3, PSA = FS2 + FS5, G2 = FS2 + FS5, LD2 = FS3 + FS4, Gl = FS3 + FS4, LD3 = FS4 + FS5, SHI = FSI + FS6, SH4 = FSI + FS6. □ Let the number of scan chains formed by shift registers be Ic, the number of self-loop states be Is, and the number of groups formed in step 6 be Ig. The total 35 num ber of instructions assigned in this algorithm is Iseq = Ic + Is -\- Ig. Note that ;he public instructions defined in the standard are not included here. jem m a 1 The number of instructions assigned by the mapping algorithm is mini- \mal. Proof: i |The num ber of instructions is the sum of Ic, Is, and Ig. The first two numbers are jdefined by the structure of the control graph. The third number Ig is minimal since the number of groups derived from step 6.1 is minimal. Thus it is clear th at the num ber of instructions (Iseq) assigned by the mapping algorithm is minimal. □ ^Combinational Approach: The BIT controller designed using the mapping algo rithm may be a sequential circuit since a finite state machine may be used in step p.3. However, because of design constraints it might be necessary to implement the BIT controller as a combinational circuit. If this is the case, step 6.3 can be modified as follows. 6.3. C om binational Approach: A ssign a new in s tr u c tio n to each s t a t e s in th e group. For s t a t e S i in t h i s group, l e t FSi=R unTest*[IR =assigned in s t r u c t io n ] . Mark th e s t a t e . Let the number of unmarked states at the beginning of Step 6 be Iu, then the num ber of instructions assigned in the combinational approach is Icom = Ic + Is + Iu. For a given control graph, it is obvious that Icom > Iseq. Since loading a new instruction to the IR of a chip also loads new instructions to all chips on the same test ring, i.e., all chips share the same T M S line, the test tim e is increased significantly if many instructions are required in testing a kernel. Therefore the sequential approach of designing the BIT controller can reduce the test tim e at the (expense of adding a finite state machine to the controller. 36 2.3 A u to n o m o u s B IT C on troller An autonomous controller has all the test control facilities required to execute the test process of the kernel under test. Typical facilities include a test pattern gener ator, a signature analyzer, a T counter, which is used to keep track of the number of test vectors applied, an S counter, which is used to keep track of the number of bits shifted during a shifting operation, and a finite state machine to control the test sequence. For example, an autonomous BIT controller for testing an LSSD kernel ^(see Figure 2.8) is shown in Figure 2.12. The correct seeds must be loaded into both TPG and SA before the test. The START signal is generated by loading a special instruction to the IR. Once START is activated, the controller can execute the testing for the LSSD kernel autonomously. Upon completion, a signal D O N E is activated. The signature can then be collected. For the rest of this chapter, it will be assumed that the application circuit contains n BIT structures, which are all LSSD kernels. An autonomous BIT con troller is needed to test these n BIT structures. Depending on the approach used in testing these kernels (serial or parallel), the BIT controller used is called a Serial BIT controller or a Parallel BIT controller. The design philosophy of these two controllers are addressed next. ■2.3.1 S erial B IT C on trollers 2.3.1.1 A H ard-W ired Serial B IT C ontroller A single BIT controller can be employed to test n BIT structures in sequence. Such a controller is called a serial BIT controller. A BIT controller that can be used to test n BIT structures (LSSD kernels) is illustrated in Figure 2.13. A counter SS is used to keep track of which BIT structure is to be activated. This counter requires only [log n] bits. Since the same controller is shared among all the BIT structures, the area required for it will not grow linearly with n and thus, the total area for the controller (not counting the area required for storing the s and s;’s) will increase only logarithmically with n. However, the tim e required for testing is 37 Sin R2 Sout SA SH2 DONE TPG S counter T counter START FSM (a) START DONE S2 (b ) Figure 2.12: An autonomous BIT controller for a LSSD kernel. now proportional to x which may become prohibitively large if the number of BIT structures to be exercised is significant. To BIT s tr u c tu r e s (n LSSD k e r n e ls) LDn SHn SHI D E M U X SH L D SS S tru ctu re S e le c to r (t ,s ) R e g iste r Stack (fo r a LSSD k ern el) FSM Figure 2.13: Testing many kernels in sequence. 2 .3 .1 .2 A M icroprogram m ed Serial B IT C ontroller The general architecture for a microprogrammable test controller, suitable for exe cuting the test for an LSSD kernel, is shown in Figure 2.14. The design, which is first presented in [11], is similar to that of a conventional microcontroller, supplemented jwith circuitry for keeping track of the loops associated with t and s (shown within the dashed rectangle in the figure). The register stack contains constants, such as t and s , which can be loaded into the accumulator register (ACC), decremented (DCR), 39 and stored back into a temporary register in the stack. The logic C determines if the content of the accumulator is zero. The fields of the instruction register (IR) are - (1) the opcode, (2) the control field, (3) the branch condition address, (4) the miscellaneous field, used for any environment specific control signals, and (5) the address field. These fields may be either horizontally or vertically encoded. A few basic microinstructions are described in Table 2.2. f i e l d 1 5 opcode address fu n c tio n L O A D I load A C C from r e g is t e r I o f sta c k STORE I load r e g is t e r I o f sta c k w ith c o n te n ts o f A C C DBNZ N decrem ent ACC; i f A C C ^ 0 , branch to N N O P - no o p era tio n Table 2.2: Microinstruction List for Controller. For simplicity the branch condition address (field 3) has not been used. Hence 'or the DBNZ instruction, if A C C ^ 0 then N is forced into ADR1, else ADR1 takes its normal next value which is ADR1+1. The microprogram for the control graph of Figure 2.12(b) is shown below. Here s and i are permanently stored in stack registers 0 and 2, respectively. The program requires only 7 instructions and 4 different operation codes. No. Opcode C ontrol Address Comments 1 L O A D 0 0 t in r e g is t e r 0 2 STORE 0 1 T est v e c to r count in r e g . 1 3 L O A D 0 2 s in r e g is t e r 2 4 DBNZ SH 4 Issu e SH; loop s tim es 5 N O P L D 0 Issu e LD s ig n a l 6 L O A D 0 1 T est V ector count to A C C 7 D BN Z 0 2 Repeat major loop 40 Register Stack dec. ADR ACC ACC=0 MUX microprogram ROM dec. decoder microcontroller control T ▼ BIT control ►DO ► D1 Conditions MUX Figure 2.14: Microprogram controller. 41 If the chip being tested already has a microprogram control unit, then the ,est control procedure requires very little additional overhead. The latch and shift controls already exist. The data processor portion of the controller may need to be added to the chip if it does not already have an accumulator/ALU structure. An ad ditional advantage of having a microprogrammed control is th at one can implement microdiagnostic routines and functional tests. However, if the chip under test does not have a microprogrammable control unit, a hard-wired test controller previously described may be employed. 2 .3 .2 P a ra llel B IT C on trollers A parallel BIT controller can autonomously test multiple kernels in parallel. The test tim e is minimal since all kernels are tested simultaneously. Due to the similarity between the kernels, the complexity of the controller can be reduced if the controller is properly designed. The design technique for such controllers is illustrated by using an application circuit consisting of many BIT structures. All these BIT structures are assumed to be LSSD kernels. It is assumed that these kernels are independent, i.e., they do not share any resources such as busses or registers, and hence the test can be executed concurrently. The ith kernel requires f,- test vectors and s,- shifts per vector. Thus n independent controllers of the type shown in Figure 2.12 can be used for each structure. In this approach, the test tim e would be max [s x ti], and ;he controller area will increase linearly with n. Three hard-wired designs which do not have excessive test tim e as found in the sequential controllers, or require excessive area as may be the case when n controllers are used are presented next. The first design superimposes the FSM s of individual controllers into one FSM and interleaves the activation of the BIT structures. This 'design can potentially test all n BIT structures in the same time as required by n independent controllers. In addition the area overhead is comparable to th at of a sequential controller which exercises the BIT structures one by one. Unfortunately, in this scheme, the time required for testing depends on the problem at hand (i.e., |the values of ti and s;), and for some pathological cases this design may entail long test times. 42 Two other designs to be presented rectify this problem by running all BIT structures simultaneously without interleaving. The controller area is reduced by sharing common factors among the ta’s and S j-’s. These designs are referred to as the “Tree-of-Counters” and “Counter-Sharing” controller designs. It should be remarked at the outset that none of these designs is clearly superior to the others. T hat is, depending on the BIT structures to be controlled, area constraints, and test time objectives, one controller may be more beneficial than the others. j2.3.2.1 Interleaved FSM C ontroller jThe design technique is illustrated with an example consisting of three BIT struc tures BIT1, BIT2 and BIT3. The extension to n BIT structures is straightforward. (Assume that si = 41, s2 = 48, S3 = 62, ti = 900, t2 = 1000, and — 1150. To each jBIT structure, the LD and SH control signals must be issued at appropriate times. 'For example, BIT2 must receive 48 consecutive SH pulses, followed by a LD pulse, in order to apply one test vector. It is assumed that if a register is not in the SH or L D mode then it is in the H O L D mode. The FSMs for the individual BIT structures, which are similar to the FSMs shown in Figure 2.12 with different values of se - and can be combined into a single ♦ S M as follows. To interleave the execution of these FSMs, the controller can issue 41 SH pulses to all three BIT structures, then 7 SH pulses to BIT2 and BIT3, followed by 14 SH pulses to BIT3 alone. At this point all the test vectors are loaded into their proper registers and a LD pulse can be issued to the appropriate BIT structures. This process, when repeated 900 times, will apply all the test vectors to BIT1 and the jfirst 900 vectors to BIT2 and BIT3. This cycle, with control signals to BIT1 disabled is repeated another 100 times to finish testing BIT2; and another 150 repetitions with control signals to both BIT1 and BIT2 disabled, will conclude the testing process. The interleaved FSM is shown in Figure 2.15. It consists of three (in general n) phases. Each phase completely activates one BIT structure and executes some tx = 900 sx = 41 SH (1,2,3) LD(S) SH(2,3) LD(S) DCR(S) DCR(S) LD(1,2 , DCR(T) LD(S) LD(T) LD(S) SH<3) DCR(S) DCR(S) tx = 100 14 41 LD(T) LD(S) SH(2,3) LD(S) DCR(S) DCR(S) DCR(S) DCR(T) LD(S) t3 — t2 = 150 14 41 SH(3) DCR(S) SH(3) DCR(S) LD(T) LD(S) LD(S) SH(3) LD(3) DCR(S) DCR(T) LD(S) LD(S) START START Figure 2.15: Interleaved FSM controller. 44 of the test vectors of other BIT structures which as yet have not completed their test cycle. In this example, the first phase (the top row of states) tests BIT1 and parts of BIT2 and BIT3; the second phase completes the testing of BIT2 and part of BIT3; and the third phase finishes testing BIT3. A procedure for deriving an interleaved FSM for an arbitrary value of n is presented next. W ithout loss of generality, assume that S{ < sJ+i, 1 < i < n. Also let < 7 be a perm utation of (l,2 ,3 ,...n ) such that < ta(i+ 1), 1 < i < n. Define to = 5o = 0 and o'C O ) — 0- The interleaved FSM has n phases. Phase i applies — to-(i-i) vectors and after it is over, BITcr(I) through B ITa(i) have been completely tested. If ta^y — f ^ - i ) = 0 this phase can be ignored. The iih phase starts with loading tff^ — tc qi-i) and si (which is the minimum Si value) into two working registers, T and S. Register S is then decremented and a SH signal is issued to the appropriate BIT structures, namely BITcr(i 1), BITcr(z - + • 2), ... BIT<r(n). This is done repetitively until register S contains a zero. The value S2 — is then oaded into S and this count is used to issue SH pulses to all those BIT structures or which a sufficient number of SH pulses have not been issued so far. This process s continued until all the test vectors are shifted into their respective registers. A L .D pulse is then applied to the appropriate BIT structures and the above cycle is •epeated ta(i) ~ 1) more times. In general, there are n phases, each with 2n + 1 states (2 for each difference [si — s;_i) > 0 and one final state). Hence the total number of states in the combined rS M is n x (2n + 1), requiring [log(n x (2n + 1))] flip flops. The interleaved FSM requires 2n constants (the differences ( — s;_i) and (t^i) — for i = 1 to n). We assume that they are stored off-chip and can be loaded by invocation of appropriate LD signal. It can be noticed that the above controller does not have the m inim um number of states required to control the n BIT structures. In fact, the num ber of states can be reduced by almost a factor of two. For example, consider the second phase in Figure 2.15. Since BIT1 has already been tested, the S register can be loaded with 48, instead of 41 followed by 7. This results in a reduction of 2 states from the second phase. Similarly, 4 states can be deleted from the third phase. In general, ;he num ber of states which can be eliminated is given by 2 + 4 + ...2 (n — 1) = n(n — 1). 45 Thus, instead of requiring flogn(2n + 1)] flip flops, flog(rc(2n + 1) — n(n — 1))] = flog n(n + 2)] flip flops are sufficient. Thus the saving in terms of storage is just one flip flop. On the other hand, this change introduces asymmetry in the phases as each phase now has a different number of states. Because of this asymmetry, a jdifferent decoder will be needed to issue SH and LD signals in each phase, and hence the controller area will start increasing linearly with n. Therefore, the FSM shown in Figure 2.15 requires less area even though it does not have the minimum num ber of states. Three factors contribute toward an efficient implementation of an interleaved controller. First, only three registers S, S' and T are required as opposed to 2n registers in the case of n independent controllers. Secondly, the increase in the num ber of states in the FSM over a single controller does not lead to very much additional area. Since the state information can be encoded using just flog(n x (2n + 1))] flip-flops, the variable part of the design (with the constants stored off- jchip) increases only logarithmically with n. Thirdly, the proposed FSM is highly symmetric. Thus one can use a single 2n + 1 state machine in order to issue SH and LD pulses to all the BIT structure. A flog n] bit counter, referred to as the tem plate register, may be used to store which phase is being executed and, depending on its value, the control signals to the BIT structures that have already been tested may be disabled. Figure 2.16 shows a schematic for this implementation. Under certain conditions the interleaved controller may entail a penalty in ;est time. Since the application of test vectors to different BIT structures is in terleaved, it takes max[s;] units of time to apply one vector to all BIT structures. jThus the total test tim e is proportional to raax[(;] x max[s,-]. Typically, the BIT structures with large value of s; will also require the most number of test vectors, and hence max[<s] x max[si] will equal maxfl,- x s,-], the tim e required by n indepen dent controllers. However, if the st’s and tCs are not well matched, there may be an unacceptably high penalty in test time. For example, if Si = 50, s2 = 100, t\ = 5000 and t2 = 1000, the interleaved controller will take 100 x 5000 units of tim e instead of 50 X 5000 required by two independent controllers. 46 9- SH ► LD f t I I (to BIT str u c tu r e s) Figure 2.16: A controller for interleaved test execution. 2.3.2.2 Tree o f C ounters D esign The test tim e problem just discussed can be alleviated by executing all n BIT structures simultaneously without interleaving. The number of flip-flops used in the count-down registers can be reduced by sharing the common factors in the numbers and .s, which need to be decremented. This scheme will be most effective when the number of common factors is large. Even in the worst case, this design methodology will be no worse than employing n independent controllers. Consider once again the example given in the previous section. The con troller has to provide a SH control signal to BIT1, BIT2, and BIT3 for 41, 48, and 62 clock cycles respectively; provide a LD control signal to the respective BIT struc tures after every 41, 48, and 62 clock cycles, respectively; and repeat this process until all test vectors have been applied. To apply one test vector to B ITi requires Si SHs followed by a LD, i.e., it is S{ + 1 clock cycles. Notice th at the LD signal is dependent on SH, i.e. LD = NOT(SH). 3 \log(2n + l)] Temp!Late Reg. flog(n)] 47 It is apparent from the above discussion that the task of generating appro priate control signals to the BIT structures is in essence that of repetitively (and simultaneously) counting a set of numbers. One needs to count from Si to 0 to generate the SH signals and from (Si + 1) x ti to zero to generate the D O N E signals. To illustrate this approach, assume that modulo 41, 48, and 62 counters are required. Since 7 is a common factor of these numbers a 3-bit modulo-7 counter th at can be shared among all three counters can be used. This ‘root’ counter may be a hard-wired modulo-7 counter. This shared counter produces a term inal count signal (tc) every 7 clock cycles that can be used to decrement modulo 6, 7 and 9 counters. Sharing of common factors can be continued recursively using factors of 6, 7, and 9, giving rise to a tree like structure. This logic is called a Tree-of-Counters. Figure 2.17 shows the optimal Tree-of-Counters for this example. No constants need be stored in this design approach since ti and s, are actually hard-wired into the modulo counters. Clearly, considerable savings can be accrued by such sharing if the numbers to be counted have many common factors. In this example, if one used three separate counters, then flog 42] + flog 49] + flog 63] = 18 flip-flops will be required. On the other hand, the proposed design uses only flog 7] + flog 3] + flog 7] + flog 2] + flog 3] = 1 1 flip-flops. The problem of designing an optimal Tree-of-Counters for a given set of numbers can be formalized as follows. Let the numbers to be counted be si, S2, ■ ■ ■ , and sn, and let Fi, F2, ..., and Fn be the sets containing the factors of these numbers. (Fi's may contain multiple factors.) Also let / tJ , i = 1,2, ..., n and j = 1,..., |F,| denote the j th factor of st. The optimal Tree-of-Counters corresponds to a tree with n leaves, rooted at 1, such that 1. the product of the / ; j ’s on the path from the root to the ith leaf is s;, and 2. JZ flog for all in the tree is minimum over all possible trees. The first item in the above definition ensures that the frequency of the term inal count signal of the ith leaf counter is st. Because common factors may be shared in different ways, several trees are possible for a given set of numbers. For 48 mod-7 mod-7 mod-3 LD1 SHI mod-3 mod-2 ■ a - LD3 SH3 SH2 D CR TC D CR TC DCR TC DCR TC DCR TC Figure 2.17: A Tree-Of-Counter Design. example, Figure 2.18 shows three alternative trees for the Si values of 510, 714, 595, and 78. The dashed rectangles in the figure represent a counter for the product of the factors in the rectangle. The second item in the above definition requires that the total cost of constructing these trees in terms of the number of flip flops be minimum. The problem of constructing the optimal Tree-of-Counters appears to be computationally intractable. In fact it is not known whether it is even in the NP class. A complete discussion of the computational complexity of this problem is beyond the scope of this work. In the remainder of this section, a ‘greedy’ heuristic, th at attem pts to obtain a good solution by locally optimizing the savings associated with any proposed counter sharing, is presented. To make counters for and sj one can have a mod-o counter, a = GCD(s{, sj) feeding counters for GC^ g:s y and qcu(^~7~)- This sharing results in a saving of riog(G'C'D(5i, 5i ))l and two counters having |"log o c W t^ J j 1 a n d F lo g g c d ( Si,Sj)~ \ b it s respectively have to be constructed. These two counters will be driven by a sig nal from the mod-o; counter. Thus, with each pair S{ and Sj, we can associate a score S given by S(si,sj) = \log(GCD(si, s?))] which weights the sharing of com mon factors between st and Sj. A greedy procedure for generating a near optim al Tree-of-Counters is presented below. TreeOfCounters(C) { i f ICI = {} then return; e l s e i f |C| = {a} then {GenCounter(a, 1); return; } e l s e {fin d a, b in C such th at S (a ,b ) i s maximum; i f S (a ,b ) = 0 then { fo r a l l x in C, GenCounter(a, 1); return; } GenCounter(a/GCD(a,b), GCD(a, b)); GenCounter(b/GCD(a,b), GCD(a, b)); d e le t e (a , C ); d e le te (b , C ); insert(G C D (a,b), C ); TreeOfCounters(C); }> 50 r if 17 I 78 i J = * y-L j 595 F T ' 5' ' . 510 714 mod-7 counter (a) xrL J 113 595 78 17 510 14 (b) - 1 -----1 _ _i 78 595 13 Cc) Figure 2.18: Three Trees-Of-Counters. This algorithm works bottom up from leaves to the root. For any given set C of numbers the algorithm computes the score S(si,sj) for all Si,Sj £ C. It then picks the pair with the maximum score, generates a mod( GC^ S. 3 ^) and mod{ ) counter, deletes Si and sj from C and inserts GCD(si,Sj) into C . These two counters are decremented by the terminal count signal of GCD(si,Sj). The procedure now recursively calls itself with the modified set C of numbers. It term inates when there is only one number left in C, or the maxim um score for any pair is 0 indicating th at all elements of C are relatively prime. In the procedure, the routine GenCounter(A, B) generates mod-A counter which is enabled by the term inal count signal of the counter for mod-B. Figure 2.18(c) shows an example Tree-of-Counters constructed using this procedure. The step “find a, b in C such that S(a,b) is maximum” is the most complex step in the algorithm. Initially, 0 (n 2) GCD computations will be required in order to find the maximum value of S(a, 6). Since in each recursive step the size of C diminishes by one, there will be n iterations and the above procedure can be easily executed in 0 (n 3) time. It is possible to reduce this time complexity to 0(rt2) if in each iteration a and b are replaced by GCD(a, b). Thus in each successive iteration the score S need be recomputed only for the pair formed by this new entry and the old numbers. Hence it is possible to find the maximum value of S(a , 6) in 0(n) time in all iterations except the first one, resulting in an 0 (n 2) overall tim e complexity. 2.3.2.3 C ounter Sharing D esign In the previous design, specific common factors of the form G C D (A, B) were employed and used to enable the counters for A/GCD(A,B) and B/GCD(A,B). In the Counter Sharing design scheme a set of common factors are obtained, and these factors are combined to derive the desired SH and LD signals. Let tcx denote the term inal count signal of a mod-a: counter. One can generate signals of any frequency by simply ANDing together appropriate factors of a desired frequency. For example, a signal th at goes high every 100 clock cycles can be generated by ANDing tc2 5 and tc4, denoted by AND(tc 2 5,tc 4). Notice that repeated prime factors, e.g. {5,5} and {2, 2} in this case, have to be multiplied and a larger common factor counter must 52 be used. T hat is, it is wrong to implement AND(tc 5,tcs,tc 2,tc 2), since this would produce a signal every 10 clock cycles instead of every 100 clock cycles. Figure 2.19 shows how to derive tc\Q and icioo from mod-2 and mod-5 counters. In general, if n ;=i of* is the unique prime factorization of N, then a signal th at goes high every N cycles can be produced by AND(tca^ ,... tc^pp). tc4 tc 5 tc25 tc2 ii tclOO f tclO mod-5 mod-2 mod-5 AND Figure 2.19: A counter-Sharing design. Let Tj — tj x (sj + 1) denote the total test time for the j th BIT structure. To control this structure, we need a signal tcSj+ 1 to issue appropriate LATCH signals and a signal tcrj to mark the end of testing. Let factor(X) = {a*|a:* occur in the prime factorization of A-} and U = \Jj=1( f actor (Tj) U factor(sj)) be the set of all the factors. Here the union operation U takes the maxim um power for each replicated prime factor, i.e., U (a \ a J) = The control signals for all the BIT structures can now be generated from the common pool U of factors by ap propriately ANDing the appropriate signals. For example, if Tj = ctji1 a ^ 2 ■ • • a ^ r is the prime factorization of Tj then tcTj = tctjXSj+i = AND (tc ,..., tc pjr) will a H a jr generate the DONE signal for BIT;. Signals s/s can be similarly generated. This design requires 2n AND gates (one for each sj and the other for each Tj). Notice th at no other circuitry is required, either for decoding or for m aintaining the state information. The total test time required by this design is max"= 1 Tj = m aXj=1(tj X (sj + 1)) which is considerably better than J2j=i Tj, the tim e required by a sequential controller. 53 2.3.2.4 C om parison o f T hree D esigns The interleaved FSM controller requires on the order of m ax[si] x max[<t] clock cycles to completely activate all BIT structures. For a ‘balanced’ problem, i.e. the one in which max[s,] and max[ii] occur for the same BIT structure, this will be optimal. In addition, the area occupied by the FSM excluding the area required for storing the constants, increases only logarithmically with the number of BIT structures. The Tree-of-Counters design and the Shared-Counter design are useful when max[s;] x max[i4 ] is much larger than max[s, x £;], and when the s^s and U’s have many factors in common. Both designs operate the BIT structures concurrently, without any penalty in test time. Also the area overhead for these designs is con siderably less than that incurred when independent controllers are used. This is achieved through register sharing and elimination of decoding circuitry. I i The basic idea in both the Tree-of-Counters and Counter-Sharing designs is the same. In the former the term inal count signal of one counter is used to decrement a set of other counters giving rise to a tree structure. The amount of I jsharing in this scheme depends heavily on the problem at hand and may be lim ited for some problem instances. For example, if counters for 11 x 13, 11 x 31, and i 31 x 13 have to be designed, only two of these numbers can share a counter. The Counter-Sharing design alleviates this problem by constructing a common pool o f ; j I ;counters which is shared among all numbers. For each number appropriate term inal count signals are simply ANDed together. For most problems this design will be 1 superior to the Tree-of-Counters design. However, there exist situations when a , Tree-of-Counters is more desirable. For example, if counters for 11 x 13 and 11 x 31 j are to be constructed, both design schemes will require modulo counters for 11, 13, | I and 31. The Tree-of-Counters design will use mod-11 to drive mod-13 and mod-31. J On the other hand, the Counter-Sharing design will need extra A N D gates to produce i signals of appropriate frequencies. It is this saving in terms of A N D gates which may ; !make Tree-of-Counters more desirable for some problem instances. ; ! . . ! ! For any BIT structure the value of Si is fixed by the length of its scan-chain \ b r internal shift registers. The number of vectors (ti), however, can in general be increased, if doing so will result in reduced area without excessive penalty in test ______________ ________________________________________________________ 54. 1 0 0 — x— independent - x - treel + -tree2 -sharedl £ .a & = » 40 shared2 o - rounded — o— interleaved - * - sequential # test schedules Figure 2.20: Hardware complexity for different designs. tim e. Since common factors in the t4 ’s are shared, by modifying the value of ti, it should be possible to significantly reduce the controller area. The simulations experiments described below show that this indeed is the case. In particular, we will see that if the s are increased to become multiples of some number, say 100, l both the Tree-of-Counters and Counter-Sharing designs result in reduced area. This ! clearly increases test time. However, this increase may be acceptable in exchange 'for reduced area and associated increase in fault coverage. Controller area and test times have been determined for each of the designs ! described. The results are shown in Figures 2.20 and 2.21. Figure 2.20 shows the j average number of flip flops required by the various designs relative to that of n ! independent controllers. In order to plot these curves, n random pairs of si and ti values were generated. The number of flip flops required by each design was then estim ated (as described below) and averaged over 20 iterations. The results indicated correspond to the following cases, described from top to bottom in the figure. The value of s; range from 50 to 250; that of f; from 1000 to 50,000. 55 1. independent - represents the area occupied by n independent controllers with j no optimization. The area is estimated to be X 3?=i ( [l°g •§,] + [log t{\) and is used as a normalization factor for all other curves. The area of the FSM is ignored. 2. t r e e l - represents the area occupied by the tree generated using the procedure T reeO fC ounters. 3. tr e e 2 - same as above except ti s are incremented to the next m ultiple of 100. 4. s h a re d l - indicates the area occupied by the Shared Counter design; the area for the A M D gates is ignored. 5. shared2 - same as above with ti s advanced to the next multiple of 100. j 6. rounded - same as 4 above with ti s increased to the next power of 2. 7. in te r le a v e d - represents the area occupied by the interleaved controller es tim ated by 2 [log (max s*)] + [log (max (£ * •)] + [logn] + [log(2n + 1)]. Here the term s represent the area occupied by the S and T registers, the state register, j and the tem plate register (see Figure 2.16). The area occupied by the decoder is neglected and all constants are assumed to be stored off-chip. 8. s e q u e n tia l - represents the area occupied by a sequential controller which] I is shared among all the BIT structures. In this case we need an additional j counter to distinguish between various BIT structures and the area is given 1 by 2[log(m axs,)] + [log(max(^)l + [logn]. | I I It should be emphasized that the above area estim ates only consider the 'area for counters and flip flops required for storing the state information. All the {constants are assumed to be stored off-chip and loaded through activation of ap- j I jpropriate LD signals. These estimates nonetheless are representative of the total j [controller area. W hen the area for storing the constants s and Sj’s are consid- | ' | ered, those designs using mod counters become even more attractive because these i constants are hard-wired into the counters themselves. xlO 7 2.5 * sequential/10 + rounded x interleaved o parallel 1.5 t S a 0.5 30 40 60 # test schedules Figure 2.21: Time complexity for different designs. Figure 2.21 shows the test times for these designs. Not surprisingly, the n independent controllers, the Tree-of-Counters Design, and the Shared Counter de signs all take the same tim e indicated by the bottom most curve (labeled p a r a l l e l ) lin the figure. This time corresponds the maximum tim e taken by any BIT structure land is given by m ax[(s8 + 1) x ti\. The test time for the interleaved design (labeled jin te rle a v e d ) is proportional to max[(s^ + 1)] x [maxf,-]. If the values of the ti s are jchanged to the next higher power of 2, the design area can be reduced, as shown by the curve marked ' 'r o u n d e d '' in Figure 2.20. The corresponding increase in test 1 .time is given by the curve labeled rounded in Figure 2.21. Finally, the tim e taken Iby a sequential controller, which is given by + 1) x ti, is shown. This curve is labeled s e q u e n tia l/10 and is scaled by a factor of ^ in order to more clearly show its relationships with the other curves. The test time for the sequential controller is significantly greater than any other design, and increases monotonically with n. Several conclusions can be drawn form these plots. First, all the designs occupy an area which is between that of using n independent controllers and the sequential design. The interleaved FSM design may be viewed as an enhancement 57 j to the sequential design. The area occupied by it is only marginally more than the I .sequential design. At the same time, its test tim e is comparable to th at of parallel I j controllers. Secondly, on the average, the Shared Counter design performs better than the Tree-of-Counters design. However, the Tree-of-Counter design should not be totally ruled out because in some cases both may use the same num ber of flip flops while the later will require additional AND gates for decoding, which have not been accounted for in these plots. Thirdly, the area occupied by both the common factor sharing designs can be further reduced by appropriately selecting £; values. jThe reduction in area appears to be proportional to the corresponding increase in test time. Lastly, these designs reduce the rapid rise in test time as n increases. It is clear th at if the ti s can be either forced to have more common factors, or better jyet to take on values from a small set, then the area overhead can be considerably [reduced. The same is true for the Si values. Though it is not feasible to make a scan j chain longer, pseudo test bits can be added to each vector to make the apparent [length of a scan chain longer. For example, a test vector of 47 bits can be preceded by 3 garbage bits to produce a test vector of 50 bits. After 50 SHIFT pulses, only the desired 47 bits will reside in the corresponding 47 flip flop shift register chain. Thus by forcing the Si values also to be elements of a small set the area of the controller can be further reduced. 58 C h a p ter 3 C o n tro ller for T esta b le M o d u les This chapter deals with the design and implementation of an MMC. An MMC is used to control the self-test process of a module (or board) by accessing each chip’s ! BIT structures through an LO-bus. The proposed MMC is universal in the sense that the same basic design is used for all modules. MMCs differ by the test programs j they execute, the number of test busses they control, and the expansion units th e y ! employ. Test programs are used to control the processor in an MMC in the execution j of the Built-In Self-Test (BIST) process for the entire module. The test results are 1 then reported to a SuMP via an Ll-bus. A SuMP can initiate the self-test process j jof a module by sending a “begin test” command to the MMC on that module. The | IMMC then reports the “health status” of that module back to the SuMP. ! I j An MMC contains bus interface units (such as an Ll-slave and an L0-master), a processing unit (such as a processor), a memory unit (consisting of RAMs and ,ROMs), one or more test channels, a Bus Driver/Receiver, one or more expansion units (such as testability registers and analog test interface), and a CMC. A simple yet novel design, called the test channel, is used in an MMC. Since every testable chip has an LO-slave in its CMC, a test channel, which contains an LO-master, can communicate over an LO-bus with the CMC. The MM C’s processor can control a test channel by reading from or writing to its internal registers. Once initiated by the processor, a test channel can completely control an LO-bus and the testing of a chip. The separation of processor and test busses provided by test channels prevents the processor from dealing with detailed bus tim ing activities. A 59 test channel translates processor instructions into proper tim ing sequences for an LO-bus. A test process can now be represented as high level processor instructions. In [15], Budde reported on the design of the Testprocessor which is similar to our MMC. The Testprocessor is intended to carry out some of the functions of the CMC and the MMC. Since it may be part of an application chip, it must be simple. The Testprocessor is programmed at the m icroinstruction level. All peripheral devices are controlled directly by the control signals provided by these microinstructions. The number of expansion units is limited by the total num ber of 'control signals the control unit can provide. Data can be moved directly between the test pattern RAM and the test interfaces without going through the processor register. Obviously, this is an efficient approach for data movement. However, due to the lim itation of the bus, only one serial interface can run at a time. Comparisons are done using a fault-secure comparator. There is no other data processing unit 1 in the Testprocessor. Due to the limited processing capability, diagnostic programs j cannot run on the Testprocessor. j 3.1 R eq u irem en ts for an M M C i ^ I i |An MMC must be able to respond to requests from a SuMP, to carry out tests for j every chip on the module, and to report test results to a SuMP. The requirements for f :an MMC are stated below, followed by a description of its architecture in the next I 'section. In summary, an MMC should be able to support the following functions: j ; ! 1. Access the on-chip BIT structures via an LO-bus. I 2. Provide proper control sequences for the execution of a chip’s BIT structures. J 3. Generate test data and collect test results if necessary. 4. Analyze test results to monitor the health status of chips. 5. Test the interconnects among different chips on the module via the boundary scan registers. 60 6. Provide controllability and observability for non-testable chips and analog cir cuits. 7. Interface with a SuMP or the control console. In addition, an MMC must have memory to store test d ata and/or test results if determ inistic test data is needed, this requirement on memory is relaxed for random or exhaustive test methodologies since only seed data and signatures need to be stored. I 3.2 M M C A rch itectu re Figure 3.1 shows the architecture of an MMC. It consists of a 16 bit general or special purpose processor, a ROM, a RAM, a test channel, a CMC with an LO-slave, an Ll-slave, and a Bus Driver/Receiver (BDR). The BDR supports an expansion bus, i.e., it allows extra units to be added to the MMC. For example, a functional bus interface, two testability registers, an analog test interface, several test channels, an expansion ROM, a control console interface and a disk interface are shown to communicate with the MMC through the BDR in the figure. The components shown in the shaded region (which can be implemented as a single ASIC chip) are required for every MMC. CMCs for these chips are not shown. All units on the local and expansion bus are accessed by the processor in a memory-map schema. T hat is, every accessible register of each unit occupies one location in the global address space. The processor can read from or write into these jregisters by first addressing the appropriate registers. Each unit must be able to ! decode the address lines. Once the register is selected, an enable signal is generated I ] to initiate a read or write operation. I 3 .2 .1 T est C h a n n el D esig n A CMC may have a pseudorandom test pattern generator (TPG ) and a signature analyzer (SA), which can be implemented using linear feedback shift registers [51]. 61 Core of the MMC LI -bus Processoi; RAM ROM LI-slave k local bus ? LO-bus CMC Test Channel 1 (LO-master) nap* Bus Driver/ Receiver LO-bus 1 functional Bus bus Interface Testability Register O signals C signals Testability Register 2 O signals C signals Analog Interface analog signals. Disk Interface Control Console Interface Expansion ROM expansion bus Test Channe 2 Test Channe n LO-bus 2 LO-bus n Use IEEE 1149.1 as LO-bus Figure 3.1: The architecture of an MMC. In this case only control signals need be supplied by a test bus during self-test. An example of such a design is presented in [5]. However, if the chip does not have these facilities and is to be tested using pseudorandom test data, then a T PG and an SA m ust be made a part of the MMC. For chips tested by determ inistic test vectors, an MMC must be able to provide test vectors and obtain test results via a test channel. I ! Once initialized by the processor, the primary function of the test channel ■ is to control an LO-bus autonomously. The processor can then be used for other tasks. As a result, high test parallelism can be achieved through running several test channels at the same time. The m ajor functions of the test channel are listed below. 1. Serve as an LO-master. 2. Transmit instructions to and receive status from chips. 62 3. Generate and transm it pseudorandom test data and receive and compact test results. 4. Transm it deterministic test vectors to and receive test results from chips. 5. Generate interrupts and also direct interrupts from chips to the processor. 6. Keep count of the number of tests applied, and the number of bits of each test or instruction transm itted. Select(/CS Write (7W R) Read(/RD addressfPAl data bus(PD test channel TDO bus driver status Register Select Circuit Direct n bus us m m enable 11IFSM1 i l i Finish Interrupt Circuit ............... mmrn test control signals TMSO TMS1 *TMS2 »TMS3 TCK (INT) LO-bus — i LO-slave B I T „ controller -----------> (optional) Figure 3.2: The architecture of the test channel. O rganization o f th e T est Channel: Figure 3.2 shows a block diagram of the test channel. The test channel consists of 63 a Transm itter Register (TxR) for transm itting data over the TD I line; a Receiver Register (RxR) for receiving data on the TDO line; Two polynominal control and buffer registers PA and PB; a control register (CR) which specifies the operational mode, selection and function enabling information; a status register (SR) which contains the current chip status; three counters, namely TC, which stores the total num ber of test vectors to be sent, SC which keeps track of the num ber of bits in a test vector which have been transm itted, and DC which keeps track of the elapse idle tim e between two vectors; a register CNR which contains the initial values for SC and DC; a register select circuit for processor read/w rite control; an interrupt circuit to request service from the processor; and a control unit FSM1 which implements the LO-master protocol and is used to send and receive information via an LO-bus under the control of the CR and the three counters. If the test channel is implemented as a stand-alone unit, then it should also have a CMC. The I/O pins of the test channel consists of /W R , /R D , /C S, PA, PD, Direct, jTDI, TDO, TMSi, TCK, and other interrupt signals. The processor can write a word !of data from the data bus PD to a register, addressed by PA, in the test channel by simultaneously activating the signals /W R and /CS. Similarly, the processor can read a word of data from a register in the test channel to the data bus by 'simultaneously activating the signals /R D and /CS. I O utput signals, such as TDI, TMS are all driven through a tri-state buffer thus allowing two or more test channels to be connected to an LO-bus. This enhances the reliability of the test process as well as enables external testing of a module by another MMC [12]. A more detailed description of the m ajor blocks follows. 1. T xR (Transm itter Register). The TxR is a 16 bit register with parallel LOAD, SHIFT and T PG capabilities. It is used to transm it data over the TDI line. During pseudorandom d ata transmission the TxR acts as a T PG . The feedback polynominal of the TPG is controlled by the PA. Any feedback polynominal can be realized since the PA is directly writable by the processor. The seed value for the TPG can also be loaded by the processor. During instruction or determ inistic data transmission the TxR acts as a shift register. It must be loaded with a new word of data before transmission is initiated. The PA serves 64 as a buffer for transmission. Once the TxR is empty, the next word of data, j which is already in the PA, is copied into the TxR. Processor service is then requested in order to load a new word of data into the PA. Transmission over the LO-bus is not interrupted during the 16 clock cycle window in which the PA may receive a new data word. If the data transfer rate is not fast enough, or when the TxR is empty the PA does not contain a new word of data, the LO-bus enters a pause state until the PA is loaded. 2. RxR (Receiver Register). The RxR is a 16 bit register with parallel READ, SHIFT and SA capabilities. It is used to receive data from the TDO line. Received data is either read by the processor or compressed into a signature. ! During pseudorandom data transmission the RxR acts as an SA. The feedback polynominal is controlled by the PB. The final signature in the RxR can be read out via a processor read operation. During transmission of status or | determ inistic results, data on the TDO line is shifted into the RxR. The PB | I serves as a buffer. Once the RxR is full, its contents is copied into the PB. A i service request is generated to signal the processor to read the PB and store the data in the RAM. If the previous result in the PB has not yet been read, the LO-bus enters a pause state. Transmission cannot start again until the PB I ■ is read and the RxR transfers its data to the PB. : 1 i 3. PA, PB (polynominal control registers): Both registers are 16 bit wide and ! have parallel LOAD capability. They can be accessed by the processor via the j data bus. Their functions have already been described. j i 4. CR (Control Register). The CR is a 7 bit register. Symbolic names used for the CR bits are FSM en, IN Ten, MSO, MSI, BSO, B S l and Scan. F SM en and IN T e n enable FSM1 and the interrupt circuit, respectively; M S 0 and M S I specify the operational mode; BSO and B S l select one of the TMSi (i=0,l,2,3) signals; and Scan determines the scan or non-scan operation. j 5. SR (Status Register). The SR register consists of 4 bits namely, Finish, IRQ, Ready and Wait. The Ready bit is cleared whenever the content of the PA is copied into the TxR, and is set whenever the processor loads new data into the PA. The Finish bit is set only when the required information has been 65 transferred, or TC reaches 0. The IRQ bit is set when the INT line from the test bus is active. The Wait bit is set when both the T xR and PA are empty, and is in the reset state when the TxR is loaded. A processor SR read operation also reads the content of CR, i.e. 11 bits are read. This operation can be perform independent of the state of the FSM1. Bits Finish and IRQ are cleared whenever the SR is read. 6. TC (Test Counter). The TC counts the number of test vectors transm itted during the execution of one test session. The TC is a 22-bit down counter; it is able to count down to 0 from any number between 1 and 4,194,303. It requires two processor write operations to load: one of the write operations t i loads part of this counter and part of the CR, another loads the rest of the i j counter. I 7. SC (Scan Counter). The SC is used to keep count of the num ber of bits of a test vector or instruction which have been transm itted. SC is a 10-bit down counter and can count down to 0 from any number between 1 and 1023. Its initial value is loaded from the CNR. A terminal count signal will be activated whenever the value in SC reaches 0, and the value s in CNR will be copied into SC. In transm itting t test vectors to a chip during one test session, SC must be re-initialized (to the value s) t times. I 8. DC (Delay Counter). The DC is a 5 bit down counter and is used to count I | 1 ! the number of clock cycles between the transmission of two consecutive test j , I vectors. Its initial value can be loaded from the CNR. The DC can count down to 0 from any number between 1 and 31. A term inal count signal will be activated whenever DC reaches 0, and the value d in the CNR will be copied into the DC. 9. CNR (Count Number Register). This buffer is used to store the initial value of the constants for both SC and DC, i.e. s and d referred to above. As discussed above, these counters destroy their original contents after a test vector is transm itted. Thus CNR is used to restore the values in both counters so that the next vector can be transm itted. The CNR is 15 bits long. It can be loaded by a single processor write operation. 66 10. Register Select Circuit. This circuit is driven by the processor and is used to control the access to various registers in the test channel. Registers CNR, TC, CR, SR, TxR, RxR, PA, PB are accessible to the processor. W hen the Direct signal is inactive, the registers are selected by address. W hen the Direct signal is active, this circuit interprets a processor read operation as a write to the PA operation, thus ignoring the address lines. In addition, the address and Read signals are used to read a word from the memory unit. Thus a word of data is transferred from the memory unit to the PA of the selected test channel. Similarly, when Direct is active a processor write operation is interpreted as a read from the PB operation. The address and Write signals are used to write the contents of the PB into the memory unit. 11. FSM1. This circuit controls the operation of the test channel and acts as an LO-master. It receives control signals from the CR and conditional signals from the counters TC, SC, and DC. When the F SM en bit is set, a processor generated write operation is used to issue a Start signal which in turn initiates the FSM1. The initialization of the test channel includes the loading of the following jregister of the test channel: CR, TC, CNR, PA, PB, and TxR. The contents of the •registers SR and RxR, which are the results of the previous operation, should be jread before the test channel is again enabled for the next operation. I jO peration o f th e T est Channel: ] jThe operation of a test channel is controlled by its FSM1. The FSM1 controls the i state of a test bus via signal line TMS (see Figure 3.2). The possible bus states are shown in Figure 3.3. A test channel provides two types of operation: RunTest and Scan. During R unT est, the test bus enters the Idle/RunT est state for a pre-determ ined num ber of clock cycles. The TC counter keeps track of this number. No data is transm itted on either the TDI or TDO lines. This type of operation is used when a chip under test has BIST capability and the BIST hardware has been properly initialized through the test bus. The chip’s BIST controller runs the self-test as long as the bus stays in the Idle] R unT est state. 67 During Scan operation, the test channel transfers either pseudorandom test d ata (PTD ), deterministic test data without results compression (DTD), determ in istic test data with results compression (DRC), or instructions (INS). The operation of the test channel is controlled by the CR and three counters. These counters are used for all types of information transfer. During the different operational modes, these counters may be used for different purposes. For example, in PTD transm is sion, the TC keeps track of the number of test vectors applied, the SC keeps track of the number of bits transm itted, and the DC keeps track of the num ber of elapsed clock cycles between two consecutive test vectors. Table 3.1 indicates how these counters are used. The operational modes of the TxR and RxR are also shown in the table. PTD DTD D R C INS RunTest TC # t e s t s # t e s t s # t e s t s s e t to 1 # e lk c y c le s SC # b i t s # b it s # b it s # b i t s ----- D C # elk c y c le s se t to 15 s e t to 15 s e t to 15 ----- TxR TPG SHIFT SHIFT SHIFT ----- RxR SA SHIFT SA SHIFT ----- Table 3.1: Counter usage. Figure 3.3 shows the state transitions carried out by the FSM1 of the test channel. The dashed rectangle represents a wait for processor service. The opera tio n s indicated in the solid rectangles are executed in one clock cycle. The protocol of this state transition diagram is consistent with the IEEE 1149.1 boundary scan protocol. The F SM en bit is cleared during the power up process, and the test channel enters the idle state at this time. The processor can read from and write into internal registers of the test channel while in this state. After initializing the appropriate set of registers, setting the Start signal and F SM en bit will initiate the operation of the FSM1. Depending on the setting of Scan, MSO and M S 1 bits, the FSM1 follows one of the five m ajor branches as shown in Figure 3.3. 68 idle TMS=0 RunTest TMS=1 FSMen TMS=0 dec TC TMS=0 dec DC I see next figure ‘ similar to \ D T D TMS=1 TMS=1 Id DC Id SC similar to D T D Fimsh=1 TMS=1 TMS=1 TMJ dec 5=1 TC TMS=0 dec SC sh TxR sh RxR Fimsh=1 TMS=1 Figure 3.3: The state transition diagram for a test channel. TMS=0 dec SC dec DC sh TxR sh RxR TMS=1 TMS=0 id SC Id DC TMS=1 Wait=1 TMS=0 TMS=1 Id DC Id TxR Id PB Ready=0 TMS=0 dec SC Id DC Id 1 "xR sh RxR TMS=0 dec SC dec DC sh TxR sh RxR Id PB Ready=0 SC=0 TMS=1 TMS=0 Wait=1 TMS=0 TMS=1 TMS=1 dec TC Id TxR Id PB Ready=0 Finish=1 TMS=1 to idle Figure 3.4: The state transition diagram (cont.). The branch labeled PTD is executed when pseudorandom testing is needed. Registers PA, PB, TxR, RxR, CNR and TC are assumed to have been initialized to their appropriate values, such as pa, pb, seedl, seed2, (s,d ) and t. The T xR acts as a TPG with pa selecting the feedback polynominal and seedl as its initial value; RxR acts as an SA with pb selecting the feedback polynominal and seed2 as its initial value. The test channel then autonomously transm its t random test vectors generated by TxR to TDI and compresses t test results in the RxR. Each test result is s bit long, and d clock cycles of delay exist between two consecutive test vectors. No service from the processor is required during pseudorandom testing. The Finish jbit is set when process is completed. The processor then reads the signature stored m RxR to determine the test result. The branch labeled DTD (see Figure 3.4) is executed when determ inistic test data is used. Registers CNR and TC contain the values (s, d) and t. Note th at d is always set to 15 for the DTD process. Its purpose is to clear the Ready bit after the transmission of 16 bits. For a test vector longer than 16 bits, the T xR is loaded with ■the first 16 bits of deterministic test data before the Start signal is activated. After .15 shift operations, the TxR contains the last bit of the test data. One clock cycle later the RxR is full. Two possible situations exist. After these shift operations have occurred it is possible that the PA is full (Ready=l). Then the content of the PA is copied into the TxR and one clock cycle later the content of the RxR is copied into the PB. The Ready bit is cleared and transmission over TDI and TDO is not interrupted. The processor then has another 16 clock cycles to load the PA, read i I the PB and set the Ready bit. Another possibility is that the PA is empty (Ready—0). Transmission is then j interrupted and the Wait bit is set to request service from the processor. W aiting for the processor to read the RxR and load TxR is indicated by a dashed rectangle In Figure 3.4. The test bus is in the pause state during the wait period. Once the processor finishes the read/w rite process, it clears the Wait bit to allow the FSM1 to transfer another 16 bit of information. The Finish bit is set upon the completion of the DTD test, i.e., when the TC reaches zero. The branch labeled DRC is selected when deterministic test data are used and the test results are compressed in the RxR. The volume of information flow between the memory unit and test channel is reduced by half over the DTD operation. The branch labeled INS is followed when transm itting instructions. The content of TC is set to 1. The operations of the test channel are similar to th at for DTD operations except that the sequence of values on the TMS line is different. The branch labeled RunTest is used when the R unT est operation is required. iThe test channel transm its a specific sequence as specified by the boundary scan protocol over the TMS line such that all LO-slaves connected to the selected signal TMSi will enter the Idle/RunTest state for t clock cycles. The Finish bit is set before returning to the idle state again. The loop conditions depend on the conditional signals (TC = 0, SC = 0, DC = 0 and DC > 1) generated by counters TC, SC and DC. The processor j ■can stop or disable the operation of the FSM1 by loading a new word into the CR through a processor write operation. Resetting the F SM en bit will halt the ; operation of the FSM1. To m aintain consistent operations, modifications of all other registers, except PA, PB, TxR and RxR, are prohibited until the Finish bit is set or an interrupt has occurred. 3 .2 .2 B u s D r iv e r /R e c e iv e r The Bus Driver/Receiver (BDR) is a bidirectional interface to the local bus of the MMC. It provides the driving capability for signals to/from the expansion bus. Figure 3.5 shows the basic architecture of the BDR. Signals in and out control the 'flow of information between the local bus and the expansion bus. These two signals j are decoded from the address and control busses, which are subbusses of the local i I jbus. When the addressed unit is not directly tied to the local bus, the BDR is used jto select the appropriate unit on the expansion bus. To enable external interrupts from units tied to the expansion bus to reach the local bus, the expansion bus interrupt signals can also assert the in signal. 72 loca bus control local bus ad d ress local bus (address, data, control) o u t d eco d er expansion bus expansion bus interrupt Figure 3.5: The Bus Driver/Receiver. I I 3 .2 .3 F u n ctio n a l B u s In terfa ce The Functional Bus Interface (FBI) allows communications between the m odule’s functional bus and the MMC’s expansion bus. Through the FBI, the MMC can execute functional tests for the module. Details of this interface will not be presented jhere. Further information on related interfacing techniques can be found in [10]. 3 .2 .4 T e sta b ility R e g iste r jThis is a 16 bit register used to increase the testability of modules containing chips jwhich are either not designed to be testable, or do not have a test bus interface, j The boundary scan registers on testable chips can be used to increase the testability j of non-testable chips. However, in many cases, no boundary scan registers can be j jfound to access signals between non-testable chips. The Testability Register can j jbe used to increase the testability of these chips and their signals in the following way. Signal points which need to be controlled (C ) and/or observed (O) are cut and ■fed into the Testability Register. The O signals are connected to C signals during norm al operation (see Figure 3.6). In test mode, the processor writes a word to the testability register which in turn applies this data to the C signals. Signals which 73 need to be observed ( 0 signals) are loaded into the testability register and then read by the processor. Thus both the controllability and observability of these cut points are enhanced. A technique for selecting these signal points is presented in [17]. 3 .2 .5 A n a lo g T est In terfa ce This circuit is used when there are analog circuits on the module under test (see Figure 3.7). To generate an analog signal, the processor writes a word to the analog test interface, the D /A converter then converts this data into an analog signal. For observability, an analog signal is converted into a digital word which can then be read by the processor. 3 .2 .6 L l-S la v e The MMC communicates with its higher level SuMP controller via an Ll-bus, thus t it must have an Ll-slave. The design of a TM-slave is given in [13]. I ,3.2.7 P r o c e sso r i The processor’s functions can be classified into five categories: (1) transfer data between memory and test channels; (2) transfer data between memory and an Ll- slave; (3) compare test results with expected results; (4) transfer data between | Lnemory and expansion units; and (5) execute test and/or diagnostic programs. j i A general or special purpose 16 bit processor can be used in the MMC. It controls all other units in the MMC. Through read,/write operations, the processor can access internal registers of a peripheral device, such as the Ll-slave and test jchannels. Operations of a peripheral device can thus be controlled by a processor iwrite to the control register of the peripheral device. D ata exchange between mem- 1 jory and a peripheral device are controlled by processor read/write operations. Any 1 processor th at can execute the instruction set shown in Table 3.2 is powerful enough for the application of an MMC. 74 D O O signal 0 C signal 0 O signal 1 C signal 1 data bus D15 O signal 15 C signal 15 address bus control bus O signal DFF DFF TestMode read data bus sample write M UX c signal decoder bitO TestMode sample read write bit 1 TestMode sample read write bit 15 TestMode sample read write (b) Figure 3.6: A Testability Register; (a) block diagram (b) circuit of bit D O Analog signal data bus D15 write address control read Analog signal Q D decoder 16-bit D /A 16-bit A/D enable Figure 3.7: Analog Test Interface. 76 in s tr u c tio n meaning LD A R{ Load Acc w ith Ri LDA M Load Acc w ith memory (M) S T A Ri S tore Acc to Ri S T A M S tore Acc to memory (M) ADD Ri Add Ri t o Acc A N D Ri B itw ise And Ri w ith Acc C M P Ri Compare Acc w ith f?. N E G Complement Acc CLA C lear Acc B R Z Ri Branch to (Ri) i f Acc not zero J M P Ri Jump to (Ri) P U SH Push Acc onto Stack POP Pop Acc from Stack NOO P No o p era tio n H A L T H alt th e p r o c esso r Table 3.2: Processor instruction set. | The minimal architecture for a processor which is able to execute the above instruction set consists of an accumulator, four general purpose registers, an ALU, a program counter, a program status word, a stack with at least 4 words, an interrupt circuit, and a microprogrammed control unit. If the MMC is implemented as a single chip ASIC, two additional instructions are useful to increase the data transfer efficiency between the memory unit and the test channel. The added instructions are M R (M ultiple Read) and M R M W (M ultiple Read and Multiple Write). The signal lines Direct, F in ish , and Ready 'are used exclusively to support these two instructions (see Figure 3.8). Signal Direct ;is active when the microcontroller is executing any one of these two instructions, jsignals F inish and Ready are used as conditional signals for the microcontroller j of the processor. All Ready (F in ish ) signals from test channels are wired-ORed together. ! W hen executing an M R instruction, the processor waits until the Ready ] 1 signal is cleared and then issues a read operation to the memory location addressed j by the general purpose register RO. Meanwhile the test channel with F S M e n bit 77 15' TDO TDO address data Read Write RAM address data Read Processor Write Direct Ready Finish address data Read FSMen Write ryj Direct L~' Ready Finish Test Channel 1 TxR f address data Test Channel 2 Read FSMen Write f-'; Direct l — J' ! TxR i " eaC* ; RxR - Finish - Figure 3.8: Control signals for MR and MRMW instructions. set and operation mode being either DTD, DRC or INS can generate a load PA signal using signals Direct and Read. Thus a data word is moved directly from memory to a test channel. The value of RO is increased by one after each read. The processor waits for the Ready signal to be deactivated, and then issues another read operation. This process is repeated until the Finish signal is set. Thus a block of information can be moved from the memory unit to the selected test channel and transm itted to a chip without any interruption. * W hen executing an M R M W instruction, the processor waits until the Ready signal is deactivated and then issues a read operation to the memory location ad dressed by RO. Meanwhile, the enabled test channel generates a load PA signal, and ithe data word from memory is loaded into the PA. The value of RO is increased by i one. The processor then issues a write operation to the memory location addressed by RI; meanwhile the enabled test channel generate a read PB signal, and a data ,word is read out of the PB and sent to the memory. The value of R I is incremented. The processor waits for the Ready signal to be deactivated again and then issues another read/write operation. This process is repeated until the F inish signal is set. Thus a block of deterministic test data is moved from the memory unit to the selected test channel, and a block of test results is moved from the selected test channel to the memory unit. 3 .2 .8 M em o ry T h e memory unit in an MMC is composed of a RAM unit and a ROM unit. The ROM unit contains test programs to test the entire module. These programs are ^compiled separately before testing. Some crucial information about the chips on the module is stored here. This information includes the num ber of chips to be tested, ordering of chips along the test bus ring, number and length of scan chains for each i i chip, num ber of random test vectors to apply to each chain, test instructions for each ; chip’s CMC, TPG seeds and good signature for each test session. MMC functional j self-test programs can also be stored in this unit. If the MMC is implemented using commercial ICs, then these programs are essential for MMC self-test. The Expansion ROM can be added whenever the module requires a large test program. 79 j The RAM unit provides scratch pad memory for test program execution. Response signatures are stored here for latter evaluation. The RAM also provides storage for the Go/NoGo status for all chips, as well as for the entire module. 3 .2 .9 S ta n d -A lo n e M M C The MMC can be used as a stand-alone mini ATE, provided th at extra storage and console capabilities are added. For this application, a Console Control Interface and Disk Interface can be added to the MMC. ! I 3 .3 M M C S elf-T est ! I I If the MMC is implemented with an off-the-shelf “non-testable” processor, ROM and j jRAM, then some form of functional self-test is required. After finishing self-test, j |the MMC then reports its status to the control console or to a SuMP. i | ] An MMC can also be tested either by an ATE or by another MMC. In the jfirst case, an ATE can access the expansion bus of the MMC under test. The ATE j invokes the self-test program of the MMC under test and waits until its completion, j ;The test results, which are stored in the RAM, are then read by the ATE. In the ] isecond case, an MMC uses its Functional Bus Interface to access the expansion bus j of the MMC under test. Again self-test programs can be invoked. Test results can jbe read and interpreted by the monitoring MMC. If the MMC under test is implemented as a custom testable ASIC, then we I i assume it has a CMC. The MMC can thus be tested by another MMC. All units in the MMC, such as the processor, RAM, ROM, test channel and Ll-slave, must be designed to be testable and their BIT structure need to be accessible via the L0-slave. Some of the testability features of the test channel are described next. T estab le te st channel: The testable design features of a test channel are shown in Figure 3.9. M ajor combinational logic blocks are indicated by rectangles having 8 0 Write Read address Direct data bu> To MMC local bus Ready Finish IRQ LO-bus rag; X v ' v ! Register Circuit SR TxR -i D CNR TC r p a CR - 1 U PB - c SC — » DC □ Interrupt Circuit FSM1 1:4 CMC RxR \ LO-slave BIT controller 0 M : \ \ scan chain 2 scan cham 1 • TDI -TDO pTMSO >TMS1 >TMS2 ►TMS3 -TCK -(INT*) Figure 3.9: Testable design features for a test channel. 81 dotted lines. Registers are indicated be rectangles having solid lines. Some logic is associated with these registers, since some are counters and LFSRs. Normal functional connections are not shown. Instead the two scan chains formed during self-test are shown. Scan chain 1 is the boundary scan chain. All I/O signals can be controlled and observed by shifting test data or results along this chain. All other registers make up scan chain 2. The state of the test channel is controlled by shifting d ata along this scan chain. If a functional clock is activated, the next state of the test channel also can be observed by shifting out the content of this chain. During testing, scan chain 1 is first loaded with test data which is held in place while the logic associated with scan chain 2 is tested. The module I/O s are tested using the boundary scan chains of this chip and those to which it is connected, i 3 .4 D isc u ssio n o f M M C D esig n An MMC design suitable for controlling the self-test process of a module has been [described. The design uses the concept of test channels, which can run a test ; I .autonomously (in PTD case) once it is initialized by the processor. Because of the test channel, the processor need not deal with detailed control sequences over the JTAG boundary scan bus. Test execution sequences for chips can be generated in term s of processor read,/write operations, which greatly simplifies the development of test programs. The MMC architecture is expandable. More test channels can be added j so th at more chips can be tested in parallel. In addition, the MMC supports the functional testing of a module, the testing of clusters of chips which are not designed to be testable, and the testing of analog devices. C lock Synchronization: i Four or more clocks may be applied to an MMC, viz. TCK for the CMC, FCK1 for the Ll-slave, FCK2 for each test channel connected to an LO-bus, and FCK3 for the operation among processor and other peripheral devices. Synchronization problems will occur in a test channel where both FCK2 and FCK3 may access the same component, such as TxR and RxR. Techniques to solve this problem can be 82 found in [39, 10]. In the design presented here, we use a common clock to drive all the clocks mentioned above, thus avoiding the clock synchronization problem. P ortab le Tester: The proposed MMC is designed to be part of a HTM system. It is assumed that each module contains an MMC, which under request from a SuMP can test all chips on the module and report back test results. However, it is possible to build an MMC as a portable stand-alone unit. In this case the Ll-slave can be replaced by a control panel. A stand-alone MMC can test any module having an LO-bus. The m odule’s built-in MMC is tested first through its LO-slave. Application chips on the i module can be tested either by the built-in MMC or by the stand-alone MMC. For |the latter case, the built-in MMC must be disabled to allow the stand-alone MMC I to take control the m odule’s LO-bus. An operator can start the test process via the control panel. Test programs stored in the ROM then take over control. After all chips have been tested, test results are shown on the control panel to indicate the Go/NoGo status of the module under test. i O verhead: * .There are several ways of implementing an MMC. One or more test channels can be ibuilt on an ASIC chip. The processor, RAM and ROM can be implemented using standard chips. The other functions, which are optional, can be implemented using standard parts or an ASIC chip, excluding the expansion ROM. The application chip requires overhead to support testability, such as scan registers, as well as a ^ |L0-slave. For double latch designs, scan area overhead usually varies from 2.8 to ■ I 6.3%, depending on the ratio of gates to latches [52]. The overhead for an LO-slave depends on the length of the Instruction Register and the num ber of I/O pins. Assuming each shift register latch (SRL) is equivalent to 10 gates, an LO-slave with a 16 bit instruction register and a 60 bit boundary scan register requires about 1600 gates. For a 50000 gate ASIC chip, the total overhead for testability will typically 1 be between 5-10%. The boundary scan bus consists of 4 wires. Assuming 60 pins/chip prior to adding the bus, the routing overhead to support testability will be at least 4/60*100%=6.7%. This is a lower bound since most pins on a chip are tied to 83 f only 2 or 3 point nets, while the test bus goes to all IC ’s. The wiring overhead is estim ated to be closer to 10%. Fault Isolation: One of the im portant attributes of boundary scan is the ability to test the inter connect between chips. Assuming chips are also designed to be testable via DFT or BIST techniques, the MMC should be able to accurately locate hardware faults to a chip or interconnect. A n alog Perform ance: Since the A /D and D /A conversion time is much smaller than the data transfer rate in the bus, the speed of observing or controlling an analog signal is determ ined by jthe data bus bandwidth. For example, an Intel 80186 processor running at 8MHz clock rate can transfer 4 MByte data from memory to the Analog Interface in 1 second. 3.5 A n M M C P r o to ty p e 1 I ^The implementation of an MMC prototype is described in this section. The m ajor components of an MMC includes a processor, a memory unit and a Test Channel. The MMC prototype has been successfully used to execute the test procedures for a test chip. Programs th at describe these test procedures are easy to develop since they can be w ritten in a high level languages such as C. ! 3 .5 .1 T est C h a n n el The Test Channel is implemented using the Actel field program m able gate array (FPLA) technology. Some design changes have been made in the implementation. The m ain reasons for these changes are 1) the limited capacity of the device; 2) a change with the clocking scheme; and 3) the addition of DFT facilities. A detailed description of the changes in the design can be found in [44]. 84 Extension Card Test Channel Bus Decode I/O Slot Cable IEEE 1149.1 I IBM XT/AT Proto Board I I i I Figure 3.10: Physical configuration of the MMC prototype. The implementation was aided by the Actel Action Logic System, which au tom atically performs placement, routing and the programming of ACT1020 devices. The Test Channel uses an ACT1020 device which is packaged in an 84 pin Plastic Leaded Chip Carrier (PLCC). The module utilization of this device is high. 513 out !of a total of 548 logic modules are used, and so are 47 out of 67 I/O modules. This Ichip can operate at 2.5 MHz and consumes less than 250 mW of power. j - i f 3 .5 .2 P r o c e sso r an d M em o ry An IBM AT computer is used as the host computer of the prototype. It provides both the processor and the memory units required in an MMC. The physical configuration of the prototype is illustrated in Figure 3.10. A board which occupies a bus slot in ithe IBM AT is used to provide an extension of the I/O bus. Another board called the proto-board and developed at Stanford University [23], is used to decode the bus signals and to accommodate the Test Channel chip. 3 .5 .3 P r o c e sso r an d T est C h a n n el In terfa ce Interfacing the Test Channel with the host processor requires very little effort. Only the following signal lines need to be connected: a 16 bit bata bus, a 4 bit address 85 bus, a chip enable line and two read/w rite control lines. These lines are available on the I/O bus of the host. Through these lines, the host can control the Test Channel by executing I/O read/w rite operations. The proto-board provides I/O connection and bus decoding logics for inter facing with an IBM XT or AT computers that can serve as a host for the Test Channel chip. The width of the data bus on the proto-board is 8 bit only, which is incom patible with the 16 bit design of the Test Channel chip. Hence, a data bus adaptor that provides data buffering between the 8 bit and the 16 bit bus is required. Figure 3.11 shows the data bus adaptor used in the prototype. The signal lines available on the proto-board include HD[7:0], HA[3:0], /P IO R , /P IO W and /P O R . HD[7:0] is the 8 bit data bus from the host. HA[3:0] is the 4 bit address bus from the host. The /P IO R and /PIO W signal lines are derived from the host signal lines /IO R , /IO W and address lines. Both /P IO R and /P IO W are active only when I/O is selected in the address range from 300H to 30FH. The signal lines of the Test Channel chip that need to be controlled by the host are PD[15:0], PA[3:0], /RD , /W R , /CS and Reset. PD[15:0] is a 16 bit data Jbus. PA[3:0] is a 4 bit address bus that selects the internal register to be accessed. A data buffer consisting of two 74LS373 is used to interface PD[15:8] with HD[7:0]. A bus transceiver 74LS245 is used to interface PD[7:0] with HD[7:0]. The address PA[3:0]~1111 is reserved for accessing the data buffer, which is used to store the high byte data. Two write operations are required to move a 16 bit d ata from the host to an internal register of the Test Channel. The first write oper ation, which is “outp(0x30f, high_byte_data);” in Microsoft C, loads the high byte d ata into the data buffer. The second write operation “outp(0x30i,low_byte_data);” loads both the high byte and low byte data into the internal register th at is ad dressed by PA[3:0]=i. Similarly, two read operations are required to move a 16 bit data from an internal register of the Test Channel to the host. The first read operation “low_byte_data=inp(0x30i);” moves the low byte of the internal register into the host data bus and the high byte into the data buffer. The second read 86 LS373 TestChanne HD[7:0] PD[15:8] m i EN /OE .S3Z3 EN /RH /OE PD[7:0] S/R r /cE HA[3:0] PA[3:0] /RD /WR /CS /RL /PIOR /WL Reset /PIOW /POR —c Figure 3.11: The data bus adaptor. operation “high_byte_data=inp(0x30f);” moves the high byte from the d ata buffer into the host data bus. 3 .5 .4 D isc u ssio n The prototype has been successfully used to execute the test procedures for a test chip. Since the programs describing these test procedures are w ritten in a high-level language like C, they can be easily developed and designed. Also, the prototype has been implemented using a PC, which costs much less than a conventional ATE. Furtherm ore, the performance of the prototype is superior to an ATE in testing (boundary scan devices thanks to the Test Channel chip. Using the developed Test Channel, it is possible to implement a minimal . MMC by adding two chips such as an off-the-shelf RAM chip (e.g. Hitachi 6116), and a micro-controller (e.g. Intel 8048). Thus at most three chips are required to make a board completely self-testable. j The m ajor drawback of the field programmable gate array technology is its I Jlimited capacity. Because the maximal capacity of the selected device (ACT 1020) is less than 2000 gates, many functions in the original designs have been om itted. It is possible to implement the MMC using a different technology that has a larger ca pacity than an ACT1020, such as the ACT 2 devices. In such case, the performance jof the MMC can be improved as follows. t 1. Add the PA and PB registers, and the DC counter, om itted in this implemen- j tation, to the Test Channel. 7 I 2. Increase the length of the two counters TC and SC to account for a larger num ber of test vectors and more bits in each vector. 3. Incorporate a memory control circuit so th at the Test Channel can access a i local RAM. In this way it is possible for the Test Channel to send a long sequences of instructions and data without interruption. 4. Integrate the processor, the Test Channel and a memory unit into a chip. Handshaking circuit among the processor and the Test Channel can be avoided 88 since they can be designed to be synchronous. However, to fit into a chip, the size of the processor instruction set and the on-chip memory should be bounded. i l 3 .6 T estin g a K ern el U sin g M M C a n d C M C To test an application circuit consisting of one or more kernels, both the MMC |and the CMC are required. Through a test bus the proposed MMC controls the execution of and provides all necessary test data for the testing of a chip having a CMC. During test mode the control signals of a kernel must be activated in the sequence specified by its associated control graph. Figure 3.12 shows how the test control signals for a kernel can be generated and controlled by an MMC with the help of a CMC consisting of a TAP and a BIT- controller. The signals C i, Cj , Ck are controlled by the state signals C ap tu re, S h i f t , U pdate, RunTest corresponding to their associated TAP controller states, which are in turn controlled by the TMS line. The value of the TMS line is deter mined by the state of the FSM1 in the test channel. The FSM1 can control the TAP controller to any desired state via the TMS line. When sending data to the kernel, i a predefined sequence of state transitions is generated by the FSM1 to activate the TAP states such th at data can be received. Several predefined sequence of state transitions have been built into the FSM1. The processor in the MMC can select a sequence by loading proper data into the internal registers of the test channel. The loading of a register in the test channel is achieved by controlling the signals / CS, i/WR, /RD, PA, PD. These signals can be controlled by the processor executing a piece of code that contains I/O instructions. By executing a program stored in the memory unit of an MMC, the processor can control the test of a kernel in the application circuit. The program directs the processor to control the test channel by loading its internal registers. The FSM1 is then activated and produces the proper sequence of values on the TMS line, which drives the TAP controller to appropriate states. This activates the state signals and, 89 MMC CUT TAP BIT kernel cont. cont. test channel Proc. Capture Shift Update RunTest FSMl TMS=C dec SC dec DC shTxR shRxR ► TMS Capture ► Shift Update ► RunTest control graph TAP controller Figure 3.12: Overview of the test control. 90 with the help of the BIT controller, the control signals C i, Cj , Ck are activated according to the control graph. The kernel is thus tested. W riting a program to test a kernel is a difficult problem since it involves a high degree of complexity resulting from many details at various level. To solve this problem a test program synthesis technique is used. This technique is presented in chapter 4, where test description languages are provided for describing test proce dures at both chip and module levels. Software tools are also provided so that test programs can be automatically synthesized from files w ritten in these languages. The synthesized test programs are compiled and loaded into the memory unit of the | MMC. The processor can then test the kernel by executing the test programs. I I C h a p ter 4 T est P ro g ra m S y n th esis ; i i l . ; i W hen the proposed design methodology is properly adopted, test programs for the | system can be easily synthesized. This is one of the most im portant aspects of the j BOLD system. The ability to synthesize the test programs for a system represents a m ajor advance in the reduction of the test development time. The relationship between the test hardware and software in an HTM system is shown in Figure 4.1, where four axes are used to represent the hardware assembly [units, the test description languages, the test programs, and the test controllers, 'respectively. Each axis again is represented by a hierarchy of four levels. The top ^hardware assembly unit is a system, which consists of several subsystems, which jagain consists of several modules, which again contains many chips. Each hard- ■ ware assembly unit has a test controller associated with it. These test controllers ; I include the system maintenance processor (SMP), the subsystem m aintenance pro- | cessor (SuM P), the module test and maintenance controller (MMC), and the chip , test and m aintenance controller (CMC). The test programs are classified into four levels according to their applications to the hardware units. These are the system 4est program (STP), the subsystem test program (SuTP), the module test program (M TP), and the chip test program (CTP). The test languages used to describe the : test aspects of a hardware unit includes the system test language (STL), the sub system test language (SuTL), the module test language (MTL) and the chip test language (CTL). 92 Hardware u n its system T est e x e c u tio n Manual g en era ted s . d e s c r ip tio n s subsystem module ch ip SM P SuMP.. M M C / C M C 3 H CTL lMTL SuTL STL T est Languages C o n tr o lle i and t e s t ' C2C CTP M 2C programs \ (e x e . codes Su2C M T P S2C SuTP S y n th e siz e r s C co m p ilers STP i T est programs in C Figure 4.1: Overview of the test hardware/software hierarchy. 93 The synthesis process starts with the preparation of the test description files | by a designer. These files are constructed using high level description languages ! th at are easily understood by designers with little or no knowledge of testability. A set of synthesizers are then used to translate the input files into appropriate formats (which is in C language) for each hardware unit. These test programs are then translated down to executable code for the test controllers by an appropriate C compiler. A test controller can then test its associated hardware unit by executing the loaded executable codes. In such a manner, the entire system can be tested. The m ajor advantages of the synthesis approach are (1) consistent test m ethod ology, that is, a chip is tested using the same test set during the chip test, module test, subsystem test and system test; (2) reduced time, effort, and errors in test program development; (3) test programs can be prepared by designers with little knowledge of testability; (4) interconnect testing is included autom atically. This chapter is organized as follows. In section 4.1 the test description lan guages are presented. In section 4.2 the test program synthesizers are described. In section 4.3 an example is used to illustrate the synthesis of test programs. In sec tion 4.3 an example is used to show the details of test program synthesis procedure. In section 4.4 results of synthesizing several modules are presented. ( I I i !4.1 L an gu ages Four set of languages are required in describing the test aspects of a system, namely Chip Test Language (C T L ), Module Test Language (M T L ), Subsystem Test Lan guage (SuTL ) and System Test Language (STL). Currently, only C T L and M T L have been developed and are supported. These two languages are described next. I 4 .1 .1 C T L - T h e C h ip T est L angu age Due to its wide acceptance, the BSDL [55] has been adopted as the framework of the CTL. Effort is currently under way to make this language a new IEEE standard 94 to go along with the boundary scan standard. The syntax of the BSDL follows that of VHDL [32], which has been widely accepted as a hardware description language. A very brief description of BSDL is given here. The BSDL can be used to describe the testability information of a boundary scan device (or chip) th at con forms with the IEEE Std. 1149.1. The information described by the BSDL includes three m ajor parts: the pin I/O, the Test Access Port (TAP) and the boundary reg ister. The pin I/O part contains the definition of the logical port, the package pin mapping, and the definition of the scan port. The TAP part describes the instruc tion and the instruction register (IR), the identification register (ID) and the other registers th at can be accessed. The boundary register part characterizes the cell type of each boundary cell, the ordering of the cells in the register and the control information for tri-state and bi-directional cells. A detailed description of the BSDL can be found in [55]. The BSDL describes the information about the on-chip test hardware; how ever, it does not provide the information about how the chip can be tested. To | circumvent this problem, the CTL includes a part called Test Procedure in addition (to the original BSDL description. The Test Procedure provides the information I required for testing the chip. The incorporation of the test procedure is achieved I !by adding a VHDL attribute called TESTJPROC. The following example shows how a test procedure is incorporated in a CTL-file. a t t r ib u t e TEST_PR0C o f appl : e n t it y i s "Test_Begin" & "T D M 1 = FULLSCAN;" & "REG=FBR, VECFILE=apl_in2, RESFILE=apl_out2;" & "REG=BOUNDARY, VECFILE=apl_inl, RESFILE=apl_out1;" & "C LO C K = FC K 1 .0 CYCLES.IN RUN_TEST_IDLE;" & ! "T est.End j I It is also possible to omit this attribute and describe the test procedure in a separate file, where the quotes and & are omitted. For example, the above test procedure can be w ritten in a file called a p p l. ctp as follows. 95 TEST.BEGIN T D M 1 = FULLSCAN; REG=FBR, VECFILE=AP1_IN2, RESFILE=AP1_0UT2; REG=BOUNDARY, VECFILE=AP1_IN1, RESFILE=AP1_0UT1; C L O C K = FCK 1 .0 CYCLES.IN RUN.TEST.IDLE; TEST.END T C K -----------------------> T est T M S -----------------------s» Chip C o n tr o lle r TDI Under T D O < -------------------------- T est 1 Figure 4.2: Test control model used in CTL. The test control model used in CTL is shown in Figure 4.2, where a test controller can execute the test process of the chip under test (CUT), which is also referred to as the device under test (DUT), via a four-line boundary scan test bus. A Test Procedure consists of one or more test sessions. During a test session different ! parts of the chip are tested according to a predefined methodology, which is called ■ a Testable Design Methodology (TDM). The procedure of testing a circuit designed j with a specific TDM is referred to as a TDM in the Test Procedure. The procedure can be described as either tem plate-based or user-defined TDMs. A tem plate-based TDM is used to describe the procedure to test a circuit designed with a commonly 'used TDM such as fullscan or BILBO. A user-defined TDM, on the other hand, is used to describe an arbitrary procedure composed by the user using C codes or some test-specific statem ents provided by the CTL. A more detailed description of these 1 TDMs follows. * T em p late-B ased T D M s The tem plate-based TDMs that are currently supported include: Fullscan, FullscanN, BILBO, RUNBIST, and INTEST. i I ______________________________________________ 96J • Fullscan TDM: The circuit under test is designed with the full scan technique, where all storage elements in the circuit are made scannable and are cascaded to form a scan chain. The circuit thus is observable and controllable via the scan chain. The fullscan design structure of a circuit is shown in Figure 2.8, where the control graph for this circuit can also be found. A circuit designed with the fullscan technique can be tested using the following procedure. Procedure to t e s t a c ir c u it u sin g th e F u llsc a n T D M : 1. Load a v e c to r to th e scan ch ain by s h i f t in g s tim es 2. Repeat f o r t — 1 tim es Update th e scan ch ain by a p p ly in g a fu n c tio n a l c lo c k . S h ift out th e r e s u lt w h ile s h i f t in g in n ex t v e c to r . 3 . Get th e l a s t r e s u lt by s h if t in g s tim e s . In addition to the number of test vectors (t) and the length of each vector | (s), the test controller also needs to know where to get the test vectors and j the correct response vectors such th at they can be compared with the test j responses. An example of the Fullscan TDM described in CTL is shown as follows. T D M <tdm_id> = F u llsc a n ; R E G = < re g l> , VECFILE= < f i l e l > , RESFILE= < file 2 > ; CLO CK = FC K <numberl> CYCLES_IN RUN_TEST_IDLE; The <tdm_id> identifies the TDM in a CTL-file. The selected scan register is j < regl> , which must be previously defined in the CTL-file. The test vectors j are stored in the file <f i l e l > and the expected result vectors are stored in | the file <f ile 2 > . The number of test vectors and the length of each vector is contained in the file <f i l e l > . After a test vector is loaded into the scan register < regl> , it is necessary to apply <numberl> cycles of FC K clock to the circuit under test before the test results are available for shifting out. Note th at the test bus must be m aintained in the RUN_TEST_IDLE state during the application of the clock FCK . 97 • FullscanN TDM: This TDM is similar to the Fullscan TDM except th at the scan registers are organized into more than one scan chain. All scan chains m ust be loaded with a new test vector before applying a system clock to capture the test result. The procedure for testing a circuit with full scan structure using m ultiple scan chains is as follows. Procedure f o r t e s t i n g a c ir c u it u sin g th e F u llscan N T D M : 1. Repeat fo r t tim es 1 .1 Load each scan ch ain w ith a t e s t v e c to r segm en t. 1 .2 Update a l l ch a in s by ap p ly in g a f u n c tio n a l c lo c k . 1 .3 Get a r e s u lt segment from each scan c h a in . i Note that the steps 1.1 and 1.3 can be executed simultaneously, i.e. shifting in a new vector segment while shifting out a result segment, provided that all the scan chains can be updated simultaneously. However, this is not possible in the IEEE Std. 1149.1 protocol, where the capture state precedes the shift state. However, as shown in chapter 7, steps 1.1 and 1.3 can be overlapped for some scan chains if there is no data dependency between these chains. An in depth analysis of this problem is presented in that chapter. As shown in [45], information about the data dependency among the boundary scan chains is ! needed. If this information is not available, one can separate the operations j of steps 1.1 and 1.3 so that no conflict exists. However, in this case the test application time is not necessarily minimal. An example of the FullscanN TDM in CTL is shown below. T D M <tdm_id> = F ullscanN ; R E G = < r e g l> , VECFILE - < f i l e l > , RESFILE = < f ile 2 > ; R E G = <reg2>, VECFILE = < f ile 3 > , RESFILE = < file 4 > ; C L O C K = FC K 1 .0 CYCLES_IN RUN_TEST_IDLE; i i • BILBO TDM: The circuit under test has been designed using the BILBO methodology and consists of one or more BILBO kernels. For each BILBO kernel, the seed 98 and the correct signature must be provided. In addition, the num ber of the pseudorandom test vectors that are applied is required. A circuit designed using the BILBO TDM is shown in Figure 2.10, where the control graph can also be found. The procedure for testing a circuit using the BILBO TDM is as follows. Procedure fo r t e s t in g a c ir c u it u sin g th e BILBO TD M : 1. Load th e seed s in to th e BILBO r e g is t e r s . 2 . Apply a number o f c y c le s o f t e s t c lo c k s . 3 . Get th e s ig n a tu r e s from th e BILBO r e g i s t e r s . Note th at more than one BILBO kernels can be tested in a single session. An example of the BILBO TDM in CTL is as follows. j T D M <tdm_id> = BILBO; INITIALIZE < regl> = < v a lu el> , <reg2> = < valu e2> ; USE_INSTRUCTION = < in sl> ; C L O C K = <TCK or FCK> <numberl> CYCLES_IN RUN_TEST_IDLE; EXPECTED_RESULT <reg3> = < valu e3> , <reg4> = < v a lu e4 > ; It is necessary to load both registers < regl> and <reg2> with the initial values < v a lu el> and <value2>, respectively, at the beginning of the test. The in struction register (IR) of the TAP must be loaded with the instruction < in sl> | during the execution of the test. After the execution of the test, the final sig- ! nature in the registers <reg3> and <reg4> must be <value3> and <value4>, j respectively, for a fault-free circuit. J I • RU N BIST TDM: The circuit can be tested using the public instruction RUNBIST defined in the IEEE 1149.1 standard. Once the RUNBIST instruction is loaded into the IR of the TAP, the self-test procedure can be executed by simply applying the test clock TC K during the RUN_TEST_IDLE bus state. The result of the test is stored in a register <regl>. The <valuel> represents the result th a t should be in the register when the circuit is fault free. An example of the RUNBIST TDM is listed below. 99 T D M <tdm_id> - RUNBIST; C L O C K = T C K <number> CYCLES_IN RUN_TEST_IDLE; EXPECTED_RESULT < regl> = < va lu el> ; • IN T E ST TDM: The circuit is tested using the INTEST instruction defined in the IEEE 1149.1 standard. The instruction INTEST must be loaded into the IR of the TAP before the execution started. This TDM differs from the Fullscan TDM in th at only the boundary scan register is included in the scan chain and no internal storage elements are scanned. Procedure f o r t e s t in g a c ir c u it u sin g th e INTEST T D M : 1. Repeat f o r t tim es 1 .1 S h ift a t e s t v e c to r in to th e boundary scan r e g i s t e r . 1 .2 Apply one or more fu n c tio n a l c lo c k c y c le s . 1 .3 S h ift th e r e s u lt s out o f th e boundary scan r e g i s t e r . An example of the INTEST TDM is shown below. T D M <tdm_id> = INTEST; VECFILE = < f i l e l > , RESFILE = < file 2 > ; C L O C K = FCK <numberl> CYCLES_IN RUN_TEST_IDLE; U ser-D efin ed T D M In addition to the normal C code, some additional C functions have been pro vided for describing user-defined TDMs. The following paragraphs describe these functions. • ScanlR(outS); The content of the instruction register (IR) of the TAP can be updated by executing this function. The string outS is loaded into the IR after this function is executed. When scanning in the new instruction, a string of values called status is also scanned out. The status is the logic value on the parallel 100 data input to the instruction register before the shifting started. By executing this function, a test controller can control the test process of a chip. The format of the instruction is determined by the chip designer except for those that have been predefined by the IEEE Std. 1149.1. The form at of the status is also defined by the chip designer. • ScanDR(outS); This function is similar to the Scan I R except that the selected data register is scanned. The selection of the data register is determined by the current con tent of the instruction register, which can be altered by the S ca n IR function. W hen scanning a new string of data, the resulting string from the selected data register is also scanned out. The result is the logic value on the parallel d ata input to the data register before the shifting started. • Apply Clock (tck, n); This function can be used to provide test clocks to the circuit under test. Both the test clock T C K and the functional clock FCK can be applied using this function. For example, if the value of tck is 1 (or true), then the test clock TC K is applied for n cycles while the rest of the I/O pins are kept unchanged. If the value of tck is 0 (or false), then the functional clock FCK, which may consist of several phases, is applied for n cycles while keeping the input to the rest of the I/O pins unchanged. • Bring2state(i); The execution of this function drives the bus to state i, defined in the IEEE Std. 1149.1, regardless of the current state of the bus. It is easy to conclude, from the state transition diagram, that putting a 1 on the T M S line for five consecutive clock cycles will bring the TAP controller into the R eset state. The TAP controller can then be brought to any state i from the R eset state. If more than one possible state transition paths exist, the one th at goes through the smallest number of states will be taken. • State2state(i,j); The execution of this instruction drives the bus from state i to state j. Again, the path taken is through the smallest number of states. 101 • RepeatState(i,n); The execution of this instruction enables the bus to stay in state i for n consecutive clock cycles. Note that this instruction can be applied to only some of the states. For the shi f tD R and s h i f tI R states, this instruction can be modified to Repeat S ta te (i,n , Sout, S in ), where Sout is the string sent to the TDO line and S in is the string received from the TDI line. • RunTest(n); The bus is driven into the RUNT_TEST_IDLE state and held there for n consecu tive test clock cycles. During this period, the on-chip test controller executes the predefined test process for the built-in self-test structures. 4.1.1.1 Form al D efin ition o f th e CTL The CTL is a superset of the BSDL. The formal definition of the BSDL can be found in [55]. The definition for the chip test program is listed below using the YACC [35] input format. t% c t l : BSDL chip_test.program ; ch ip _ test.p ro g ra m : _TEST_BEGIN t e s t .p r o c s .TEST.END; t e s t .p r o c s : t e s t .p r o c I t e s t .p r o c s t e s t .p r o c ; t e s t .p r o c : _TDM _INT_NUM _EQ .RUNBIST .SEMICOLON ru n b ist.td m I _TDM .INT.NUM _EQ .INTEST .SEMICOLON in te st.td m I _TDM .INT.NUM _EQ .FULLSCAN .SEMICOLON fu llsc a n .td m I _TDM .INT.NUM _EQ .BILBO .SEMICOLON b ilb o.td m I _TDM .INT.NUM _EQ .USER.DEFINE .SEMICOLON u ser.d ef_ p ro c; ru n b ist.td m : c lo c k .p a r t r e s u lt.p a r t; c lo c k .p a r t : .CLOCK _EQ _TCK nums .CYCLES.IN _RUN.TEST.IDLE .SEMICOLON I .CLOCK _EQ _FCK nums .CYCLES.IN _RUN.TEST.IDLE .SEMICOLON I .CLOCK _EQ _FCK .SHIFTED 1 0 2 I .CLOCK _EQ .NONE; nums: .FLO.NUM I .INT.NUM; r e s u lt .p a r t : .EXPECTED.RESULT r e s . l i s t s .SEMICOLON; r e s . l i s t s : r e s . l i s t I r e s . l i s t s .C O M M A r e s . l i s t ; r e s . l i s t : .IDENTIFIER _EQ .BIN.NUM; in te st.td m : .VECFILE _Eq .IDENTIFIER .C O M M A .RESFILE _Eq .IDENTIFIER .SEMICOLON c lo c k .p a r t; fu llsca n .td m : r e g .F l i s t s c lo c k .p a r t; r e g .F l i s t s : r e g .F l i s t l r e g .F l i s t s r e g .F lis t ; r e g . F l i s t : .REG _Eq .IDENTIFIER .CO M M A .VECFILE _Eq .IDENTIFIER .CO M M A .RESFILE _Eq .IDENTIFIER .SEMICOLON; b ilb o.td m : i n i t i a l i z e . p a r t u s e .in s .p a r t c lo c k .p a r t r e s u lt .p a r t ; i n i t i a l i z e . p a r t : .INITIALIZE i n i . l i s t s .SEMICOLON; i n i . l i s t s : i n i . l i s t I i n i . l i s t s .C O M M A i n i . l i s t ; i n i . l i s t : .IDENTIFIER _Eq .BIN.NUM; u s e .in s .p a r t : .USE.INSTRUCTION _Eq .IDENTIFIER .SEMICOLON; u se r .d e f.p r o c : .TOP; % % Note th at the user-defined TDM is not described since the statem ents in this TDM are directly translated by LEX [42]. In addition, the syntax of C is not listed. The interested readers are referred to the source code of the program. 4 .1 .2 M T L - T h e M o d u le T est L an gu age The MTL is a high level language that can be used to describe the test aspects of a module. The language has been designed in such a way th at little testing expertise is required to use it. A module consists of many testable chips and a test controller, which can access these chips via one or more test busses. The controller can test these chips and the interconnect between them. A typical test control model used in MTL is shown in Figure 4.3, where five chips are organized into two test rings. The test clock TCK, which is connected to every chip, is not shown. 103 TD O TM SO TM S1 TDI Chipl Chip2 Chip4 Chip5 Chip3 C o n tro lle r T est Figure 4.3: Test control model used in MTL I j Each module is associated with an MTL-file w ritten in MTL. This file con- I tains all the information required for testing the module. W hen an MTL-file and its ' 1 associated CTL-files are processed by the synthesizer M 2C , a module test program j jis synthesized. This test program is w ritten in C. W ith the help of a C compiler,: i ’ jthe test programs can be translated to executable codes which can then be executed I by the test controller. The entire module can thus be tested. | The test aspects of a module described by the MTL include the following i | parts, namely library-id, device dist, test bus configuration, netdist, and test proce- I I dure. The libraryJd points to the directory containing the CTL-files. The deviceJist 'associates every device used in the module with a CTL-file in the library. An error jis flagged when a device is associated with a CTL-file that does not exist in the 'library. The deviceJist of an example module consisting of two devices is described I 'as follows. t I ! d e v ic e J .is t = i ; (Chipl adder)(Chip2 m u lt ip lie r ) ; The test bus configuration describes how the devices on the module are con nected via the boundary scan bus to the test channel. In the boundary scan archi tec tu re , the test bus can be configured as a ring, a star or a combination of both. In MTL a test bus is modeled as a m ultiple ring, which can be m apped into any one of the above three configurations. A ring configuration is formed when only a single ring is used. A star configuration is formed when every ring contains only one device. All DUTs in a ring are controlled by the same TM S line. The test bus shown in Figure 4.3 is described as follows. t e s t bus = rin g 0: c h ip l = > chip2 = > chip3, r in g 1: chip4 = > chip5; i jThe neLlist describes how the devices on a module are interconnected functionally. ^The num ber of nets can be large. Two or more terminals are possible for each net. Each term inal is specified by two names, the first name gives the device name, while jthe second specifies the I/O port name or the pin number. A net_list consisting of |two nets is described below. I n e t _ lis t = I ! net 1: (Chipl in p l) (Chip2 o u t p l) , | net 2: (Chip2 in p l) (Chip3 in p l) (Chipl o u t p l ) ; ! |The test procedure contains the information for testing a module by an MMC. This information is represented in term s of standard C code and some test-specific state m ents. These statem ents assume no knowledge about the test controller and can jbe translated to low level functions according to the architectural detail of the MMC. The low level functions are fully supported by a library of C code containing 'machine-dependent I/O functions. These I/O functions are used to control the test channel, which is responsible for all the low level activities in the boundary scan test bus: ; ! If every chip used in a module conforms with the IEEE 1149.1 boundary scan architecture, it is possible for a designer to write the test procedure using only the |test-specific statem ents. In such a situation, a test procedure can be easily w ritten by a designer with little knowledge of testing, thus greatly reducing the program ^development time. However, if all chips used in the module do not conform with 105 the boundary scan architecture, then it is necessary for the designer to write the test procedure using C code in addition to the test-specific statem ents. In this case some knowledge about testing is required. The test-specific statem ents that can be used to describe the test procedure in an MTL-file are listed below. • Testchip (chipAd); A test controller, such as an MMC, can fully test a chip by executing this statem ent. If more than one session is required for testing the chip, the test results are reported only after all sessions are completed. • Testchip (chipAd) Use TDM (tdmAd); A test controller can test part of a chip by executing this statem ent. Usually, a chip may be tested in more than one session. During each session part of j the chip is tested using a specific TDM. To fully test the chip, all test sessions I must be executed. When a module under test cannot stay offline long enough j to allow it to be fully tested, a partial testing approach is used. Also a piece J i of circuit can be tested several times using different test patterns or different | seeds. In this approach, the chip is tested in different intervals. One or more | test sessions are executed in each interval without exceeding the tim e limit. • Testlnet(); A test controller can test the interconnect on a module by executing this statem ent. Every net connecting two boundary scan chips is tested. The test set used in this test is a counting sequence which can determ ine if the entire interconnect is fault free. However, only the Go/NoGo information is produced in this test. No diagnostic information is provided. • DiagnosisInetQ; A test controller can test and diagnose the interconnect on a m odule by exe cuting this statem ent. To achieve the maximal diagnostic resolution, the test set is a universal test set, which includes a walking ones sequence, a walking zeroes sequence, and the all-0 and all-1 vectors. As shown in chapter 6, all diagnosable faults can be identified by this test. 106 • SampleRing(ringAd); j By executing this statem ent, a test controller can achieve a snap-shot of the i logic values on the I/O pins of all chips th at are connected to the selected ring. The returned value of this function is a string of Is and Os representing the current status of these chips. • ScanDR (ringSd, preDR, postDR, outS, inS); By executing this statem ent, a test controller can exchange information with a selected data register in the DIJTs on a selected ring specified by the ringJd. The data sent out is the string out S. The data received, which is referred to as results, is a string inS. It is possible to exchange information with a particular DUT while bypassing all other DUTs in the ring. In this case the num ber of DUTs to be bypassed must be specified. In Figure 4.4, D R represents the data register of the selected DUT, preD R is the num ber of bypassed DUTs between the selected DUT and the TD O of the ring, and postD R is the num ber of bypassed DUTs between the TDI of the ring and the selected DUT. The num ber of shifts required for transm itting the string outS is calculated autom atically once both preD R and postD R are known. DE preDR ---------------------- TDO Figure 4.4: Scanning a data register in a ring. i I • ScanIR (ringed, preIR, postIR, outS, inS); ! J This statem ent is similar to ScanD R except th at the information transm itted ! is the instructions to the DUTs and the received results is the status of the | t DUTs. W hen sending instructions, all DUTs in the selected scan ring receive j a new instruction. For those devices that are not being tested, the received ! instruction can either be a No-Op or the previous instruction. • ApplyClock(ring-id, n); By executing this statem ent, a test controller can apply the test clock to the DUTs in a selected scan ring for a fixed num ber of cycles. The bus state of postDR TDI ----- r 107 the selected ring is kept in the R u n T est state so th at the last instruction can be executed. For example, when this statem ent is executed after the public instruction R U N B I S T has been sent, the chips can be tested. • Change ClockFreq(n); By executing this statem ent, a test controller can change the frequency of the test clock TCK. The default TCK frequency can be divided by a factor specified by n. • EnableClock(on); By executing this statem ent, a test controller can halt the application of the test clock TCK. W hen this statem ent is executed with on = 0, the test clock TCK is disabled (or halted); otherwise the test clock is enabled (or running freely). By default the test clock is always enabled. 4.1.2.1 Form al D efin ition o f th e M TL S yn tax The formal definition of the module test language MTL is listed below. The language jis described in the input format of the YACC. 11 m tl: c o n f .s e c tio n t e s t .s e c t io n ; c o n f .s e c tio n : m tl.stm ts; m tl.stm ts: m tl.stm t I m tl.stm ts m tl.stm t; j m tl.stm t: module I l i b I d e v i c e . l i s t I t e s t .b u s I n e t . l i s t ; module: .MODULE _EQ .IDENTIFIER .SEMICOLON; l i b : .LIB _EQ .IDENTIFIER _LPR .INT.NUM _RPR .SEMICOLON; d e v i c e . l i s t : .DEVICE.LIST _EQ d e v .p a ir s .SEMICOLON; d e v .p a ir s: d e v .p a ir I d e v .p a ir s d ev .p a ir; d e v .p a ir : _LPR .IDENTIFIER c h ip .ty p e _RPR; c h ip .ty p e : .IDENTIFIER; jte st.b u s: .TEST.BUS _EQ t e s t .r i n g s .SEMICOLON; t e s t .r i n g s : t e s t .r i n g | t e s t .r i n g s .CO M M A t e s t .r i n g ; t e s t .r i n g : .RING r in g .id .COLON d ev ice.ch a in ; 108 Irin g.id : .INT.NUM; I d e v ic e .c h a in : .IDENTIFIERI d e v ice .ch a in .RARROW .IDENTIFIER; n e t . l i s t : .NET.LIST _EQ n ets .SEMICOLON; n e ts: net I n e ts .CO M M A net; n et: .NET n e t .i d .COLON p in s; n e t .id : .INT.NUM; ipins : p in I p in s pin; !pin: _LPR .IDENTIFIER .IDENTIFIER _RPR; t e s t .s e c t i o n : .TEST.BEGIN m tp.stm ts .TEST.END m tp.stm ts: m tp.stmt I m tp.stm ts mtp.stmt; ■mtp.stmt : t e s t . i n e t i I d ia g n o s is .in e t I t e s t .c h i p I r e s e t .r in g I sam p le.rin g I a p p ly .c lo c k I I c h .c lo c k .fr e q ; I t e s t . i n e t : .TESTINET _LPR _RPR .SEMICOLON; ! d ia g n o s is .in e t : .DIAGNOSISINET _LPR _RPR .SEMICOLON; | t e s t .c h i p : .TESTCHIP _LPR .IDENTIFIER _RPR .SEMICOLON j I .TESTCHIP _LPR .IDENTIFIER _RPR .USE j _TDM .INT.NUM .SEMICOLON; j i r e s e t .r in g : .RESETRING _LPR int.num _RPR .SEMICOLON; j jsam p le.rin g: .SAMPLERING _LPR int.num _RPR .SEMICOLON; j !int_num: .INT.NUM; ■apply.clock: .APPLY.CLOCK c lo c k .ty p e int.num .CYCLES .SEMICOLON; c h .c lo c k .fr e q : _CHANGE.CLOCK.FREQ _LPR int.num _RPR .SEMICOLON; j c lo c k .ty p e : _TCK I _FCK; j U 109 Note th at the m tp-stm t can also be described in the C language. The formal definition of this part is not listed. An example of an MTL description and some CTL descriptions are given in section 4.3.1. 4.2 S y n th esizers In the current implementation, only the CTL and MTL have been defined. Hence, jonly the C2C, which is a program that can synthesize a test program for a chip fusing CTL, and the M2C, a program that synthesizes a test program for a module using CTL and MTL, are described. I i I *4.2.1 C 2C S y n th e siz er i i T he C2C synthesizer can generate a test program for a chip from the CTL description i ________________________ _ of the chip, i.e. a CTL-file. The output of the C2C is a program in the ANSI C form at. A C compiler is needed to compile the test program into executable code for the test controller such as an MMC. I Comparing the models used in the CTL (see Figure 4.2) and the MTL (see i Figure 4.3), it is clear that when a module contains only one chip then both models are the same. Therefore the M2C can be used as a C2C. Since the C2C is built as i a subset of the M2C, no further description of the C2C will be given. 4 .2 .2 M 2C S y n th e siz e r ! i I [The M2C generates a test program for a module from the MTL description of that ! module and the CTL description of each chip on the module. The outputs of the i M2C includes a test program in ANSI C format that can be used to test the module I and a file describing the module interconnect. Testing of a module is considered i I I complete when all individual chips on the module and the interconnect between j these chips have been tested. As explained before, a C compiler is used to compile 1 1 0 the test program down to executable codes for the MMC. This process is depicted in Figure 4.5. CTL CTL CTL M T L net in fo Test Prog, in C T est Prog. e x e c u t. codes Compiler ex ecu tio n Test M2C Figure 4.5: Generating test programs for a module. The structure of the M2C is shown in Figure 4.6. The m ajor components of |the M2C include a m ultiple parser module, a template-based TDM module, a user- , defined TDM module, an interconnect test and diagnosis module, a shift adjustm ent jmodule and a test program manufacturing module. These components are described in more detail in the following paragraphs. T h e parsers Both the MTL and CTL parsers are constructed using the well known compiler iconstruction tools YACC [35] and LEX [42], developed at the Bell Laboratories. J W hen the syntax of a language is properly described, these two utilities can generate a parser for the language. Due to the use of global variables in the parser, conflicts may exist in a program with more than one parser. This is a m ajor lim itation for applications using more than one input language, such as the M2C. To resolve this problem, a parser management technique is used. The technique takes advantage of the use of makefiles in the UNIX environment. Whenever a parser is generated, | the name of its global variables are automatically changed before the compilation begins. Hence no two parsers have the same global variables. Using this technique, more than one parser can exist in the same program. I l l MTL-file C T L -files g lo b a l data base Test Program (in C) Reusable parser management in fo n et Test Program Manufac. module d evice d riv er p arser CTL Inet t e s t module M T L parser s h i f t adj ustment module u ser- d efin ed T D M module tem p late- based T D M module Figure 4.6: The structure of the M2C. T em p late-b ased T D M M odule Procedures th at can generate C programs from a tem plate-based TDM are provided. These procedures are referred to as meta-procedures. Their input is a test procedure w ritten for a tem plate-based TDM and they generate a C program for executing the test process of the selected TDM. These meta-procedures include callbilbo, call- \fullscan, callfullscanN, callintest, and callrunbist, which can generate programs for jthe BILBO, Fullscan, FullscanN, INTEST, and EUNBIST TDMs, respectively. I All information required in generating a test program m ust be provided to these meta-procedures. For example, when using the m eta-procedure callbilbo, the following information is required. c h ip ID : the name of the current chip under test; chipType: the type name of the current chip under test; rin g ID : the test bus ring where the chip under test is located; pre: the number of cells between the CUT and the controller when shifting data (see Figure 4.4); j post: the number of cells between the controller and the CUT when 1 shifting data; ipre: the number of cells between the CUT and the controller when shifting instruction; ipost: the number of cells between the controller and the CUT when shifting instruction; in iN u m : the num ber of registers that should be initialized; in iln s: the instructions that are used to select the registers; iniV al: the initial values to be loaded into these registers; u seln s: the instruction used during the test execution; ckType: the type of clock used for executing the test; ckN um : the number of clock cycles to execute the test; resN um : the number of registers used to store the test results; resin s: the instructions that can be used to select these registers; expVal: the expected results in these registers when the circuit under test is fault-free. 113 The values of items such as in iN u m , in iln s , in iV a l, u s e ln s , ckType, ckN um ] r e sN u m , r e s in s , and expV al are directly available from the CTL description. How ever, the values of some of the information, such as chipType, rin g ID , ipre, ipost, pre, post, are not directly available and can only be obtained by processing the information in the MTL-file and the CTL-files. U ser-d efin ed T D M M odule A user-defined TDM is basically a C program plus some test-specific statem ents. It jis necessary to translate these test-specific statem ents into normal C statem ents th at lean be readily executed by a test controller. Due to the differences in the test control jmodel used in CTL (see Figure 4.2) and MTL(see Figure 4.3), the test-specific state m ents are modified to reflect the change in the control model. For example, the test- 1 i . . . i (specific statem ent sca n IR (o u tS ) is changed to sca n IR (R in g ID ,ip re, ipost, o u tS ) so th at the same string of data outS can now be properly sent to the chip under test. The value of the information rin g ID , ipre, ipost are computed in the Shift (Adjustment Module. j I j i I Shift A d ju stm en t M odule ! Often it is desirable to send and receive data only to a single chip on a scan ring while keeping the other chips in their bypass mode. The model used here is shown I in Figure 4.7. The data stored in register TxR is shifted into the chain, and the incoming data is shifted into register RxR. The length of both TxR and RxR are assumed to be unlim ited since a buffering mechanism is employed to make sure that the T xR will never be empty and that the RxR will never be full. W hen exchanging data with a single chip on a scan ring, the num ber of shifts needs to be adjusted so that (1) data in TxR is sent to the CUT properly, and (2) d ata in the CUT is received in RxR properly. The calculation of the num ber of shifts is done as follows. Let the length of DR in C2 be len, the num ber of chips between C2 and RxR be pre, the number of chips between TxR and C2 be post. 114 Cl C3 C4 C2 u _ L _ D R RxR TxR t e s t c o n tr o lle r Figure 4.7: Model for calculating the number of shifting. For example, in Figure 4.7, pre — 2, post = 1. Depending on the values of pre and post, three cases are possible. These cases are summarized in Table 4.1. ca ses pre > post pre < post pre = post # o f s h i f t s len + pre len + post len + post # TxR lea d in g Os added pre — post 0 0 # TxR t r a i l i n g Os added post post post # RxR lea d in g b i t s discarded pre pre pre # RxR t r a i l i n g b i t s discarded 0 post — pre 0 Table 4.1: The numbers used in the shifting. Case 1: pre > post In this case the total number of shifts is ten + pre. It is necessary to add pre — post leading O s to the output string before loading it into the TxR, such th at after the (shifting the output string will be properly loaded into the DR. It is also necessary to discard the first pre number of bits received in the RxR such th at the content of the DR is properly received. Case 2: pre < post In this case the total number of shifts is len+ post. No leading O s are needed for the output string. However, it is necessary to discard the first pre bits received in the 115 f ■ ~ ' S RxR. It is also necessary to discard the last post — pre bits received in the RxR so th at the contents of the DR is properly received. Case 3: pre = post In this case the total number of shifts is len + post. No leading O s are needed for the output string. However, it is necessary to discard the first pre num ber of bits received in the RxR such that the content of the DR is properly received. T est P rogram M anufacturing M odule The synthesizer M2C generates C programs from the CTL and MTL descriptions. These C programs are then sent to an IBM AT computer where they are compiled and executed. The IBM AT serves as the host for the test controller in the present im plem entation of the MMC prototype. The purpose of the Test Program M anu facturing module is to reduce the manual operation in transferring and compiling j the test programs, therefore, reducing the errors in the generation of the test pro- igrams. This module can generate a makefile that can m anufacture the test program i , j jin the IBM AT. In addition, this module also copies all the C programs to a specific ! subdirectory so th at minimal effort is required to transfer test programs from the SUN, which is used to synthesize the test programs, to the IBM AT. | I | In tercon n ect T est M odule This module deals with the testing and diagnosis of the module interconnect. It con tains four m ajor parts, namely n etJist generation, test generation, test application and results analysis. In the n etJist generation part, the n etJist, which is described in the MTL- file, is first read and the data structure established. The drivers and receivers of these nets are all part of the boundary scan registers of the chips. The CTL-files of jthese chips are then read such that the physical configuration of these boundary scan jregisters are established. A mapping mechanism is then used to m ap the term inals 'of a net to the physical locations in the boundary scan register. The mapping information is then output to a file called i n f o f i l e .n e t which is later transferred 116 to the IBM AT. W hen executing the interconnect testing, this file is first read so th at all information required to perform the test can be obtained without the need to consult both the MTL-files and CTL-files. The syntax of the file infofile.net is as follows. The file consists of a head-line followed by one or more net-lines. The head-line consists of a net number which is the total num ber of nets in the module, a ring number which is the total num ber jof test bus rings used in the module and one or more ring descriptions. A ring is J described by three numbers; the first number is the identification of the ring, the • second number is the sum of the IR length of all chips, and the third num ber is the sum of the length of the Boundary Register of all chips. A net-line consists of a num ber which is the identification of the net, the num ber of drivers of the J net followed by a list of drivers, and the number of receivers followed by a list of j jreceivers. A driver is described by a number which identifies the ring on which the 'driver is located; a number which is the location of the driver in the ring; a flag ;which indicates whether the driver is 2-state or 3-state; and an optional enabling |information which is needed only when the driver is 3-state. The first num ber of khe enabling information representing the location of the control cell, which must be in the same ring as the driver. The second number is the value th at should loaded into the control cell in order to disable the driver. A receiver is described by two ;numbers; the first num ber identifies the ring on which the receiver is located and ! ( the second num ber gives its location in the ring. i 1 For example, the file infofile.net of the module shown in Figure 4.3 is as j follows. 6 2 1 14 24 0 6 6 1 1 1 19 0 1 1 23 2 1 1 22 0 1 1 14 3 1 1 21 0 1 1 20 4 1 1 18 0 1 1 15 5 1 1 7 1 16 1 1 0 5 6 1 1 6 1 16 1 1 0 2 The first line indicates that there are six nets and two rings in the module. Ring | 1-contains fourteen instruction register cells and twenty four boundary scan cells. 117 Ring 0 contains six instruction register cells and six boundary scan cells. The data on the second line indicates that net 1 (which is not shown) has one 2-state output driver which is located in ring 1 at the nineteenth location. Also there is one receiver on the net. The receiver is located on ring 1 at the location 23. The data on other lines can be interpreted similarly. In the test generation part, a test set that can identify all diagnosable faults is generated! This test set contains a walking ones and a walking zeros sequence. Detailed analysis of fault models and the theorems and algorithms for generating the test set are presented in chapter 6. I In the test application part, a test schedule, th at can be used to apply all test vectors and collect the test results, is produced. Depending on the num ber of boundary scan rings used, and the connectivity among these rings, the application 'sequence of the test vectors can be properly determined in order to achieve the minimal test application time. Detailed analysis of this problem along with some theorems and algorithms that lead to the generation of minimal tim e schedules are presented in chapter 7. In the results analysis part, the collected test results are compared with the test vectors and then analyzed. Based on this analysis, the faults in the module interconnects are identified. The analysis is closely related to the test generation techniques. In fact, both parts are treated in chapter 6. i D e v ice D river I ! The device driver consists of a set of C functions that control the operation of the test channel. Two functions that are used in the MMC prototype developed at USC [44] are shown. A function that can load a two-byte data (outword) into a register of the test channel, addressed by portid, is implemented as a w riteR eg function in | Microsoft C as follows: j 1 v o id w rite R e g (p o rtid , outw ord) j u n sig n ed i n t p o r tid , outw ord; j i n t i ; 118 ou tp (0 x 3 0 f, outword / 256); /* high b yte * / o u tp (p o r tid , outword 5 £ 256); /* low byte * / fo r (i= 0 ; i<30; i+ + ) ; /* t c synchronization * / > Another function (readReg) that can read a two-byte data from a register of the test channel is implemented as: unsigned in t readReg(portid) unsigned in t p o rtid ; unsigned in t lowByte, highByte; low B yte= in p (p ortid ); highByte=inp(0x30f); return(low Byte+highByte*256); > These functions are hardware-dependent in that the I/O address p o r tid is deter mined by the physical implementation of the test controller. For the MMC prototype built at USC [44], the valid address for p o rtid ranges from 300 to 30f(hex). These jfunctions are also C compiler-dependent. For example, if the Turbo C Compiler is ! used, then the function outp used in both writeReg and readReg should be replaced j |by outportb. Similarly, the function inp should be replaced by inportb. O ther functions th at are related to the device drive may also be required. For example, if the communication between the test channel and the processor of the controller is done through an interrupt mechanism, then interrupt service functions are needed. , One of the m ajor advantage of the MMC is th at all test programs run by it jean by w ritten in portable code except the functions included in the device driver as shown above. 4 .3 A n E x a m p le In the BOLD system, the process of testing a module consists of the following steps: 119 1. Prepare the input files. These files include a MTL-file and m any CTL-files. 2. Synthesize the test program by running M2C. For convenience, the program source files are copied into a directory. 3. Transfer all files in the directory into the IBM AT computer, which serves as the host of the MMC prototype. 4. M anufacture the executable codes using the M A K E utility. 5. Execute the test program by the MMC. The process is demonstrated using the following simple example. The module consists of only three chips. The first chip appl was developed at USC [44]. The second chip TI8374 [55] is a product of Texas Instrum ent. The third chip bilbol does not exist. ! j To synthesize test programs for this module, four input files are required, namely e x l.m tl a p p l .c t l t i 8 3 7 4 .c t l and b i l b o l . c t l . These files are described I ibelow. 4 .3 .1 A n M T L -file The MTL description (exl.m tl) of the module is shown below. M O DULE = e x l; LIB ■ l i b l ( 3 ) ; i DEVICE.LIST = (Chipl appl) (Chip2 TI8374) (Chip3 b i l b o l ) ; TEST.BUS = RING 0: C hipl, RING 1: Chip2 => Chip3; NET.LIST » NET 1: (Chipl DIN) (Chipl SUM), NET 2: (Chipl CO) (Chip2 D 2), NET 3: (Chip3 I I) (Chip2 Q2); MODULE.TEST # in clu d e < std io.h > # in clu d e < strin g.h > 1 2 0 # in clu d e "comp.h" main() ■ c t e s t i n e t ( ) ; / * t e s t th e e n tir e in terco n n ect network*/ t e s t c h ip ( C h ip l) ; / * t e s t th e e n tir e chip * / te stc h ip (C h ip 2 ); te stc h ip (C h ip 3 ); > END_TEST The test bus is organized into two rings. Only one chip is located on ring 0. The other two chips are located on ring 1. To simplify the problem, only three nets are shown in the example. The test procedure for the module consists of four statem en ts,! which are self-explanatory. j4.3.2 C T L -files I The CTL description of the chip appl is shown below. A detailed description of this ichip can be found in [44]. 1 I— CTL d e sc r ip tio n of th e appl chip je n tity appl i s g e n e r ic (PHYSICAL_PIN_MAP : s tr in g := "DW.PACKAGE"); port (RESET, DIN: in b it; CO, SUMrout b it ; VCC, GND:linkage b i t ) ; use STD _1149_l_1990.all; — 1149.1-1990 a ttr ib u te s and d e f in it io n s a t tr ib u t e PIN_MAP of appl: e n t it y i s PHYSICAL_PIN_MAP; con stan t DW.PACKAGE:PIN.MAP.STRING:="RESET:4 8 ,DIN: 8 ,CO: 5 ,SUM: 3, "& "TDI:45, TDO:44, TMS:46, TCK:52, TRST:47, VCC:25, GND:49"; a t tr ib u t e TAP.SCAN.IN of TDI : s ig n a l i s tru e; a t tr ib u t e TAP.SCAN.MODE of TM S : s ig n a l i s tru e; a t tr ib u t e TAP.SCAN.OUT of TDO : s ig n a l i s tru e; a t tr ib u t e TAP.SCAN.CLOCK of TCK : s ig n a l i s (2 .0 e 6 , BOTH); a t tr ib u t e INSTRUCTION.LENGTH of appl : e n t it y i s 3; a t tr ib u t e INSTRUCTI0N.0PC0DE of appl : e n t it y i s "BYPASS (001, 101, 011, 111)," & "EXTEST (0 0 0 )," & "INTEST (1 0 0 )," & "SAMPLE (0 1 0 )," & "SCANFB (1 1 0 )"; 1 2 1 a t tr ib u t e INSTRUCTION.CAPTURE of appl : e n t it y i s "101"; — a t tr ib u t e INSTRUCTION.DISABLE of appl : e n t it y i s "TRIBPY"; a t tr ib u t e REGISTER_ACCESS of appl : e n t it y i s — im p lic it "BOUNDARY (EXTEST, INTEST, SAMPLE)," & — im p lic it "BYPASS (BYPASS)," & "FBR[2] (SCANFB)"; — 2 - b it FeedBack R e g iste r a t tr ib u t e BOUNDARY_CELLS of appl : e n t it y i s "BC_2"; a t tr ib u t e BOUNDARY_LENGTH o f appl : e n t it y i s 3; a t tr ib u t e BOUNDARY_REGISTER of appl : e n t it y i s — num c e l l port fu n c tio n sa fe [ c c e l l d is v a l r s l t ] "2 (BC_2, DIN, in p u t, X)," & "1 (BC_2, SUM , output2, X), "& "0 (BC_2, CO, ou tpu t2, X)"; a t tr ib u t e TEST_PR0C of appl : e n t it y i s "Test_Begin" & "TD M 0 = FULLSCAN;" & "REG=FBR, VECFILE=apl_in2, RESFILE=apl_out2;" & "REG=B0UNDARY, VECFILE=apl_inl, RESFILE=apl_outl;" & "CLOCK = FCK 1.0 CYCLES.IN RUN_TEST_IDLE;" & "Test.End "; end appl; The BSDL description of the chip TI8374 can be found in [55]. Hence only the test procedure part of CTL-file is shown in the following: Test_Begin T D M 0 = USER.DEFINE; #in clu d e < std io.h > #in clu d e < strin g.h> # d e fin e IR 1 # d e fin e DR 0 to p () char o u t s [2 0 ], i n s [20]; char * p l, *p2; s p r in t f ( o u t s , "00000011"); /* INTEST * / sc a n lR (o u ts); s p r in t f ( o u t s , "000000001010101000"); scanD R (outs); /* load D=10101010, clk=0 * / s p r in t f ( o u t s , "000000001010101001"); scanD R (outs); /* load D=10101010, clk = l * / s p r in t f ( o u t s , "000000000101010100"); 1 2 2 s tr c p y ( in s , scanD R (outs)); /* lo a d D=01010101,clk=0 and get previous r e s u lt * / p l= in s+ 1 0 ; p2=outs+2; if(str n c m p (p l,p 2 ,8 )!= 0 ) { p r in tf(" e r r o r in 10101010 t e s t .\ n " ) ; e x i t (1 ); > s p r in t f ( o u t s ,"000000000101010101") ; s tr c p y ( in s , scanDR(outs)); s tr c p y ( in s , scanDR(outs)); if(s tr n c m p (p l,p 2 ,8 )!= 0 ) { p r in tf(" e r r o r in 01010101 t e s t .\ n " ) ; e x i t (1 ); > e l s e { prin tf("T I8374 i s t e s te d 0K .\n"); i Test_End jFor the chip bilbol, only the test procedure part is shown below. Test_Begin T D M 0 = BILBO; INITIALIZE FBR=11B, B0UNDARY=100B; USE_INSTRUCTION - BYPASS; CLO CK = TCK 3000 CYCLES.IN RUN_TEST_IDLE; I EXPECTED.RESULT FBR=11B, BOUNDARY=00IB; Test_End 4 .3 .3 T h e In te r c o n n e c t In fo rm a tio n The net list information that will be sent to the MMC is as follows: I 3 2 0 3 3 1 11 21 11 0 1 0 102 2 1 0 0 0 1 1 17 3 1 1 9 1 19 1 1 1 2 123 4 .3 .4 S y n th e siz e d T est P ro g ra m s The synthesized test program for the module exl consists of seven files, namely comp.h, exlm ain .c,T I8374top s.c, d r iv e r .c , in e tp c . c, te m p la te . c and in f o f i l e .n e t . The file comp.h, shown below, defines all variables used in the m ain program. # d efin e BUFSIZE 1024 char chipID [30]; char ch ip T yp e[30]; char regins[BUFSIZ]; char vecFID[BUFSIZ]; char expFID [BUFSIZ] ; char i n i l n s [BUFSIZ]; char iniVal[BUFSIZ]; char uselns[BUFSIZ]; char resins[BUFSIZ] ; char expVal[BUFSIZ]; # d e fin e IR 1 i # d e fin e DR 0 I | jThe file exlm ain. c contains the main function that is needed for any program. The ;prefix e x l is extracted from the name of the MTL file e x l.m tl. Four function !calls are contained in the main program. The first one in e tp c is used to test the ;interconnect. The second one fu llscan N is used to test the chip C hipl by using i fullscan TDM. The third one TI8374topO is used to test the chip Chip2 by using a user-defined procedure that has the name topO. The prefix TI8374 is extracted from the name of the CTL-file T I8 3 7 4 .c tl. The last one b ilb o is used to test the ; i chip Chip3 by using the BILBO TDM. The file ex lm ain .c is as follows: j j # in clu d e < std io .h > # in clu d e < strin g.h > # in clu d e "comp.h" main() { in e tp c (1 ); /* in tercon n ect t e s t * / sp r in tf(c h ip I D , "C hipl"); sp rin tf(ch ip T y p e, "appl"); 124 s p r in t f ( r e g in s , "110000"); sp r in tf(v e c F ID , "AP1.IM2, A P l.IN l" ); sp rin tf(exp F ID , "AP1_0UT2,AP1_0UT1"); fu llsca n N (ch ip ID , chipType, 0 , 0 , 0 , 0 , 0 , 2 , r e g in s ,vecFID, expFID); T I8 3 7 4 to p 0 (l, 3, 0, 1, 0); sp r in tf(c h ip ID , "Chip3"); sp r in tf(ch ip T y p e, " b ilb o l" ); s p r i n t f ( i n i l n s , "110000"); s p r in t f ( in iV a l, "11100"); s p r in t f ( u s e ln s , "001"); s p r in t f ( r e s i n s , "110000"); sp r in tf(e x p V a l, "11001"); b ilb o (c h ip ID , chipType, 1 , 0 , 8 , 0 , 1 , 2 , i n i l n s , in iV a l,u s e ln s ,1 , 3 0 0 0 0 ,1 .0 0 0 0 0 0 ,2 ,r e sin s ,e x p V a l); > The file T I8374tops.c is a translated version of the user-defined procedure for testing Chip2 th at is of the type TI8374. Note that the statem ent scan lR (ou ts) ; in the file T I8 3 7 4 .c tl has been translated into scan(T I8374rid, IR, T I8374ipre, T I8 3 7 4 ip o st, ou ts) ;. The added information is based on the physical organiza tion of the test bus. ♦ in clu d e < std io .h > ♦ in clu d e < strin g .h > ♦ d e fin e IR 1 ♦ d e fin e DR 0 TI8374topO(TI8374rid, TI8374ipre, T I8374ip ost, TI8374pre, TI8374post) in t T I8374rid, T I8374ipre, T I8374ipost, TI8374pre, TI8374post; < 'char o u t s [2 0 ], i n s [20]; char * p l, *p2; s p r in t f ( o u t s , "00000011"); scan(T I8374rid, IR, TI8374ipre, T I8 3 7 4 ip o st,o u ts); s p r in t f ( o u t s , "000000001010101000"); scan (T I8374rid , DR, TI8374pre, T I8 3 7 4 p o st,o u ts); s p r in t f ( o u t s , "000000001010101001"); scan (T I8374rid , DR, TI8374pre, T I8 3 7 4 p o st,o u ts); 125 s p r in t f ( o u t s , "OOOOOOOOOiOlOlOlOO"); s tr c p y ( in s , scan(T I8374rid, DR, TI8374pre, TI8374post ,o u ts )) ; p l= in s+ 1 0 ; p2=outs+2; if( s tr n c m p (p l,p 2 ,8 ) !=0) { p r in tf(" e r r o r in 10101010 t e s t \n " ) ; e x i t (1) ; > s p r in t f ( o u t s ,"000000000101010101"); s tr c p y ( in s , scan(T I8374rid, DR, TI8374pre, T I8 3 7 4 p o st,o u ts)) ; s tr c p y ( in s , scan(T I8374rid, DR, TI8374pre, T I 8 3 7 4 p o st,o u ts)); if(s tr n c m p (p l,p 2 ,8 )!= 0 ) { p r in tf(" e r r o r in 01010101 t e s t \ n " ) ; e x i t (1) ; > e l s e { prin tf("T I8374 i s t e s t e d 0K .\n"); > The files d r iv e r .c , in e tp c .c and tem p la te.c are not shown here. The d r iv e r .c !contains all the functions that can be used to control a device. The in e tp c .c contains the interconnect test program and tem p la te.c contains all the tem plate- j based TDM functions. By reading the file in f of i l e .net, the in e tp c procedure can j 1 I be used to test different interconnect networks. These three files rem ain unchanged even when the MTL-file or the CTL-files are changed. j t j 4 .3 .5 A c tiv itie s b e tw e e n P ro c esso r an d T est C h a n n e l ! i The test channel is controlled via processor read/w rite operations. For example, the function sc a n (0 ,0 ,0 ,0 ,0 1 0 ) ; represents the operation of sending a string of 010 to the data register of a chip which is the only chip located on ring 0. This function is further translated into the following sequences of read/w rite operations. Once the test channel is properly initialized, it can be started and it carries out the required operation. W hen the operation is finished, the results are read. 126 w riteR eg(776, 0); /* d is a b le FEN */ w riteR eg(768, 32); /* lo a d CR with binary 10000 * / w riteR eg(769, 14); /* lo a d CNR with 14 */ w riteR eg(772, 1); /* lo a d TC w ith 1 * / w riteR eg(773, 16384);/* lo a d TxR */ w riteR eg(777, 0); /* c le a r SR */ w riteR eg(774, 0); /* c le a r RxR * / w riteR eg(776, 1); /*en ab le FEN * / w riteR eg(776, 0); /^ d isa b le FEN */ w riteR eg(777, 0); /* c le a r SR * / readReg(774) ; /*read RxR * / I The entire process of testing the module described by exl.m tl contains 1059 iread/w rite operations. It is clear th at without the synthesis of test program, it j I would take a tremendous effort to write the test program for even a simple example. 4 .3 .6 A c tiv itie s on th e T est B u s The testing of a chip is realized by controlling the data on the test bus, which consists of four lines, namely TCK , TM S, TDI and TDO . Using the same example, the process of sending instruction 010 to a chip (appl) located in ring 0, requires the ^following binaries sequences. TCK 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 j TMS 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 ' TDO 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 The entire program for testing the module described in e x l .m tl executes for about 1000 TCK clock cycles. This kind of test program is hard to write and almost impossible to comprehend by a user. Consequently, the cost associated w ith the development and maintenance of such a program is very high. This is also the case when testing a chip using autom atic test equipment (ATE) th at is not specifically I designed with the boundary scan control facilities. The ATE can control and observe a num ber of I/O pins in parallel in the form of binaries (or tim ing waveforms). In this case, a test engineer has to deal with the testing of a chip in binaries. The superiority of the synthesis approach used in BOLD is clearly dem onstrated in this example. 127 4 .4 R e su lts Test programs for several modules have been synthesized. The results are presented in Table 4.2. To simplify the comparison, only two types of chips are used in the module. Each chip is tested via either the BILBO TDM or the LSSD TDM. A BILBO chip represents a chip that is tested via a BILBO TDM, while an LSSD chip represents a chip that is tested via a LSSD TDM. For example, the b ilb o l is a BILBO chip and appl is an LSSD chip. The TDM for b ilb o l requires the application of 3,000 test patterns when executing the test. On the other hand, the TDM for appl requires the application of only 8 test vectors via two scan chains, which have a total length of 5. The test time required for testing interconnects is based on the application of a counting sequence. The data stored in the memory of the controller includes the instructions, the seeds and signatures, and the test vector and correct results .for each chip. mod id # ring #B ILB O chips # LSSD chips time to synthesis program size # net # rw ops test tim e modO 1 0 2 0.3 57,344 2 1,518 621 modOO 2 0 2 0.3 57,344 2 1,518 677 m odi 1 1 1 0.4 57,344 2 858 3,333 mod 10 2 1 1 0.4 57,344 2 913 3,023 mod2 1 10 1 0.7 73,728 21 1,935 31,800 mod20 2 10 1 0.6 73,728 21 2,056 31,490 mod3 1 1 10 0.6 73,728 6 7,288 9,126 mod30 2 1 10 0.6 73,728 6 7,329 6,126 mod4 1 10 10 0.9 73,728 30 8,391 36,710 mod40 2 10 10 0.8 73,728 30 8,452 30,230 Table 4.2: Synthesis results for some modules. The synthesis is done using a Sun 4/60 workstation. The tim e to synthesis the test program is shown in seconds. For a module containing twenty chips, the synthesis is completed in less than 1 second. The program size is shown in term s of bytes. Due to the characteristics of the machine, the size of an executable code in 128 a Sun 4 is a m ultiple of 1024 bytes. This explains why the modules mod2, mod20, mod3, mod30, mod4 and mod40 all have the same program size. The column # rw ops represents the number of access from the processor to the test channel chip. Note that this number is not a good indication of test time. This is due to the inclusion of BILBO chips th at have a long test tim e while requiring only a small num ber of read/w rite operations from /to the test channel to set up the test. The test tim e is the num ber of test clock cycles required to complete the test. The listed figures reflect the case where the test bus never enters the pause state. This may occur when the test controller cannot supply data to the test bus fast enough. j The size of the test vector files for these examples are very small, therefore, .its im pact on the total storage size of the program is negligible. The num ber of nets are also small since both appl and b ilb o l have very few I/O pins. In general the i i cases where two test rings are used have shorter test times when compared to those have only one ring. The impact of the test time can be much more significant if an example th at has a large number of test vectors is examined. One exception shown in the table is that the test tim e required for modO (one ring) is less than th at of modOO (two rings). The reason is as follows. Both modules contain two chips th at ! i are identical. These chips are tested using the same number of test vectors and the jsame length for each vector. If they are on the same ring (as in the case modO), fewer redundant bus states are traversed during the test. On the other hand, if each chip is located on a separate ring controlled by the same test channel, then only one ring can be tested at a time (This is the lim itation of the MMC). Therefore, more redundant bus state transitions are traversed. This explains why the test tim e for the two rings case is longer than that of the one ring case. I If two test rings are employed in the module under test, then it is beneficial j I . , | ito arrange all LSSD chips on one ring and the all BILBO chips on the other. In this way, the BILBO chips can be initialized and while the self-testing is in progress the J LSSD chips are tested. This scheme can greatly reduce the test time. However, it i can only be applied if the BILBO chips are designed with an autonomous on-chip controller. 129 C h a p ter 5 G lo b a l C o n tro ller M in im iza tio n U sin g T est P ro g ra m S y n th e sis Testable chips built with the boundary scan architecture are used in the design of a self-testable module. A module test controller controls the testing of a chip through an on-chip test controller. These controllers are connected via a test bus. The module test tim e is affected by the overall complexity of these controllers and the configuration of the test busses. The design of test controllers has been previously addressed. In chapter 3 the design of a universal module test controller (MMC) was presented. The MMC controls the test process of chips which have the boundary scan architecture via one or more test channels. Each test channel controls a test bus, which can be organized in one or two test rings. D ata communication can only occur in one ring at a time. A test bus consisting of two rings is shown in Figure 4.3. In chapter 2 the design of an on-chip controller for a circuit with various test structures, referred to as Testable Design Methodologies (TDM s), was presented. Depending on the assistance from the bus during test execution, on-chip controllers can be designed in two different styles, namely autonomous and bus-dependent. An autonomous controller can execute the test without any assistance from the test bus, once it is properly initialized. Chips with such controllers can be tested concurrently using only one test ring. A bus-dependent controller requires the assistance from the test bus during the entire test execution process. Thus if only one test channel 130 is used, two chips with such controllers must be tested in sequence even if they are on different test rings. In a module the more chips designed with autonomous controller, the shorter the module test time. This is achieved at the expense of increased overall controller complexity since autonomous controllers usually have a higher complexity. An im portant objective for the module test design is to minimize the overall controller complexity while keeping the module test tim e within reasonable bounds. To achieve this it is crucial to quantify both the controller complexity and the test tim e for a module. The complexity of an on-chip controller can be easily calculated once its design is known. Since the complexity of an MMC is fixed, the overall controller complexity can be easily computed. On the other hand, the module test tim e cannot be directly obtained. This is true even when the tim e required to test each individual chip is known. Worse still, a chip’s test time is not always available. For example, if a chip is designed with a user-defined TDM (defined in chapter 4), it is usually not clear how to estim ate its test time. The test program synthesis technique provides a way to calculate both the module and chip test time. This is done by examining the num ber of test clock cy cles required in their corresponding test programs. In this chapter two approaches for controller minimization, referred to as tradeoff curve-based minimization and algorithm-based minimization, are presented. These two approaches can determine not only the complexity of each on-chip controller but also the test bus configu ration. The derived results guarantee that a module can be tested within a given tim e while the overall complexity of the controllers is minimal. W hile both approaches require the synthesis of test programs, they differ in the way the module test tim e is obtained. In the first approach, the module test time is calculated only after the module test programs are synthesized. In the second approach, the test tim e for each individual chip is first calculated from the synthesis of the chip test programs. The module test time is then estim ated based on the chip test tim e and the module test configuration. 131 5.1 T rad eoff C u rv e-B a sed M in im iza tio n This approach requires the plotting of a curve th at relates the module test tim e to the overall controller complexity. For each design point on the curve, the module test tim e is calculated after the module test programs have been synthesized. The complexity for each design can be calculated once the design of the controllers are known. Using such a curve, a design point can be easily selected such th at within a tim e bound the overall controller complexity is minimal. This approach is illustrated by the example circuit in Figure 5.1 which has only one LSSD kernel. The kernel includes a 16 bit input register, a 16 bit output register and a combinational circuit to be tested. The test vectors are generated using a test pattern generator (TPG) and results are compacted using a signature analyzer (SA). It is also assumed that the fault simulation task has been previously performed and 1000 test vectors are needed to achieve the required fault coverage. The test control facilities for this kernel include: two 16 bits LFSRs (one used as a T PG and the other as an SA); a 10 bit counter TC used to count the num ber of test vectors applied; a 4 bit counter SCI used to count the number of shift operations; a 4 bit register SC2 for holding the initial value for the SCI; a 3-bit finite state machine FSM to control the LSSD test procedure; two 16 bit data words, one being the seed of the TPG , the other the correct signature of the SA; and a 16 bit comparator. The hardware overhead for each control facility is estim ated based on the following assumption: TPG: 32 units, SA: 32 units, TC: 20 units, SCI: 8 units, SC2: 8 units, finite state machine: 6 units, seed and signature storage: 32 units, comparator: 16 units. The total complexity of the test control hardware is 152 units. These test facilities are provided by the MMC and the on-chip controller. The facilities on the left hand side of any one of the four partition lines, shown in Figure 5.1, can be provided by the MMC; while those at the right hand side are provided by the on-chip controllers. Since the design of an MMC include all the possible test facilities, the complexity of an MMC is fixed. However, the complexity of the on-chip test controller varies as the partition line changes. 132 partition 1 partition 2 partition 3 partition 4 Go/No 3o SC2(4) TC(10) FSM (3) SA(16) Correct (16) Signature Seed (16) Comparator (16) R1 (16) Kernel cost: 48 40 64 total cost: 152 Figure 5.1: Possible partitions of test resources. 133 Let D l, D2, D3 and D4 be the four design styles representing the circuit at the right hand side of each of the four partition lines. For example, D4 contains the chip to be tested and no test facilities. The complexity of the on-chip controller for D l is the highest among the four, while that for D4 is the lowest. In addition, the controllers for D l and D2 are autonomous, while the controller for D3 and D4 are bus-dependent. The module under consideration contains two chips C hipl and Chip2. Each of these chips can be designed using one of the four design styles. Therefore, the module can be designed in 10 different ways, namely d l l, dl2, dl3, dl4, d22, d23, d24, d33, d34, and d44, where dij represents the case when one chip has the design style Di and the other Dj. The objective is to find the design style for both chips such th at the overall controller complexity is minimal and the test tim e is less than or equal to a given bound. T est tim e versus controller com p lexity Using BOLD, the required inputs are a CTL-file for each chip and an MTL-file for the module. These files can be easily prepared. The test programs are then autom atically synthesized and the test tim e is calculated. Figure 5.2 shows the relationship between the test time and the controller complexity. In the figure there are four plots identified by the design style for C hipl. For example, in plot 1 denoted by “Chipl uses D4”, the data points illustrate the cases where C hipl has the design style D4 while Chip2 can take any of the four other design styles (D l, D2, D3 and D4). The horizontal axis (controller complexity) represents the overall controller complexity of the module. The complexity of the bus interface (the Test Access Port) and the MMC are not included since they are fixed. The vertical axis represents the total test time for the module. Note th at the interconnect test tim e is not included since it is fixed if the number of test rings remains unchanged. The data points ‘+ ’ represent the cases where both Chipl and Chip2 are located on the same test ring. The data points ‘o’ connected by the dashed line segments represent the cases where Chipl and Chip2 are located on different test rings. In general the test tim e decreases as the complexity of the on-chip controller increases. Also, the number of test rings may affect the test tim e. For example, 134 te s t t i m e te s t time d44 1. Chipl uses D4 100 complexity 200 150 C O ■ 8 5 3. Chipl uses D2 4 3 d24 2 l 1 — 100 150 200 250 300 . xlO4 $34 2. Chipl uses D3 '- - $ 3 3 + 50 100 150 200 complexity xlO4 250 $14 4. Chipl uses D l + 150 200 250 300 350 complexity complexity Figure 5.2: Test time versus controller complexity, 135 in plot 1, the test tim e is affected by the number of test rings only when Chip2 has the design styles D1 or D2. In both cases, Chip2 has autonomous controller. Once Chip2 is properly initiated, it can be tested concurrently with C hipl since both chips are located on different test rings. On the other hand, the test tim e is not affected by the number of test rings when Chip2 has the design styles D3 or D4. In this case these two chips are tested in sequence since they both require the assistance from the test bus in executing the test. So the test tim e remains the same even when these two chips are located on different test rings. The results shown in plot 2 are similar to that of plot 1 since C hipl also has a bus-dependent on-chip controller. In both plot 3 and 4, C hipl has an autonomous on-chip controller. The number of test ring affects the test tim e only if Chip2 has the design style D3 or D4. If two test rings are used, C hipl and Chip2 can be tested in parallel. On the other hand, these two chips must be tested in sequence if only one test ring is used. T radeoff C urve In Figure 5.3 the test tim e and controller complexity of all the 10 possible design combinations for Chipl and Chip2 are shown; th at is d ij = dji. The horizontal axis represents the overall controller complexity of the module (in units). The vertical axis represents the total test time for the module in test clock cycles. The data points ‘+ ’ represent the cases where both chips are located on the same test rings. The d ata points ‘o’ along with the dashed line segments represent the cases where the two chips are located on different test rings. A curve consisting of these data points can be used to make design decision in selecting an optim al design point for given constraints in both test time and overall controller complexity. As shown in the figure, the test tim e when using two test rings is always less than or equal to that of the one test ring case. It is interesting to note that even the design d33 (for the two test ring case) has a higher controller complexity than the design d24. The test tim e for the former is greater than th a t of the latter. Similarly the design dl4 has a much higher complexity than d24, but there is little difference between the test tim e for both cases. Therefore both designs d33 and d!4 are inferior to the design d24. Using the same argument, more inferior designs 136 xlO4 d44 5.5 4.5 single test ring d34 d33 S 3.5 2.5 d24 dl4 two test rings dl3 , d23 d22 6-------------- fc-e-- dl2 d ll 1.5 350 100 150 200 250 300 controller complexity Figure 5.3: Tradeoff curve: Test time versus controller complexity. can be identified. These inferior designs include d33, dl4, d22, dl3, dl2 and d l l. A curve, referred to as the t r a d e o f f c u r v e for a module, consists of all designs that are not inferior. These superior designs include d44, d34, d24 and d23. This curve can be used in selecting the most appropriate design satisfies both the constraints on test tim e and controller complexity. The analysis of this example shows that (1) putting both TPG and SA on chip can greatly reduce test time, (2) putting the test control facilities, including FSM and two counters, on chip can also reduce test tim e, however, (3) putting the seed and correct signature on chip has little if any impact on reducing test tim e and often leads to an inferior design. 137 This approach can be generalized to the cases where a module contains many chips th at have many different design styles. The optim al solution can be guaranteed if the design points include the entire design space. The drawback of this approach is that it is not feasible for a module containing a large num ber of chips, since the time required to plot the tradeoff curves in such cases grows exponentially as the problem size increases. To determine the optim al solution in such a case, an algorithmic approach is needed. Two algorithms that can reduce the tim e required to generate an optim al solution are presented next. These algorithms do not remove the need of test program synthesis since they assume the test time for each design is known. The test tim e calculated by these algorithms is only an approxim ated value since the model used is different from the final design. Therefore the final results have to be verified by the synthesis approach in order to make sure that the m odule test tim e is actually less than the given bound. 5.2 A lg o r ith m -B a se d M in im iza tio n To simplify the problem an MMC is assumed to have only one test channel. We make this assumption to ensure that the MMC processor bandw idth will m atch th at of the test channel. If a module has m test channels, then it will have m MMCs. Suppose there are n chips in the module under test. Let T be the given upper bound of the module test time. The on-chip controller for each chip can be either autonomous or bus-dependent. Let the controller hardware complexity for chip i be a4 if its on-chip controller is autonomous; otherwise, let the complexity be bi. The test program synthesis technique can be used to calculate the test time for each individual chip. Suppose that the test time for chip i is tai if its controller is autonomous and t^i otherwise. 138 5 .2 .1 O n e T est C h a n n el Assume there is a single test channel and it can control either one or two test rings. However, it can only transm it data to one ring at a time. It is clear th at a 8 - > & , • for all chips. (This assumption can be justified from the discussion in chapter 2). By connecting all chips that have autonomous on-chip controllers to one test ring, the self-test of all these chips can be executed in parallel with the testing of the chips with bus-dependent controllers. On the other hand, all chips th at have bus-dependent controllers must be tested in sequence. The module test tim e is Tm = m ax(t0,i&), where ta is the longest test tim e required by any chip with an autonomous controller, and tb is the sum of the test tim e for all chips with bus- dependent controllers. In this work, it is assumed th at t ai < T, Vi, where T is a given param eter representing the upper bound on the total test time. The objective is to determine the controller design style for each chip such that Tm < T and the total complexity of all the controllers is minimal. P rob lem F orm ulation Let X i be a variable associated with chip i defined as follows. { 1 if chip i has a bus-dependent on-chip controller, 0 otherwise. The objective is to minimize n ^ j ■ 1 - ’jbi T (1 X ^ C t j i =1 subject to n T > JT, Xitbi- i = l Note that the objective function can be reformulated as n n n n ^ ^ (x ;b j -f- (1 j = ^ 1 O j ' y j X i ( d i & ,■ ) — C y j X i ( a , i 6j). i=1 i = l i=l i — 1 139 C is a constant. Let C i = a4 - — hi be the difference in complexity between the autonomous and bus-dependent design for chip i. It is clear th at c * > 0 since for a given chip the complexity of an autonomous controller is higher than th at of a bus-dependent controller. The objective can be restated as follows. Maximize n ^ ^ X{C{ i =1 subject to n T > ^2, xit i— l This formula is equivalent to that of the Optimization 0-1 Knapsack [53] and can be efficiently solved using dynamic programming. The following algorithm , proposed in [53], is used to solve this problem. A lgorith m D P 1. Let Mo = {(0,0)}. 2. For j — 1,..., n do (a) Let M j = 0. (b) For each element (-S ', c) of M j - 1, add to M j the element (S, c) and also (S u { iL c + cj) if J2ieshi + h j < T. (c) Examine M j for pairs of elements (5, c) and (S', c') with the same second component. For each such pair, delete (S', c') if YlieS'tbi > Ylies ^b i-, otherwise delete (S, c). 3. The optim al solution is S, where (S, c) is the element of M n having the largest second component. Starting with chip 1, the algorithm determines the design style for each chip in sequence. After the design style of chip n has been determined, the optim al solution is obtained. A solution is represented by (S,c), where S is the set of chips which should have bus-dependent controllers and c is the total complexity for this set of chips. A solution set is feasible if Y^ieS hi < T. M j is the set of all feasible solutions after the design of j th chip has been determined. It is obvious th at the feasible set in M n with the largest c is the optimal solution. 140J A C program has been implemented based on this algorithm. The program can solve this design problem in 0 (n 2c) time. This C program is used in the following algorithm to determine the design style for each chip and also connects the chip to one of the test rings of the test channel. The module can then be tested within the tim e bound T by an MMC having a single test channel. If only one test channel is used to control the testing of a set of chips the following algorithm can be used to determine the design style for each chip such th at the total test tim e is less than or equal to T and the overall control complexity is minimal. The algorithm also determines the chips are connected to the test bus. A lgorith m T C I: 1. Find the test tim e tu and tat for each chip using the test program synthesis technique. The test tim e is calculated based on the model used in describing a CTL-file for a chip (see Figure 4.2 in chapter 4). 2. Formulate the problem as above and find an optim al solution set using Algo rithm DP. 3. If a chip is in the solution set, its controller is bus-dependent; otherwise, it is autonomous. 4. Connect all chips that have bus-dependent controllers to ring 0 of the test channel. Connect all chips that have autonomous controllers to ring 1 of the test channel. L em m a 2 The solution set found by Algorithm TCI is optimal. Proof: The proof is obvious since the solution set is found by using a dynamic programming procedure, which implicitly searches through the entire solution space. □ Using Algorithm T C I, one can quickly obtained optim al controller designs. Figure 5.4 shows the results obtained by using both approaches. The data points connected by solid line segments are the actual values obtained by using the test program synthesis technique. The data points connected by dashed line segments are obtained by using Algorithm T C I. As observed, there is very little difference between the results obtained from the two approaches except for the design d44. 141 xlO4 5.5 o: obtained from synthesis *: estimated by algorithm TCI 4.5 3.5 2.5 1.5 200 300 350 100 150 250 controller complexity Figure 5.4: Test time estim ated by algorithm T C I. The discrepancy is caused by the simulation program that calculates the test time from the module test program. 5 .2 .2 M u ltip le T est C h a n n els The test tim e for a module can be further reduced by increasing the num ber of test channel while the overall controller complexity remains minimal. The following algorithm determines the controller complexity for each chip and the num ber of test channels required such that the module test time is less than or equal to T and the overall controller complexity is minimal. The controller complexity includes both the complexities of on-chip controllers and of the test channels. 142 Let ctc be the complexity of a test channel. Let G = (V,E) be a graph. A set of nodes I C V is an independent set of G if W,-, Vj G / , e = (Vi,Vj) E. Y is a maximal independent set of G if (1) Y is an independent set and (2) Y is not contained in another independent set. A lgorith m TCM : 1. Formulate the problem as above and use Algorithm DP to solve it. However, replace step 3 of Algorithm DP by the following: For every solution set ( S k , Ck) in M n, if c * ; > etc, mark it as a candidate. Let the num ber of candidates be 2. Relabel the candidates such that they are represented as ( S k , C k ) , k = 1 , z . 2. Construct a weighted graph G=(V,E,C) as follows: For each candidate 1 < k < z associated with (Sk, Ck), there is a corresponding node Vk € V and a label C k € C. An edge e = (Vi,Vj) G E if (Si,Ci) fl (Sj,Cj) ^ 0. 3. Let Q be the collection of all maximal independent sets of G. Find X G Q so th at YLv,eX ci maximal. 4. For each node G X , allocate a new test channel. For all chips in St of the corresponding candidate, assign the design style for their on-chip controllers as bus-dependent. Connect these chips to the newly allocated test channel. Repeat this step until all nodes in X are processed. Let m be the num ber of test channels allocated in this step. 5. Let U be the set of chips that do not belong to any candidate in X . If U ^ 0 allocate a new test channel. All chips in U are connected to this test channel. Apply Algorithm TC I to U and find a set of chips U b th at have bus-dependent controller. Connect all chips in U b to test ring 0, and all chips in U — Ub to test ring 1 of the newly allocated test channel. Note th at all chips in U — U b have autonomous controller. □ In step 1 a candidate is a feasible set found using Algorithm DP with C k > C tc. This means th at making all chips in a candidate set as bus-dependent can satisfy the tim e constraint and the complexity is reduced since the cost of allocating a new test channel is less than that of making these chips autonomous. In step 2 the optim ization problem is formulated as a graph so that it can be solved in step 3 by finding the maximal independent set for a graph. Computing X in step 3 in general requires exponential time since finding a maximal independent set of a graph is a known NP-Complete problem [24], An algorithm (Algorithm MIS), that uses a branch and bound technique to find X efficiently, is given below. Efficiency 143 is obtained by pruning the solution space. The number of test channel allocated in step 4 is maximal. This leads to the minimization of the overall controller com plexity. If the complexity of a test channel ctc is very large such th at no candidate can be found (z = 0), then steps 2, 3 and 4 are not executed and the set U in step 5 consists of the entire set of chips. For this case both Algorithms TCM and T C I generate the same result. A lg o rith m M IS : 1. Initialization: Set X = Y = 0. X contains the current “best” independent set. 2. Call B B (Y , 1); 3. A is a maximal independent set and ci is maximal. F u n c tio n B B ( Y ,i ): 1. If i > n return; otherwise continue. 2. Y = Y U K } . 3. If Y is an independent set then (a) if (\X\ < \Y\) or (\X\ = = \Y \ and E ^ c , - < JZv,€y Ci),setX = Y. (b) Call B B {Y ,i + 1) 4. Call B B ( Y - {vf -},* + 1). □ Using a recursive approach, the function B B searches through the entire solution space for Y (branch) and intelligently prunes off a subspace if it does not contain a solution better than the best solution seen to date. Pruning is performed in step 3 of B B . W hen an independent set Y is found, X is updated only if an independent set with more nodes is found (|A | < |K|) or an independent set with higher value is found (|X | = = |F | and c% < ^ ^ n°t independent, the subspace containing Y is no longer considered, e.g., if is not independent, all set of nodes that contain both v\ and will not be checked for independence. L e m m a 3 The solution found by Algorithm TCM is optimal. 144 Proof: The solution set X is optimal in that the test tim e for the resulting design is less than or equals T and the overall controller complexity is minimal. This is obvious since the algorithm implicitly searches through the entire space. □ 5 .2 .3 R e su lts Let a chip be represented by a two-tuple (e,-, ta), where c; and it> i have been defined previously. E xam p le 5.1: Consider a module consisting of five chips, numbered from 1 to 5. These chips can be represented as (6,1), (11,1), (17,3), (3,2) and (9,2), respectively. Using Algorithm T C I, the solution set found is S={1, 2, 3}. Using Algorithm TCM , the results are as follows, where n= 5 and T=5. C tc num. of candidates z num. of TC allocated m 10 10 3 15 6 2 20 3 1 30 1 1 40 0 0 The value in the last column is the number of test channels allocated in step 4 of Algorithm TCM. Note that even with m =0, Algorithm TCM will still allocate one test channel in step 5. E xam p le 5.2: Consider a module consists of seven chips, numbered from 1 to 7, and is represented as (299,4), (73,1), (159,2), (221,3), (137,2), (89,1) and (157,2), respectively. Using Algorithm T C I, the solution set is S={1,2,3,6,7}. Using Algo rithm TCM, the results are as follows, where n= 7 and T=10. 145 C tc num. of candidates z num. of TC allocated m 500 44 2 550 32 1 600 27 1 700 10 1 800 0 0 5.3 D isc u ssio n s Algorithm TCM can also be used in the design of system level controllers. For example, if each MMC can only control a test channel, then Algorithm TCM can determ ine the number of MMC required such that a system can be tested in a predeterm ined time. C hip T y p e C onstraint W hen two chips have the same application circuit, it is beneficial to let them have the same type of on-chip controller so that only one type of chip need to be manufac tured. The algorithms presented above can deal with this constraint if the problem is modeled as follows. Replace the set of chips that have the same application cir cuit with a new pseudo-chip. The test time and the complexity of the pseudo-chip is the sum of the test tim e and the complexity of the chips deleted, respectively. Repeat this process until the constraint on the chip type is completely removed. The new problem can then be solved by the previously proposed algorithms. C hips H aving A u tonom ou s C ontroller The test tim e required by chips having an autonomous controller is neglected in both Algorithm T C I and TCM. It is assumed that these chips are connected to a test ring and their tests are executed concurrently while the test ring stays in the R unTest bus state. The assumption is invalid if an autonomous chip cannot stop the self-test process by itself. In this case, a test ring should be assigned to each chip. The controller complexity is thus increased. This problem can be solved if the test channel is modified such th at it can support many test rings while data can be sent to only one ring at a time. In 146 such a case, the complexity of the test channel remains low and all chips having autonomous controller can be tested concurrently. C ontroller w ith M u lti-level C om p lexity For both Algorithm T C I and TCM, the on-chip controller is assumed to have only two levels of complexity, a; and However, it is possible to have controllers with m any levels of complexity, as shown in section 5.1. For example, the complexity of chip i can be further classified into , where j = 1,..., if, if it has a bus-dependent controller, and atJ, where j = 1,..., ia if it has an autonomous controller. The test tim e for the chip is t\- if the complexity is 6 ,-y , and < ? ■ if the complexity is Using one test channel, the problem can be formulated into an integer linear programming problem as follows. Let the variable Xij be defined as X { j — {: if the complexity of the on-chip controller is a^j otherwise and the variable be defined as { 1 if the complexity of the on-chip controller is bij 0 otherwise. The objective is to minimize *6 y > Xijdjj + y ^ yij kj) 8 = 1 j = 1 j = 1 subject to yij = 1, Vi, and ELi Ei=i ViAj < T. 147 C h a p ter 6 In te rc o n n ec t T est G en era tio n Previous work on the diagnosis of faults in a wiring network has been based on the assumption that both open and short faults do not exist on the same net. W hen this assum ption is relaxed, the results obtained fail to identify all diagnosable faults. The non-diagnosability of these faults, including shorts between nets, represents a deficiency. In this chapter the causes for this deficiency are analyzed and explained. New test algorithms and theoretical results for correcting this deficiency are also provided. Based on these results, a test set that is capable of identifying all diagnos able faults in a wiring network with arbitrary open and short faults is developed. Finally, two adaptive diagnosis algorithms which can reduce the num ber of test vectors while retaining the same level of diagnostic resolution are delineated. 6.1 In tr o d u c tio n Detecting and locating faults in wiring networks on a printed circuit board has drawn much attention since the emerging of the boundary scan architecture [33]. In this architecture each primary input/output pin of a chip is associated with a boundary scan (B-S) cell. Each chip has a boundary scan register consisting of all the B-S cells. During the test mode a scan chain is formed by cascading boundary scan registers of several chips. Through this chain a test controller can access the I/O pins of every chip. Thus a virtual bed-of-nails capability is achieved. W ith 148 this capability the wiring nets can be isolated from the chips and tested without the need to physically probe the board. In this way it is possible to test the new generation of boards which allow limited probing due to the use of surface mounted devices and tape autom ated bonding technology. Note th at if non-boundary scan devices are used, physical probes may still be required. Many papers have dealt with the problem of finding test sets for detecting and locating faults in a wiring network [16, 25, 29, 30, 34, 37, 56, 64, 68]. Usually opens are modeled as stuck-at faults and are diagnosed separately from the shorts. This is based on the assumption that a net cannot be simultaneously associated with both open and short faults. Very comprehensive results are presented by Jarwala and Yau [34], where a framework for the detection and diagnosis of wiring networks is discussed. In particular, the diagonally independent property is identified. It is shown th at a test set with this property is sufficient for the diagnosis of all shorts in one-step, where the results are analyzed after all test vectors have been applied. However, as shown in this chapter, when the assumption concerning open and short faults is relaxed, a test set having the diagonally independent property is insufficient for achieving complete diagnosis without repair. In fact, there are cer tain faults th at cannot be diagnosed without repair. Some of these non-diagnosable faults are listed in section 6.2.3. Furthermore, it is shown that none of the previous results can identify all diagnosable faults. The causes for this deficiency are identi fied and characterized in Lemmas 4 and 5. A diagnostic level DR5, which refers to the case where all diagnosable faults can be identified, has been formulated. A term called maximal diagnosis is defined using two conditions. It can be shown th at these conditions are both necessary and sufficient for achieving the diagnostic level DR5. Both one-step and two-step diagnosis are addressed in this chapter. In the former case, responses are analyzed only after all test vectors have been applied. In the latter case, responses are analyzed after a fixed part of the test vectors have been applied. Based on this analysis, additional test vectors are then generated and applied. Final analysis is then carried out to identify the faults. 149 For one-step diagnosis, a property called set-cover independent is identi fied. Based on this property a fundamental theorem on diagnosing wiring faults is presented. The theorem gives both the necessary and sufficient conditions for identifying all diagnosable faults. A universal test set is also presented. This test set can achieve maximal diagnosis for an arbitrary network without assuming a specific fault model. For the two-step diagnosis case, two adaptive diagnosis algorithms are pre sented. Compared with one-step diagnosis, these algorithms can reduce the number of test vectors while retaining the same level of diagnostic resolution by using a two- step scheme. In the first step, a detection sequence is applied and the responses are evaluated. Based on the initial results, a second sequence is applied to achieve the required diagnosis. The test vector size is reduced since some information about the network is employed in generating the second sequence. 6.2 P relim in a ries A wiring network consists of many nets. A net contains one or more drivers and one or more receivers. The logic value of a net can be controlled via one of its drivers and observed by all of its receivers. For a multi-driver net, only one driver can be enabled at a time; the others must be disabled. In addition, while testing a wiring network, only the drivers and receivers of nets are accessible. A fault-free net can transfer the logic value from an enabled driver to its receivers correctly. A receiver of a fault-free net can only receive from its associated drivers. The objective of diagnosis of a wiring network is to find a set of test vectors which can be applied to identify as many faults in the network as possible without repair. 6 .2 .1 F au lt M o d el Two types of physical faults, namely open and short, are assumed. More than one physical break is possible in an opened net. Ignoring fan-out nodes and shorts, m ultiple opens are modeled as a single open along a wire segment. Also more than 150 one physical bridge are possible between two shorted nets. M ultiple shorts are modeled as a single short between two wire segments. If two or more nets are shorted, the resulting behavior can be modeled as either (1) a wired-OR, (2) a wired-AND, or (3) a strong-driver, where one driver dom inates the resulting behavior. In all cases, all nets involved in a short will have the same resulting logic value. The wired-OR fault model is assumed in this discussion unless otherwise stated. A net shorted to a power line VCC (GND) will exhibit a stuck-at 1 (stuck-at 0) behavior. If a net contains an open, the logic value interpreted by all floating receivers of the opened net will be the same, which could be either a soft stuck-at 1 or a soft stuck-at 0. The logic value of a net shorted to a wire segment having a soft logic value cannot be forced to the soft value. The soft stuck-at 1 model is assumed for a floating net unless otherwise stated. In Figure 6.1, the logic value of the point A is soft stuck-at 1. I A fl Wi Figure 6.1: A soft stuck-at 1 case. Both opens and shorts can occur on the same net. If a net contains both an open and a short, the logic value received by the receivers is determ ined by the combined effect of these faults. A short to an open net is illustrated in Figure 6.2, where A takes the logic value of B. Wj f 2 B fl w; Figure 6.2: A short to an opened net. 151 6 .2 .2 N o ta tio n an d D efin itio n s The notation and definitions used in this chapter follow the conventions established in [34]. For convenience, some of this information is repeated here. • Parallel Test Vector (P T V ): the vector applied to all nets of a wiring network in parallel. • Sequential Test Vector (STV): the vector applied to a net, over a period of tim e, by a sequence of P T V s. • Test Set (or test sequence) S : the collection of all S T V s. Each column of S is a P T V and each row of S' is a S T V . • Sequential Response Vector (SRV): the response of a net to a STV. • Syndrome: the SRV of a faulty net. • Aliasing syndrome: the resulting syndrome of a set of faulty nets is the same as the correct SRV of a net not in the set. • Confounding syndrome: the syndromes that results from m ultiple indepen dent faults are identical. The following definitions are also used. • OR-Cover: A vector V OR-covers another vector V j if for every bit position in V j that is 1, the corresponding bit in V i is also 1. For example, STV*— (HOI) OR-covers STVj=(0101), or S T V j is OR-covered by S T V i . The OR-cover is used in the wired-OR fault model. In a similar fashion one can define an AND- cover for the wired-AND fault model. In this chapter, the term OR-cover is abbreviated as cover. • Independent set: A test set S is an independent set if no STVi is covered by another S T V j,j ^ * ■ 152 • Set-cover: Let Vj be the result of wire-ORing a set of vectors Vji, ..., Vjk- A vector Vi set-covers the vectors V ji,..., Vjk if V covers Vj. • Set-covering syndrome: A set-covering syndrome is a syndrome th at results from a set of shorted nets (W') that either covers a SRV or is covered by the SRV of some net W i not in W '. • Set-cover independent: Let S = { S T V i,..., S T V n ) T be a test set for a set of nets W = (ttfj,... ,w n). S is set-cover independent if for i — 1,... ,n, S T V i is neither covered by nor covers the union (for wired-OR, intersection for wired- AND) of any subset of vectors in S — { S T V } . In the other words, for every S R V i in S no set-covering syndrome can exist. 6 .2 .3 N o n -D ia g n o sa b le F au lts A fault / is said to be non-diagnosable if there is no test set S and an algorithm A such th at by applying S to the network and processing the responses using algorithm A, f can be identified. Note that all single faults are diagnosable. Based on the fault model presented, some faults are non-diagnosable. Some of these non- diagnosable faults are listed below. • In a set of nets that are shorted with each other, there are some opens that are non-diagnosable. For example, in Figure 6.3 the open fault fl on net w 2 is non-diagnosable. Since Wi and w 2 are electrically common, it is impossible to find a test to detect the open. W \ f 2 f 3 w 2 fl Figure 6.3: An open that is non-diagnosable. 153 • The short between a set of opened nets is non-diagnosable. For example, in Figure 6.4, it is impossible to identify the short fl between wi and W2 since no receivers are connected to the shorted wires. f l W2 f l f3 Figure 6.4: A short th at is non-diagnosable. • There exists three possible reasons for a faulty net th at has all-1 responses to a test, namely (1) the net is shorted with a VCC power line, (2) the net is opened and floating (soft stuck-at 1 model), and (3) the net is shorted with other nets such that the combined result is an all-1 vector (wired-OR model). The third case can be distinguished from the others by applying an all-0 PTV. The first two cases cannot be distinguished. 6 .2 .4 D ia g n o stic R e so lu tio n Various levels of diagnostic resolution are possible in testing a wiring network with out repair. Listed below are six such levels. They are listed in ascending order of their diagnostic resolution, i.e., DR1 has the lowest diagnostic resolution, and DR6 has the highest diagnostic resolution. D R 1: Determine whether the entire network is fault-free. D R 2: Identify all faulty nets. D R 3: For each and every net, determine whether it is fault-free without knowing the response of the other nets. D R 4: Identify all faulty nets. In addition, for nets without shorts, identify the existence of nets having opens. For a faulty net without open faults, identify all nets that are shorted to it. 154 D R 5: Identify all faulty nets. In addition, identify all faults th at are diagnosable. D R 6: Identify all faulty nets. In addition, identify all the opens and shorts in the network. In DR1, one is only interested in determining the health status of the entire network. No further diagnostic information is provided. In DR2, all faulty nets are identified. No information about what type of faults associated with each net is provided. In DR3, all faulty nets are identified. In determining the health status of net, only its response is required. No information about the response of other nets are needed. This scheme is most suitable for a built-in self-test type of design. In DR4, all faulty nets are identified. The faults associated with each nets can be identified if they belong to those cases described above. In DR5, all faulty nets are identified. More faults can be identified than in the case of DR4. In DR6, all faulty nets are identified. In addition, all the faults, including opens and shorts, are identified. For the purpose of repairing a wiring network, it is desirable th at as many faults as possible be identified. Due to the fact that some faults cannot be identified without first repairing other faults, DR6 cannot be reached. An example would be a net with m ultiple opens. In this work, the focus is to achieve DR5 without repair. 6 .2 .5 P r e v io u s R e su lts Previous results focus on diagnostic resolution ranging from DR1 to DR4. Some typical results for the testing of a network consisting of 4 nets are listed below. These results are based on the assumption that both opens and shorts cannot exist on the same net. Counting Sequences: [37] 0 0 1 0 1 0 0 1 1 1 0 0 155 This test set consists of a simple counting sequence, where the all-0 and all-1 STVs are not used. This test can achieve the DR1 diagnostic levels. The size of the test set is \log(n + 2)], where n is the number of nets. Complementary Counting Sequence: [64] .. — 1 i — 4 l 0 O • l 0 0 l 0 l 1 0 i " O 0 1 1 This test set consists of a counting sequence and its complement. The all-0 and all- 1 STVs can be used. The size of the test set is 2 \logn\. This test set can achieve diagnostic level DR3. By eliminating the aliasing syndromes, the self-diagnosis property is achieved. This allows the determination of the health status of a net by examining only its S R V . Maximal Independent Set: [16] 1 0 0 ' 0 1 0 0 0 1 1 1 0 Constant weight codes, where every STV has the same num ber of Is, is a class of independent test sets. These test sets can achieve diagnostic level DR3. The size of the test set is minimal for self-diagnosis when the num ber of Is in a SRV is half of the number of PTVs, which is referred to as a maximal independent set [16]. Diagonally Independent Sequence: [34] X X X X X 1 X 1 0 1 0 0 156 The x represents either a 0 or a 1. This test set can achieve the diagnostic level DR4. Both aliasing and confounding syndromes can be eliminated. All pairs of nets th at are shorted are identified. 6 .2 .6 D e fic ie n c ie s in P r e v io u s A p p ro a ch es Recall th at these results are based on the assumption that both opens and shorts cannot exist on the same net. However, when this assumption is relaxed, there exists certain types of opens and shorts that cannot be identified. For example, the diagonally independent sequence S = 0 10 1 0 0 10 0 10 0 10 0 0 cannot identify the short fault fl in Figure 6.5 nor the open fault f2 in Figure 6.6. 0 10 1 0 0 10 0 10 0 10 0 0 fl f 2 0 10 1 0 0 10 1 1 1 1 10 0 0 Figure 6.5: A short that cannot be identified by a diagonally independent sequence. 0 10 1 0 0 10 0 10 0 1 0 0 0 fl f 2 0 10 1 0 0 10 0 10 1 10 0 0 Figure 6.6: An open that cannot be identified by a diagonally independent sequence. 157 The following lemmas summarize those cases which cannot be completely handled by previous mentioned approaches. L e m m a 4 A test set S cannot identify the short between two nets Wi and Wj if (a) there is an open which is closer to the receiver of net Wi than the short, and (b) S T V i is covered by S T V j . P ro o f: S R V is the all-1 vector since no logic value is transferred to the receiver. Furtherm ore, S R V j = S T V j since S T V j covers S T V i . Therefore, it is impossible to know whether there is a short between Wi and w3. □ Figure 6.5 is an example of Lemma 4. L e m m a 5 A test set S cannot identify the open in a net Wi if there exists another net Wj such that (a) there is a short between Wi and Wj which is closer to the receiver of Wi than the open, and (b) S T V j covers S T V i . P ro o f: Since S R V j = S R V i = S T V j , the “contribution” from S T V i to S R V becomes indeterm inate. Therefore, the open cannot be identified. □ Figure 6.6 is an example of Lemma 5. Both Lemma 4 and 5 can be generalized to m ultiple nets. For example, in Figure 6.7 the short fault fl cannot be identified by a maximal independent set. This is because STV 3 is set-covered by .S '7214 = SRV 2 = 1110. Therefore it is impossible to determine whether the short fl exists or not. Wl 1 1 0 0 1 0 1 0 0 1 1 0 0 0 11 w 2 Wn f l W4 1 1 1 0 1 1 1 0 0 0 11 0 0 11 Figure 6.7: A short that cannot be identified by an independent test set. 158 In summary, none of the previous approaches which include the diagonally independent sequence [34], the maximal independent set [16], and the complemen tary counting sequence[64] can identify all the faults described above in one pass without repair. The existence of unidentifiable shorts and opens in a network represents a deficiency in diagnosis. A test set that can be used to identify these faults is presented next. 6 .3 O n e-S tep D ia g n o sis Due to the existence of non-diagnosable faults in a wiring network, it is impossible to identify all faults without repair or access to points of the nets other than the drivers and receivers. The term m ax im al diag n o sis is defined as follows. Let W = (wi, w2, ..., wn) be a set of nets to be tested, D, be the set of drivers of net w,, and R i be the set of receivers of net W { . A test set S achieves maximal diagnosis for W if the following two conditions are verified. • Condition C l: For i = 1,..., n, by analyzing the responses obtained from the application of S , the existence of the connection (J9t, Ri) can be determined. The connection (D i,R i) exists if for all k, each driver dik € Di can transfer its logic value to all receivers in R i correctly. A driver in D i is said to transfer its logic value to a receiver R i if S R V t covers S T V i . Note th at only one driver can be enabled for a given net at one time. In other words, if the connection ( D i , R { ) exists, then the application of S T V i to the enabled driver of W i will make one of the following statem ents true: (1) S R V i — S T V i , or (2) S R V j covers S T V i . • Condition C2: For all i,j, i ^ j, by analyzing the responses obtained from the application of S, the existence of the connection (D,, R j ) can be determined. The connection (D i, R j ) does not exist if for all k, each driver dik € Di does not transfer its logic value to any receivers in R j . In other words if the connection 159 (Di, Rj) does not exist, the application of STVi to the enabled driver of to * will make one of the following statem ents true: (1) SRVj / ST V i, or (2) SRVj does not cover STVi. T h e o re m 1 Diagnostic level DR5 can be achieved by a test set S iff S achieves maximal diagnosis. P ro o f: (*/part) The test set S verifies both Condition C l and C2 since S achieves m aximal diagnosis for a network W . By definition DR5 is achieved if all diagnosable faults in W can be identified. There are two type of faults in W , namely opens and shorts. The diagnosability of these faults by S are discussed separately. (1) opens: An open on a net Wi can be diagnosed if the connection (D i,R i) does not exist. Since S can achieve maximal diagnosis, the connectivity is deter m ined in Condition C l. Thus S can identify all diagnosable opens in W . (2) shorts: A short between two nets Wi and Wj can be diagnosed if one of the following is true: (a) there exists a driver set Dk such that both connections (Dk, Ri) and (Dk, Rj) exist; (b) there exists a receiver set Rk such that both connections (Di, Rk) and (D j,Rk) exist. The existence of these connections can be determined by S using Condition C2. Thus S can identify all diagnosable shorts in W. From (1) and (2), all diagnosable faults can be identified. Therefore the diagnostic level DR5 can be achieved. (only i/p a rt) Assume that S cannot achieve maximal diagnosis. By definition, there exist at least one connection for which the existence cannot be determined. Two cases are possible: (1) if the connection is of the form (D{,Ri) then there can be an open fault on net Wi that cannot be identified; (2) if the connection is of the form (D i,R j),i ^ j, then there can be a short between wt and Wj th at cannot be identified. In both cases, the faults can be identified by a walking ones sequence. Thus, by definition, these faults are diagnosable. Therefore the diagnostic level DR5 cannot be achieved. Thus the maximal diagnosis property is necessary. □ 160 In this chapter the generation of test sets that achieves maximal diagnosis is discussed. The generated test sets thus achieves diagnostic level DR5 in which all diagnosable faults are identified. This is the best diagnostics possible without accessing points on the nets other than the drivers and receivers. L e m m a 6 For the wired-OR (wired-AND) model, any test set S is set-cover inde pendent iff S has the walking ones (zeros) sequence as its subsequence. Proof: (t/p a rt) This is obvious since the walking ones sequence is set-cover independent. ( only if part) Suppose that the test set S is set-cover independent. For a given STVi, there must exist a PTV such that its tth bit is 1 and all other bits are 0. This is true for all STV i,i = 1 , . . . , n . By arranging S properly (by swapping rows and columns), a walking ones sequence can be constructed. Therefore, S contains a walking ones subsequence. □ In the following, a theorem that characterizes the test set which achieves maximal diagnosis is presented. T h e o re m 2 A test set S achieves maximal diagnosis for a network W in one-step iff S is set-cover independent. Proof: (*/part) Suppose th at S is set-cover independent. No set-covering syndrome can exist. For each and every net to,, Condition C l can be verified by checking whether S R V i covers S T V i . If this is not true then there is at least an open fault between the driver and the receivers of the net to,. Since no set-covering syndrome can exist, no drivers of other nets can cover the S T V i . Condition C2 can be verified as follows. If S T V i cannot cover S R V i , then for every bit in S R V i th at is not covered by the S T V i , there is a short between the receiver of to ,- and a driver w j whose S T V j covers th at bit. 161 Since both Conditions C l and C2 can be verified, m aximal diagnosis is achieved and the sufficiency aspect of the theorem has been dem onstrated. {only if part) Suppose that S is not set-cover independent. There exists at least one S T V i that is covered by another S T V j . W hen the following two faults occur, the existence of the connection (D i,R i) cannot be determined: (1) a short between to, and Wj, and (2) an open on Wi that is closer to the driver than the short. This means that Condition C l cannot be satisfied. Thus, by definition, maximal diagnosis is not achieved. □ From Lemma 6 and Theorem 2, one can conclude that any test set th at can achieve maximal diagnosis must have a walking ones (zeros) sequence as its subse quence. From Theorem 2 and 1 it can be concluded th at a set-cover independent test set can achieve the diagnostic level DR5. E x a m p le 6-1: The short that could not be identified by a diagonally independent sequence in Figure 6.5 can be identified by a set-cover independent sequence (see Figure 6.8). 0 0 0 1 - 0 0 10 - 0 1 0 0- f l f 2 0 10 1 0 0 10 1 1 1 1 1 0 0 0 10 0 0 Figure 6.8: Achieving maximal diagnosis using a set-cover independent sequence. U n iv e rsa l T est S et: Assuming the wired-OR model and that a floating net is modeled as a soft stuck-at 1, a test set th at can achieve maximal diagnosis for a network consisting of three nets can be constructed as follows. ' 1 o 0 0 1 ' 0 0 1 0 r--- o 1 0 ---- , O 162 The all-0 PTV is used to distinguish between the cases (1) all nets are shorted together (all Is for SRVs) and (2) all nets are opened. Similarly, if the wired-AND model is used and an open and floating net is modeled as a soft stuck-at 0, a test set for maximal diagnosis can be constructed by a walking zeros sequence followed by an all-1 PTV. In summary, a universal test set for maximal diagnosis, without m aking any assumption on the nature of the faults, is as follows. ' 1 1 1 0 0 0 0 1 ' Suniversal — 1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 0 6 .4 T w o -S tep D ia g n o sis Two-step diagnosis refers to the fact that diagnosis is done by applying two test sequences. The results of the first test sequence is used to generate the second test sequence. This type of diagnosis is also known as adaptive diagnosis. Two adaptive algorithms that can achieve maximal diagnosis with a reduced num ber of PTV s are presented next. The test sets for both algorithms do not have the set-cover independent property since certain information about the network is employed in generating the second test sequence. 6 .4 .1 A d a p tiv e A lg o r ith m A 1 1. Apply a maximal independent set (Sm )• Collect and analyze the responses. Stop if no faults are detected. 2. Partition the nets into two groups. The partitioning is done as follows. For a net Wi,i = 1, ...,n , if (a) S T V i = S R V i and (b) S R V i is unique, include W { into Group 0, else Group 1. 163 3. Apply a walking ones sequence Sf to all nets in Group 1, and all-0 vectors to all nets in Group 0. The objective of the first sequence is to achieve the self-diagnosis property by elim inating the aliasing syndromes. The maximal independent set is the minimal size test set th at can achieve this objective [16]. The num ber of PTV s required by the first sequence is p, where p is the smallest integer satisfying ^ \ vj 2\ — n i an<i Cj^/2 j represents possible combinations choosing [p/2j item s out of p items. The total number of PTVs required by this algorithm is p -f F , where F is the number of nets in group 1. T h eorem 3 The test set derived from Algorithm A1 achieves maximal diagnosis. Proof: Let Wo and W\ be the set of nets in group 0 and 1, respectively, after the completion of step 2. Let D 0 i , R 0 l and D 0 j , R o j (i ^ j ), be the set of drivers and receivers of nets w 0 i and W o j , respectively, where w o i and w 0 j E W0. Similarly, let D u , R u and D \ j , R i j (i 7 ^ j), be the set of drivers and receivers of nets W u and W \ j , respectively, where w u and w \ j E W\. There are six types of connections need to be checked, namely (Do*, R o i ) , ( D o i , F q j }? ( D 0 l , R i j ) , ( D u , R u ) , ( D ji , R o j )? ( ^ n , R i j ) , for all / j . The connections (Doi,Roi) do exist since (a) in the sequence (Sm ), it is impossible to find a subset from S — {STLo;}? whose wired-OR result equals S T V o i , (b) S T V o i = S R V o i and (c) S R V o i is unique. The connections (DoiRij) do not exist for the following reason. For each i, if the connections exist, S R V o i must equal S R V i j since there is no open on w 0 i . However S R V o i is unique, so we know that these connections do not exist. The connections (Doi, Roj),Vi j do not exist since the existence of them will make S R V o i — S R V q j , which contradicts the fact th at S R V o i 7 ^ S R V o j . The connections ( D u , R u ) , ( D u , R o j ) and ( D u , R \ j ) are checked by the walk ing ones sequence Sf- 164 Since the existence of all connections can be determined, Algorithm A1 achieves maximal diagnosis. □ E xam p le 6-2: Let W be a network of four nets, where w\ and w 2 are two nets in group 0 and w3, w 4 are in group 1. Let be the test generated by Algorithm A l, S ai = (Sf ,S m ) = M aximal diagnosis can be achieved in this example since the existence of all con nections can be determined. The efficiency of this algorithm depends on the value of F. In the case when F ~ n, the number of PTVs is close to th at of a walking ones sequence. ' 0 0 1 1 0 o 1 0 0 1 0 1 0 0 1 1 0 0 1 _ 1 0 0 1 1 0 6 .4 .2 A d a p tiv e A lg o r ith m A 2 The second adaptive algorithm follows. 1. Apply a maximal independent set (S m )• Collect and analyze the responses. Stop if no faults are detected. 2. Partition nets into Group 0 and 1 (as in Algorithm A l). 3. Partition nets in Group 1 such th at all nets with the same S R V are in the same group. Number these new groups from 1 to G. Let K be the cardinality of the largest group. 4. Apply a walking ones sequence Sg to all groups in parallel except to Group 0 for which the all-0 vectors are applied. 5. Apply another walking ones sequence Sk across groups. T hat is, all nets in the same group are modeled as a single net. Again the all-0 vectors are applied to all nets in Group 0. The number of PTVs is G. 165 The total number of PTVs for Algorithm A2 is p + G + K . In general this" algorithm requires fewer PTVs than Algorithm A l. This is because in Algorithm A l the F nets in group 1 are now partitioned, into G groups in Algorithm A2. In fact, F -(-1 G T K . T h eorem 4 The test set derived from Algorithm A2 achieves maximal diagnosis. Proof: Let Wq be the group of fault-free nets and Wx, Wy be any two of the G groups formed in step 3. Let Dai, R ai and Daj , R a j (i ^ j), be the set of drivers and receivers of nets wai and waj, respectively, where tt> a, and waj 6 Wa,a = 0,x ,y . There are twelve types of connections to check, namely (Doi, Roi), (Doi, Roj), (Doi,Rxj), (Doi, Ryj), (Dxi,Rxi ), (D xi, Roj ), ( D xi , Rxj ), ( D xi , Ryj ), ( D yi, Ryi ), (Dyi, Roj), (Dyi, Ryj), (Dyi, Rxj), Vi / j. The connections (D 0i,Roi), (D 0i,Roj), (Doi, Rxj ) 5 (Doi, Ryj ), (D xi, Roj ) 5 (Dyi, Roj) are checked by Sm for the reasons stated in the proof of Theorem 3. The existence of connections (Dxi,R xj) or (Dyj , R xj) or both is denoted by (Dxi\Dyj, R xj). After the application of S g the existence of (Dxi\Dyi, R xj) can be determined. The S k can then easily distinguish between the three possible cases covered by (Dxi\Dyj, R xj). Thus, with the application of both S q and S k , the existence of the remaining six type of connections can be checked. Both Conditions C l and C2 are verified since all the twelve types of connec tions are checked. This concludes that Algorithm A2 achieves maximal diagnosis. □ E xam p le 6-3: Let W be a network with seven nets. After the application of Sm, three groups are formed. Let wi, w 2 be in group 0, W 3 , w 4 be in group 1, and w5, w& , W 7 be in group 2. In this example n = 7, G = 2 and K = 3. According to Algorithm A2, the total test set Sa 2 consists of a maximal independent set of 5 166 PTV s (Sm ), followed by 3 walking ones PTVs (Sq), which again are followed by another 2 walking ones PTVs (Sk )- ---- , o 0 0 0 0 1 1 0 0 0 ' 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 1 0 0 1 0 1 0 1 0 0 0 1 l 0 0 0 1 0 1 1 0 0 i 0 0 1 0 0 1 0 1 0 _ l 0 1 0 0 0 1 0 0 1 i 6 .4 .3 C o m p a riso n w ith O th er A d a p tiv e A lg o r ith m s i I Several adaptive algorithms have been proposed previously. In the following these algorithms will be briefly reviewed and compared to Algorithms A l and A2. M eth o d 3: [16] This algorithm first applies a counting sequence; and, based on the initial results, a second sequence is applied. The STV of the second sequence represents the number of O s in the corresponding STVs of the first sequence. The purpose of the second 1 j sequence is to make sure that the overall test set S is independent. This algorithm i can achieve self-diagnosis. No confounding syndromes can be identified. Also, the jfault fl in Figure 6.7 cannot be detected by this algorithm. jW -T est A lgorithm : [25] |This algorithm is similar to Algorithm A l except th at the first sequence is a count ing sequence S c , i.e., in the W-Test algorithm the test set consists of only S c and Sf - We next show an example in which the W-Test Algorithm cannot achieve maximal diagnosis. P art of a wiring network is shown in Figure 6.9, where the counting sequence S c has been applied. For i = 1,2 S T V i = S R V i and S R V i is unique, thus both w\ and w 2 will be put into group 0. This means that the opens fl, f2 and the short 167 1011 0011 1001 0001 0010 W \ f l f 3 w 2 f 2 w4 1011 0011 1111 1111 1111 Figure 6.9: Deficiency in the W-Test Algorithm. f3 will not be identified since during the application of a walking ones sequence all nets in group 0 will be kept at 0. Therefore, to avoid putting a net into group 0 by mistake, it is necessary to apply an independent set that can achieve the self-diagnosis property. Both Algo rithm A l and A2 will not put tci and w 2 into group 0. Thus the faults associated with them can be identified by either Sp or (Sg, Sk )- C -T est A lgorithm : [34] The C-Test Algorithm first applies a counting sequence, then based on the analysis of the syndromes, one or more PTVs are applied. P art of a wiring network is shown in Figure 6.10, where a counting sequence has been applied. In the C-Test Algorithm the short faults f3 and f4 can be iden tified immediately to within an equivalence class. However, since no aliasing or confounding syndromes are related to SRVi = 1011, the faults fl and f2 cannot be identified. Using the same example, the diagnosis sequence in both Algorithm A l and A2 will apply a walking ones sequence to w 2l w ^ w /i since they are in the same group. Thus all faults in the network can be identified. 168 1011 1011 f l U > 2 0011 0110 f 2 0111 f 3 0111 f4 W 4 0111 0101 Figure 6.10: Deficiency in the C-Test Algorithm. 6.5 D ia g n o sis U sin g S tru ctu ra l In fo rm a tio n Up to this point all the test methods presented assume a net can be shorted to every other net in a network. In practice, however, one net can only be shorted to a set of neighboring nets. This set of nets is called its neighbors. The size of the neighboring nets is usually much smaller than the number of nets in the network. Therefore, using neighborhood information it is possible to generate a reduced test set that can still achieve maximal diagnosis. Other researchers have considered using neighborhood information [16, 68], but do not obtain maximal diagnosis because they do not consider that both opens and shorts can be associated with the same net. A one-step diagnosis algorithm that incorporates the neighborhood informa tion is presented below. Let W = {ic1;..., wn} be a network under test, and let Nbr(w{ ) C W be the set of the neighboring nets of Wi. The algorithm for constructing a test set that can achieve maximal diagnosis in one-step is as follows. A lgorith m A3: 169 1. Construct a neighborhood graph N G — ( Fl , f?i) as follows: (1) For each Wi £ W there is a corresponding Vi £ Vi, (2) E\ = {e|e = (Vi,Vj),Vwj £ N b r ( w i ) , \ / w i } . 2. Construct an augmented neighboring graph A N G = ( V , E ) as follows: (1) V = V i , ( 2 ) E = E t U E 2 , where E 2 = {e|e = (Vj,Vk),Vvj,Vk £ N b r ( w i ) , V v i £ V i } - 3. Label each node u, of the A N G with a color C o l o r ( v i ) such th at C o l o r ( v i ) ^ C o l o r ( v j ) if e = ( v i , Vj) £ E and that the number of colors c is minimal (this num ber is also referred to as the chromatic number of the graph). Let the colors used be C i , ..., C c . 4. Associate each color Ci with a unique c bit binary vector CVci,i = such that only one bit in each vector is a 1. 5. Associate each net Wi £ W with a vector S T V i such that S T V i = C V c 0 i o r ( v i ) - The test set 5 = (STVi, . . . , S T V n ) T . The problem to be solved in step 3 is NP-Complete [24]. In general, solving this problem requires an exponential time algorithm. The algorithm listed below, referred to as Algorithm Coloring, can be used to solve this problem. Some efficiency is achieved by pruning the search space. This algorithm makes use of Algorithm M I S presented in chapter 5. A lgorith m Coloring: 1. Construct a weighted graph G 2 = ( V 2 , E 2 , C 2 ) such th at (1) V 2 — V , (2) E 2 — E , i . e . , E 2 = {e|e = ( v i , v j ) E , V i , V j £ E }, (3) C 2 = {ct - = 1,V*}. 2. Apply Algorithm MIS using G 2 as input. Let c be the size of the solution set. The size of the maximal independent set of G 2 equals the size of the maximal clique of G . Therefore the chromatic number of G is c. 3. For nodes u i,...,u „ , assign a color for each of them. This process can be done in 0 (nc) tim e since the chromatic number c is known. 170 T h e o re m 5 The test set derived from Algorithm A S achieves maximal diagnosis for W and is minimal in size. P ro o f: First, it is necessary to show that S can achieve maximal diagnosis. From the way the test set is generated in Algorithm A3, it is clear that the STVs of a net Wi and its neighboring nets Nbr(wi) contain a walking ones subsequence. All opens faults in this set of nets and all shorts between any two net in this set are diagnosable. The same argum ent can be applied to each net W { € W . Thus S achieves maximal diagnosis for the network W. Second, it is necessary to show that S is of minimal size. The num ber of PTVs in S is c, which is the minimal number of colors required to color the ANG. For any test set with less PTVs than S, there exist at least one net Wi and its neighboring nets Nbr(wi) that cannot be assigned a different CV. Hence the faults in this set of nets cannot be fully diagnosed. Thus S is a minimal test set th at can achieve maximal diagnosis for the network W. □ E x a m p le 6-4: Let W = {uq, ...,u>6} be a network under test. Let Nbr(w 4) = {u;2}, Nbr(w2) — {uq,u>3}, Nbr(w3) = {w 2,w 4}, Nbr(w4) = {w 3 ,w 5}, Nbr(w 5) = {u?4, w6} and Nbr(w6) = {to5}. The neighborhood graph NG is shown in Fig ure 6.11(a) and the augmented neighborhood graph ANG is shown in Figure 6.11(b). It is found that the chromatic number of ANG is 3 and one solution to the coloring problem is shown in Figure 6.11(c). The colors used in the graph are R, G and B. v 4 v 4 v 4 (a) (b) (c) Figure 6.11: Example 6-2: (a) the NG, (b) the ANG, (c) the colored graph. 171 Let the vectors associated with the colors be CVr=(100), (7Vg=(010) and (7Vb=(001). Then the minimal test set S that can achieve m aximal diagnosis is as follows. Note th at there is a nets Nbr(wi). Therefore, network W . 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1 talking ones sequence it is clear that S ac each net W { and its neighboring ves maximal diagnosis for the 172 C h a p ter 7 In te rc o n n ec t T est S ch ed u lin g Many digital systems will soon be built with ICs that conform with the IEEE 1149.1 boundary scan architecture. Due to the hierarchical nature of such systems, they may contain many boundary scan chains. These chains can be used to test the system, subsystem and board interconnect. To reduce test time, the application of test vectors to these scan chains must be carefully scheduled. This chapter deals w ith problems related to finding an optimal schedule for testing interconnect. This problem is modeled using a directed graph. The following results are obtained: 1) upper and lower bounds on interconnect test time; 2) necessary and sufficient conditions for obtaining the optimal schedule when the graph is acyclic; 3) sufficient condition for obtaining the optimal schedule when the graph is cyclic; and 4) an algorithm for constructing the optimal schedule for any graph. 7.1 In tro d u ctio n Testing interconnect between I/O pins of ICs on a printed circuit board is facilitated by including boundary scan in the design of the ICs. In this chapter the term boundary scan is referred to the IEEE 1149.1 boundary scan architecture [33]. For this architecture a boundary scan cell is associated with each I/O pin. A boundary scan register in an IC is formed by concatenating these cells. The logical values of the input (output) pins of an IC can be observed (set) by using the boundary scan ________________________________ 173 — . . . . . — —-— — — i I. , , register. By using the boundary scan registers a net connecting two I/O pins can be tested by scanning in test vectors and observing test results. The impact of boundary scan on board test has been reported in [28, 50, 54]. This technique provides many benefits such as enhanced diagnosis, reduced test- repair looping, standardized testing, and reuse of tests. These benefits can be extended to the test of other levels of assembly of a system provided that the boundary scan philosophy is followed. In many cases multiple boundary scan chains exist in a complex system com posed of subsystems which, in turn, are composed of modules and boards. There fore, when testing interconnect between these subassemblies, m ultiple boundary scan chains (BS-chains) are used. Test vectors can be applied to these BS-chains by using a test controller. To apply a test vector, the test controller m ust first par tition the vector into segments and associate each BS-chain with a vector segment. The test controller then sequentially selects a BS-chain and applies the associated vector segment to it. The order in which chains are selected can significantly impact total test time. According to the IEEE 1149.1 protocol, for each scan operation the test result segment is loaded (in parallel) into the BS-chains immediately before a new test vector segment is shifted in. This result segment can be shifted out while shifting in a new test vector segment. The selection order is im portant since it determines the correctness of the test procedure and the overall test time. The generation of tests to detect and diagnose interconnect faults on a board is discussed in [25, 29, 30, 34, 37, 64, 68]. In this chapter it is assumed th at a set of test vectors has been generated and stored in a memory unit. Our main objective is to apply these test vectors so that the interconnect is correctly tested in minimal time. This objective can be achieved if the optimal schedule is found. For each test vector the schedule determines the order for applying test vector segments to the BS-chains. This implies that the interconnect is correctly tested by a test controller executing the schedule. If the optimal schedule is obtained, all test vectors can be applied in minimal time. In this chapter several theorems which aid in identifying how to construct an optim al schedule are presented. It is shown that the optim al schedule can be 174 achieved if a proper order is followed in applying test vectors to the BS-chains. An algorithm for deriving an optimal schedule is presented. This schedule can be executed by the Module Maintenance Controller (MMC) developed at USC [43], or by the Scan Bus M aster (SBM) developed by Texas Instrum ents [63]. The test tim e is greatly reduced using an optimal schedule. A reduction in the range of 30 to 50 percent has been achieved for the examples considered so far. From the way the problem is defined, the least upper bound in the reduction in test tim e is 50 percent. The algorithms presented can also be applied to schedule tests of chips that are designed with full scan capability where the chips have more than one internal scan chains. Typical examples are chips designed with both the boundary scan architecture and the multiple scan chain technique, such as the CBT m ethod and the MAST m ethod [22, 57]. 7.2 T estin g M o d el Every chip has a boundary scan register since it is assumed th at all chips under test have the boundary scan architecture. A scan chain is formed by cascading the boundary scan registers of various chips during test mode. The length of the scan chain is the number of scan cells in the boundary scan registers. All cells in a scan chain share the same control inputs (except the Mode line) provided by the test controller. Therefore all scan cells in a scan chain always operate in the same mode. The control model of a boundary scan register is shown in Figure 2.6. The control inputs of the scan chain are assumed to be activated in a fixed order (see Figure 2.5), namely an activation of CaptureDR followed by zero or more activations of S h iftD R , and lastly an activation of UpdateDR. A test controller m ust follow this order to access a scan chain. Also, only one scan chain can be accessed at a time. These assumptions are in accordance with the IEEE P1149.1 standard. A special function is defined which, when executed by a test controller generates the proper sequences of control signals. This function is denoted as Scan(chainID, vecS, resS), where chainID is the chain selected to be scanned, 175 vecS is the new data string that will be shifted into the scan cells, and resS is the data string originating from the scan chain that will be shifted out of it. Once this function is executed by a test controller, the logic values on the D inputs of the scan chain are collected as a string resS, and the Q outputs of the chain are updated to take on the values specified by the data string vecS. A test schedule represented by this function can be executed by a Test Channel operating in the DTUR mode [44]. G rap h M o d e l for T estin g In terc o n n ec t 7— 1 chain 1 I = > - ch ain 2 - -s>- (a) (b) Figure 7.1: Test interconnect via two boundary scan chains; (a) block diagram, (b) graph model. Figure 7.1(a) shows interconnects that can be tested via two boundary scan chains. Each scan chain is formed by cascading the boundary scan registers of two chips. Figure 7.1(b) shows a directed graph that is used in scheduling tests. Each node represents a scan chain. Each edge represents the d ata flow direction between scan chains. For example, an edge exist from node 1 to node 2 because an output pin in scan chain 1 drives an input pin in scan chain 2 - namely signal A. 176 Each net can be represented by one or more edges in the graph. For example, m ulti-input nets, m ulti-output nets and busses may all map into more than one edge in this graph model. When a bus structure is used, many edges may exist in the graph. To avoid dealing with a complete directed graph, the testing of a bus is carried out in several phases. During each phase, the bus is modeled as a single-input m ulti-output net. T e st C o n tro ller M o d el ■ s- T D O TDI Test ** TM S1 ^ TMS2 C on troller TMSn Figure 7.2: The test controller model. A model for the test controller used in this work is shown in Figure 7.2. The controller contains a data port, which includes a TDI line and a TDO line, and several TMS lines labeled as TMSi, i= l,...,n . Only one TMS line is selected at a tim e. By controlling the selected TMS line and the data port, the test controller can send and receive information to/from the selected scan chain. The non-selected TMS lines are either (1) set high, which sets the associated scan chain to the “reset” state, or (2) set low, which sets the associated scan chain to the “run test/idle” state. In the latter state, the chips under test in th at chain can execute a self-test. The controller can only alter the value of one TMS line at a time. At least two counters are included in the test controller. These counter are used to control the transmission of information between the data port and the selected scan chain. In addition, the controller can execute the scan function on a selected scan chain. 177 7.3 T h e P ro b lem In this section the reasons for having multiple scan chains in testing interconnect are first presented. The scheduling problem associated with the application of tests via these scan chains is then identified. Solution methods for this problem will be presented in the next section. 7.3.1 T h e U se o f M u ltip le S can C h ain s The system test hierarchy for a hierarchically testable and m aintainable system has been discussed in [12, 27, 43]. W ith reference to the system discussed in [12, 43], each system has a system maintenance processor (SMP). Each subsystem, which consists of two or more modules, has a subsystem maintenance processor (SuM P). Each module (board) has a module maintenance controller (MMC). Each chip has an on-chip test controller (CMC), which can be as simple as a Test Access Port (TAP), plus a boundary scan register. Test busses are required to connect these controllers. The issue of checking these test busses is not dealt with in this chapter; only the testing of functional interconnect is considered. Three classes of interconnect exist, namely system interconnect, subsystem interconnect and board (module) interconnect. A net can have one or more drivers and one or more receivers. These drivers and receivers can be located in one or more chips. If these chips are all located on the same board within a subsystem then this net belongs to the class of board interconnect. If these chips are located on different boards then this net belongs to the class of subsystem interconnect. If these chips are located on different subsystems then this net belongs to the class of system interconnect. Figure 7.3 shows nets belonging to the different classes. Two control schemes, referred to as distributed control and centralized con trol, exist in testing a net connecting two or more units, where a unit can be either a subsystem, a board or a chip. The distributed control scheme is shown in Figure 7.4(a), where a net is tested by two local controllers LCl and LC2 under the control of another controller C l. C l first instructs LCl to execute a Scan function in order 178 system backplane ^subsystem backplane board board board ^ subsystem backplane Board In terco n n ect: i Subsystem Interconnect: j System In terco n n ect: k Figure 7.3: Different classes of interconnect. 179 to set net i to a logic value. It then instructs LC2 to read the value on the net via another Scan function. This value is then checked by C l. The centralized control scheme is shown in Figure 7.4(b), where the testing of the net is directly controlled by C2. C2 execute a Scan function to set net i to a logic value, and then executes another Scan function to read the value of the net. The test bus is configured in a star configuration so that unit 1 can be tested even if unit 2 is removed. In both schemes the net is tested via two scan chains. ■ y - -£>*■ LCl u n it 1 1 u n it _____i net i , LC2 - O i - i i i u n it 2 Cl net l u n it 2 (a) (b) Figure 7.4: Two schemes for testing interconnect; (a) distributed control, (b) cen tralized control. Because of the hierarchical nature of a system, it is very common to test interconnect, especially system and subsystem interconnect, using m ultiple scan chains. If a unit represents a chip, then Figure 7.4 represents a board having several chips that support boundary scan. In Figure 7.4(a) each chip has a fairly complex test controller. In Figure 7.4(b) each chip has a minimal test controller. It is believed th at most designs will conform to the latter case. Several scan chains can be used to reduce test time. Chips associated with the same scan chain can be tested either in a sequential manner or concurrently. However, in the latter case the test procedure can be quite complex and inefficient. On the other hand, chips associated with different scan chains can more easily be tested concurrently and usually in an efficient manner. Therefore, it can be concluded th at, in many 180 cases, interconnect is tested using multiple scan chains regardless of the class of interconnect. 7 .3 .2 S ch ed u lin g P r o b le m in T estin g In te r c o n n e c ts The problem related to scheduling vectors to test interconnect is considered next. This problem is illustrated using both distributed and centralized control schemes. We assume that the test vectors have been form atted to conform with m ultiple scan chains. C entralized C ontrol Schem e The scheduling problem is modeled using a directed graph. Figure 7.5 shows several directed graphs and their associated test schedules. Each test schedule consists of several Scan functions, where tij (rij) represents the ith test vector (result) segment for scan chain j . For clarity and simplicity it is assumed th at each schedule consists of only two test vectors. In simple cases such as those shown in Figure 7.5 (a), (b) and (c) the schedule is easy to construct. For example, in Figure 7.5(b) a test vector is only applied to scan chain 1, while a result vector consists of two segments collected from scan chains 1 and 2, The first function S c a n (l,titi, —) loads chain 1 with the first test vector. The second function Scan(2, — , r 1)2) gets the result segment associated with the first test vector out of scan chain 2. The third function Scan(l, i2ii, ^1,1) gets the result segment associated with the first test vector out of scan chain 1 and simultaneously loads the second test vector into scan chain 1. The fourth function Scan(2, — , r 2 > 2) gets the result segment associated with the second test vector out of scan chain 2. Finally, a fifth function Scan(l, — , r 2,i) is used to scan out the result segment associated with the second test vector in scan chain 1. The situation is more complex for the cases shown in Figure 7.5 (d), (e) and (f), where it is more difficult to construct a test schedule. In general, if the graph contains many nodes with complex connectivity, finding a “good” schedule is difficult; the problem of finding an optimal schedule is NP-Complete. 181 (a) 0 - 0 Scan(l, i 1(i, - ) Scan(2,- , r h2) Scan(l, < 2,i, ~ ) Scan(2, r2,2) S ca n (l,< i,i,-) Scan(2, —, r1 > 2 ) Scan(l,t2 ,i»ri,i) S c a n (2 ,-,r 2)2) S c a n ( l,- ,r 2ti) (c) (d) Scan(2, ti,2, - ) Scan(l,<i,i, - ) Scan(2,f2i2,r 1)2) S ca n (l,t2ii,r M) Scan(2, — ,r2,2) Scan(l, r2ji) 0 = 0 Scan(l,<i,i, - ) Scan(2,<i,2,ri,2) S can (l,t2)1,rlji) Scan(2, i2,2, r2i2) Scan(l, - , r 2,i) Ce) C P = © Scan(l, <i,i, - ) Scan(2,<i> 2 ,r i> 2 ) Scan(l,<2,x,ri,i) Scan(2, < 2j2, r2,2) Scan(l, —, r2,x) (f) Scan(l, < 1,1, - ) Scan(2,<i,2, - ) Scan(2,<i,2,ri,2) Scan(l, < 2 > i, ri,i) Scan(2, <2)2, - ) Scan(2, < 2,2, r2,2) Scan(l, - , r 2,i) Figure 7.5: Deriving test schedules for several examples. 182 U istr ib u te d C ontrol Schem e In Figure 7.4(a) the execution of Scan functions by LCl and LC2 can be either synchronous or asynchronous. In the former case, the scheduling problem does not exist. The interconnect can be tested by letting LCl and LC2 execute the Scan functions at the same time. In the latter case, LCl and LC2 must execute the Scan functions in sequence. The order in which LCl and LC2 execute this function is im portant since it dictates the correctness of the test and the overall interconnect test time. In general it can be difficult to synchronize LCl and LC2 because 1) Cl cannot send data to both LCl and LC2 at the same; 2) clock skew between LCl and LC2; and 3) difference in the length of the scan chain between LCl and LC2. Therefore, the scheduling problem exists in this distributed control scheme. Since the scheduling problems for both control schemes can be modeled in the same way, only one of them will be considered. For the rest of this chapter, the model under discussion is assumed to be the testing of board interconnect using the centralized control scheme shown in Figure 7.4(b). The theorems and procedures th at lead to the generation of an optimal schedule are presented in the next section. 7.4 O p tim a l T est S ch ed u lin g T h eo rem s Throughout this section we shall use the following notation. Let B be a board containing n BS-chains, t\ , ... ,tm be the test vectors for testing the interconnect on B : and r l5 ..., rm are the test results associated with <i,. .. ,tm, respectively. Each test vector ti is partitioned into n segments t n , ... ,tin in accordance with the n BS-chains. Each vector segment is then applied to its associated BS-chain. The result Ti consists of segments r,i,..., rin collected from the n BS-chains. A BS-chain j is sc a n n e d if tij (r8J) is to be applied (observed). A vector (result) segment is said to be re q u ire d if it is in the test (result) vector and it has not yet been applied (observed). 183 D e fin itio n 1 A sch ed u le for testing interconnects on a board B , denoted b yS (B ), is a sequence of Scan functions that can apply all test vectors t i , ... , t m and collect all test results r% ,..., rm. The total number of shift operations of a schedule S (B ) is denoted by N s(B ). This number is a good measure of the test tim e since most operations of a Scan are shift operations. This notation is used throughout the chapter. Also the argument B will not be used if the meaning is clear. D efin ition 2 A schedule S is optim al if Ns is minimal, i.e. given a schedule Si for B , N s < NS l. D efin ition 3 For a pair of BS-chains Vi,vj, a schedule S has the scan order Vi > ~ Vj if for each test vector tk (k = 1, ...,m ) that contains vector segments tki and tkj for Vj and Vj, respectively, S applies t^i to Vi before applying tkj to Vj. D efin ition 4 Let V { and Vj be two BS-chains on a board B . I f there is a net connecting Vi and Vj whose logic value can be set by a scan cell in Vj and observed by a scan cell in Vi, then Vi depends on vj, denoted by ViDvj, otherwise, V { does not depends on Vj, denoted by n4 D vj. D efin ition 5 Let X and Y be two sets. X + Y is the union of these two sets, X — Y is the difference of these two sets and consists of all elements in X which are not in Y . D efin ition 6 A dep en d en cy graph DG for a board B is defined as a 3-tuple (V, W,E), where V is a set of nodes, W is a set of labels associated with the nodes, and E is a set of edges, such that (1) for each BS-chain c in B , there is a corre sponding node vc £ V, (2) for each BS-chain c in B, there is a w c € W representing the length of the BS-chain c, and (3) for every pair of BS-chains u ,v, there is a corresponding edge e = (u ,v ) 6 E if and only if vDu. Two types of nodes exist in a DG, namely type-I and type-II nodes. Type-I nodes refer to those nodes that have a self-loop; Type-II nodes refer to those nodes 184 that don’t have a selT-loop. The set of all type-I nodes is denoted as V/7~The set of all type-II nodes is denoted as V//. Thus V = Vj + V//. D e fin itio n 7 DG' = (V ',E ',W ') is the re d u c e d fo rm of DG = (V ,E ,W ) with respect to U, denoted as DG' = DG±U, where V' = V - U , W ' = W - { w i\v i € U} and E' = E — {e|e = (V { , V j ), (uj,ut) or (V { , V k ), where V i , V k € U , V j € V'}. Let Vi (V//) be the set of all type-I (type-II) nodes in DG, DG i = DG±.Vn and D G u = DG-LVj. D efin ition 8 DG is ty p e-I acyclic if DGi is acyclic when all self-loops are ig nored. DG is ty p e -II acyclic if D G u is acyclic. DG is ty p e-a cy clic if DG is both type-I acyclic and type-II acyclic. D efin ition 9 Let DG be type-acyclic. A schedule S is ty p e -I proper if for every pair of nodes Vi,vj € DGj, 1) vjDvi, and 2) S has the scan order vx > - Vj (see Figure 7.5(a)). A schedule S is ty p e-II proper if for every pair of nodes u;, vj e D G u , 1) ViDvj, and 2) S has the scan order v, > - Vj (see Figure 7.5(c)). A schedule S is proper if 1) it is both type-I proper and type-II proper and 2) for every pair of nodes V { € V}, vj € Vu, S has the scan order vl y- v 3 (see Figure 7.5(b)). Note th at if DG is not type-acyclic, it is impossible to find a schedule that is proper. For example, the schedules in Figure 7.5(d) and (f) are not proper. Let DG be type-acyclic, K be the cardinality of Vj, and M be the cardinality of V/j. Let T\ = (t>i, V 2 , ... , v k ) be a topological order for nodes in DGi when self loops are ignored and T2 = («i, u2, ■ ■., «m) be a topological order for nodes in DGu- T h eorem 6 Let DG be type-acyclic. A schedule S is optimal iff S is proper. 185 Proof: (*/part) Let S be a proper schedule. Suppose that S is not optimal. There exists at least one node V { E V such that either 1) V { E Vj and the test vector segment for V { is applied twice for each test vector, or 2) V { E Vu and there is at least one scan operation th at does not contain both the required test vector segment and the required result segment. In case 1, there exists at least one node Vj E Vj such th at vfDvi and S has the scan order V { > - Vj. This implies that S is not type-I proper. In case 2, there exists at least one node Vj E Vu such th at vJD vl and S has the scan order v} > - vt. This implies that S is not type-II proper. Both cases lead to the conclusion th at S is not a proper schedule. This contradicts the fact that S is a proper schedule. This proves the if part of the theorem. {only */part) Let S be an optimal schedule. Suppose that < 5 is not a proper schedule. If S is not type-I proper, then there exists Vi,Vj E Vj such that 1) VjDv,-, and 2) S has the scan order Vj > - Vj. This means that u, must be scanned twice for each test vector, while all other Vj need only be scanned once. It is possible to find another schedule S i th at has the same scan order as S except that S i has the scan order Vj > - u ,-. This implies that Nsx < N s , thus S is not optimal. This leads to a contradiction and thus proves the only if part of Theorem 6. □ C orollary 1 Let DG be type-acyclic. If S has the scan order (vk V 2 > - V i y- Ui > - U 2 >- ... > - um), then S is optimal. Proof: From Definition 9 it can be concluded that S is proper. By Theorem 6 S is an optim al schedule. □ Procedure P I, which is based on Corollary 1, constructs an optim al schedule S for a type-acyclic DG. In this procedure, Tj — (v1 ? u2, ..., vk) is a topological order for nodes in DGj when self-loops are ignored, and T 2 = (ui, U 2, ..., % ) is a topological order for nodes in DGu. P roced u re P I: 186 (1) For i = 1 to to do (1.1) For j = K down to 1 do Scan(vj,titj,r i-ij). (1.2) For j = 1 to ill do Scan(uj,tij,r{j). (2) For j = K down to 1 do Scan(vj,x,rm > j). □ Since the function Scan(vi,tj,rj) contains wt shift operations, from Proce dure P I it is obvious that N s = (m -f 1) * ^ 2 Wi + m * Wi I'iC V j Vi gV>7 = Y1 Wi + m * ^2 Wi (7.1) vieVi vi€V = N ac (7.2) L e m m a 7 [Lower Bound] Let S be a schedule for DG. Ns > L B = m * f2V iev wi- P ro o f: From Equation (1) it is obvious that Ns = m * ^2V iev wi when V = Vu and D G is acyclic. For this case a schedule can be found such that each scan chain is scanned exactly once for each test vector. It thus follows th at a schedule S for any other DG has a greater or equal number of scan operations. □ If a given DG is not type-acyclic, then Theorem 6 cannot be applied. This type of DG is dealt with next. D e fin itio n 10 A ty p e -I cycle in DGj is a directed cycle that consists of two or more type-I nodes. A ty p e -I I cycle in D G n is a directed cycle that consists of two or more type-II nodes. A cycle C in DGi (D G n) is b ro k en if a node v £ C is removed from DGi (D G n). Removing v from DGj is represented as D G j± {v} (Definition 7). In a schedule < S , the removal of a node V { from DGi is achieved by scanning the vector segment into chain i twice for each test vector. The first scan operation loads a vector segment into the scan chain. The second one gets the result segment out 187 of the scan chain while loading the same vector segment back into it. Thus node V { provides a correct test vector segment throughout the application of the rest of vector segments. Hence it can be removed from DGn The removal of a node Vi from D G u is achieved by putting Vi into the same category as type-I nodes when applying the vector segment for Vi. This enables the vector segment in scan chain i to remain valid throughout the application of the remaining segments of the current test vector. Thus node Vi can be treated the same as a type-I node. Let DGi = DGDVu and D G n — DGA-Vj be two subgraphs of DG. D efin ition 11 If DG = (V ,W ,E ) is cyclic and DG' — D G ± Z is acyclic when self-loops are ignored, then Z is called a feedback vertex set (FVS) of the DG. For example, in Figure 7.5 (d), Z = {2} is a FVS of the DG. D efin ition 12 Given a DG—(V,W,E), a set of nodes Z C V is a m inim al FV S of DG if for any Z' C V that is a FVS of DG, Eu.ezw* < wi- D efin ition 13 Let Zi C Vj, Z u C Vu be two sets of nodes in DG. If (1) DGi is cyclic and DGiA-Zj is acyclic when self-loops are ignored, and (2) DGn is cyclic and both DGn-LZn and DG±(V — (Z u + Vi — Z i)) are acyclic, then (Z i,Z u ) is called a join t-F V S of DG. For example, in Figure 7.6(a), Zj = {6} and Z u = {1} is a joint-FVS of the DG. D efin ition 14 Given a DG, (Zj,Zu) is called a m inim al jo in t-F V S of DG if (1) (Zi, Zu) is a joint-FVS of DG, and (2) for any (Z\, Z 'H) that is a joint-FVS of DG, (m - 1) * EwigZj wi + HvieZu wi < (rn - 1) * E«iezj wi + wi- Let (Zi, Zu) be a joint-FVS of DG. Let T\ = («i, U2, ..., u k ) be a topo logical order for nodes in DGJ-(V — (Z u + V/ — Zj)) when self-loops are ignored, T 2 = (ui,i>2,. -- 5^ 2) a topological order for nodes in D G L (V — (Vu — Zu)), and T 3 = (y\,y 2i ■ • • iVzi) be an arbitrary order for nodes in DG±.(V — Zj). Also 188 let K , M , z\ be the cardinality of Z u + V j~ Zj, Vu — Z u , and Zi, respectively. A schedule S for DG can be constructed as follows. P roced u re P2: (1) For i = 1 to m do (1.1) For j = K down to 1 do Scan(uj,ti}U j,r i-itU }). (1.2) For j = 1 to z\ do Scan{yj,ti> y -,x). (1.3) For j = 1 to M do Scan{vj,tifVj,r ijVj). (1.4) For j = 1 to zt do Scan(yj,tityj,r ity3). (2) For j = K down to 1 do Scan{uj, x ,r m,Uj). □ Since the function Scan{vi,tj,rj) contains w .; shift operations, it follows that Ns = (m + 1) * ^ 2 wi T 2m * ^ 2 wi + m * ^ 2 wi vi£(Zll+Vl — Zl) ViGZj vi€(Vu— Zji) = (m — 1) * £ Wi + ^2 Wi + £ Wi + m * £ Wi vidZj viGZu vi€Vj Vi6V = (m — 1) * ^ 2 w i + ^ 2 Wi + Nac ( ^ 3) v i S Z j Vi (EZj j Note th at N ac = X^.eVj w% + m * Ylv % ev wi is independent of the selection of {Zu Zu). If DG is type-acyclic then Z\ — Z u = 0. In this case Equation (3) reduces to Equation (1). T h eorem 7 Let (Z j,Z n ) be a minimal joint-FVS of DG. If S is a schedule con structed by Procedure P2, then S is optimal. Proof: The schedule S applies (collects) all required test vector (result) segments, so it is indeed a schedule. We still needed to show th at S is optimal. It is obvious that DG' = D G ± (Z j -f Z u ) is of type-acyclic. Since by construction S is proper with respect to D G ', S is optimal with respect to DG' (Theorem 6). Next we show that S is optim al w ith respect to DG. From Definition 14 it is clear th at for any schedule 189 th at removes another joint-FVS of DG (Z'j, Z'n ), (m — 1) * YlvieZj wi + Hvi£Zu wi — (m — 1) * Y^vitZ'j wi + Ylviez'/j wi- Thus Ns < Ns1■ From Definition 2, we conclude th at S is an optimal schedule. □ C o ro lla ry 2 [Upper Bound] If S is optimal then Ns < U B = 2m * J2V ie v wi- P ro o f: From Equation (3), it is obvious that Ns is maximal when Z j = V, i.e., all nodes in DG are type-I and they must all be removed. But (m + 1) * 'fZVi£{Zu+vI- z I) wi + 2m * YlvieZj Wi + m * Y.vitiVn-Zn) Wi < 2m * J2Vi€V Wi is true for any Z/ and Zu . Thus Ns < 2m * wi■ 1 = 1 Note that one can always construct a schedule for any DG regardless of its connectivity. For example, if Zj = V then the scan order is not im portant. The problem is that the schedule constructed may not be optimal. Theorem 7 provides a way to find an optimal schedule for a board modeled as a cyclic DG. Since an acyclic DG can be viewed as a cyclic DG with an empty F V S , Theorem 7 is applicable to acyclic DGs as well. In this case, both Procedure P I and P2 produce the same schedule. E x a m p le : Optimal schedule for a cyclic DG. Figure 7.6(a) shows a cyclic DG with 7 nodes. There are 4 type-I nodes, i.e., Vj = {4,5,6, 7} and 3 type-II nodes, i.e., Vij = {1,2,3}. The minimal joint-FVS (Zi, Z u ) found is Zj — {6} and Z jj — {1}. Using Procedure P2, an optimal schedule can be constructed (see Figure 7.6(c)). Ns equals (2 — 1) * 90 + 50 + (80 + 90 + 100 + 110)+ 2 *(560) = 1640. □ 7.5 A n A lg o rith m for G en era tin g S ch ed u les A Test Scheduling Algorithm (TSA) based on Theorem 7 is described next. This algorithm can find an optimal schedule for an arbitrary DG. Because of its com plexity (0 (n * 2n)), it may not be suitable for problems where n > 15. For large problems the user can direct the TSA to find a sub-optimal schedule at a reduced 190 (a) m — 2, n = 7 u;i = 50, w 2 — 60, W 3 = 70, W 4 W 5 = 100, wq — 90, to 7 = 110 Vr = {4,5,6,7} Vu = {1,2,3} 3 / = {6} Z u = {1} Vi — Zi = {4,5,7} Vu - Z // = {2,3} Ti = (1 ^ 4 > - 7 X 5) T 2 = ( 2 y 3) (b) = 80 Test Schedule S Scan(5, < 1 ,5 , ■ Scan(7, < 1 ,7 , — Scan(4, <i,4, 5 c an (l, <14, 5can(6, < 1 ,6 , ■ 5can(2,<i,2, r i )2) Scan(3, < 1,3, ri,3) Sccm(6, <i)6, ri,6) Scan(5, < 2 ,5 5 ^1,5) Scan(7,t 2j , ri,7) 5'can(4,<2,4,n,4) 5 c an (l, <2,i, ^1,1) Scan( 6 , < 2,6, — ) Scan(2, < 2 ,2 , r 2,2) 5can(3, < 2 ,3 , > " 2 ,3 ) Sco,n(f), < 2 ,6 , ^ 2 ,& ) Scan( 5, — , r 2,s) Scan(7, — , r 2)7) 5'can(4, — , r 2,4) Sc<m(l, — , r 2,i) (c) Figure 7.6: Example: Deriving an optimal schedule. 191 complexity (0(n * M A X {n , ej)), thereby reducing com putation time. This can be done by replacing Procedure Find-Min-Joint-FVS in step (3) by Procedure Find- Joint-FVS. A lgorith m TSA: Input : A DG — (V, W, E ), V/, Vu and a test set t x , . . . , t m. Output : A schedule < 5 . M ethod : (1) For i= l to m do the following Partition ti into tik {k = 1 ,..., n) as follows. Let the first wi bits of ti be tn , let the next w 2 bits of ti be ti2, let the next Wj bits of ti be tij, let the last wn bits of ti be tin. (2) D G t = D G E V u, DGU = D G lV i. (3) Run Procedure Find-Min-Joint-FVS (or Procedure Find-Joint-FVS). Let {Zi, Z u ) be the derived output. (4) Let DG' = D G ± {V - (Zn + Vr - Zt )). (5) Find the topological order T\ of D G '. (6) Let DG" = D G E{V - (Vn - Zu )). (7) Find the topological order T 2 of DG". (8) Generate a schedule S using Procedure P2. □ A problem is modeled as a DG in TSA. Test vectors are partitioned into segments in step (1). The procedure Find-Min-Joint-FVS is used to find a minimal joint-FVS (Zi, Z u ) for DG in step (3). This procedure is described in more detail later on. In step (4) the DG', which is acyclic, is derived. The topological order of DG' is found in step (5). The m ethod listed in [67] is used to derive the topological order. In step (6) the DG" = D G ± (V — {Vu — Zu)), which is acyclic, is derived. The topological order of DG" is derived in step (7). Finally, Procedure P2 is used to 192 derive a schedule for DG. According to Theorem 7 the derived schedule is optim al if the (Zj, Zu ) found in step (3) is a minimal joint-FVS of DG. Finding Zj, Z u is still a problem. For clarity, the problem is restated as follows. P roblem : Find a minimal joint-FVS. Given a cyclic graph DG = (V, E , IV), find two sets of nodes Z j C P/, Z u C V u such that 1) DG' = (DGA.Vu)A.Zi is acyclic, 2) DG" = (D G ±.V j)±Zu is acyclic, 3) DG'" = DG±.{V - (V/ - Zj + Zu )) is acyclic, and 4) (m - 1) * J2Viezz wi + H v^Zu wi is minimal. This problem can be shown to be NP-Complete. Due to the facts th at 1) the problem of finding a minimal FVS is a special case of the problem of finding a minimal joint-FVS, and 2) the problem of finding a minimal FVS is NP-Complete, it can be concluded that the problem of finding a minimal joint-FVS is NP-Complete by using the restriction technique described in [24], For a small problem, it is possible to compute the optimal solution exhaustively. The following procedure finds a minimal joint-FVS for a given DG. P roced u re F ind-M in-Joint-FV S: Input: DG = (V ,W ,E ), Vj and Vn . Output: A minimal joint-FVS (Z [,Z j i). Method: (1) Let n\ (n2) be the cardinality of V/ {Vu). Let Z' = Z" = 0, m in = a large number. (2) For i = 0 to n x do (2.1) For all C f 1 combinations do (2.1.1) Generate the next combination {U\). (2.1.2) If DGil-U\ cyclic then goto (2.1.1) (2.1.3) else for j = 0 to n2 do (2.1.3.1) For all C"2 combinations do (2.1.3.1.1) Generate the next combination (U2). 193 (2.1.3.1.2) If DG ijA-U2 cyclic then goto (2.1.3.1.1) (2.1.3.1.3) else if DG-L(V — (V) — Zi 4- Z u)) cyclic then goto (2.1.3.1.1) (2.1.3.1.4) else if m in < (m - 1) * E Vi€Zl Wi + E ^eZ // then goto (2.1.3.1.1) (2.1.3.1.5) else min = (m - 1) * J2VieZj + E vi€Zn ™ i, Z' = Ul t Z" = u 2. (3) Zl = Z \ Z ll = Z". □ The complexity of Procedure Find-Min-Joint-FVS is derived next. In the worst case nx + n 2 operations are needed in step (2.1.3.1.4), which can be re peated 2"2 times in one pass of step (2.1.3). Also, step (2.1.3) can be repeated 2ni times in the worst case. So the complexity of Procedure Find-M in-Joint-FVS is 0 (2 ni+n2(r*i + n,2)) or 0(n2n). Since the most tim e consuming step in Algo rithm TSA is Procedure Find-Min-Joint-FVS, the complexity of Algorithm TSA is 0 (n 2 n). For a large problem it is not computationally feasible to use Procedure Find- Min-Joint-FVS. Therefore, a heuristic procedure th at can find a good solution (not necessarily optimal) in a reasonable amount of tim e is needed. A version of Proce dure Find-M in-Joint-FVS that can be used in such cases is described next. P roced u re F ind-Joint-F V S: Input: DG = (V, W, E), V 7 and VH. Output: A joint-FVS (Zj, Z u ). Method: (1) Let n x (n2) be the number of nodes in Vj (V u )• (2) Let Z'n be a F V S of D G n (Use Procedure Find-FVS). (3) D G u i = D G ±(Vn - Z'u). (4) Let Z i be a FVS of DGIU (Use Procedure Find-FVS). (5) Let Z i = Z'j and ZH = Z'n . □ The Procedure Find-Joint-FVS first uses Procedure Find-FVS to derive a FVS (Z'n) for DG /i, then it derives a FVS (Z\) for DG ±(VU - Zu ). Since Z\ 194 and Z'n are derived separately, a minimal joint-FVS cannot be guaranteed. Fur therm ore, Procedure Find-FVS uses a greedy strategy to find a FVS. This means that whenever a node must be removed from a cycle, the one with least weight is selected. The Procedure Find-FVS is presented below. A strongly connected component (see) is a set of nodes th at have directed edges among them , and at least one directed path exists from each node to every other node. One or more cycles exist in a see. If no cycle exists in a directed graph then there is no sec containing more than one nodes in this graph. At least one node must be removed from a see in order to break a cycle. P roced u re Find-FV S: Input: A DG. Output: Z , which is a F V S of the DG. Method: (a) M ark all nodes in V as “white”, and let Z be an empty set. (b) Call H(DG). (c) P ut all nodes marked as “black” into Z. P roced u re H (D G ): (1) Find all sec of DG. Let S C C — { all sec found with |sce| > 1}. (2) If \SCC\ = 0, then RETURN. (3) For each sec € S C C do the following: (3.1) Pick a node ( E sec, such that Wi < Wj,Vvj 6 sec. (3.2) Mark V { as “black”. (3.3) Remove Vi from sec, i.e. sec' = scc_l_{uj}. (3.4) Call H(scc'). □ In step (1) the algorithm given in [4] is used to find all sec of a DG. Procedure H calls itself recursively. The major function of procedure H is to find all sec of a graph. A node having a minimal weight is removed from each sec. The remaining graph is again checked for sec. More nodes are removed if more sec are found. This process continues until no more sec containing more than one node are found. The 195 solution set Z consists of all the nodes removed during the process, i.e., those nodes that are marked as “black”. The complexity of Procedure Find-FVS is equal to that of Procedure H. The complexity of step (1) in Procedure H is 0 (M A X (n ,e )) since the procedure described in [4] is used. In the worst case, only one node is removed from the see in step (3). If there are n nodes in the DG, the complexity of Procedure H is 0 (n * M A X (n , e)), i.e., the complexity of Procedure Find-FVS is 0 (n * M A X (n , e)). The complexity of Procedure Find-Joint-FVS is AX(rii, ei))+ 0 (n 2* M A X (n 2 , e2)) or 0 (n * M A X (n , e). Therefore the complexity of Algorithm TSA is 0 (n * M A X { n , e) when Procedure Find-Joint-FVS is used in step (3) of Algorithm TSA. In conclusion, the complexity of Algorithm TSA is 0(n2n) when an optimal solution is required, and 0 (n * M A X (n ,e )) when a sub-optimal solution can be used. TSA has been applied to several examples. The results are shown in Ta ble 7.1(a). Each example is modeled as a DG and is described by a 4-tuple (rt, e,to,m ), where n is the number of nodes, e is the number of edges, w is the total weight of the D G , and m is the number of test vectors. The connectivity of the DG is not shown. Column UB indicates the worst case situation, i.e., where each node is scanned twice to make sure that test results correspond to appro priate test vectors. The columns Heuristic and Optimal indicate the values for schedules derived by using Procedures Find-Joint-FVS and Find-M in-Joint-FVS, respectively. All the values found by Procedure Find-M in-Joint-FVS are minimal. For the Examples 1, 2, 3, 7 and 8, Procedure Find-Joint-FVS also finds the optim al solution. Table 7.1(b) shows the results in percentage. The saving S'S is calculated from Equation (4). S S = 1 - = 1 Ns UB (to - 1) EvjgZj w i + E m e Z u w i + E v <ey, w i + m Y i v i e v w i ^ 4 x • 2 m ' £ v .e V Wi 196 Ex. n m e w Heuristic Optimal UB 1 3 15 1 500 7,500 7,500 15,000 2 3 20 2 700 14,700 14,700 28,000 3 3 20 3 300 8,100 8,100 12,000 4 4 20 6 355 8,880 8,880 14,200 5 5 30 8 500 23,562 21,068 30,000 6 6 15 11 1,900 40,200 35,100 57,000 7 5 25 12 1,350 39,420 39,420 67,500 8 4 30 11 1,450 60,900 60,900 87,000 9 7 20 15 830 25,505 22,845 33,200 (a) Ex. Heuristic Optimal 1 50 50 2 47.5 4 7 .5 3 32 32 4 37 37 5 21 30 6 29 38 7 42 42 8 30 30 9 23 31 (b) Table 7.1: Typical results; (a) N s , (b) saving S S (in %) on test time. Note that S S cannot exceed 50 percent. This is because at least one shift operation is required for applying each bit of the test data and there are m * Ylvc v Wi hits in the test data. So S S is limited to 1 — _ q g gjnce . . . . . 2m ^ v i e v w' the connectivity of these examples are different, it is misleading to compare one example with another. However, it is clear that the smaller the value of (m — !) H v izzwi + J 2 V iezn wi + HvieVj Wi, the larger the saving. 7.6 A n E x te n sio n to Full Scan The TSA Algorithm can also be used to schedule tests for a chip designed with full scan capability. Figure 7.7(a) shows a chip designed with the boundary scan architecture. Two scan chains are used to test the circuit C. Scan chain 1 is the boundary scan register. Scan chain 2 consists of all internal scan cells. Figure 7.7(b) shows a directed graph that can be used to schedule tests. Test vectors for C are applied and observed via these scan chains. Once a test vector is generated, it must be partitioned into two segments before being applied. The Scan function described earlier can be used to apply test vectors to these scan chains. It is necessary to find a schedule which can properly apply vectors to test C in minimal time. This problem belongs to this new class of scheduling problems, and can be solved using algorithm TSA. 198 chain 1 PD PI i chain 2 (a) (b) Figure 7.7: Testing a circuit via two scan chains; (a) block diagram, (b) graph model. 199 C h a p ter 8 C o n clu sio n s an d F u tu re R esea rch In this work a design-for-test tool, called BOLD, has been described. BOLD is applicable at the chip, module, subsystem and system level. Using a hierarchical design methodology BOLD can deal with both hardware and software test issues and achieves a very high degree of testability and m aintainability. A system de signed using BOLD can support fault detection and isolation in a timely and cost effective manner. Hence system availability is increased and the hardware life-cycle costs are decreased. Various issues related to the design of a hierarchically testable and m aintain able system are dealt with in BOLD. These issues include the support of both test hardware and software. In particular, the following m aterial has been described in depth: (1) design of on-chip and module test controllers; (2) definitions of test languages and the support of test synthesizers for these languages; (3) algorithms to evaluate various tradeoffs between test tim e and controller complexity; and (4) algorithms that lead to enhanced interconnect testing. 8.1 O n -C h ip T est C on troller The design of the on-chip test controllers presented are based on the boundary scan architecture so as to conform to the IEEE Std. 1149.1. These designs include both bus-dependent and autonomous controllers. A bus-dependent controller requires 2 0 0 the assistance from the test bus during the entire test process. Controllers for two commonly used kernels and a more complex kernel have been designed. W hen designing such a controller, there exists a problem of mapping the test bus states to the test control signals of the kernel. A mapping algorithm has been provided to solve this problem. Using this algorithm a controller can be designed that requires a minimal number of instructions. Two design styles (serial or parallel) for the autonomous on-chip controllers have been presented. A serial controller can test many kernels in sequence. It requires less hardware overhead than a parallel controller. Designs for both the hard-wired and microprogrammed serial controllers have been illustrated. A parallel controller can test many kernels simultaneously and reduces the overall chip test tim e at the expenses of extra hardware. Three techniques have been illustrated in the design of a parallel controller for testing many scan-type kernels. These techniques, referred to as interleaved design, tree-of-counters design, and counter sharing design, reduce the hardware overhead by sharing resources. A comparison of the performance of these three designs shows that no one design is always better than the other two. 8.2 M o d u le T est C on troller The design of a family of universal module test controller was presented. The module controllers differ by the test programs they execute, the num ber of test busses they control, and the expansion units they employ. One im portant aspect of their design is the use of a test channel. A test channel contains a boundary scan m aster that can communicate with an on-chip controller over the boundary scan bus. In addition, test vectors can be generated and results can be compressed in the test channel. The processor used in the module controller can control the test channel by reading from or writing to its internal registers. Once initiated by a processor, a test channel can completely control a boundary scan bus, thus eliminating the need 2 0 1 for the processor to deal with detailed bus activities. The test process can thus be represented as high level processor instructions. A prototype of the module test controller has been built and tested. The prototype is based on an IBM AT computer and a test channel built using the Actel field programmable gate array technology. Compared to most conventional autom atic test equipment, the cost of the proposed module test controller is much cheaper and the performance is far superior. 8.3 T est P ro g ra m S y n th esis One of the m ajor contribution of this work is the synthesis of test programs for the system under test. The synthesis process starts with the preparation of test description files, represented in high level languages, for each chip and module. Synthesis softwares have been provided to translate these descriptions into test program, which can then be used to drive the test controllers embedded at various hardware units. These controllers can then control the testing process of their associated hardware units. The entire system is therefore tested. The languages have been designed in such a way th at test files can be un derstood by a designer with little or no knowledge in testing. M ajor advantages of this approach include the reduced time in developing test software, and the in creased m aintainability and reliability of the test software since it can be checked and synthesized quickly. 8 .4 C on troller M in im iza tio n The time required to test a module is related to the complexity of the module and chip test controllers. In general, the more complex the controllers are, the shorter is the module test time. Tradeoffs can be made so that the module test tim e is bounded and the overall controller complexity is minimized. The test program synthesis technique provides a way to calculate the tim e required to test a module 2 0 2 or chip. This ability facilitates the design tradeoff between the module test tim e and the overall controller complexity. Two approaches that can be used to determ ine the complexity for each controller have been presented. Both approaches can minimize the overall controller complexity while keeping the module test tim e bounded. 8.5 In terco n n ect T est G en era tio n The results presented for test and diagnosis of interconnects are superior to all previous approaches in that all diagnosable faults can be identified. It has been shown th at there exists diagnosable faults in a wiring network which cannot be ■identified by any of the previous approaches, including the complementary counting sequence [64], the independent set [16], the diagonally independent sequence [34], jthe W-Test Algorithm [25], the C-Test Algorithm [34] and Method 3 in [16]. The faults th at lead to the deficiencies in these previous approaches are summarized and explained. Various levels of diagnostic resolution have been defined. In particular, a diagnostic level where all diagnosable faults are identified is defined. Two maximal diagnosis conditions have been presented and proved to be both necessary and sufficient for identifying all diagnosable faults. A property called set-cover independent is introduced. A test sequence that is set-cover independent must have a walking ones sequence (for wired-OR model) as its subsequence. It has been shown that a set-cover independent set is both necessary and sufficient for achieving maximal diagnosis. In addition, a universal test set has been proposed to identify all diagnosable faults in a network regardless of the fault model used. Two adaptive algorithms that achieve maximal diagnosis have been pre sented. They can reduce the size of the test set by employing a two-step diagnosis scheme. Both algorithms first apply a maximal independent set to elim inate alias ing syndromes. The responses are analyzed and based on the initial results, the second part of the test set is generated. W ithout the information from the first 203 jpart, it is impossible to reduce the size of the test set. However, it is not clear whether these algorithms can generate minimal test sets. In practice, a net can only be shorted to a set of neighboring nets due to the physical structure of the network. When neighborhood information is employed, it is possible to generate a reduced test set. A one-step diagnosis algorithm th at uses this information has been presented. It has been shown that this algorithm can generate a minimal size test set to achieve maximal diagnosis. 8.6 In terco n n ect T est S ch ed u lin g The problem of applying interconnect test vectors via multiple boundary scan chains was investigated. The objective is to apply test vectors in such a way so as to mini mize the total test time. This problem leads to a new class of scheduling problems. Theorems pertaining to optimal schedules are derived. Based on these results an algorithm has been constructed that generates an optimal schedule. The test time is greatly reduced when the optimal schedule is adopted. A reduction in the range of 30 to 50 % has been achieved in the examples examined so far. The search pro cedure used in this algorithm can be further improved. However, no effort has been made to find a better search procedure. This is due to the following two reasons: (1) the problem is NP-Complete; and (2) the current procedure performs well when the problem size n is less than 15, which includes most foreseeable applications. 8.T F u tu re R esea rch The enhancement of BOLD includes both hardware and software aspects. The proposed MMC is not as efficient as possible due to limitations in the test channel. The inefficiency could be a problem if the volume of data transfer between the memory and the test channel is large. Also, the test channel does not support an arbitrary sequence of values on the T M S line. This could be a problem in some applications. To make the BOLD system more general, some suggestions are listed 204 below. This includes the realization of a more efficient MMC, which can be done through the redesign of the test channel. 8.7.1 O n -ch ip T est C on troller Currently, BOLD supports the synthesis of test software but not the test controllers. The system can be improved if the CMCs could be autom atically synthesized. The synthesis of a CMC should contains two parts, namely the Test Access Port (TAP) and the BIT controller. T A P : The inclusion of a TAP to each chip should be done automatically, i.e., independent of the logic design of the chip. The synthesis of a TAP can be achieved as follows. 1. Generate the TAP controller, which is a machine defined by the IEEE Std. 1149.1. 2. Generate the boundary register consisting of a boundary scan cell for each I/O pin of the chip. The scan cells should be properly connected and con trolled so that it satisfies the IEEE Std. 1149.1 requirements. In particular, the Boundary Register should support the predefined public instructions EX TEST, INTEST, and SAMPLE. 3. Generate the instruction register (IR). The length of the IR should be de term ined first. This can be done if the total number of instructions required in controlling the BIT controller is known. The opcode of the instructions should also be determined to allow the BIT controller to be synthesized. The mapping algorithm proposed in chapter 2 can be used to determined the total num ber of instructions required. 4. Generate the bypass register and the identification register. The latter is optional. The synthesis of the TAP should be treated as a development project since it is not very difficult and requires little research. 205 B IT C ontroller: The synthesis of a BIT controller should start with the repre sentation of test schedules. Given a circuit that has been partitioned into testable kernels, it is necessary to organize the test into sessions. During each session a test schedule is required. The difficult part of the synthesis is the minimization of the area overhead of the BIT controller. This problem is further complicated by the fact that the area overhead of the TAP, which varies as the length of the IR changes, should also be considered. The proposed control graph, which represents a test procedure in terms of test control signals, can be used to represent a test schedule. Further research is needed to find the best way for representing test schedules. CTL G eneration: The CTL description of a chip can be autom atically generated along with the CMC. This will be an im portant feature in the integration of both chip and module testing. 8 .7 .2 M o d u le T est C on troller Im proved T est C hannel The current implementation of the MMC prototype is not ideal. The m ajor reason is that the test channel of the MMC is implemented using an ACT1020 device, which has a very limited capacity. The interface between the processor and the test channel is also not ideal since the data transfer between the memory and the test channel is not fast enough without interrupting the test activities. In addition, many proposed features of the test channel designs are not included. An ideal test channel should contain the following features. 1. The test channel should contain a large memory so th at all the test data for each test session can be loaded into the test channel without the need to access the external memory. 2. The data transfer between the test channel and the external memory unit should be very high. Either the direct memory access (DMA) operations or the use of D ire c t signal proposed in this work are desirable. 206 3. The FSM1 of the test channel should not be limited to only some predefined state transition sequences. The test channel should be able to set an arbitrary sequence of values on the TM S line. 4. The test channel itself should be testable, i.e., it should contain a CMC. 5. A clock control circuitry should be added to the test channel. This circuitry should be able to control the application of the test clock TCK and the system clock. This ideal test channel can be realized using either full-custom or standard cell VLSI design approaches. Both approaches allow for a large num ber of gates and a large memory to be built onto a single chip. M M C for a Self-T estable M odule To make a module fully self-testable, a complete MMC should be built into the module. To be practical, it is desirable to have a complete MMC including a test channel, a processor and a large memory unit packaged in from one to three chips. W hen both BILBO and RUNBIST TDMs are used in a module as the primary means of testing, the memory requirements can be small. In this case, an MMC can be built from two chips, namely a microcontroller chip that contains an internal RAM, and a test channel chip. M M C in a Chip A more ambitious project would be to develop a single chip MMC. This MMC should contain three major units, i.e., a processor, a RAM and a test channel. The instruction set of the processor can be very small. In fact both the processor and the test channel can be closely coupled, i.e., no obvious boundary need exist between these two units. 8 .7 .3 T est P rogram S y n th esis The test program synthesis aspect of BOLD can be improved as follows. 207 1. The synthesis of test programs in BOLD requires test description files as the inputs. Currently, these files are manually prepared. The autom atic generation of these description files should be a m ajor goal for improving the capability of BOLD. Initial investigation indicates th at the test description of a chip can be generated as a by-product of the on-chip test controller synthesis, and that the test description of a module can be generated from a computer-aided design system. The latter assumes that the module contains all boundary scan devices. 2. Currently, the synthesizers do not fully support the capability of putting an arbitrary sequence of values on the T M S line. The reason for this is that the implemented test channel is not ideal. The required operations are only partially supported by the hardware. When the test channel is properly redesigned, the synthesizers will be enhanced by supporting this capability. 3. Currently, the lowest level of hardware that can be dealt with using BOLD is a chip. However, most of the design-for-test tools, such as TDES [2] and SIESTA [26], operate on circuit blocks or kernels of a chip. More work needs to be done to enhance BOLD with the capability of describing the test aspects of a circuit block. By so doing, BOLD can be integrated with these design- for-test tools. 3 .7 .4 C o n tro ller M in im iza tio n The minimization algorithms presented in this work assumed th at an MMC con tains a single test channel. However, when employing a very fast processor, an MMC can control more than one test channel simultaneously without interrupting their operations. New controller minimization algorithms should be developed to ncorporate these changes. 208 t i 3 .7 .5 In terc o n n ec t T est 1. The two-step diagnosis algorithms, presented in chapter 6, do not guarantee the generation of a minimal size test set. Further work is required to de velop an algorithm that can generate a minimal size test set for the two-step diagnosis approach. 2. The presented interconnect test methods are based on the assumption th at all chips contained in the module have the boundary scan architecture. However, most existing modules do not satisfy such requirement. New test methods need be developed such that the interconnect can be tested under the incom plete boundary scan environment. In addition, the testing of the glue logic between boundary scan chips should be considered. 3. The proposed test methods investigate only the DC behavior of the inter connect. Modern systems are designed to operate at such a high speed that the testing for the DC correctness of the interconnect is no longer sufficient. Therefore faults related to the AC behavior, such as crosstalk between signal lines, transmission line effects, line delay faults, should be dealt with. Future work should investigate the possibility of testing these faults in the boundary scan framework. 209 R eferen ce L ist [1] M.S. Abadir and M.A. Breuer, “Constructing Optimal Test Schedules for VLSI Circuits Having Built-In Test Hardware”, Proc. 15th In t’ l Symp. on Fault-Tolerant Computing, pp. 165-170, June 1985. [2] M.S. Abadir and M.A. Breuer, “A Knowledge-Based System for Designing Testable VLSI Chips”, IEEE Design & Test of Computers, pp. 56-68, August 1985. [3] M.S. Abadir and M.A. Breuer, “Test Schedules for VLSI Circuits Having Built-In Test Hardware”, IEEE Trans, on Computers, Vol. C-35, No. 4, pp. 361-367, April 1986. [4] A.V. Aho, J.E. Hopcroft and J.D. Ullman, “The Design and Analysis of Computer Algorithms” , Addison-Wesley, Readings, Massachusetts, pp. 193- 194, 1974. [5] L. Avra, “A VHSIC ETM-BUS Compatible Test and Maintenance Interface”, Proc. In t’ l Test Conf., pp. 964-971, 1987. [6] P.H. Bardell and W. McAnney, “Self-Testing of M ultichip Logic Modules”, Proc. In t’ l Test Conf., pp. 200-204, 1982. [7] J. Beausang and A. Albicki, “A Methodology for Designing Self-Testable VLSI Chips: Synthesis, Part I, A Model for Self-Testable Chips” , Technical Report EL-87-05, Department of EE, University of Rochester, 1987. [8] F. Beenker, K. Eerdewijk, R. Gerritsen, F. Peacock and M. van der Star, “Macro Testing: Unifying IC and Board Test”, IEEE Design & Test of Com puters, pp. 26-32, December 1986. [9] F. Beenker, “Systematic and Structured Methods for Digital Board Testing”, VLSI System Design, pp. 50-58, January 1987. 10] G. Borriello and R. H. Katz, “Synthesis and Optimization of Interface Trans ducer Logic”, Proc. In t’ l Conf. Computer-Aided Design, pp. 274-277, 1987. 2 1 0 [11] M.A. Breuer, “On-Chip Controller Design for Built-In-Test”, Technical Re port CRI-88-04, Department of EE-Systems, University of Southern California, December 1985. [12] M.A. Breuer and J.C. Lien, “A Methodology for the Design of Hierarchi cally Testable and Maintainable Digital Systems”, Proc. 8th Digital Avionics System Conf, pp. 40-47, 1988. [13] M.A. Breuer and J.C. Lien, “A Test and Maintenance Controller for a Module Containing Testable Chips”, Proc. In ti Test Conf., pp. 502-513, 1988. [14] M.A. Breuer, R. Gupta, and J.C. Lien, “Concurrent Control of M ultiple BIT Structures” , Proc. In ti Test Conf., pp. 431-442, 1988. [15] W.O. Budde, “Modular Testprocessor for VLSI Chips and High-Density PC Boards” , IEEE Trans, on CAD, Vol. 7, No. 10, pp. 1118-1124, October 1988. [16] W .T. Cheng, J.L. Lewandowski and E. Wu, “Diagnosis for W iring Intercon nects”, Proc. In t’ l Test Conf., pp. 565-571, 1990. [17] K.K. Chua and C.R. Kime, “Selective I/O Scan: A Diagnosable Design Tech nique for VLSI Systems”, Comput. Math. Applic., Vol. 13, No. 5/6, pp. 485- 502, 1987. [18] G.L. Craig, C.R. Kime, and K.K. Saluja, “Test Scheduling and Control for VLSI Built-In Self-Test” , IEEE Trans, on Computers, Vol. C-37, No. 9, pp. 1099-1109, September 1988. [19] C.A. Dennis, “Common Signal Processor: Application and Design” , IBM Technical Directions, Federal Systems Division, Vol. 13, No. 1, pp. 14-20, 1987. [20] IBM, Honeywell and TRW, “VHSIC Phase 2 INTEROPERABILITY STAN DARDS”, ETM-BUS Specification, December 1986. [21] E.B. Eichelberger and T.W . Williams, “A Logic Design Structure for LSI Testability”, Proc. lfth Design Automation Conf., pp. 462-467, June 1977. [22] P.P. Fasang, J.P. Shen, M.A. Schuette and W.A. Gwaltney, “Autom ated De sign for Testability of Semicustom Integrated Circuits”, Proc. In ti Test Conf, pp. 558-564, 1985. [23] A. El Gamal, “Protozone: The PC-Based ASIC Design Frame”, EE218 Hand out No. 7, Stanford University, W inter 1990. [24] M.R. Garey and D.S. Johnson, “Computers and Intractability: A Guide to the Theory of NP-Completeness” , W.H. Freeman and Company, New York, 1974. 2 1 1 [25] P. Goel and M.T. McMahon, “Electronic Chip-In-Place Test” , Proc. In ti Test Conf., pp. 83-90, 1982. [26] Rajesh Gupta, “Advanced Serial Scan Design for Testability”, Ph.D. Disser tation, Department of EE-Systems, University of Southern California, 1991. [27] J.E. Haedtke and W.R. Olson, “Multilevel Self-Test for the Factory and Field”, Proc. Annual Reliability and Maintainability Symp., pp. 274-279, 1987 [28] P. Hansen, “The Impact of Boundary-Scan on Board Test Strategies”, Proc. A TE & Instruments Conf. East, pp. 35-40, Boston, June 1989. [29] A. Hassan, J. Rajski, and V.K. Agarwal, “Testing and Diagnosis of Intercon nects using Boundary Scan Architecture” , Proc. In t’ l Test Conf, pp. 126-137, 1988. [30] A. Hassan, V.K. Agarwal, J. Rajski and B.N. Dostie, “Testing of Glue Logic Interconnects Using Boundary Scan Architecture”, Proc. In t’ l Test Conf., pp. 700-771, 1989. [31] C.L. Hudson, Jr. and G.D. Peterson, “Parallel Self-Test with Pseudo-Random Test Patterns” , Proc. In ti Test Conf, pp. 954-963, 1987. [32] IEEE Standard 1076-1987, “IEEE Standard VHDL Language Reference”, IEEE Standards Board, 345 East 47th Street, New York, NY 10017, March 1988. [33] IEEE Standard 1149.1-1990, “IEEE Standard Test Access Port and Boundary Scan Architecture,” IEEE Standards Board, 345 East 47th Street, New York, NY 10017, May 1989. [34] N. Jarwala and C.W. Yau, “A New Framework for Analyzing Test Generation and Diagnosis Algorithms for Wiring Interconnects” , Proc. I n ti Test Conf, pp. 63-70, 1989. [35] S.C. Johnson, “Yacc: Yet Another Compiler-Compiler”, in B.W. Kernighan and M.D. Mcllroy, UNIX Program’ s Manual, Bell Laboratories, 7th Edition, 1978. [36] N. Kanopoulos, et al., “A New Implementation of Signature Analysis for Board Fault Isolation Testing” , Proc. In t’ l Test Conf, pp. 730-736, 1987. [37] W .H. Kautz, “Testing for Faults in Wiring Networks” , IEEE Trans, on Com puters, Vol. C-23, No. 4, pp. 358-363, April 1974. [38] B. Konemann, J. Mucha and G. Zwiehoff, “Built-In Logic Block Observation Techniques” , Proc. In ti Test Conf, pp. 37-41, 1979. 2 1 2 [39] S.Y. Kung, S.C. Lo, S.N. Jean and J.N. Hwang, “Wavefront Array Processors - Concept to Implementation”, IEEE Computer, pp. 18-33, July 1987. [40] D. van de Lagemaat and H. Bleeker, “Testing a Board with Boundary Scan”, Proc. In ti Test Conf., pp. 724-729, 1987. [41] J.J. LeBlanc, “LOCST: A Built-In Self-Test Technique” , IEEE Design & Test of Computers, pp. 45-52, November 1984. [42] M.E. Lesk and E. Schmidt, “Lex: A Lexical Analyzer Generator”, in B.W. Kernighan and M.D. Mcllroy, UNIX Program’ s Manual, Bell Laboratories, 7th Edition, 1978. [43] J.C. Lien and M.A. Breuer, “A Universal Test and Maintenance Controller for Modules and Boards”, IEEE Trans, on Industrial Electronics, Vol. 36, No. 2, pp. 231-240, May 1989. [44] J.C. Lien, “A Module Maintenance Controller Prototype”, Technical Report CENG 90-14, Department of EE-Systems, University of Southern California, June 1990. [45] J.C. Lien and M.A. Breuer, “An Optimal Scheduling Algorithm for Testing Interconnect Using Boundary Scan”, Journal of Electronic Testing: Theory and Applications, Vol. 2, No. 1, pp. 117-130, March 1991. [46] J.C. Lien and M.A. Breuer, “Maximal Diagnosis of Wiring Networks” , Tech nical Report CENG 91-2, Department of EE-Systems, University of Southern California, February 1991. [47] T.S. Liu, “The Role of a Maintenance Processor for a General Purpose Com puter System”, IEEE Trans, on Computers, Vol. C-33, No. 6, pp. 507-517, June 1984. [48] TRW, “MMN Architecture”, Private Correspondence. [49] J.G. Malcolm, “BIT False Alarms: An Im portant Factor In Operational Readi ness” , Proc. Annual Reliability and Maintainability Symp., pp. 206-212, 1982. [50] C. Maunder and F. Beenker, “BOUNDARY-SCAN: A Framework for Struc tured Design-For-Test” , Proc. In t’ l Test Conf., pp. 714-723, 1987. [51] E.J. McCluskey, “Built-In Self-Test Techniques”, IEEE Design & Test of Computers, pp. 21-28, April 1985. [52] M .J. Ohletz, T.W . Williams and J.P. Mucha, “Overhead in Scan and Self- Testing Designs”, Proc. In t’ l Test Conf., pp. 460-470, 1987. 213 [53] C.H. Papadimitriou and K. Steiglitz, “Combinatorial Optimization, Algo rithm s and Complexity”, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, pp. 421-422, 1982. [54] K.P. Parker, “The Impact of Boundary Scan on Board Test”, IEEE Design & Test of Computers, pp. 18-30, August 1989. [55] K.P. Parker and S. Oresjo, “A Language for Describing Boundary Scan De vices” , Proc. In t’ l Test Conf., pp. 222-234, 1990. [56] G.D. Robinson and J.G. Deshayes, “Interconnect Testing of Boards with Par tial Boundary Scan”, Proc. In t’ l Test Conf., pp. 572-581, 1990. [57] K. Sakashita, T. Hashizume, T. Ohya, I. Takimoto and S. Kato, “Cell-Based Test Design M ethod”, Proc. Int’ l Test Conf, pp. 909-916, 1989. [58] J. Sayah and C.R. Kime, “Test Scheduling For High Performance VLSI System Implementations”, Proc. In t’ l Test Conf, pp. 421-430, 1988. [59] J.H . Stewart, “Application of Scan/Set for Error Detection and Diagnostics”, Proc. Semiconductor Test Conf., pp. 152-158, 1978. [60] IBM, Honeywell and TRW, “VHSIC Phase 2 INTEROPERATABILITY STANDARDS” , TM-BUS Specification, December 1986. [61] J. Turino, “IEEE P I 149 Proposed Standard Testability Bus - An Update with Case Histories”, Proc. In t’ l Conf. Computer Design, pp. 334-337, 1988. [62] N. Vasanthavada, “TEA Design Review, Built-In Test”, Research Triangle Institute, June 1987. [63] S. Vining, “Tradeoff Decisions Made for P I 149.1 Controller Design” , Proc. In t’ l Test Conf., pp. 47-54, 1989. [64] P.T. Wagner, “Interconnection Testing with Boundary Scan”, Proc. Int ’ I Test Conf, pp. 52-57, 1987. [65] L. W hetsel, “A Proposed Standard Test Bus and Boundary Scan Architec ture”, Proc. In t’ l Conf. on Computer Design, pp. 330-333, 1988. [66] T.W . Williams and K.P. Parker, “Design For Testability - A Survey,” The Proc. of the IEEE, Vol. 71, No. 1, pp. 98-112, January 1983. [67] N. W irth, “Algorithm -f D ata Structures = Programs” , Prentice-Hall, Engle wood Cliffs, New Jersey, pp. 188-189, 1976. 214 [68] C.W. Yau and N. Jarwala, “A Unified Theory for Designing O ptim al Test Generation and Diagnosis Algorithms for Board Interconnects”, Proc. In t’ l Test Conf., pp. 71-77, 1989. [69] M.A. Breuer and X. Zhu, “A Knowledge-Based System for Selecting Test Methodology for a PL A”, Proc. 22nd Design Automation Conf, pp. 259-265, June 1985. 215
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11257190
Unique identifier
UC11257190
Legacy Identifier
DP22825