Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Built-In Self-Test for Modeled Faults by Mody Lempel A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) December 1994 Copyright 1994 Mody Lempel UMI Number: D P22888 All rights reserved INFORMATION TO ALL USERS T he quality of this reproduction is d ep en d en t upon the quality of th e copy subm itted. In the unlikely event that the author did not sen d a com plete m anuscript and th ere are m issing p ag es, th e se will be noted. Also, if m aterial had to be rem oved, a note will indicate the deletion. Dissertation Publishing UMI DP22888 Published by P roQ uest LLC (2014). Copyright in the D issertation held by the Author. Microform Edition © P roQ uest LLC. All rights reserved. This work is protected ag ain st unauthorized copying under Title 17, United S ta tes C ode P roQ uest LLC. 789 E ast Eisenhow er Parkway P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 - 1346 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, written by under the direction of h..i..T. Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of Ph.P. CpS L 6< pI DOCTOR OF PHILOSOPHY Dean of Graduate Studies Date DISSERTATION COMMITTEE Chairperson Dedication To my parents, my guiding light; To Irit, at lee ron. Acknowledgements This thesis owes a great deal to Professors Melvin A. Breuer, Sandeep K. Gupta, Leonard M. Adelman, Douglas J. Ierardi and Lloyd R. Welch. I would like to thank you for your time, your wisdom and your patience. It was an honor and a pleasure learning from you. This work was supported by the Advanced Research Projects Agency and monitored by the Federal Bureau of Investigation under Contract No. JFBI90092. Contents D ed ica tio n ii A ck n o w led g em en ts iii L ist O f T ables vii L ist O f F igu res v iii A b stra ct ix 1 In tro d u ctio n 1 1.1 Pattern generator d e s ig n ............................................................................... 4 1.2 Response analyzer design............................................................................... 8 2 P a ttern g en era to r d esign 12 2.1 The probability of 100% fault coverage with a random test sequence of length L .......................................................................................................... 13 2.2 Detectability profile models ........................................................................ 19 2.3 Experimental results on finding short pseudo random test sequences . 23 2.4 Discrete logarithms and LFSR seq u en ces................................................. 27 2.5 The test embedding p roblem ........................................................................ 29 2.6 Identifying hard f a u l t s .................................................................................. 32 2.6.1 The h e u r is tic ...................................................................................... 34 iv 2.6.2 Probabilistic analysis of the h e u ris tic ............................................ 37 2.7 Experimental r e s u l t s ..................................................................................... 43 2.8 C onclusions...................................................................................................... 53 2.9 Miscellaneous issues......................................................................................... 55 2.9.1 Multiple se ed s....................................................................................... 55 2.9.2 Lower fault c o v e ra g e .......................................................................... 56 2.9.3 Embedding subsets of test p a tte r n s ............................................... 57 2.9.4 Using wider pattern generators ...................................................... 57 2.9.5 Different windows from one polynom ial......................................... 58 3 R esp o n se an aly ze r design 59 3.1 Bounds on the least degree non-factor of a set of polynom ials................... 60 3.1.1 The worst case b o u n d s ...................................................................... 61 3.1.2 The expected b o u n d s.......................................................................... 65 3.2 Polynomial operations in GF[2 ] ................................................................. 68 3.2.1 Polynomial multiplication, division and gcd ................................ 69 3.2.2 x 2™ modulo b(x) and xf modulo b(x) ............................................ 70 3.3 Finding a non-factor of smallest degree for a given set of polynomials 71 3.3.1 The product of all distinct factors of the same degree for a given p o ly n o m ia l........................................................... 73 3.3.2 The number of all distinct factors, of the same degree, for a set of polynom ials...................................................................... 76 3.3.3 Finding a n o n -fa c to r......................................................................... 79 3.4 Practical scenarios ........................................................................................ 80 3.4.1 Finding a non-factor of a pre-specified degree ........................... 81 3.4.2 Finding a non-factor f a s t .................................................. 82 3.4.3 Exhaustive search ................................................................................ 83 3.5 Experimental r e s u l t s ..................................................................................... 84 3.5.1 Random selections based on the absolute bounds ..................... 84 v 3.5.2 Random selections based on the expected b o u n d s ..................... 85 3.5.3 Experiments on benchmark c irc u its .............................................. 86 3.6 C o n clu sio n s...................................................................................................... 87 R eferen ce L ist 89 A p p en d ix A The least prime in an arithmetic progression.............................................. 93 A .l General B o u n d s................................................................................................ 93 A.2 Some evidence for the likelihood of k < 2n .............................................. 95 A.3 Finding the Least Prime in the Arithmetic Progression k2n + 1 . . . . 98 A p p en d ix B Generating irreducible polynomials ........................................................101 A p p en d ix C Factoring a product of distinct irreducible polynomials of the same degree 104 L ist O f T ables 2.1 Fault distribution and probability bounds for a test sequence of length L ....................................................................................................................... 24 2.2 Probability bounds for a test sequence of length 2L ............................. 25 2.3 Number of random sequences of length less than (2)L, that detected all f a u lts ............................................................................................................. 26 2.4 Experimental results for some ISCAS85 combinational benchmarks. . 44 2.5 Characteristics of synthesized circuits..............................................................45 2.6 RS results for the randomly testable synthesized circuits...........................46 2.7 RS results for the random resistant synthesized circuits......................... 47 2.8 Hard fault identification for synthesized circuits....................................... 48 2.9 Embedding results for the synthesized circuits.......................................... 49 2.10 Pnd(elit) values for t = 2 * , l < j < l - \ - 5 .................................... 49 2.11 Comparison of embedding results and random selection result................. 50 2.12 Distribution of simulation length vs. embedding length ratio............... 52 2.13 Results of additional RS experiments to get equal tim e comparisons. . 53 3.1 Number of primitive elements in GF[2m] and accumulated number of primitive elements in GF[22\ ... GF[2m] ..................................................... 64 3.2 Least prime p, of the form j32m + 1, with smallest generator a and 2m-th root of unity u > ........................................................................................ 70 3.3 Summary of Results .................................................................................... 88 A .l Smallest 1 < k < 100 for which k2n + 1 is prime, 0 < n < 206 . . . . 100 vii L ist O f F igu res 1.1 A LFSR-based PG for a 4 input CUT. The feedback polynomial is f(x ) = x4 + x + 1.................................................................................... 3 1.2 A MISR-based RA for a 4 output CUT. The feedback polynomial is f(x ) = x4 + a : + 1............................................................................................... 3 2.1 A LFSR with the feedback polynomial f ( x ) = x 4 + x + 1....................... 28 2.2 The procedure for the windowing stage of E P ............................................ 31 2.3 The procedure for the second phase of identifying hard fa u lts .................. 35 3.1 Procedure distinct-f actors(hj). Computes the product of all distinct factors of degree j, for 1 < j < u, of the polynomial h................................74 3.2 Procedure distinct-primitive(gi> j). Sifts out the non-primitive factors of 9 i j ..................................................................................................................... 74 viii Abstract Built-In Self-Test (BIST) is the capability of a circuit to test itself. The idea behind B IST is to create pattern generators (PGs) to generate patterns for the circuit and response analyzers (RAs) to compact the circuit response. The PGs need to generate patterns that detect the faults of interest while keeping the length of the test sequence under user specified constraints. The compaction function of the RAs is a many-to-one function, hence certain faulty responses might be mapped to the same signature as the good response. This is known as aliasing. When aliasing occurs, a faulty circuit is presumed to be fault free. The RA mechanism should minimize the aliasing probability. Both the PG and RA mechanism should keep the hardware overhead to a mini mum, thus reducing the area and delay penalty the circuit suffers. Most circuits are too large to be tested as one entity, hence they are partitioned for test purposes. The partition is usually done at register boundaries. Thus, two popular mechanisms are a linear feedback shift register (LFSR) for the PG and a multiple input shift register (MISR) for the RA. The properties of these structures is determined by their feedback polynomials. In this work we will focus on selecting feedback polynomials for LFSR-based PGs in order to achieve 100% fault coverage in minimum time and on selecting feedback polynomials for MISR-based RAs in order to achieve zero-aliasing. W hen a register acts as both a PG and a R A , we will select a polynomial that achieves both goals simultaneously. The selection of a feedback polynomial for a PG is based on random selections for randomly testable circuits or on a guided search using the theory of discrete loga rithms for random resistant circuits. For these circuits the set of faults is partitioned into easy and hard faults, the test patterns of the hard faults are generated, their position in the sequence generated by the LFSR is computed and a minimum length test sequence is found. The selection of a zero-aliasing feedback polynomial for a RA is found by con sidering the error sequences for the faults of interest. A bound is computed that guarantees that a zero-aliasing polynomial of degree less than the bound exists. The factors of the error polynomials whose degrees are less than the bound are extracted and a polynomial that is a non-factor is searched for. When the same polynomial is to serve both objectives, the polynomial chosen is the one th at generates the shortest test sequence from a set of zero-aliasing polyno mials. Chapter 1 Introduction Built-In Self-Test (BIST) is the capability of a circuit to test itself. The idea behind BIST is to create pattern generators (PGs) to generate test patterns for the circuit and response analyzers (RAs) to compact the circuit response to the inputs that are applied. It is the responsibility of the PG to generate test patterns that detect all the faults of interest. The faults of interest depend on the fault model that is adopted. An example of a fault model is the single stuck-at fault for which the faults of interest are the non-equivalent single stuck-at faults. Under this model it is assumed that the circuit under test (CUT) has just one fault, and the fault is a single line which is either a constant “1” or a constant “0”, independent of the circuit inputs. Other fault models include bridging faults, open lines, transistor shorts and transistor stuck-on (off). By knowing the fault model it is possible to simulate the effect of the fault, thereby to evaluate if a certain test pattern propagates the effect of the fault to an output, i.e. the output in the presence of the fault is different than the output of a fault free circuit. If one abandons the fault models and is interested in detecting any possible non-sequential fault, the PG must generate either an exhaustive test set in which all possible patterns are applied to the circuit, or a pseudo exhaustive test set in which all logic cones (a logic cone is the portion of logic that drives a single output bit) are tested exhaustively (this excludes faults that change the cone configuration of the circuit, e.g. certain bridging faults). 1 Once the PG mechanism is known, since it is deterministic and always initialized to the same state, the sequence of test patterns is also known. By simulation, the circuit response can be found. It is the responsibility of the RA to analyze the response and provide a go/no-go signal as to whether the CUT is fault free or faulty. The circuit response, which may consist of thousands of bits, is compacted into a signature which consists of only tens of bits. The compacting function is a many- to-one function and as a result some erroneous responses might be m apped to the same signature as the good response. This is known as aliasing. When aliasing occurs, a faulty circuit passes the test and is-presumed to be fault free. When all erroneous responses are mapped to a different signature than the good response, we have zero-aliasing. When designing a PG or a RA it is im portant that the design be effective and effi cient. The effectiveness of a PG is measured in terms of the fault coverage it achieves, i.e. the percentage of faults of interest that are detected by the test sequence that is generated, and efficiency is measured in terms of test time, i.e. the length of the test sequence that needs to be applied in order to achieve the stated fault coverage. The effectiveness of a RA is measured in terms of the aliasing probability, i.e. the probability that a faulty circuit passes as fault free. It is also im portant that the design add minimal area and delay overhead to the circuit. Testing a large circuit is done by partitioning the circuit into smaller subcircuits. Usually these subcircuits are combinational, hence they are bounded by registers. Thus most PG and RA designs are based on reconfiguring existing registers. Two popular mechanisms are the linear feedback shift register (LFSR) for the PG and the multiple input shift register (MISR) for the RA. An example of a LFSR-based PG is shown in Figure 1.1. Number the cells of a k stage LFSR Do, D i,..., Dk-i, with the feedback coming out of cell Dk-i- The feedback function is represented as a polynomial f(x ) = x k + fix % and the feedback feeds cell Di iff /, = 1. The feedback polynomial of the LFSR in Figure 1.1 is f(x ) = x 4 + x + 1. When the feedback polynomial is primitive, all 2k — 1 non-zero binary A;-tuples are generated by the PG, hence it has the ability to generate all possible non-zero test patterns. 2 CUT Figure 1.1: A LFSR-based PG for a 4 input CUT. The feedback polynomial is f ( x ) = x 4 + x + 1. CUT < 3 i o0 }+ ■ D0 -K ; . 1 i O i > Dr 02 03 >+d 3 — Figure 1.2: A MISR-based RA for a 4 output CUT. The feedback polynomial is f(x ) = x 4 + x + 1. An example of a MISR-based RA is shown in Figure 1.2. The difference between a MISR and a LFSR is that the CUT also feeds the stages of the shift registers. The compacting function of a MISR is polynomial division over GF[2]. The effec tive output polynomial is divided by the feedback polynomial. The signature is the remainder of the division. If the CUT has k outputs, it has k output sequences. Denote these sequences o0, oj,..., Ok-i where o; feeds Di. If the input sequence is of length n, then each o ,- can be viewed as a polynomial Oi = YffjZo where Oij is the output of the i-th output at tim e j. The effective polynomial is then o = Efco 0 ,x ‘. The m ajor drawback of a LFSR-based PG is that with random selections of the feedback polynomial and the initial state, it may take a long tim e for all the necessary pattern to be generated. Several designs have been proposed to alleviate this problem. The drawback of these schemes is the added area overhead over the LFSR design. 3 Several researchers have analyzed the aliasing probability of MISR-based RAs. This probability comes out to be 2~k for a A;-stage MISR. If the MISR is short (less than 20 stages), this probability might not be satisfactory. Several researchers proposed zero-aliasing designs. These designs can detect any error in the output sequence, but incur a high area penalty. As a result of the partitioning into subcircuits being done at register boundaries, in many instances a register will serve as both a PG and a RA. The designs of non- LFSR PGs do not take into account the need for RA capability as well, and vice versa, hence the circuit will be subject to the overhead of both designs. The purpose of this work is threefold. First, the design of LFSR-based PGs that achieve 100% fault coverage in minimum time. Second, the design of zero-aliasing MISR-based RAs for the set of modeled faults we are interested in detecting. These two tasks are achieved by proper selection of the feedback polynomials and seeds, hence the third aspect of this work is to select polynomials that achieve both goals simultaneously, thus freeing the circuit of the overhead of reconfigurable designs. In the next two sections of this chapter we elaborate on past work regarding pattern generator and response analyzer design and outline our approach. 1.1 Pattern generator design A n-stage LFSR with a primitive feedback polynomial generates a perm utation of all the non-zero binary n-tuples. Changing the polynomial changes the perm utation. The seed specifies the starting position for scanning the perm utation. To achieve 100% detection of the faults of interest the LFSR must generate patterns until the set of patterns contains at least one test pattern for every fault. The problem is that the perm utation might not contain a relatively short subsequence of test patterns for all the faults, and even if it does, it is usually not known where this short subsequence begins. There are two approaches to deal with the problem of generating all the required test patterns. The first is by trying to estimate the number of test patterns required 4 to achieve 100% fault coverage. The second is by adding hardware to guarantee 100% detection with a small test set. While the first approach keeps the hardware at a minimum, it significantly adds to the required test time. The second approach results in a very short test time, but requires significant hardware overhead. We will first expand on the above approaches and then discuss our ideas for the design of LFSR-based PGs. To estim ate the number of patterns required to achieve 100% fault coverage, a number of authors [26] [39] [45] asked how many test vectors need be applied to achieve a given fault coverage with a specified degree of confidence. The answers depend on the detectability profile of the circuit, i.e. the number of test patterns for each fault. For example, to achieve 100% fault coverage when the probability of detection of the hardest fault is p, Savir and Bardell [39] suggest using a test sequence of length ll/p . The main motivation behind this question is that it eliminates the need for test generation and for fault simulation. W ith the advances in test generation and fault simulation tools (not to mention platform speed), we believe this is not as im portant as it used to be. On the other hand, shorter test sequences for combinational subcircuits allow for shorter test sessions for the whole circuit and allow for easier synthesis of zero-aliasing response analyzers (Chapter 3). As opposed to trying to detect all the faults of interest with one pseudo-random sequence, a second approach is to either use special hardware to generate small test sets, or to use a short pseudo-random sequence to detect most of the faults, and then use special hardware to detect the remaining faults. The use of non-linear feedback functions are suggested in [11] and [12]. Using this approach, one first creates a test set T that achieves 100% fault coverage and then synthesizes logic to generate the test patterns. In [11] the patterns in T are viewed as states of a finite state machine. Logic is built around the register such that the states of the register generate these patterns. In [12] patterns are added to T such that T can be ordered in a way that consecutive patterns differ in only one bit position. A ROM is used to store these positions. A related approach is that of weighted random patterns [46], [4], [27]. Logic is added to the register to affect the probability that each register cell outputs either a “1” or a “0”. The probabilities are based on the test patterns which detect the faults of interest. A store-and-generate approach is suggested in 5 [2]. Precomputed patterns are stored in a ROM. Each pattern is loaded into a LFSR and the LFSR proceeds to generate test patterns. After a certain number of patterns have been generated, the next pattern is loaded into the LFSR. This can be seen as a multiple seeding of the LFSR, which at the extreme can be used to detect all but a certain number of faults with the first seed, and the remaining seeds are patterns for the undetected faults. One implementation of this idea is suggested in [1], A ROM, counter and additional linear logic is used to generate the patterns of T. The authors note that it is very unlikely that T will be a set for which their scheme will work. Another implementation is suggested in [20]. In this scheme not only is the PG reseeded, but with each seed there is a corresponding feedback polynomial. Decoding logic is used to reconfigure the feedback connections of the register, depending on the encoding of the polynomial. Along similar lines, [13] suggests to divide T into subsets of linearly independent patterns. The linear connections of the LFSR can be modified such that each connection generates a different subset of T. In [44] it is argued that one can do with just one such subset of T which includes the patterns required to detect the random resistant faults. Remaining faults will be detected by the succeeding patterns generated by the LFSR. A different scheme is suggested in [14], where the patterns are stored in a ROM and the outputs of the CUT act as addresses to the ROM. All the above PG schemes cause additional area and delay overhead compared to a LFSR-based PG, although they reduce the size of the test set, in some cases considerably. Another drawback of some of these schemes is that changing the feedback function of the LFSR makes the compaction properties of the resulting circuits extremely difficult to analyze. In this work we try to find short one-seed, pseudo-random test sequences that achieve 100% fault coverage. The term short is relative to the probability of detecting the hardest fault of a circuit. If this probability is p, then short will mean a test length of L = i . By using just one seed the BIST control circuitry is kept at a minimum. We first show that a pseudo-random test sequence of length at most 2L has a high probability of achieving 100% fault coverage. Thus, a random process of 6 selecting a random primitive polynomial and a random seed for a LFSR-based PG is likely to produce the desired test length. By simulating up to 2L test patterns, one knows whether the desired sequence is found or not. If not, another random selection is made. This scheme will be best suited for randomly testable circuits, i.e. those circuits with high values of p. For random resistant circuits we suggest a more sophisticated way of selecting the feedback polynomials and seeds that will produce shorter test sequences and require less computation time. We use the theory of discrete logarithms to embed a subset of test patterns in a LFSR sequence, from which we produce the test sequence for all faults. Our algorithm is based on the fact that a pseudo-random test sequence that detecs the random resistant faults will, with very high probability, include test pat terns for the random testable faults as well. By embedding test patterns only for the random resistant faults the complexity of the algorithm is greatly reduced. At the same time, it requires a method to differentiate between random resistant and random testable faults. Our scheme puts a premium on reducing hardware and delay overhead, whereas previous schemes concentrate on either reducing test time at the expense of area and delay or vice versa. We believe that simplicity and functionality is more im portant, although within the realm of our approach we also minimize test time. In fact, when combined with our approach for zero-aliasing, it is crucial that test tim e be reduced as much as possible. The applicability of these schemes is dependent on the computational effort one is willing to expend and on the time a test sequence is allowed to run. The tests we embed can either be one pattern tests for non-sequential faults, or, using schemes such as in [43], two-pattern tests for sequential faults. As the pattern sequence of LFSRs and one-dimensional Cellular Automata (CAJ with the same primitive characteristic function are isomorphic, [42], our algorithms will also work for CAs. 7 1.2 Response analyzer design The compacting function of a RA is a many-to-one function, hence the issue of aliasing must be dealt with. Much work has been done on computing the aliasing probability. While different researchers considered different error models, all results showed the probability to converge to 2~k, where k is the length of the RA. The first model to be considered was the uniform error model, in which all error sequences for a single output circuit were equally likely. Under this model it is straightforward, using polynomial division, to show that the aliasing probability is 2~k. This error model is not very realistic. Williams et al. [47] considered the independent error model (also referred to as the symmetric error model [48] and the Bernoulli model [38]) for a single output circuit, where the probability for an erroneous output bit is p and the errors are independent of one another. They showed that the aliasing probability converges to 2~k when the degree of the feedback polynomial is k, and that it converges faster for primitive polynomials. Using the same error model, Saxena et al. [38] computed upper bounds on the aliasing probability that are dependent on the length of the test sequence and the order of the feedback polynomial, Lc. For test lengths smaller than the order, the upper bound is (1 + c)/Lc. For test lengths equal to Lc the upper bound is 1, and for lengths greater than Lc the bound is 2/(L c + 1). Xavier et al. used the asymmetric error model for single output circuits. In this model the probabilities of a 1 — > 0 error and a 0 -> 1 error are not necessarily the same, as in the symmetric error model. They used simulation data to show that this error model is more accurate than the symmetric error model. Their experiments tend to show that under this error model, the aliasing probability converges to 2~k. For multiple output circuits, Pradhan et al. [31] used the q-ary symmetric error model in which the probability of no error is p and all 2k — 1 error patterns (at tim e j) are equally likely with probability jFfj- The aliasing probability under this model converges to 2~k. Kameda et al. [22] show that as long as there are at least 2 different error patterns with probability greater than zero and the feedback polynomial is irreducible, the aliasing probability converges to 2~k, independent of the probability distribution on the error patterns. 8 When trying to achieve zero-aliasing, there are two approaches. The first assumes that any error can occur, while the second considers only errors that result from a specific fault model. There are two previous schemes that try to detect any deviation from the good circuit response. The first is by G upta et al. [17]. In this scheme the RA is a LFSR and the compacting function is polynomial division of the good response by the feedback polynomial. The scheme requires the quotient of the good response to be periodic. This is achieved by proper selection of the LFSR feedback polynomial once the good response is known. They give a bound of n j 2 on the length of the required register, for a sequence of length n. The second scheme is by Chakrabarty and Hayes [8]. They use non-linear logic to detect errors in the response. The number of memory cells in their RA is [log n"| but they have no bound on the extra logic required to implement their scheme. The compacting function of a MISR is polynomial division over GF[2]. The effective output polynomial is divided by the feedback polynomial. The signature is the remainder of the division. Our objective is to select a feedback polynomial for the compacting MISR, given a set of modeled faults, such that an erroneous response resulting, from any modeled fault is mapped to a different signature than the good response. The major difference between our scheme and the aforementioned zero- aliasing schemes is that we target a specific set of possible faults and try to achieve zero-aliasing for the error sequences resulting only from these faults. We do not try to recognize all possible error sequences, mainly because most of them will never occur. The fault model lets us focus on the probable error sequences. As a result, we use less hardware than the aforementioned schemes. A previous m ethod for finding zero-aliasing feedback polynomials for modeled faults was presented by Pomeranz et al. [30]. Different heuristics for finding a zero- aliasing polynomial are suggested, but these heuristics will not necessarily find an irreducible or primitive polynomial, which is very im portant if the register is also to function as a PG. They do not give bounds on the resulting or necessary degrees of their zero-aliasing polynomials, nor do they present results on the computational complexity of their methods. 9 For a CUT with few outputs, the available register might be too short to achieve zero-aliasing. In this case we need to lengthen the register by adding memory el ements. To keep the hardware overhead at a minimum, we want to add as few flip-flops as possible, hence we are interested in a feedback polynomial of smallest degree that achieves our objective. When a register is to serve both as a PG and a RA, it is advantageous to have the feedback polynomial of the same degree as the available register, hence we are interested in a feedback polynomial of a pre specified degree. At times, we might want to find a feedback polynomial fast, even if the resulting MISR requires extra flip-flops over the optimum. We assume the following test scenario. The input sequence to the CUT has been designed so that the effective output polynomial due to any target fault is different from the effective polynomial of the good response. Denote the effective polynomial of the good response by r, then the effective polynomial due to fault i can be represent as r + h{. By the linearity of the remaindering operation, we get a different remainder for this erroneous polynomial iff hi is not divisible by the feedback polynomial. We assume we are given the error polynomials for each of the target faults. The problem we deal with is the following: given a set of polynomials H — {h i , /1 2, . . -, h\H\} find a polynomial that is relatively prime to all the polynomials of H. Such a polynomial will be referred to as a non-factor of H. If a non-factor is used as the feedback polynomial for the compacting MISR, zero-aliasing is achieved for the set of target faults. In particular, for irreducible and primitive feedback polynomials we propose to find (1) upper bounds on the smallest degree zero-aliasing MISR; (2) based on the expected number of factors of a given degree for a given polynomial, we refine this upper bound to an expected bound. (3) procedures for selecting a zero-aliasing MISR with the smallest degree; (4) procedures for selecting a zero-aliasing MISR of a pre specified degree (under the condition that one exists); and (5) procedures for fast selection of a zero-aliasing MISR. 10 The computational complexity, as well as the expected complexity, of all the proposed procedures is analyzed, based on the upper bounds for the smallest degree non-factor and the expected bounds. 11 Chapter 2 Pattern generator design In this chapter we deal with the design of efficient and effective pattern generators based on linear feedback shift registers (LFSR). Their effectiveness is measured in term s of generating test patterns for all the faults of interest, and efficiency in term s of minimum test length (time). Both ends will be accomplished by proper selection of the feedback polynomials (configuration) and the initial seed (state) of the LFSR. Our goal is to find short one-seed pseudo-random test sequences that achieve 100% fc. The term short is relative to the probability of detecting the hardest fault of a circuit. If this probability is p, then short will mean L = K By using just one seed the BIST control circuitry is kept at a minimum. We first show th at a pseudo random test sequence of length at most 2L has a high probability of achieving 100% fc. Thus, a random process of selecting a random primitive polynomial and a random seed for a LFSR-based PG is likely to produce the desired test length. By simulating up to 2L test patterns, one knows whether the desired sequence is found or not. If not, another random selection is made. This scheme will be best suited for randomly testable circuits, i.e. those circuits with high values of p. For random resistant circuits we suggest a more sophisticated way of selecting the feedback polynomials and seeds that will produce shorter test sequences in less time. We use the theory of discrete logarithms to embed a subset of test patterns in a LFSR sequence, from which we produce the test sequence for all faults. The applicability of these schemes is dependent on the computational effort one is willing to expend and on the tim e a test sequence is allowed to run. 12 The tests we embed can either be one pattern tests for non-sequential faults, or, using schemes such as in [43], for two-pattern tests. As the pattern sequence of LFSRs and one-dimensional Cellular Automata with the same primitive characteristic function are isomorphic, [42], our algorithms will also work for CAs. The rest of this chapter is organized as follows. Throughout the chapter, n de notes the number of inputs to the circuit under test (CUT). In Section 2.1 we analyze the probability of detecting a fault having 2k+t test patterns, given a test sequence of length L = 2n_fc and we give lower and upper bounds on the probability of detect ing all the faults of interest. In Section 2.2, we assume two different detectability profile models and derive the probability bound values for 100% fc for circuits that abide by these models. In Section 2.3, we find the actual detectability profile for some example circuits, derive the probability of drawing short test sequences and conduct random experiments which validate our analytic results. These sections provide the basis for our claim that short test sequences can be found in acceptable tim e constraints. We then proceed to introduce our procedures for selecting primi tive feedback polynomials and seeds for the LFSRs. In Section 2.4 we introduce the notion of discrete logarithms and show its relation to the sequencing of patterns by a LFSR. In Section 2.5 we state the test embedding problem and propose our solution. Section 2.6 defines and characterizes faults we classify as hard faults, which are of m ajor importance to our algorithm. In Section 2.7 we present experimental results. We present our conclusions in Section 2.8 and some miscellaneous issues in Section 2.9. 2.1 The probability of 1 0 0 % fault coverage with a random test sequence of length L Consider a combinational circuit with n inputs. In this section we address the following two questions. 13 Given a fault with K = 2k test patterns, out of N = 2n possible input patterns to the circuit, what is the probability that a random test sequence of length L = 2n~k does not detect the fault, assuming the patterns are drawn with no replacement? W hat is the probability that a random sequence of length L does not detect a fault with 2k + % test patterns, where i > — 1? To answer these questions we define the function pnd{t, L) which is the probability that a sequence of length L does not detect a fault with t test patterns. Hence, we are looking for the values of pnd(t, L) when t equals 2fc+ * for i > — 1. The total / N \ number of sequences (with no importance to order) of length L is f 1. Of those, the number of sequences that do not detect a given fault with t test patterns is N - t . Hence, Pndij't -f') — N — t L N L Writing this expression in factorial form (N — t)\(N — L)\ P nd(t, L) ( . / V — t — L)\N \ ■ After cancelling the appropriate terms (N - L)(N - L - 1) • • • (N - L - t + 1) Pnd(t ? T ) — N(N — 1) • • • (N — £ + 1) N - L\ [N - L - 1 \ ( N - L - t + 1 N ) \ N - 1 J V N —1 + I L \ / L \ / L 14 For t = 2k = I<, pnd(2k,L) < ( l - j f ) - 04) K K < For t = 2k- 1 = Kf 2, Pnd(2k~ \ L ) < < 04) ( 4 iS"l For t = 2k+i = K • 2*, * > 1 pnd{2k+i,L) < < 04) (4 J s T 2' In general, for t — w K Pnd{ t , 4 < 04) a: < (2.3) In most cases of interest to us, t will be (much) less than L. For these cases, this upper bound is tight. By Equation (2.1) Pnd{i") L') > ^1 N - t + 1y 15 hence pnd(t,L) > 1 and by the assumption of t < L P nd(t, L) > ^] K - ^ 1 \* K - 1 For t — wK, this becomes Pnd (^) ■ £') -'> K - 1 K (2.4) When K = 32 and for K = 64 pnd(t, L) > (0.9519 Pnd{t,L) > (o.9762^)™. Combining Equations (2.3) and (2.4) under the assumption t = w K < L results in 1 '* 1 ^ Pndij'y ^ 0- f ) K and as K increases, the bounds become tighter. We defined the sequence length to be L = If the sequence length is doubled, the probability of missing a fault with t test patterns is squared. By Equation (2.2) pnd(t,2L) < ^1 2L\* N J \ K/2 K /2 ) 2 w Similarly, if the test sequence length is halved (L /2 ) the probability is the square root of its value for a sequence of length L. 16 Using the values of p nd(2k+t,L ) , we would like to bound from above and below the probability that a sequence of length L detects all the faults of interest. We do this by taking for each value of t the closest power of 2 greater or equal to t and the closest power of 2 less than or equal to t. By considering the power of 2 greater than t we derive the upper bounds, and by considering the power of 2 less than t we derive the lower bounds. Let F = { / i , / 2 , • • • ,/s} be the set of all the faults of interest in the CUT. Let tt be the number of tests for fault /, and let A ; , - = [log C] • Let fc m,n = min^A;;} and kmax = max,{A;,}. Group the faults into subgroups Ckmin,C kmin+1 , •. •, Ckmax, where fi G Cj iff h = j. Let k = kmin. We are interested in finding the probability that a random sequence of length L detects all the faults of F . Let pj be the probability that a fault in Cj is not detected by the sequence. Since a fault in Cj has between 2J_1 and 23 test patterns, the probability pj is lower bound by p j > p nd ( 2J, L ) . We know that > P nd(2J, L ) . The value of p j is either greater than or less than or equal to -2 /rr • Our goal is to get a pessimistic upper bound on the probability of detection, hence we will assume that The probability, qj, that a fault in Cj is detected by the sequence is upper bound, by our above assumption, by qj < 1 - - ^ rr - el Notice th at without our assumption, the upper bound on qj can be greater. Hence, the probability, q, that a random sequence of length L detects all the faults of F is upper bound by k m a x / 1 \ l c> l < ? < I I ( 1 “ ) • j=k V e / In the above expression we assume that the detectabilities of any two faults are independent. 17 Similarly, we can define dt = [ l o g U\ and the subgroups {F 3} where f x € Fj iff di = j. Let d = min;{d,} and dmax = maXj{d;}, then d is either k or k — 1 and dmax is either kmax or kmax — 1. The probability rj that a fault in Fj is not detected is upper bound by 1 n < e* - . hence the probability, Sj, that a fault in Fj is detected is lower bound by Thus, the probability q of detecting all the faults of interest is bound by dmax / 1 \ kmax / 1 \ \^j\ •max / 1 \ j K m a x / n < * < n ( i- ^ ) • (^) 3—d 3—k The actual value of q will be closer to the upper (lower) bound when for most of the harder faults the number of test patterns is closer to the power of 2 from above (below). The super-exponential decrease in the probability of not detecting a fault, as we move from Cj to C j+1 , allows us to consider only the first few subgroups of faults when estimating the probability q. This is an analytic explanation to a similar observation made in [39]. To see this, consider the following question. How many faults Xj, with 2J+1 test patterns are needed such that the product of their detection probability equals the detection probability of a single fault with 2J test patterns? This can be written as X j — n(X - I ^ ) 6 « a » + » - 0 Substituting y for ev k we get (' - ;) _ (' - :) X j — + ( 2 ' 6 > 18 The power series expansion of ln(l + z), where — 1 < z < 1 is 1 2 ' F1 _ L ~ 3 ~Z + — Z Substituting 2: for ^ in Equation (2.6), and expanding to the first 2M term s, we get Xj E 2M * = 1 * E 2 J V Z i = 1 '.M £ Jt=l f Z + z 2 2 + z3 3 _J_ z4 4 + . . . Z + z 2 2 + z5 3 + z 7 4 + • • • > - = e z 2J ~ For j — k = — 1,0,1,2,3 and 4, the values of xj are greater than 2(> e2 1), 3(> e), 7(> e2), 55(> e22), 2981(> e23), and 8886114(> e24) respectively. This means that when computing the bounds on q, the significance of one fault with 2k+4 test patterns to the probability of success is greater than the contribution of e1 6 faults with 2fe + 5 test patterns. Equivalently, the probability of detecting a single fault with 2k+4 test patterns is less than detecting e1 6 faults with 2fc + 5 test patterns. Thus, the faults in the subgroups Cd+5, . . . , Cdm ax and Fk+5, F km ax do not have to be taken into account when analyzing the probability of success. 2.2 Detectability profile models The actual value of the probability q depends on the distribution of faults in the different subgroups. To get an idea of this value as a function of the number of faults in Ck and Fd, we assume two detectability profile models, both placing an emphasis on the first few subgroups. These models will let us approximate bounds on the probability q without considering the detectability profile, but only the total 19 num ber of faults and the number of faults in Fd and Ck- We later use the profiles of actual circuits and the results demonstrate that our two models are very pessimistic, i.e. we' should expect better results from actual circuit distributions than from the model approximations. The first model we consider is the exponential model. In this model we assume d = k — 1 and the following distribution: 1^1 = u \Ck\ = u 1^4-11 = 1 ue 2 \Ck+i\ = ve l ^ +2| = \Fd+i\e ICW2 I = \Ck+i\&2 l^+ 3| = \Fd+2\e2 \Ck+3\ = |CW2|e4 I Fd+4 1 = l^ + 3|e4 |CW4| = |CW3|e8 Itt+sl = li^-nle8 |CWs| = |CW4|e1 6 The idea behind this model is that the probability of detecting all the faults in Fd+i (Ck+i) is the same as detecting all the faults in Fd (Ck)- Using this model, we get The same is true for the subgroups {Ci}. Thus, we can approximate q with the bounds •max 20 The values of k m a x — k and d m a x — d can be bound as follows. The number of faults in each of the subgroups C k + i ( k + i < k m a x ) is i>e2'-1. The last subgroup, C k m ax, may not be completely full, hence I r-il 2 r n a x ~ |F | < ve thus, km ax k ln jF j < I n n + 2 k m a x — k > log (In |F | — In v). Similarly, the subgroup Fdm a x - 1 is full, hence o^rnax— d— 1 _ ■ , |F | > ue 2 and hence d m a x - d < log(2(In \ F \ — lnu) + 1) + 1 ( l ^ k y lM2ilnlFh'nU) + 1)+1) < < ^ 1 F|-In«) Setting, for example, u and v to equal 10, and |F | = 5000, we get 9 • 10~2 ° < q < 6.61 • 10"6. By doubling the test length the probability bounds become 4.34 • 10-6 < q < 2.28 • 10-2. 21 The second distribution is the linear distribution, in which Iftl = U , \Ck\ = V Ift+ll 1 — e*u > ICWi I = ev IC m l + £ H n I I > ICW2 I = e\Ck+i\ \f m \ — e\Fd+2\ ) | Ck+31 — e\Ck+2\ t ^i+41 = e\Fd+3\ , ICTm I = e\Ck+z\ 1 Fd^-5 | = e|Fd+4| , |C 7 fc+5| — e | C fsdr 41 Using this distribution km ax k |F | = » £ i— 0 W ith this distribution, the product of the probability of detecting all the faults in Fd+2+i (o < i < 3) is (much) greater than the probability of detecting one fault in Fd+2+i-i■ The probability of detecting all the faults in Fd+i is greater than the probability of detecting all the faults in Fd. The same applies to the faults in the subgroups {Cj}. Thus, we can approximate the bound on the probability of detecting all the faults by Assuming u — v = 10, |F | is irrelevant in this case, 7.91 • 10“9 < q < 10"4. If we double the length of the test sequence, the bounds become 10"4 < q < 5.45 • 10"2. The upper bounds, for both models, on the probability that a random test se quence of length 2L achieves 100% fc is better than These results, of course, are dependent on our choice of values for u, v and F. Irrespective of these values is the effect that doubling the test length has on the probability of 100% fc. This effect is a result of the exponential decrease in the probability of not detecting a fault. While these two detectability profiles seem artificial, experiment circuits show them to be very pessimistic, i.e. actual profiles give rise to detection probabilities that are better than the models ' bounds. 2.3 Experimental results on finding short pseudo random test sequences We synthesized 13 circuits from the Berkeley [ 6 ] benchmarks as multilevel circuits. For each circuit we found its k value. Using a modified version of the ATALANTA [18] test generation system, we generated a list of all the non-equivalent faults of the circuit and proceeded to generate all possible test patterns for each fault. If the pattern count exceeded 2 fc+5, we discarded the fault and stopped the test generation procedure for the fault. Otherwise, we recorded the number of test patterns for the fault. Having iterated through all the faults, we found the respective sizes of the subgroups {Fj} and {Cj}. We then used Equation (2.5) to compute upper and lower bounds on the probability of finding a test sequence of length L = 2n~k which detects all the faults. These results are presented in Table 2.1. The first row for each circuit is the number of faults in the subgroups Fk-i through Fk+ 5 and the second row is the number of faults in the subgroups Ck- 1 through Ck+5 - The column labeled q in each row is the bound derived from the row using Equation (2.5). In the first row is the lower bound on q and in the second the upper bound. The column labeled lin. bnd. represents the bounds derived using the linear detectability profile model (Equation (2.7)). The values of u and w , the number of faults in Fk-i and Ck, respectively, are taken from their entries in the table. When u = 0, as is the case for the first eight circuits, we cannot calculate the lower bound, hence the entry is left blank. Notice that only for circuit m5, where the num ber of faults in Ck was only 1 , did the model give a bound that was higher than the one given by Equation (2.5). We also computed the bounds on the probability that a sequence of length 2L will detect all the faults. The results are in Table 2.2. Having computed the probability bounds, we conduct 100 experiments (for circuit 23 Table 2.1: Fault distribution and probability bounds for a test sequence of length L circuit k - 1 k k+ 1 k + 2 k + 3 k -f 4 k + 5 q lin. bnd. bcO 0 10 18 9 21 30 274 0.00063 0 10 14 10 17 30 281 0.0011 0.0001 b3 0 3 6 20 18 33 138 0.0725 0 3 3 12 16 20 164 0.1301 0.0638 chkn 0 5 18 37 30 31 127 0.0037 0 4 5 30 34 23 152 0.0438 0.0255 cps 0 25 33 39 172 131 269 4 * 10~8 0 25 33 35 163 101 312 4.2 * 10~8 1 . 1 * io - lu exep 0 12 27 42 54 80 75 3.6* 10-5 0 12 22 37 42 72 105 8.3 * 10~5 1.6 * 10~& in3 0 3 7 9 2 2 193 0.0772 0 3 3 5 8 2 195 0.1485 0.0638 in4 0 3 6 22 25 32 158 0.1078 0 3 3 12 22 25 181 0.1298 0.0638 in5 0 1 14 46 55 49 56 0.0346 0 1 12 32 42 39 95 0.0602 0.4 in7 0 5 0 0 7 9 77 0.1007 0 5 0 0 7 2 84 0.1007 0.01 vg2 5 9 1 9 9 0 33 0.0001 8.9* 10"5 0 12 3 0 9 9 33 0.0026 1.6* 10~5 vtxl 8 12 0 9 15 17 12 2 * 10“6 3.3* 10~7 0 16 4 0 9 15 29 3.6* 10“4 4.2* 10"7 xldn 8 12 0 9 15 17 12 2 * 10~6 3.3* 10"7 0 16 4 0 9 15 29 3.6* 10~4 4.2 * 10“7 x9dn 8 6 18 20 5 9 17 1.8* 10"6 3.3* 10“^ 0 14 2 27 9 5 26 7.3* lO"4 2.6* 10“6 24 Table 2.2: Probability bounds for a test sequence of length 2L circuit lower bound upper bound bcO 0.167 0.18 b3 0.5747 0.6091 chkn 0.3422 0.5045 cps 0.0141 0.0142 exep 0.1045 0.1149 in3 0.3207 0.6106 in4 0.5743 0.6091 in5 0.6573 0.6852 in7 0.4833 0.4833 vg2 0.0267 0.1650 vtxl 0.0044 0.0907 xldn 0.0044 0.0907 x9dn 0.00759 0.1247 chkn only 50 experiments were conducted) of random selections of polynomials and seeds to produce 100 test sequences. We ran each sequence for at most 2L patterns (for circuit chkn at most 1.2L), stopping whenever 100% detection was achieved. We recorded the number of random sequences of length at most L and of length between L and 2L that detected all the faults. The results are in Table 2.3. The first column shows the expected number of sequences of length at most L that detected all faults. These numbers are based on the probability values from Table 2.1. The second column shows the actual number of such sequences. Column three shows the number of sequences of length greater than L and at most 2L th at detected all faults. Column four gives the total number of sequences of length at most 2L that detected all faults and column five gives the expected number of such sequences, based on the probability values from Table 2.2. For 24 cases (omitting circuit chkn), we have both expected and actual results. For 11 of the 24 cases, the actual results were in the expected range. Omitting the 5 cases in which expected and actual results were 0, of the remaining 6 cases, in 5 the actual result was closer to the higher expected value and in 1 case it was closer to the lower expected value. Of the 13 cases in which the actual value was not in the expected range, in 10 cases the actual results were higher than the high end of the expected range. 25 Table 2.3: Number of random sequences of length less than (2)L, that detected all faults circuit expc< L < L L < ,< 2L < 2 L expc< 2L bcO 0 1 23 24 16-18 b3 7-13 12 49 61 57-61 chkn 0-4 1 2 * 34-50 cps 0 0 36 36 1 exep 0 0 14 14 10-12 in3 7-14 0 50 50 32-61 in4 10-13 17 47 64 57-61 in5 3-6 8 64 72 66-69 in7 10 9 37 46 48 vg2 0 0 18 18 2-17 vtxl 0 0 6 6 0-9 xldn 0 0 2 2 0-9 x9dn 0 2 10 12 0-12 Our first conclusion, based on the empirical results, is that the probability bounds given by Equation (2.5) are fairly accurate. The fact that the actual results were usually in the high end of the expected range can be explained by the fact that (1) we used pessimistic assumptions to derive Equation (2.5) and (2) certain correlations between the detectability of some faults may exist but were not considered. Our second conclusion from both the analytic and empirical results is that the bounds given by the linear distribution model are overly pessimistic and Equation (2.5) will give more optim istic results. Given that the actual results tended to be in the high end of the expected range, we conclude that with only a few random selections sequences of length at most 2L can be found that produce 100% fc. 26 2.4 Discrete logarithms and LFSR sequences Up to this point it was shown that the probability that a pseudo-random test se quence of length at most 2L achieves 100% fc is typically greater than In the sequel we show a more sophisticated way of selecting the primitive feedback poly nomial and seed of the LFSR-based PG which finds shorter sequences in less or competitive time. Given a specific feedback polynomial, we guide the search for the optimal seed by using the theory of discrete logarithms. When initialized to a non-zero state, a n-stage LFSR with a primitive feedback polynomial cycles through all 2n — 1 non-zero binary n-tuples. W hen the feedback polynomial is changed from one primitive polynomial to another the order in which the patterns appear changes. Hence, a n-stage LFSR with a primitive feedback polynomial defines a permutation over the non-zero binary n-tuples, each polynomial corresponding to a different permutation. Given a subset of binary n-tuples, the minimum subsequence of a LFSR needed to cover all the tuples of the subset will vary depending on the perm utation defined by the feedback polynomial. If the position of the tuples in the perm utation was known, it would be straight forward to find the minimum covering subsequence. The position of a tuple in a sequence can be obtained from the theory of discrete logarithms. Let f ( x ) = Y17=ofix % — xTl + h(x) be a primitive polynomial of degree n over GF[2] and let a be a root of / . The non-zero elements of GF\2n] can all be expressed as distinct powers of a, i.e. \j(3± 0 € GF[2n], 3 0 < j < T - 2 s.t. (3 = (2.8) Since o; is a root of / we have an = h(a), hence (3 can be represented as a unique non-zero polynomial in a of degree less than n. If j3 is the j-th power of a , j is said to be the discrete logarithm of (3 to the base a. The discrete logarithm problem can be stated as follows: given a (equivalently, given / ) and j3, find j. 27 Figure 2.1: A LFSR with the feedback polynomial f(x ) = x 4 + x -f 1. Discrete logarithms relate to the order in which patterns are generated by a LFSR as follows. Consider a LFSR with / as its feedback polynomial, as in Figure 2.1. We number the cells D0, D i, ..., Dn_i, with the feedback value coming out of cell D n - 1 and feeding D{ iff /, = 1. Consider the following mapping between non zero patterns of the register and non-zero polynomials in a of degree less than n. A “1” in cell Dj represents o*. Since every non-zero element of GF\2n] corresponds to a unique polynomial in a of degree less than n, the non-zero elements of GF[2n] correspond to the patterns of the register. Assume the initial state has a “1” in the least significant cell and 0 in all other cells. A shift will move the “1” to the second cell, corresponding to multiplication by a. After the n-th shift, the “1” is fed back, corresponding to substituting h(a) for an. Thus, the j-th pattern, interpreted as a polynomial in a , represents the element a3. If we want to know when a pattern will appear we need to find the discrete logarithm of the element corresponding to the pattern. We considered two algorithms for solving the discrete logarithm problem. The first is due to Pohlig and Heilman [32] and the second is due to Coppersmith [9]. An excellent expository of these two algorithms and of the discrete logarithm problem can be found in [28]. To avoid a lengthy discussion on the exact analysis of the computational complex ity of these two algorithms we refer the reader to the above references. We just state the issues which influenced our decision to prefer the Pohlig-Hellman algorithm over Coppersm ith’s. While Coppersm ith’s algorithm has an asym ptotic run tim e which is better than the Pohlig-Hellman algorithm, it is also much more complex. The computational complexity of the Pohlig-Hellman algorithm is dependent on the size and multiplicity of the prime factors of 2n — 1 (the num ber of non-zero elements in the field of computation). When the number of circuit inputs is less than 64, which 28 is true for the cases we are targeting, the prime factors of 2n — 1 are small enough (except for n = 62,61,59, 49,41,37,31) for the Pohlig-Hellman algorithm to run faster than Coppersm ith’s. 2.5 The test embedding problem In this section we define and present an algorithm to solve the test embedding prob lem. We analyze the computational complexity of our algorithm and discuss a strong lim itation. We then present a way to overcome this lim itation. The test embedding problem is defined as follows. Given (1) a set of faults F = { / i , . . . , / s}, (2) a set of test patterns T = with T; = { ^ ,i,. . . , being the set of all tests for fault and Utj being a binary n-tuple, and (3) a primitive polynomial, p, of degree n, find a minimum length subsequence of patterns generated by the corresponding LFSR that includes at least one test pattern for each /,• in F. When given a set P = {px(x),... ,pu{x)} of primitive polynomials we would like to select the polynomial that generates the shortest such subsequence. Having found the polynomial, the initial state of the LFSR and the sequence length, we have embedded a test set for F in the sequence that will be generated by the LFSR. The resulting test sequence is referred to as the embedded (test) sequence. The faults in F are referred to as embedded faults and the test patterns of T are the embedded test patterns. To solve this problem we propose a two-stage procedure, referred to as the em bedding procedure (EP). The first stage is referred to as the discrete logarithm stage and the second stage is referred to as the windowing stage. In the discrete logarithm stage we first find the logarithm of all the test patterns tij with respect to a root a of the primitive polynomial p. We then sort the loga rithm s and for each logarithm we create a list of faults detected by the corresponding pattern. 29 In the windowing stage the idea is to use a sliding window on the cycle of loga rithm s, identifying those windows that detect all the faults, and selecting the smallest window (or one of the windows of shortest length when there are more than one). The outline of the procedure for this stage is given in Figure 2.2. The array covered[] keeps track of the number of patterns in the window that detect each fault. The counter not-covered keeps count of the num ber of faults that are not detected by the patterns in the window. The array logJable[] stores the ordered logarithms, while the array pattern-table\\ stores the patterns corresponding to the ordered logarithms. The procedure begins with the window containing the first logarithm. It is then extended, until all the faults are detected. The size of the window is recorded. It is then compared with the previous best and the smaller of the two is kept. At the end of each iteration the tail is advanced. In the following iteration the head is adjusted, if necessary, so that the window detects all the faults. The procedure term inates when all the logarithms have been considered as tails. Denote the ordered logarithm cycle by lg\ylg2,..., lg\T\ where Igi is followed by Igi+i and lg^\ is followed by lg\. Denote the window whose tail is Igi by Wi and denote the head of by hi. We have the following Lemma. L e m m a 1: The head of tu,+1, h{+i, is in the subcycle [hi. .. Igi+i), i.e. it is not in the subcycle [lgi+i ... hi). P roof: Assume /il+i is in the subcycle [lgi+i ... hi), then the window [Igi... hi+ 1] detects all faults, contradicting the minimality of Wi. □ As a result of Lemma 1, Procedure windowQ finds the shortest window associated with each tail. Since the procedure considers windows beginning at all the logarithms and considers the logarithms of all the test patterns for all the faults, the procedure finds a smallest subsequence that detects all faults. The complexity of Procedure windowQ is a function of the num ber of tail and head movements taken. There are at most |Xj logarithms, hence at most |T | tail movements. Each window contains at most |Tj logarithms, hence there are at most 2|T | head movements. For each tail or head movement there are at most |F | ac counting operations that are needed, hence the complexity is 0 ( |T ||F |) . 30 P ro c e d u re 1: window(n, # -o f~ flts, j^-of dogs, log -table, pattern-table) 1. best = 2n - 1 2. for (i = 0;i < # - o f - fl t s ; i + + ) (a) covered[i] — 0 3. not-covered = # - o /-fits 4. end = # -o fJo g s — 1 5. for (start = 0; start < # - o f Jogs; start + + ) (a) tai/ = logJable[start] (b) whi\e(not-covered > 0) i. head = log-table[(+ + end) (mod jf-o f -logs)] ii. for all faults j covered by head A. if(ccwered[j] = = 0), not-covered----- B. covered[j] + + (c) size = head — tail + 1 (mod 2n — 1) (d) if best > size i. best = size ii. seed = patternJable[start] (e) for all faults j covered by tail i. if(couered[j] = = 1), not-covered + + ii. covered[j] ---- Figure 2.2: The procedure for the windowing stage of EP 31 The complexity of EP is given by 0 (P P D L + \T\(DL + log \T\ + |^ |) ), where P P D L is the preprocessing effort required for the Pohlig-Hellman algorithm, DL is the tim e required for one logarithm computation and \T\ log |T| is the tim e required to sort the logarithms. As mentioned in the previous section, for the cases we are targeting P P D L and DL will not require much effort. The bulk of the work required by P P D L is the construction of a hash table of size 2 • X D£li qi, where m is the num ber of distinct prime factors of 2n — 1 and the qfs are the distinct prime factors. Each entry involves one polynomial multiplication and one polynomial reduction modulo p and one in sert operation into the table. The work required for one DL operation is 0 {m n ) polynomial multiplications and reductions modulo p and m integer multiplications and reductions. The m ajor factor in the complexity expression is the number of test patterns, |T |, which might be overwhelming, rendering the algorithm impractical. We must, therefore, find a way to limit the size of T. This is done in two ways, based on a user-set limit on the effort allowed for embedding a fault. The first is by considering only a subset of all possible faults, those which will be classified as hard. For some circuits the set of hard faults will be empty. These circuits are classified as randomly testable circuits and no embedding is done for them. A short test sequence for these circuits is found by random selections. The second is by limiting for each hard fault the number of test patterns that will participate in EP (whose logarithms we compute). This limit will typically apply to only a small subset (if at all) of the hard faults. In the next section we describe our heuristic for identifying hard faults. 2.6 Identifying hard faults The amount of work needed for EP is strongly affected by the num ber of test pat terns. We have to modify the procedure in a way that will reduce the test set to a manageable size. The modification is based on partitioning the set of faults in a circuit into two sets - randomly testable faults and random resistant faults, with each set being possibly empty. By randomly testable faults we mean faults th at can be 32 detected by choosing a primitive feedback polynomial and an initial seed at random, and using a test sequence of acceptable length (this term will be defined later). By random resistant faults we mean faults that cause the test length (for 100% fault coverage) to dramatically increase beyond the acceptable length when the feedback polynomial and the initial seed are chosen at random. This partition is justified by the super-exponential decrease in the probability th at a fault is missed by a random test sequence as the number of test patterns for the fault is doubled. Our thesis is that by proper identification of the random resistant faults (referred to as hard faults in the sequel), and by embedding test patterns for each of these faults in a LFSR subsequence, the subsequence will also detect all the randomly testable faults. As was shown in Section 2.1, the probability that a random sequence does not detect a fault in Ck or Ck+i is (much) greater than the probability that the sequence does not detect any other fault, hence any procedure that identifies hard faults must be able to identify the faults in both these subsets. We refer to the union of Ck and Ck+i as HF. Before presenting our procedure for identifying hard faults, we present a brief discussion on the applicability of our algorithm. Every circuit can be associated with param eters (n, k ). These param eters determine the work factor for EP and the predicted length of the embedded test sequence. We expect the embedded sequence to be of length between L j 2 = 2n~k~1 and L = 2n~k. W hen embedding test patterns we ensure that in the resulting test sequence the embedded faults will be detected and we rely on chance that the non-embedded faults will also be detected. It follows that we must embed the faults in H F. Assuming little overlap between the test sets for each of the faults, the embedded test set is at least of order |Tem (,| = 2k • \Ck\ + 2fc + 1 ■ \Ck+i\- The size of Temb increases with k , whereas the predicted test length decreases. The applicability of our algorithm depends on the amount of preprocessing work (embedding) we are willing to do and the amount of tim e we are willing to allow the test session (test length). 33 The bound we place on the amount of pre-processing work determines what we consider as a hard circuit and what we consider as an easy circuit. If we allow each embedded fault at most 2s test patterns, then for a circuit for which k = S we expect to find an embedded sequence of length longer than 2n~s~1. Thus, if for a given circuit we find a random test sequence of length less than 2n~s~1, we consider the circuit to be randomly testable, i.e. easy. In our experiments we chose to allow each embedded fault at most 21 3 — 21 4 test patterns. This led us to classify circuits as easy when we found a random test length that was less than 2n~1 5 (the acceptable length). The bound we impose on the test length determines whether or not the resulting embedded sequence is applicable to a circuit. For example, if for a 32 input circuit we find that the hardest faults have 26 test patterns, we expect to find a test sequence of length 22 5 — 226. If the maximum allowed test length is 220, then our scheme is not practical for this circuit. On the other hand, no other scheme that uses just one seed and no reconfiguration of the LFSR will detect 100% of the faults within the imposed tim e constraints. We denote the maximum allowed test length by 2max:. In the remainder of this section we present our heuristic for identifying hard faults followed by a probabilistic analysis of its effectiveness. The experimental results validate the proposed process. 2.6.1 T h e h eu ristic The process by which we identify hard faults (referred to as IHF) is m ade of three phases. In the first phase we determine which of the faults of interest are redun dant. They are eliminated from the set. The second and third phases are based on simulation (sampling) experiments. The goal of the second phase is to determine an estim ate I for k. The goal of the third phase is to identify the random resistant faults using our estim ate for k. The second phase should be a fast process that comes as close as possible to identifying k. The outline of this phase is given in Figure 2.3. The procedure determines a cutoff value, undet, which we set to be the minimum between 50 34 P ro c e d u re 2: second-phase(circuit,k,m ax,#irredundant-flts) 1. undet = min{50, 0.05 • #irredundant-flts} 2. for (j = n — 15 ; j < max ; j + +) (a) N D = { faults not detected by a random sequence of length 2J } (b) if((i = n — 15) and (N D = < f> )) return(-2) /* the circuit is easy */ (c) if |iVD| < undet, break 3. if [j > m ax ) return( — 1) /* the expected sequence length is unacceptable */ 4. U N D = N D 5. for(i = 0 ; i < 4 ; i + + ) (a) N D = { faults not detected by a random sequence of length 2J } (b) U N D = U ND u N D 6. t = min,-{|Ti|}J-^'D^ where T ,- is the set of all test patterns for the i-th fault in UND. 7. return([logt]) Figure 2.3: The procedure for the second phase of identifying hard faults 35 and 5% of the irredundant faults. It applies random test sequences on the circuit, with increasing lengths, until a sequence that misses at most undet faults is found. The initial sequence length is 23 = 2n~15. It is understood th at (n — 15) < m ax, otherwise the procedure will not result in an embedded test sequence we are willing to use, hence there is no sense in carrying it out. If during this iteration a sequence is generated that detects all faults then a satisfying embedding for all the faults is found and the circuit is considered easy. Otherwise, j is incremented and another fault simulation is performed. This process is repeated (each tim e simulating all faults) until the num ber of undetected faults is less than undet or until j is greater than max. If j > max the procedure stops, it will not produce a satisfying test sequence. Otherwise four more fault simulation experiments of length 23 are conducted. For each of the five experiments the set of undetected faults is recorded and at the end of the experiments the union of these sets is constructed. For each fault in the union all test cubes are generated in order to find the fault with the fewest number of test patterns. 1 If the number of test patterns for this fault is f , then the estim ate of k is I — [log <]. R e m a rk 1: Although test generation is considered a hard problem, and, a for tiori, generation of all test patterns, this proved to be a very low cost task in our experiments, where the faults of interest were single stuck-at faults. After the execution of Procedure second-phase (), a circuit is classified as either easy or hard. It is classified as easy if one of two conditions is met: (1) a random test sequence of length less than 2 ” - 1 5 was found th at detects all faults; or (2 ) the estim ate I is greater or equal to 15. In all other instances a circuit is classified as hard. For circuits classified as hard, we turn to the third phase. We run a set of fault simulation experiments (we ran 20) with L j 2 = 2n~l~1 test patterns. The set of faults which are not detected by at least one experiment constitute the set of embedded faults (EF). These are the faults for which we embed test patterns. It is 1A test pattern is a binary vector, indicating exact values in every position, whereas a test cube is a ternary vector, the third symbol being a d o n ’t c a re symbol. 36 essential that the faults in E F include all the faults of H F. The param eters used in selecting the number of simulation runs in phases two and three were chosen to ensure that the probability that E F includes H F is very close to 1 (this is shown in the next section). If a fault / is included in E F , we say it is classified as hard. Otherwise, we say it is classified as easy. R e m a rk 2: We conduct 5 fault simulation experiments in phase two in order to increase the probability that at least one fault of Ck will be in the union of the undetected faults. It is im portant that I be as close as possible to k because this will reduce the number of faults we have in E F (the smaller I is, the longer the simulation lengths in phase three are, decreasing the probability of not detecting a random fault, hence fewer faults are in E F ), thus reducing the num ber of test patterns we need to embed. While this also reduces the probability of detection for some of the faults (as will be seen in the next section), it is a tradeoff that m ust be made. In the sequel, whenever EP is mentioned, it is understood th at IHF was used in a preprocessing step (stage 0 if you will) to create a reduced target fault set. 2 .6 .2 P r o b a b ilistic a n a ly sis o f th e h eu ristic The purpose of IHF is twofold. First, to find all the faults in H F . Second, by our thesis, embedding test patterns for these hard faults results in a test sequence that detects all the faults of interest. We must answer two questions regarding our heuristic. 1. W hat is the probability that a fault that should be classified as hard (i.e. has less than 2 fc+1 test patterns) is classified as easyl 2. W hat is the probability that an easy fault will not be detected by the embedded sequence? 37 In answering these questions we assume that the embedded sequence is made of a totally random set of patterns, although, given that petterns are generated by a LFSR, this might not be completely accurate. In answering the first question we compute the probability that a fault with t test patterns will not be classified as hard by IHF. We will be interested in the probability value when t is less than or equal to 2i+1, where I is the IHF estim ate for k. Let Pmiss(2n~l~1 ^ t) be the probability that a test sequence of length 2n~l~x does not detect a fault with t test patterns. This would require th at the first pattern not detect the fault, the second not detect the fault, and so on. Assuming the patterns are completely random, uncorrelated, with selections done without replacement and the all zero pattern does not participate in the drawing of patterns (and is not one of the test patterns), we have „ M - i - i ^ - t2" " 1) - * ( 2 " - 2 ) - < ( 2 " - 2 - t ’ > ( 2- - 1) ' ( 2" - 2) (2” - 2”- i- 1) ~ ~ ( 1 (2n — 1)) ( 1 (2” — 2 )) ( 1 (2n — 2n_,_1) ) Let pdetect(2n~l~1, t) denote the probability that a sequence of length 2n“/_1 de tects a fault with t test patterns, then PdeteM1 1 - 1 - 1 ,t) = l - Pmi„ (2r-l- \ t ) . Let # s im denote the number of simulation experiments. Then paii-det(2n~l~1 ,t), the probability that a fault with t test patterns is detected by all the experiments is Pall-det(2n~1 ~1 ,t) = C Pdetect{T-1 - 1 . Paii-det{2n~l~l ,t) is also the probability that a fault with t test patterns will not be classified as hard. This probability can be increased or decreased by changing the num ber of simulation experiments. In our experiments, pau_det{2n-,-1,2*) ranged 38 from 7.9 • 10~ 9 to 10~8. paiudet(2n-;_ 1 ,2 /+1) ranged from 10~ 4 to 1.25 • 10-4 . This suggests that with very high probability E F will include all the faults in H F . E F will typically include some other faults, those with higher detection probabilities than the faults in H F , thus although E F might vary from one run of EP to another, all sets will share the same core of “hardest” faults. We turn to the second question, namely what is the probability, pnd(el,t), that an embedded sequence, es, of length el does not detect a fault / , where there are t possible test patterns for / . We first answer the complement question, th at is, what is the probability that es detects / . prob(es detects / ) = prob(es detects / | / is classified hard) ■ prob(f is classified hard) + prob(es detects / | / is classified easy) ■ prob(f is classified easy). By construction, es contains at least one test patterns for each of the hard faults, hence prob(es detects / |/ is classified hard) = 1 . By definition prob(f is classified hard) = 1 — pa/i_det(2"~,_1,t). The probability prob(es detects / |/ is classified easy) is the probability th at a ran dom sequence of length el detects / . By definition prob(f is classified easy) = pa//_det(2n_i_1 ,t). Thus, prob(es detects / ) = 1 • (1 - pa//_de<(2n-;_1, t )) + pdetect(el, t) ■ paii^det{^n~l~l , t) 39 = 1 - Pall_det(2n- l~ \ t ) + Pall-det (2n _ 1, t) - P m is s ^ li^ ) " Pall-deti^ jt) — 1 Pmissfel, f) • Pall— deti^2 , f). Hence Pmiss(e/,t) * PaU-Jet(2n 1 1,t). From the above expression the probability of / not being detected is the probability that / is classified as easy (paii~det) and a random sequence of length el fails to detect / (P The probability pnrf(e/,f) is a product of two functions of t. But whereas one of these functions (paiudet) increases with t, the other (pmiss) decreases. We focus our attention on the behavior of pn c i as a function of t. As will be seen in Theorem 2 , Pnd has a maximum value and no local maxima, i.e. as t increases, pnd rises, reaches a maximum value and decreases from there on. The overall probability that the embedded sequence detects all the faults is dependent upon the probability of occurrence of the faults whose number of test patterns is in the peak area. The values of pnd(e/,t) for t = 2- 7 , I < j < I + 5, from our experiments are tabulated in Table 2.10. The peaks can clearly be seen. Let N = 2”, r = 2n_,_1, q = el and s = # sim . We can rewrite pnd as follows . NZ3+1 f v - t \ T Nzr+1 f u - t \ P nd(N ,q,t,r,s)= [J \~~7T~) II ( " V " ) V = N ..— A T \ U / JV-r+1 n u=N We assume t < N — q since if t > N — q, then a test sequence of length q will always have to include at least one of the t test patterns, hence pnd will be zero. Theorem 2 shows th at once pnd, as an integer function in t, either levels off or starts falling, it will keep on falling, hence it has one maximum value with no local maxima. T h e o re m 2: If pnd{N, q,t, r, s) > pnd(N ,q,t + l,r,s) then 40 1. For all i s.t. t + 1 < i < min{./V — q, N — r} pnd(N ,q ,i,r ,s ) > Pnd{N, q,i + l,r,s) 2. If N — r < N — g, then for all i s.t. t + 1 < N — r < i < N — q P n d ( N , q, i, r, s) > ^ ( I V , g, i + 1, r, s) □ P ro o f: We look at the ratio pnd(N , g, t, r, s)/pnd(N , 5 , t + 1, r, 5 ) and we show that if this ratio is greater or equal to 1 , then all succeeding ratios will be greater than 1. We begin with the first part of the theorem. P n d ( N , q , i , r , s ) n ^ +1 ( ? ) [1 - i f e ? 1 ( v ) j ‘ p~t{N,q,i + l,r,«) nf=N+l [l - nZ S+* Denote then N -r + 1 / u _ U t= n (V) u=N X U ' pnd{ N ,q ,i,r ,s ) _ N - i pnd(N ,q ,i + l,r,s) N - i - q 1 - U i 1 - U, »+ij By the hypothesis, for i = t, the ratio is greater or equal to 1, hence 1 - Ui 1 - U t > t + i . ( N - i - q ) ( N- i ) N - i ' 41 Denote the above inequality as M t > Qi. For i >t, Qi > Qi+i. If we show that M,+i > M ,- then we get M :+ 1 > M * > Qi > Qi+i which proves the first statem ent. To show that Mt+i > Mi we need to show that (1 - Ui+1f - (1 - 170(1 - Ui+2) > 0. (2.9) We use the fact that Ui N - i l r Ui+1 N — i — r N — i — r and that Ui+ 2 N — i — r — 1 r U ~^ i ~ N - i - 1 ~ N - i - 1 to show a u w - u * , ) = __________H __________ f/? ( N - i - l ) ( N - i - r ) t+1 > 0. The last inequality in based on the fact that * < m in (N — q, N — r). To prove the second statem ent, notice th at for i = N — r, 17,+i and £7+2 are both equal to zero, hence Equation (2.9) becomes £ /,• > 0 and there is nothing to prove. For i > N — r, a test sequence of length r will always include at least one of the i test patterns, hence paii_ M et is equal to 1. The ratio becomes n f L iT ) N - i n S + 1 N - 1 - i > 42 □ 2.7 Experimental results Experiments were conducted using a modified versions of the ATALANTA fault simulation and test generation system [18]. These modifications included fault sim ulating random patterns generated by LFSRs with primitive feedback polynomials, and generating all test cubes per fault (or as many as ATALANTA could find, de pending on the allowed backtrack limit) instead of stopping once the first test pattern is found. We first tried EP on the ISCAS85 benchmark circuits. The characteristics of some of these circuits, as well as the results of the experiments are given in Table 2.4. The first five columns of the table are self explanatory. The sixth column (#nr.flts.) gives the number of irredundant faults for each circuit. All of these circuit were classified as easy by Procedure second jphaseQ. Column seven (sp_time) is the CPU tim e (in seconds) required for Procedure second^phaseQ on a SPARC 2 workstation. Since these circuits were classified as easy, we did not continue on with EP. To get an idea of the required pseudo-random test length for these circuits, we conducted 10 random selection experiments (referred to as R S experiments in the sequel) for each circuit. In each RS experiment a primitive feedback polynomial and an initial seed were chosen at random and test patterns generated until 100% fault coverage was achieved. Columns eight and nine give the length of the best (shortest) and worst (longest) sequence that was required for each of the circuits. By looking at the values in column ten, we see that the sequence lengths that were found are much smaller than 2"~15. We did not try to embed test sequences for circuits c2670(n = 233), c5315(n = 178), and c7552(n = 207) because the number of inputs for each of these circuits made the embedding impractical. These circuits are either randomly testable or the embedded test length would be so long (greater than 2n~15) that test set embedding would not be a viable test option. 43 Table 2.4: Experimental results for some ISCAS85 combinational benchmarks. circuit #pi #PO Agates depth #nr.flts sp_time best worst Hpi — 15 c432 36 7 196 18 520 1 480 3136 21 c499 41 32 243 12 750 1 896 1536 21 c880 60 26 443 25 942 1 7520 34176 45 cl355 41 32 587 25 1566 2 1344 2848 26 cl908 33 25 913 41 1870 4 5216 33152 18 c3540 50 22 1719 48 3291 3 8960 50016 35 c6288 32 32 2448 125 7710 13 64 256 17 Since the ISCAS circuits do not provide a good test bed for our algorithm, we turned to the Berkeley circuits [6]. We synthesized circuits with 20 to 40 inputs as multilevel circuit. Some circuits were eliminated for pathological reasons. The characteristics of the synthesized circuits are given in columns two (# p i) to six (#nr.flts) of Table 2.5. Column eight (sp_time) is the CPU tim e, in seconds, required to run the second phase on a SPARC 2 workstation. The second and third phase of IHF were combined for circuit xparc. The reason for this was the long simulation tim e required for this circuit. Thus, no tim e is reported for this circuit in Table 2.5. To identify the randomly testable circuits from the random resistant ones we ran each circuit through IHF. The simulation experiments of the second and third phase of ///F u se d a pool of 150 primitive polynomials. For a circuit with n inputs we used the first 150 primitive polynomials of degree n, when ordered lexicographically (i.e. (a:n + :r + l) > (xn + 1)). The results of Procedure second-phaseQ on these circuits is given in column seven (/) of Table 2.5. For each of the easy circuits we ran 10 RS experiments to find a short pseudo-random test sequence. We also conducted a more thorough search for k for these circuits. We allowed each RS experiment to generate at most 2n-/+1 test patterns, where I is the estim ate for k. The results of these experiments are in Table 2.6. The value of n — I is given in column two. The best (shortest sequence) and worst (longest sequence) results are given in columns three and four. None of the worst sequences achieved 100% fault coverage. The total CPU tim e needed for 44 Table 2.5: Characteristics of synthesized circuits. circuit # p i # p o Agates depth #nr.flts I sp-time b4 33 22 171 9 562 easy 2 in6 33 23 172 9 564 easy 2 jbp 36 57 353 13 1051 easy 3 misj 35 12 63 5 89 easy 1 signet 39 8 243 8 709 easy 1 x6dn 38 5 207 13 651 easy 2 bcO 21 11 372 15 1434 4 21 b3 32 19 415 17 1283 12 63 chkn 29 7 282 13 959 5 997 cps 24 109 486 18 1837 3 890 exep 28 62 256 12 995 11 21 in3 34 29 240 14 714 14 77 in4 32 20 422 23 1303 12 70 in5 24 14 192 14 630 10 6 in7 26 10 117 12 330 9 5 vg2 25 8 189 12 576 7 8 v tx l 27 6 207 12 616 7 72 xparc 39 69 729 21 2614 12 * xldn 27 6 207 12 616 7 34 x9dn 27 7 215 12 636 7 83 Table 2.6: RS results for the randomly testable synthesized circuits. circuit n — I best worst n -- 15 b4 15 48,244 130,016 18 in6 15 55,712 130,016 18 jbp 13 20,064 32,768 21 misj 10 672 4096 20 signet 13 7,232 32,768 24 x6dn 14 22,368 65,536 23 these experiments was 1 — 2 minutes, as reported by ATALANTA on a SPARC 1 workstation. The best random sequence that was found was always shorter than 2n_i+i (2L). Also, n — / + 1 was always less than n — 15 (column five). For the circuits classified as random resistant circuits we tried to find short test sequences in two ways. The first was by RS experiments and the second was by continuing our embedding procedure (the first two phases of IHF were already executed). The purpose of finding a minimum sequence in two ways was to evaluate the quality of the result of EP in term s of test length and processing time. We ran 100 RS experiments for each circuit, except for circuit chkn for which we ran just 50 experiments and circuit xparc for which we ran just 3 experiments. The results of these experiments are in Table 2.7. Column rs-time represents the total CPU tim e (read as hrs : m in ) needed for these experiments, as reported by ATALANTA on a SPARC 2 workstation. We allowed each RS experiment to run for at most 2n~l+l test patterns. For all the circuits, the worst sequence lengths are the m aximum allowed lengths per experiment and all such sequences did not detect all the faults. Comparing the best sequence length and the value 2n~l, we see that the best value was less than 2n~l (L ) for 8 of the circuits, while it was between L and 2L for the remaining 6 circuits. For the random resistant circuits we completed the execution of IHF and pro ceeded to embed the test patterns for these faults. In Table 2.8, the results of IHF are shown. For each circuit, we show the total num ber of irredundant faults, the 46 Table 2.7: RS results for the random resistant synthesized circuits. circuit ^sim . best 2n-i worst rs_time bcO 100 114,624 131,072 262,144 0:36 b3 100 487,168 1,048,576 2,097,152 6:00 chkn 50 15,561,920 16,777,216 20,000,000 25:15 cps 100 2,292,512 2,097,152 4,194,304 11:46 exep 100 155,584 131,072 2621448 0:32 in3 100 660,224 1,048,576 2,097,152 4:29 in4 100 548,480 1,048,576 2,097,152 5:19 in5 100 11,424 16,384 32,768 0:04 in7 100 56,544 131,072 262,144 0:25 vg2 100 267,552 262,144 524,288 1:01 vtxl 100 1,345,152 1,048,576 2,097,152 4:29 xparc 3 248,802,208 134,217,728 268,435,456 41:36 xldn 100 1,948,192 1,048,576 2,097,152 4:30 x9dn 100 1,022,944 1,048,576 2,097,152 4:49 num ber of faults that were classified as hard and the percentage of hard faults. The total number of faults that were classified as hard, over all circuits, make up 7% of the total number of faults in the circuits. Thus, when also considering that the hard faults are those with the fewest test patterns, several orders of m agnitude of reduction in the work effort is achieved when embedding only the hard faults. In Table 2.9 we show the results of EP for each of the random resistant circuits. The second column shows the total number of embedded patterns, i.e. the union of the sets of test patterns for the hard faults. Column three shows the number of different primitive polynomials considered, i.e. the m inimum length embedded sequence was computed for each of these polynomials. The polynomials we used were the first in the lexicographical ordering. Columns four and five give the best and worst embedding results for these polynomials, and column six gives the total CPU tim e (read as hrs : min) required for EP, using an HP-700 workstation. When comparing the best sequence length with the value 2n~l for each of the circuits (see Table 2.7), we see that the length of the best embedded sequence was always less than 2n~l except for the circuit exep. For six of the circuits, the worst sequence length was also less than 2n~l. When comparing the worst embedding length with 47 Table 2.8: Hard fault identification for synthesized circuits. circuit #nr.flts. #h.flts. % circuit #nr.flts. #h.flts. % bcO 1434 65 4.53 b3 1283 29 2.26 chkn 959 88 9.18 cps 1837 388 21.12 exep 995 86 8.64 in3 714 16 2.24 in4 1303 28 2.15 in5 630 77 12.22 in7 330 8 2.42 vg2 576 15 2.60 v txl 616 29 4.71 xparc 2614 130 4.97 xldn 616 28 4.55 x9dn 636 43 6.76 the best RS length, for 6 of the circuits the worst embedding length was shorter than the best RS length. From the embedding length for each circuit we can calculate the probability P n d { e l , t ) that an embedded sequence of length el will not detect a fault that has t test patterns, for various values of t. This is shown in Table 2.10. The values in the parenthesis in each table entry are exponents of 10. For example, the value of pnd(el,2l) for circuit bcO is 5.67 * 10~9. Depending on the values, one might decide that the probability of not detecting a fault is small enough such th at verification by simulation is not necessary. Having performed the RS experiments and the embedding experiments for the random resistant circuits, we proceeded to compare the results in term s of the short est sequence length found and the processing tim e for the experiments. These com parisons are shown in Table 2.11. Columns two and three give the best test length and the tim e required for the RS experiments. Columns four and five give these values for the embedding experiments. In column six we give the ratio between the best RS sequence length and the best embedding sequence length. Column seven gives the ratio between the processing tim e of the RS experiments and the embed ding experiments. These ratios were normalized, so they would reflect the ratio if both experiments were run on a SPARC 2. To find the normalization factor we ran EP for circuit chkn on a SPARC 2 workstation and compared the run tim e with the run tim e for the same procedure and circuit on the HP-700. The normalization 48 Table 2.9: Embedding results for the synthesized circuits. circuit #em d.vecs. #pols. best worst emb_time bcO 3,120 50 74,240 196,741 0:30 b3 171,707 4 385,824 614,368 7:01 chkn 13,328 20 9,459,968 12,551,540 1:02 cps 9,024 25 1,254,272 3,464,351 1:27 exep 281,235 2 149,120 150,912 4:22 in3 148,683 4 176,960 239,296 3:27 in4 194,556 3 530,752 617,600 5:51 in5 176,310 3 6,530 7,416 3:22 in7 9,728 15 9,274 29,488 0:24 vg2 2,556 100 111,456 320,438 0:35 vtxl 5,216 25 838,176 1,621,458 0:41 xparc 536,818 1 133,667,296 * 13:20 xldn 5,216 25 838,176 1,621,458 0:41 x9dn 6,144 25 446,624 1,200,397 0:45 Table 2.10: pnd(el,t) values for t = 2 * , l < j < l + 5. circuit (n,l) (e/,2') (el, 2'+1) (e/,2'+2) (el,2l+3) (el, 2'+4) (el, 2 +5) bcO (21,4) 5.67(-9) 3.90(-5) 5.99(-3) 7.01(-3) * * b3 (32,12) 5.48(-9) 5.00(-5) 1.25(-2) 3.64(-2) 2.76(-3) 8.00(-6) chkn (29,5) 5.06(-9) 3.64(-5) 5.89(-3) 7.38(-3) l.ll(-4 ) 1.24(-8) cps (24,3) 6.96(-9) 4.33(-5) 5.53(-3) 5.00(-3) 4.77(-5) 2.30(-9) exep (28,11) 2.54(-9) 1.07(-5) 5.76(-4) 7.68(-5) 1.23(-8) 1.52(-16) in3 (34,14) 6.68(-9) 7.40(-5) 2.78(-2) 1.79(-1) 6.67(-2) 4.51(-3) in4 (32,12) 4.77(-9) 3.80(-5) 7.21(-3) 1.20(-2) 3.02(-4) 9.23(-8) in5 (24,10) 5.33(-9) 4.69(-5) 1-11 (-2) 2.85(-2) 1.68(-3) 2.85(-6) in7 (26,9) 7.43(-9) 9.06(-5) 4.12(-2) 3.93(-l) 3.20(-l) 1.04(-1) vg2 (25,7) 5.33(-9) 4.52(-5) 1.00(-2) 2.30(-2) 1.09(-3) 1.21(-6) v txl (27,7) 3.66(-9) 2.10(-5) 2.24(-3) 1.13(-3) 3.00(-6) 7.18(-12) xparc (39,12) 2.93(-9) 1.42(-5) 1.02(-3) 2.39(-4) 1.19(-7) 1.44(-14) xldn (27,7) 3.66(-9) 2.10(-5) 2.24(-3) 1.13(-3) 3.00(-6) 7.18(-12) x9dn (27,7) 5.32(-9) 4.50(-5) 1.00(-2) 2.28(-2) 1.08(-3) 1.18(-6) 49 Table 2.11: Comparison of embedding results and random selection result. random selection embedding rs/emb circuit test length time test length time test length time I n — I n — 21 in5 11,424 0:04 6,530 3:22 1.75 0.01 10 14 4 exep 155,584 0:32 149,120 4:22 1.04 0.06 11 18 7 b3 487,168 6:00 385,824 7:01 1.26 0.43 12 20 8 in4 548,480 5:19 530,752 5:51 1.03 0.46 12 20 8 in7 56,544 0:25 9,274 0:24 6.10 0.52 9 17 8 bcO 114,624 0:36 74,240 0:30 1.54 0.60 4 17 13 in3 660,224 4:39 176,960 3:27 3.73 0.68 14 20 6 vg2 267,552 1:01 111,456 0:35 2.40 0.87 7 18 11 xparc 248,802,208 41:36 133,667,296 13:20 1.86 1.56 12 27 15 x9dn 1,022,944 4:49 446,624 0:45 2.29 3.21 7 20 13 vtxl 1,345,152 4:29 838,176 0:41 1.60 3.28 7 20 13 xldn 1,948,192 4:30 838,176 0:41 2.32 3.29 7 20 13 cps 2,169,248 11:46 1,254,272 1:27 1.73 4.06 3 21 18 chkn 15,561,920 25:15 9,459,968 1:02 1.65 12.22 5 24 19 factor came out to be 2. Columns eight, nine and ten show the values of /, n — I, and n — 21 respectively. The first fact we can notice from Table 2.11 is that the test length resulting from EP is always shorter than the test length resulting from the RS experiments. The second fact we notice is th at the ratio of the processing times required by the two procedures varies. This can be explained as follows. The factor th at affects the processing tim e for EP is the number of embedded patterns, which is strongly affected by the value of /. EP will require less tim e for circuits with lower values of I (om itting the obvious influence of the number of embedded faults and the num ber of polynomials). The factor that affects the processing tim e for the RS experiments is the length of each experiment, i.e. the value of (n — i). The higher the value of I, the lower the value of (n — I), resulting in lower processing tim e for the RS experiments. Notice that for the eight circuits whose /-value is less than 10, the tim e ratio is greater than 1 for five and greater than 0.5 for all eight. In absolute tim e, EP required an hour or less for seven of the eight and an hour and a half for the eighth. Of the eight circuits for which the value n — 21 was greater than 10, for six of the eight, E P required less tim e than the RS experiments. Define the product of the test length ratio and the processing tim e ratio (columns six and seven) 50 as a measure of EP effectiveness. When this measure is greater than 1 then any relative advantage R S has over EP in either length or tim e is negated by the bigger advantage E P has in the other. Except for circuit 6c0 (whose measure was 0.924), the circuits with n — 21 > 10 all had effectiveness measures greater than 1. Of those, only vg2 had a time ratio less than 1. Circuits in i and m3 also had effectiveness measures greater than 1. Altogether, 9 of the 14 circuits had effectiveness measures greater than 1. R e m a rk 3 : The tim e reported for EP is the tim e required for the logarithm computations. It does not include the tim e required for phase three of IHF, the tim e it took to generate the test patterns and sort them before computing the logarithms, nor the tim e for Procedure window(). This is because these times were very small when compared with the time required by the RS experiments or the tim e required for logarithm computations. The time required for phase three was always between 5% — 10% of the RS time. For circuits b3 and inS, generating and sorting the test patterns took just over 2 : 30 minutes on a SPARC 2 and Procedure windowQ took just over 4 : 00 minutes for b3 and less than that for inS on an HP-700. These times are insignificant when compared with the RS and logarithm computation times. One can ask whether we really needed all the 100 RS experiments, or would 20 or 50 have been enough to produce a test sequence of length comparable with the one found by EP. A partial answer to this can be found in Table 2.12. The columns of Table 2.12 represent ratios between the sequence lengths from the RS experiments and the best embedding result. An integer entry in position (i , j ) indicates the num ber of experiments for circuit i where the ratio was between the value of the headings for the j-th and (j + l)-th columns. For example, for circuit cps there was one RS experiments whose ratio was between 1.50 and 1.75. The last non-zero entry in each row includes all the experiments whose ratio was greater or equal to the ratio of the column it sits in, e.g. for circuit in4 70 experiments had a ratio greater than 2.50. These 70 experiments include all those that ran for the maximum allowed tim e and failed to detect all the faults. 51 Table 2.12: Distribution of simulation length vs. embedding length ratio. circuit 1.00 1.25 1.50 ratio 1.75 2.00 2.25 2.50 bcO * * 1 1 2 3 93 b3 * 2 1 1 1 3 92 chkn * * 1 2 47 * * cps * * 1 0 4 4 91 exep 1 5 8 86 * * * in3 * * * * * * 100 in4 2 3 6 6 8 5 70 in5 * * 1 3 1 3 92 in7 * * * * * * 100 v g 2 * * * * * 2 98 vtxl * * 3 0 2 1 94 xparc * * * 1 2 * * xldn * * * * * 2 98 x9dn * * * * * 2 98 The results in this table tend to support the notion that we didn’t get “lucky” with random selections and usually all the experiments were necessary to ensure that we find a short sequence length. For those circuits whose tim e ratio was less than one, we conducted additional RS experiments to bring the ratio up to one. The results of these experiments are in Table 2.13. For four of the eight circuits the additional experiments did not find shorter sequences. For two circuit, the additional experiments found shorter se quences, but still longer than the embedded sequence. For two circuits the additional experiments produced two sequences that had shorter lengths than the embedded sequences. Looking closer at these circuits, the embedding were done relative to only two and three polynomials and their length ratio relative to the first 100 RS experiments was poor to begin with. Thus, there was already indication that the polynomials used for the embeddings were poor choices. R e m a rk 4: As mentioned in Section 2.6, our thesis is that the test sequence found for the hard faults will also detect the easy faults. This was true in all the embedding experiments except for one polynomial for the circuit m5, in which the embedded 52 Table 2.13: Results of additional RS experiments to get equal tim e comparisons. circuit add. exp. best result best embedding in5 9,900 8,192 6,530 exep 1,666 134,848 149,120 144,864 b3 132 611,296 385,824 in4 117 456,704 530,752 501,536 in7 92 68,064 9,274 bcO 66 163,680 74,240 in3 47 495,168 176,960 vg2 15 456,448 111,456 sequence missed nine irredundant faults, and three polynomials for circuit in7, of which two sequences missed one fault and one sequence missed eleven faults. R e m a rk 5: For circuits inS, exep, b3, in4, in5, ini and xparc, some of the hard faults had more than the maximum allowed number of test patterns (21 4 for inS, 21 2 for exep and 21 3 for the others). For these faults we embedded only the maximum allowed num ber of tests. This caused the predicted length (the distance between the head and tail of the shortest window) for some of the embeddings to be greater than the actual length. This is a result of the fact that we did not embed all possible tests, but only a subset. Additional patterns were found in the embedded sequence, rendering some of the embedded patterns unnecessary, hence the shorter length. Thus, when embedding only subsets of test sets, the length of the embedded sequence is an upper bound on the actual test length. When the complete test sets are embedded the length of the embedded sequence is the actual test length. 2.8 Conclusions In this chapter we address the question of finding short pseudo-random test sequences that achieve 100% fault coverage for LFSR-based PGs. We first show th at if the probability of detecting the hardest fault in the circuit is jo, then the probability 53 th at a pseudo-random test sequence of length ^ will achieve 100% fault coverage is typically greater than We then present an algorithm for embedding test patterns in test sequences generated by LFSRs. The algorithm is based on the theory of discrete logarithms. It produces a one-seed test sequence th at detects all the faults of interest. The algorithm can also embed two-pattern tests and can also embed test patterns in sequences generated by CAs. The advantage of our approach over existing schemes that achieve 100% fault coverage is the low overhead we incur in term s of both circuit area and delay. The applicability of the embedding algorithm depends on two user specified con straints. The first is the computational effort one is willing to expend (the param eter £ in Section 2.6.1) and the second is the length of the test session one is willing to allow (the param eter max in Section 2.6.1). Thus, for circuits where the hardest faults have more than 2s test patterns the computational effort required will be too high and for circuits with too many inputs, i.e. when n — S — 1 > m ax , the resulting sequence will be too long to be useful. We find short test sequences either by the embedding algorithm or with random selections. This is decided by classifying circuits as either randomly estable or random resistant. This classification is based on the number of test patterns for the hardest fault in the circuit. If the logarithm of this number, denoted by k, is greater than S, or if during the classification process a random sequence of length shorter than 2n~5~i that achieves 100% fc is found, the circuit is classified as randomly testable. In all other cases it is classified as random resistant. For randomly testable circuits our probabilistic and empirical results show that short test sequences can be found with few random selections. For random resistant circuits we compared the results achieved by the embed ding algorithm with those achieved by random selections of primitive polynomials and seeds for the LFSRs. In all cases the sequences found by our algorithm were shorter than those found by the random experiments. In almost half the cases, our algorithm was also faster than the random experiments. W hen the PG register is also to function as a RA, by considering only polynomials that achieve zero-aliasing (Chapter 3) as candidates for E P , both objectives are satisfied with one polynomial. 54 The circuits on which we ran the experiments are relatively small (only several hundred gates). Most of the effort for our algorithm is spent on logarithm computa tions, and only a small portion is spent on simulation. As circuit sizes increase, the cost of our algorithm will be only slightly affected whereas the cost of the random experiments will increase. Therefore, we expect the effectiveness of the algorithm to increase as circuit size increases. 2.9 Miscellaneous issues The main disadvantage of our embedding scheme, as presented in the paper, is the fact that it is applicable only to a limited family of circuits, namely those for which the hardest fault has less than 2s test patterns and for which L < 2max. While these two param eters are self imposed by each individual user, they still lim it the family of circuits. By allowing for higher values of 8 and m ax , the family increases, at a cost of more preprocessing effort and longer test sessions. The m ajor limiting factor, besides 100% fault coverage, is the requirement of one seed and no reconfiguration of the hardware. These requirements were set so as to lim it to a bare minimum the hardware and delay overhead of the BIST circuitry. By relaxing these restrictions, some tradeoffs arise. 2.9.1 M u ltip le seed s First we consider the use of m ultiple seeds for the LFSR. A num ber of complications arise in this scenario. First, this will require higher control complexity of the BIST circuitry to allow multiple seeding at the desired points in time. A more subtle complication is that by using m ultiple seeds, the overall test length, i.e. total num ber of patterns that are applied to the CUT, is reduced, in some cases drastically. This affects the probability of detecting the easy faults. The higher the reduction in test length, the higher the decrease in detection probability, resulting 55 in a need to embed test patterns for more faults. This, in turn, would increase the preprocessing effort. A third problem is the computation complexity of finding the optim al subse quence for an s-seed test sequence. This problem is N P — hard, hence exponential in the number of seeds. To show that the problem is N P H we consider the N P C problem minimum cover [15, p. 222]. The instance of the problem is a collection C of subsets of a finite set S, and a positive integer K < \C\. The question asked is whether exists a subset C' C C with \C'\ < K such that every element of S belongs to at least one element of C"? The reduction to our problem is as follows. Let S be the set of faults. Let C = { c i,. . . , C|c|} where each c * is the set of faults detected by the *-th test pattern, when sorted according to their logarithms. We assume the distance between consecutive test patterns is K. We apply our algorithm that finds the optim al selection of K seeds to detect all faults. If the resulting test sequence length is at most A', then each seed was used as a test pattern and immediately changed, i.e. the pattern generator was not allowed to run in autonomous mode. This is due to the fact that in autonomous mode, from test pattern i to test pattern i + 1 K + 1 test pattern are applied to the circuit. Thus, if the optimal test sequence is of length at most K, there exists a cover for S with \C'\ < K and C is the subsets that are represented by the selected seeds. 2.9.2 L ow er fau lt coverage A second thought is to lower the fault coverage requirement. This can easily be achieved by modifying the windowing procedure to consider subsequence that detects the desired percentage of the hard faults. The result will be shorter test sequences, which will lower the probability of detecting some of the easy faults. There are two problems with this scheme. The first is what percentage of the hard faults need to be detected to guarantee a required overall coverage, i.e. if we 56 require an overall fault coverage of y%, what is the percentage of the hard faults that need to be embedded? The second issue is the effectiveness of the embedding procedure for certain values of y , i.e. is the reduced effort in computation worth the resulting reduction in test length? As y decreases the required test length also decreases and a random selection is more likely to achieve the required coverage with a competitive length. 2 .9 .3 E m b ed d in g su b se ts o f te s t p a tte r n s W hen the complete sets for the hard faults are too large, one might consider em bedding only a subset of the test patterns for these faults. The analysis in Section 2.1, regarding the probability of not detecting a given fault, is really the probability of not generating any pattern out of a given set of patterns. W hat m atters in the analysis is the size of the smallest set of patterns, and the param eter L is relative to this size. The length of the embedded sequences is relative to L , and by empir ical observations it is between L / 2 and L. Therefore, we expect th at when given only subsets of the test patterns, the embedded sequence length will be longer than necessary, the ratio inversely proportional to the size of the subset relative to the full set. If this ratio is greater than 11, then we achieved no improvement over the recommended test length of [39]. 2 .9 .4 U sin g w id er p a tte r n g en era to rs Wagner et al. [45] looked at the question of the required test length when the PG width is wider than the input width. This has the effect of increasing the num ber of possible inputs and the number of test patterns (N and K respectively) by the same factor, hence the expected test length (L ) is unchanged. From our perspective the resulting embedded sequence length remains the same but the preprocessing work (number of logarithm computations per hard fault) increases. Thus, there is no advantage to using wider PGs. 57 2 .9 .5 D ifferen t w in d o w s from o n e p o ly n o m ia l By considering different windows from the same polynomial, the preprocessing work can be reduced without paying too much in sequence length. If the smallest window for a given polynomial turns out not to achieve 100% fault coverage, try the second best window and so forth until a satisfying window is found. By our experimental results, for six of thirteen circuits the worse embedded sequence was still better than the best RS sequence. For one circuit there was only one embedded sequence and for the remaining seven circuits, the ratio of embedded length versus RS lenght was 1.72, 1.51, and the rest were less than 1.21. For circuits exep and in5 the RS experiments took less tim e than embedding with one polynomial, but for the remaining circuits the ratio of RS tim e versus embedding tim e for one polynomial ranged from 1.7 to over 60. 58 Chapter 3 Response analyzer design The compacting function of a MISR is polynomial division over GF\2]. The effec tive output polynomial is divided by the feedback polynomial. The signature is the rem ainder of the division. Our objective is to select a feedback polynomial for the compacting MISR, given a set of modeled faults, such that an erroneous response resulting from any modeled fault is m apped to a different signature than the good response. We assume the following test scenario. The input sequence to the CUT has been designed so that the effective output polynomial due to any target fault is different than the effective polynomial of the good response. Let r be the effective polynomial of the good response, then the effective polynomial due to fault i can be represent as r + hi. By the linearity of the remaindering operation, we get a different remainder for this erroneous polynomial iff hi is not divisible by the feedback polynomial. We assume we are given the error polynomials for each of the target faults. The problem we deal with in this chapter is the following: given a set of poly nomials H = {hj, ^ 2 ,.. ., h|#|} find a polynomial that is relatively prime to all the polynomials of H. Such a polynomial will be referred to as a non-factor of H. If a non-factor is used as the feedback polynomial for the compacting MISR, zero- aliasing is achieved for the set of target faults. In particular, for irreducible and prim itive feedback polynomials we present (1) upper bounds on the smallest de gree zero-aliasing MISR; (2) procedures for selecting a zero-aliasing MISR with the 59 smallest degree; (3) procedures for selecting a zero-aliasing MISR of a pre-specified degree (under the condition that one exists); and (4) procedures for fast selection of a zero-aliasing MISR. We analyze the worst case as well as expected tim e complexity of the proposed procedures. A note on notation. The polynomials {A4} represent the error polynomials. The degree of hi is represented by The product of the polynomials in II is denoted by A, and the degree of h is d For each A,, the product of the distinct irreducible factors of degree j of hi is denoted by gtJ, with being the degree of gij. The product, over all i, of the polynomials gij is denoted by g3. The non-factor we seek will be referred to as a with da representing the degree of a. The rest of this chapter is organized as follows. In Section 3.1 we establish upper bounds on the degree of a non-factor. In Section 3.2 we review polynomial operations over GF[2] and their complexities. Section 3.3 presents procedures for finding a non factor of smallest degree for the set H . Section 3.4 presents procedures for finding a non-factor of a pre-specified degree and for finding a non-factor fast. We also discuss the effectiveness of conducting an exhaustive search for a least degree non-factor. Section 3.5 presents some experimental data. We conclude with Section 3.6. 3.1 Bounds on the least degree non-factor of a set of polynomials Consider the following problem. P ro b le m 1: Let H be a set of \H\ polynomials h \,... , A|#| with deg(h{) = d{ and for all 1 < i < |AT|, d{ < n. Let A = ni=i h% - Then deg(h) — d* = < \H\n. Give an upper bound s(dh) on the degree of an irreducible polynomial and an upper bound p(dh) on the degree of a primitive polynomial that does not divide A, i.e. there exists an irreducible (primitive) polynomial of degree at most £(<4) (p(dh)) that does not divide A . D 60 Similarly, let es(H) (ep(H)) be the expected degree of an irreducible (primitive) polynomial that is a non-factor of H. The bounds 5 (d) and p(d) will be referred to as the worst case bounds while the bounds es(H ) and ep(H) will be referred to as the expected bounds. We first establish the worst case bounds and then proceed with the expected bounds. 3.1.1 T h e w orst ca se b o u n d s For the bound on s(d) we follow [24]. Let I2U) denote the num ber of irreducible poly nomials of degree j over GF[2]. The degree of the product of all irreducible polynomi als of degree j is j l2{j)- Let s(d) denote the least integer such th at 4 ^2 (4 ) > d. Let Qs(d) be the product of all the irreducible polynomials of degree less than or equal to s(d). The degree of Qs(d) is greater than d. Replacing d with dh1 Qs(dh) Fas at least one root that is not a root of h, hence Qs{dh) has at least one irreducible factor th at is not a factor of h. Thus, s(dh) is an upper bound on the degree of an irreducible polynomial that is relatively prime to all the polynomials in the set H. The following lemma provides a bound on s(dh). L e m m a 3: [24, Lemma 4, p.293] s(d) < [log(d + 1)] □ We turn to find the bound on p(d). The number of primitive polynomials of degree m over GF[2] is - W ” - 1 ) m where < j> (q) is the Euler function denoting the number of integers less than and relatively prime to q and ([25, p. 37]) 1 ^ IlC 1 ” l /Pi) i= 1 61 where the pi s are all the distinct prime factors of q. L e m m a 4: [35, p. 173] <f>{q) > e -7 In In q + 2e'/ In In g for all q > 3 with the only exception being q = 223,092, 870 (the product of the first nine primes), for which | is replaced by 2.50637. 7 = 0.577215665... is Euler’ s constant. D For q > 65 we have ^ > 2.08 log log (!U ) To help us derive the bound on p(d) we introduce the value r(t). Let r(t) denote the least integer such that the ratio between r(f) times the num ber of primitive polynomials of degree r{t) and t times the number of irreducible polynomials of degree t is greater than 1 , i.e. 4 > (2T(,) - 1 ) . ih(t) L e m m a 5: [24, Lemma 3, p. 293] For t > 3 2t~1 < tl2(t) <2* - 2 . □ L e m m a 6 : For t > 6 , r(t) < t + [1 + log log 2t] P ro o f: For q > 7, the function yosJgT og} 1 S an increasing function. Also, « > . and . . . . 1 - 1 2.08 log log q 2.08 log log q 2.08 log log(<? — 1 ) 2.08 log log q 62 hence, q — 1 j q 2.08 log log(<7 — 1) ^ > 2.08 log log q' Let r'(t) be the least integer such that 2 oslogii^'h) > ^ ’ Thus, using the above relation, 2T '(t) - 1 ot 2.08 log log(2T '(t) — 1) > ” ' By Equation (3.1) we get < f> (2 T > {i) - 1) > 2* - 1 and due to Lemma 5 0(2T 'W - 1) > Thus, by the definition of r(f), we have that r'(t) > r(t). To bound r(f) from above, we solve for r'(t). By definition, r'(t) must satisfy 21T )___ > 2* =s. 2 .0 8 Io g r'(t) 2r > 2.08 log r'(t) => r'(t) — t > log 2.08 + log log r'{t) By setting r'(£) = t + [1 + log log 2f], we have T\ t) - t = fl + log log(2f)] and for t > 6 (1 + log log(2f)] > log 2.08 + log log r\t) □ L e m m a 7: Let p(d) denote the least integer such that ^2j=i ~ 1) > d, then for d > 65 p(d) < \\og(d + 1)] + fl + loglog(2[log(d + 1)])]. 63 Table 3.1: Number of primitive elements in GF[2m] and accumulated num ber of primitive elements in GF[22] ... GF[2m] m 4>{2m - 1) T I - , 4 > { 2! - D m - 1) T T -,< t> { 2 * - i) 2 2 2 28 132,765,696 343,973,802 3 6 8 29 533,826,432 877,800,234 4 8 16 30 534,600,000 1,412,400,234 5 30 46 31 2,147,483,646 3,559,883,880 6 36 82 32 2,147,483,648 5,707,367,528 7 126 208 33 6,963,536,448 12,670,903,976 8 128 336 34 11,452,896,600 24,123,800,576 9 432 768 35 32,524,320,000 56,648,432,576 10 600 1,368 36 26,121,388,032 82,769,820,608 11 1,936 3,304 37 136,822,635,072 219,592,455,680 12 1,728 5,032 38 183,250,539,864 402,842,995,544 13 8,190 13,222 39 465,193,834,560 868,036,830,104 14 10,584 23,806 40 473,702,400,000 1,341,739,230,104 15 27,000 50,806 41 2,198,858,730,832 3,540,597,960,936 16 32,768 83,574 42 2,427,720,325,632 5,968,318,286,568 17 131,070 214,644 43 8,774,777,333,880 14,743,095,620,448 18 139,968 354,612 44 8,834,232,287,232 23,577,327,907,680 19 524,286 878,898 45 28,548,223,200,000 52,125,551,107,680 20 480,000 1,358,898 46 45,914,084,232,320 98,039,635,340,000 21 1,778,112 3,137,010 47 140,646,443,289,600 238,686,078,629,600 22 2,640,704 5,777,714 48 109,586,090,557,440 348,272,169,187,040 23 8,210,080 13,987,794 49 558,517,276,622,592 906,789,445,809,632 24 6,635,520 20,623,314 50 656,100,000,000,000 1,562,889,445,809,632 25 32,400,000 53,023,314 51 1,910,296,842,179,040 3,473,186,287,988,672 26 44,717,400 97,740,714 52 2,338,996,194,662,400 5,812,182,482,651,072 27 113,467,392 211,208,106 53 9,005,653,101,120,000 14,817,835,583,771,072 P ro o f: By the definition of r(t) and Lemma 6 j = [lo g (d + l)l 4- f 1 + lo g log(2 flog (4+ 1)1)] j= [ lo g ( d + l) l £ - 1) > £ J W ) . j=1 j=1 By Lemma 3 and the definition of s(d) i=riog(d+l)l s(d) 3 h { j ) > > d - j=i j=i □ 64 E x a m p le 1: In Table 3.1 the values of 4> (2m — 1) (the degree of the product of all the primitive polynomials of degree m) and 2 < ^(2 * — 1) (the degree of the product of all the prim itive polynomials of degree 2 < i < m) are tabulated for 2 < m < 53. As long as d is less than the maximum value in the table, p(d) can be read from the table, instead of using Lemma 7. For example, if the num ber of modeled faults in the CUT is \H\ = 104 and the length of the test sequence is n = 106, then dh < 1010. The degree of the product of all primitive polynomials with degree less than or equal to 33 is the first which is greater than 1010, hence p(dh) < 33. Thus, a zero-aliasing LFSR with a primitive feedback polynomial, of degree at most 33, exists for the CUT. On the other hand, using the bound of Lemma 7 we get p(d/l) < 38. □ A closer look at Table 3.1 shows that the product of all the primitive polynomials of degree less than or equal to 53 has degree D greater than 1.4 • 1016. Thus, as long as the product of the number of faults and the test sequence length is less than D (which is the case for all practical test applications) a zero-aliasing MISR of degree less than or equal to 53 exists. 3 .1 .2 T h e e x p e c te d b o u n d s In deriving the expected bounds we assume that the polynomials {hi} are random polynomials. Denote the product of the distinct irreducible factors of degree j of hi by _g,j. Denote the number of distinct irreducible factors of A ,-, of degree j, by v. The value of v can range from 0 to min{\dif j\, ^ (j) } . L e m m a 8: For j > 2, the expected value of v (the number of irreducible factors of gij) is less than or equal to j. P roof: Let I R 2(j) = {Pi}i= P be the set of irreducible polynomials of degree j over GF[2], For a given polynomial q, of degree greater or equal to j, define the indicator function d(pi,q) to be one if pi divides q and zero otherwise. The probability th at a 65 polynomial of degree j divides a random polynomial of degree greater or equal to j is 2~J , hence the probability that d(pi,q) = 1 is equal to 2~J. Thus E[v] = E hU) Y d (P i»9) t= 1 2j i < 1 3 □ The same type of analysis can be used to bound Var[v], the variance of v, and cr„, the standard deviation of v. L e m m a 9: For j > 2, the variance of the num ber of irreducible factors of Qij is less than j. The standard deviation is less than ( j ) 2- P ro o f: The variance of v is given by Var[v\ = E[v2] — E[v]2. E{v2) = E fh{j) = E h(i) Y d (P i’ ^)2 + 2 Y d iP i’ <l)d (Pk> q) i=1 Pi,Pk€IR2(j) i<.k hU) . y e K P i ^ y i~ i I2{j) 212{ j f 2 i ^ 2 2J 1 2 < T + 3 32 + 2 Y E [d(Pi,q)d(Pk,q)\ P i ,Pk^IR2U ) i<k For j > 2 we have E[v2] < * 66 Having computed the mean and variance of the num ber of irreducible factors of degree j per polynomial, we can compute a confidence measure for these results. L e m m a 10: For j > 8, the expected number of polynomials pt> J - with more than 5 (50) factors is less than jf/|/100 ( |//|/1 0 ,000). P ro o f: Using the Chebyshev inequality [16, p. 376] Pr(\v — E[v)| > ccrv) < 1/c2 for j > 8 the probability that v is greater than 5 is less than 0.01. Using this result we can define a second random process in which the random variable x is 1 iff v is greater than 5 and 0 otherwise. This process is a Bernoulli experiment [10, Sec. 6.4]. The expected number of gij’s with more than 5 factors is upper bounded by |i7|/100, as is the variance. Similarly, the probability that v is greater than 50 is less than 0.0001 and the expected number of gifis with more than 50 factors is bounded by |f/|/1 0 , 000. □ L e m m a 11: The expected degree of the smallest irreducible non-factor of the set of polynomials H is bounded from above by [log \ H |] + 1. P ro o f: Denote the product of the polynomials gij, 1 < i < \H |, by gj. By Lemma 8, the expected number of (not necessarily distinct) factors of gj is less than \H\/j. The smallest j for which I2U) exceeds this value is an upper bound on the expected degree da of a non-factor of H . Thus, the upper bound S on da satisfies ^ |/ /| < 2-5 - 1 and da < [log \H\) + 1 . By applying Lemma 6 on the result of Lemma 11, we have □ C o ro lla ry 12: The expected degree of the smallest primitive non-factor of the set of polynomials H is bounded from above by 2 + [log |/f|] + [log log (2 + 2 [log |Lf|])"|. □ E x a m p le 2: Using the numbers of Example 1, let \H\ = 104 and n = 106. The first j for which 4> (23 — 1)/j exceeds \H\jj is j = 14 (Table 3.1), hence we expect to find a zero-aliasing MISR with a primitive feedback polynomial of degree less than or equal to 14, as opposed to the worst case of 33. Corollary 12 would give us an upper bound of 19. □ As the expected bound is a only a function the num ber of faults and not the length of the test sequence, the expected degree of a zero-aliasing MISR will never exceed 53. In fact, as long as the number of faults is less than 1 million, we expect to find a zero-aliasing MISR of degree less than or equal to 21. 3.2 Polynomial operations in GF[2] In search for a (least degree) non-factor of H we use procedures th at sift the factors of the same degree from a given polynomial. These procedures are based on the following lemma. L e m m a 13: [25, Lemma 2.13, p.48] x 2™ — x is the product of all irreducible polynomials of degree I, where I is a divisor of m. □ Thus, a basic step in finding the distinct irreducible factors of a polynomial b(x) is the computation of g(x) = gcd(b{x), x 2™ — x). The result of this operation is the product of all the irreducible factors of degree Z , where Z|m, of b{x). For most polynomials b{x) of interest to us, 2m > > deg(b(x)). Therefore, we first compute r(x) = (x2m — x) mod 6(x) 68 and then g(x) = gcd(b(x),r(x)). We first discuss the complexity of polynomial operations in GF[1\ and then review a well known approach to reduce x2"1 modulo b(x). 3 .2 .1 P o ly n o m ia l m u ltip lic a tio n , d iv isio n an d gcd The complexity of a polynomial gcd operation is 0(M (s) log 5 ) [3, pp. 300-308], where s is the degree of the larger polynomial operand and M(s) is the complexity of polynomial multiplication, where the product has degree s. The complexity of polynomial division is also 0 (M (s)) [3, Ch. 8]. Hence, it is crucial to find an efficient m ultiplication algorithm. We consider two multiplication algorithms. Both algorithms are based on F F T techniques [3, Ch. 7], [10, Ch. 32]. For these algorithms to work they need a root of unity whose order has small prime factors. In most cases, when the product polynomial has degree s, a root of order 2m > s is used. This poses a problem, since fields of characteristic 2 do not contain such roots. The first algorithm is due to Schonhage [41]. It uses roots of order 3m+1 to m ultiply polynomials of degree s < 3m. Its complexity is 0 (slo g slog logs). The second algorithm is suggested by Cormen et al. [10, p. 799]. To m ultiply polynomials of degree s/2 < 2m_1 they suggest working in the field GF\p] where p is a prime of the form (3 ■ 2m + 1. The m ultiplication is done over GF\p] and the coefficients of the product are reduced modulo 2 to give the correct result over G-F[2]. The question that naturally arises is how big is p. The best provable bound on the size of p is that it is less than 25-5'm [19]. It is widely believed that (3 = 0 (m 2) [35, p. 221]. A detailed discussion on the bounds on p, as well as how to find the least prime p and the smallest primitive element in GF[p\ is given in Appendix A. The complexity, per multiplication, of the Cormen algorithm is O (slogs) op erations in GF[p], If the word size of a machine is greater than logp, then word operations can be performed in 0(1) machine instructions. To further cut down on 69 Table 3.2: Least prime p, of the form j32m + 1, with smallest generator a and 2m-th root of unity u > . m P a U J m P a L O m P a U } 6 193 5 11 11 12289 11 7 16 65537 3 3 7 257 3 9 12 12289 11 41 17 786433 10 8 8 257 3 3 13 40961 3 12 18 786433 10 5 9 7681 17 62 14 65537 3 15 19 5767169 3 12 10 12289 11 49 15 65537 3 9 20 7340033 3 5 time, we construct logarithmic tables relative to a primitive element, a , of GF\p] for m ultiplication and addition. W ith these tables, m ultiplication modulo p is addition modulo p of the logarithms. The addition table stores the Jacobi logarithm Z (i ) [25, p. 69], i.e. Z(i) = loga (o;1 + 1) where loga (0) is defined as oo. Thus, adding a M + a 1 ', for p < v, is actually the multiplication operation a ^ (l + a "- '1 ) and the logarithm of the result equals p + Z{y — p). Table 3.2 shows the smallest p for m = 6, 7 , . . . , 20, along with the smallest primitive element a and the smallest 2m — th root of unity oj. In the sequel we shall use the notation 0 (M (s)) for the complexity of polynomial multiplication. Whenever possible it will mean slo g s, otherwise it should be taken as s log s log log s. Similary the notation L(s) will denote either log s or log s log log s, as appropriate. 3 .2 .2 x 2™ m o d u lo b(x) an d x l m o d u lo b(x) We review a well known approach [5] [33] to find the rem ainder of x2™ when divided by b(x) without actually carrying out the division. Let s = deg(b(x)) and let R ^ ( x ) = R i^xX = xV mod b(x). Then R ^ 2(x) = (x v )2 mod b(x) = x23+ mod b(x) = 70 Squaring over GF[2] is easy (r(x)2 = r(x2)), thus by repeatedly squaring and re ducing modulo b(x), in tim e 0 (m M (s )) we can compute R^m\x ) . Note th at the maximum degree R ^ 2 (x) can have is 2(s — 1). Once we have R(m\ x ) , we can com pute g(x) = gcd(b(x), R(m\ x ) — x) in tim e 0(M (s) logs). Overall tim e needed to compute g(x) is 0 (m M (s ) + M(s)\ogs). Let t = YljLotj^' To compute r(x) = mod 6(x) we compute R^m\ x ) as described above. This costs 0(mM (s)). We initialize r(x) to equal 1. As we compute R(m\ x ), for each interm ediate value R ^ ( x ) for which tj = 1, we set r(x ) to the product r(x )R ^ (x) mod b(x). Each such com putation costs 0 (M (s)), hence the cost of computing x l mod b(x) is 0(mM (s)). 3.3 Finding a non-factor of smallest degree for a given set of polynomials After establishing the bounds on the least degree non-factor of H in Section 3.1, this section addresses the question of finding a least degree non-factor for H. P ro b le m 2: Given a set of polynomials H = |/ii(x ), h-2(x ) , . . . , h\n\{x) j- with deg(h{) = d{ < n, let m \H\ K x ) = I I M * ) » deg(h) = '*Tdi = dh. «=1 t=l Find an irreducible (primitive) polynomial a(x), with deg(a) = da, such that 1. For all 1 < i < \H\, hi ^ 0 mod a (equivalently, h 0 mod a). 2. For all irreducible (primitive) polynomials 6 (ar), with deg(b) < da, h = 0 mod b (or equivalently, there exists an i for which hi = 0 mod b). □ 71 One way of solving the problem is by factoring the polynomials of H. This would require too much work, since we do not need to know all the factors in order to find a non-factor. We only need to know the “small” factors. In this section we present algorithms for solving Problem 2 and analyze their complexity. The complexity is given in two forms. The first is with worst case complexity bounds, referred to as the worst case complexity. The second is with expected complexity bounds, referred to as the expected complexity. The expected complexity is a refinement of the worst case complexity based on the expected size of the results from our procedures. By Lemmas 3 and 7 (Section 3.1), we have an upper bound u = s(dh) or u — p{dh) on da, depending on whether we are looking for an irreducible or a prim itive non factor. Using this bound, we begin our search process, which is m ade up of three phases. 1. For all hi 6 H, find gij(x), the product of all distinct irreducible (primitive) factors of hi, of degree j. 2. Having found the polynomials gij, determine whether all irreducible (primi tive) polynomials of degree j are factors of H. 3. If not all irreducible (primitive) polynomials of degree j are factors of H , find one that is not. The worst case complexities of the three phases for the irreducible case are 0(\H\u2M (n )), 0{\H\2 M(n) log n) and 0(\H \2n2u2M(u)). The dominant term is 0{\H\2n2u2M(u)). The worst case complexities of the three phases for the primitive case are 0(\H\u3M(n)), 0(\H\2 ■ M (n)logn) and 0(\H\2n2u3M(u) log log u). The dominant term is 0(\H \2n2u3 M(u) log log u). The expected complexity of the first two phases are 0(\H\u2M{n)) and 0{\H\ ■ log |//| • u2L(n) log n). The expected complexity for the third phase is 0(\H\ log \H | - daM(da)) to find an irreducible non-factor and 0{\H\ log \H\d?a log log daM(da)) to find a primitive non-factor. The dominant term is 0(\H\u2M(n)). 72 The worst case complexity is a function of \H\2n2 m ultiplied by term s that are logarithmic in |f/| and n whereas the expected complexity is a function of \H\n m ultiplied by terms that are logarithmic in \H\ and n. 3.3.1 T h e p ro d u ct o f all d istin c t factors o f th e sa m e d eg ree for a g iv en p o ly n o m ia l Given the polynomial hi(x) and the upper bound u, we wish to compute the product of all distinct factors of hi of degree j , for 1 < j < u. The procedure for computing the polynomials gij is given in Figure 3.1. The polynomials gij are computed in three steps. First, for u j 2 < j < u, compute gitj = gcd(hi(x),x2 3 — x). Each gij is a product of all the distinct irreducible factors of h{(x) of degree j and of degree I, where l\j. W hen j is less than or equal to u / 2, we have 2j < u. By Theorem 13, g ^ j contains the product of all irreducible factors of degree I, where l\j, of hi. Since the degree of gifij is (much) less than the degree of ht, it is more efficient to compute gij from g^2j than from hi. Thus, in Step 2, for 1 < j < u/2 compute = gcd(gi,2j , xv — x). At the end of Step 2, each gij contains all the factors of degree l\j of hi. To sift out the factors of degree less than j from < 7,j, we need to divide by g^i, where I ranges over the set of divisors of j. This is carried out in Step 3. Procedure distinct-f actorsQ is not enough when we are looking for a primitive non-factor. At the end of the procedure, each gij is the product of all distinct irreducible polynomials of degree j , that are factors of hi. From gtjJ we need to sift out the non-primitive factors. Before describing this aspect, we introduce the notion of maximal divisors. D e fin itio n 1: Let q = Yli=i pVi with p i,... ,pr being the distinct prim e factors of q. The set of maximal divisor of q is the set md(q) — {mi}J=1 where m,- = q/pi. Q For example, 20 = 22 • 5, hence md(20) = {10,4}. 16 = 24, therefore md( 16) = {8}. Since 7 has only one prime factor, md(7) = {1}. 73 P ro c e d u re 3: distinct-factors (hi) 1. For (j — u ] j > u/2 ] j ) (a) g{j = gcd(hi(x), xv - x) 2. For (j = [u/2\ \j > 0 ; j ) (a) gij = gcd(git2j ,x v - x) 3. For (j = 2; j < u;j + +) (a) For all l\j i- 9i,j — 9 i,j/ 9i,l Figure 3.1: Procedure distinct-/actors(hi). Computes the product of all distinct factors of degree j, for 1 < j < it, of the polynomial h. P ro c e d u re 4: distinct-primitive (gij) /* sifts out the non-primitive factors of gij */ 1. For all I in md(2J — 1) (a) qi - gcd(gij,xl - 1) (b) g i j = g i j / q i Figure 3.2: Procedure distinct-primitive(gij). Sifts out the non-primitive factors of 9i,j • A polynomial over GF\q\ of degree m is irreducible iff it divides xqm~x — 1 and does not divide xqk~l — 1 for all divisors k of m. It is prim itive of degree m iff it is irreducible and does not divide x l — 1 for all / in md(qm — 1) [25, Ch. 3]. Procedure distinct-primitivesQ, shown in Figure 3.2, sifts out the non-primitive factors of gij. L e m m a 14: 1. The complexity of Procedure distinct-f actorsQ is 0 (u 2M(n)). 2. The complexity of Procedure distinct -primitive/) is 0(u3M(n)). 74 3. The complexity of the first phase is 0(\H\u2M(n)) for the irreducible case and 0(\H\u3 ■ M {n )) for the primitive case. In the above expressions u = s(dh) for the irreducible case and u — p(dh) for the prim itive case. P roof: 1. The worst case complexity of Procedure distinct-f actors() is as follows. In Step 1, the procedure performs u/2 gcd computations involving h{. The com plexity of each gcd computation is 0{jM{di) + M(dz) log dt). Thus the total work for the first stage is J2 0(jM (di) + A f(4 )lo g 4 ) = o (^M (di) + logdij) = 0{u2 M(n)). j - fu/2] In Step 2 the procedure carries out u/2 gcd operations. The work required for this step is L k/2J 0(JM(diM ) + M(diM ) log dijj) = 0 (u 2M(n)). 3 = 1 In Step 3, for every element of the sets of divisors, the procedure performs a division operation. The cost expression is < Y.O{jU(di.,)) = 0(u2M(n)). j = i i [ j .1= 1 2. The complexity of Procedure distinct-primitiveif} is as follows. Each iteration of Procedure distinct-primitives () reduces (xk — 1) modulo gij and performs one gcd and one division operation. The cost of each iteration is 0 (jM (d ij ) + M(dij)\og(dij)). There are md(2J — 1) iterations, with md(23 — 1) < j < u, and we run the procedure u times. Therefore, the additional work for the primitive case is bounded by 0 (u 3M(di)). 75 In most cases, the values d{j will be (much) less than n, hence the actual work will be much less then 0 (u 2M(n)) and the dom inant factor will be Step 1 of Procedure distinct-f actor $(). 3. Over the set H, based on 1 and 2, the complexity of the first phase is (\H\u2 ■ M{n)) for the irreducible case and (\H\v?M(n)) for the prim itive case. The value of u is either s(dh) or p(dh) corresponding to either the irreducible or prim itive case. □ L e m m a 15: The expected complexity of the first phase is 0{\H\u2 M{n))) with u equal to either es(H ) or ep(H). P roof: The expected complexity of Procedure distinct-f actor s{) is dom inated by the complexity of Step 1, which is 0 (u 2M(n)). The difference in the complexity of the other steps, over the worst case, comes from using the expected size of the dijs, instead of their worst case size, which is equal to n. The expected complexity of the procedure (including Procedure distinct-primitiveQ), over the set H is, thus, 0(\H\u2M(n)), with u equal to either es(H) or ep(H). □ 3 .3 .2 T h e n u m b er o f all d istin c t fa cto rs, o f th e sa m e d eg re e, for a se t o f p o ly n o m ia ls After the first phase, for all degrees 1 < j < u, we have \H\ polynomials gij, each a product of the distinct irreducible (primitive) factors of degree j of hi. Some of the < 7; j ’s might equal 1 while some pairs might have factors in common. Our goal is to find a least degree non-factor of H . The first thing we m ust determ ine is whether all irreducible polynomials of degree j appear in gj = n i ^ 9i,j • This is the second of our three phases (page 72). A simple test is to compare de9{9 ^- = - S< de9^% ,}) hU). If ^ < I2{j) then there is a non-factor of degree j. For the prim itive case we compare with 76 If > J2(j), the only way to determine whether all irreducible (primitive) polynomials of degree j are factors of gj is to find those factors th at appear in more than one of the gij’s and to eliminate all their appearances except for one. We considered two methods for removing repeated factors. The first is referred to as the Icm method and the second is referred to as the gcd method. The 1cm method will be shown to be faster, but it also requires more space, which might not be available. In the 1cm method we first sort the gijs according to their degrees and then place them in the sets s,t, where gij € s iff 2fc _ 1 < deg(gij) < 2k. The sets {s*,} are ordered according to their index, in increasing order. We then begin computing lems of two polynomials taken from the first set. If this set has only one polynomial we take the second polynomial from the next set. The resulting Icm polynomial is placed in the set corresponding to its degree. This process ends when we are left with one polynomial, representing the Icm of all the polynomials gij. In the gcd method the polynomials gij are sorted by their degrees. In each iteration the polynomial with the highest degree is taken out of the set and and all pairwise gcds between itself and the other polynomials are taken. If the gcd is greater than 1, the other polynomial is divided by this gcd. At the end of the iteration none of the remaining polynomials in the set has a factor in common with the polynomial that was taken out. Thus, when the procedure ends, no factor appears in more than one of the gijs. L em m a 16: 1. The complexity of the second phase is 0(\H\2M(n) log ra). 2. The expected complexity of the second phase is 0 (\H | log3 \H\L(n) log(|i?|n)). P roof: 1. We can bound the work required for the Icm method as follows. First assume \H\ and dij are powers of 2 (if they are not, for bounding purposes increase 77 them to the nearest power of 2). Also, assume the polynomials are leaves of a binary tree. All the polynomials in the same level have the same degree (each level corresponds to a different set s*,). Assume th at in every Icm step, the degree of the Icm is the sum of the degrees of its two operands (i.e. the operands are relatively prime). The maximum degree the final Icm can have is \H\n and computing this Icm costs 0(M{\H\n) lo g d ^ ln )). Computing the two /cm ’s of the next to last level costs at most 0 (2 -M(\H\n/2)-log(\H\n/2)). In each lower level there are at most twice as many /cm ’s being computed but each costs less than half the cost of the level above it, hence the total cost is bounded by 0(\og(\H\n)M(\H\n)\og(\H\n)) < 0 (u 2 M(\H\n)). To use the Icm m ethod we need enough memory to store the final Icm. If we do not have the required memory, we use the gcd method. The work required is 0{\H\2 M(n) log n). 2. When taking into account the expected size of the polynomials gij, factoriza tion becomes practical. The factoring algorithm used is that of Cantor and Zassenhaus [7]. The complexity for factoring a product of r distinct irreducible polynomials of the degree j is given by 0{rM (rj)(j + log(rj)) (see Appendix C). By Lemma 10, the expected num ber of polynomials gij th at have more than 5 • 10* factors is less than \H\/102k+2. If we take the num ber of polynomi als with 5 • 10* factors to be (i.e. all polynomials with at m ost 5 factors are assumed to have 5, all polynomials with 6 ... 50 factors are assumed to have 50, etc.), then the expected work required to factor all the polynomials is bounded by / 1 io g 10 l ^ i - i q q . t j i 0 E 733d r 5 ' • io‘ )(j + iog(5j ■ io‘ )> By using the fact that 5j • 10* < n and by writing M(5j ■ 10*) as 5j ■ 10* • L(n), we can bound the sum by | l o g 10 \h \- \ \ 2 O j 25\H\jL(n)(j -f- logn) k=0 O (|ff|lo g |ff|jL (n )0 '+ lo g n )).(3 .2 ) 78 W hen the factorization is completed, all the irreducible factors can be sorted in tim e 0 (\H | • log |/f|) and the unique factors can be counted. Summing over j — 1 ... u{— es(H )) we get 0(\H\\og \H\u2L(n)(u + logn)). Since u ss log \H\, the expression becomes 0(1771 log3 \H\L{n) log(|J7|n)). □ 3.3.3 F in d in g a n o n -fa cto r We are now at the third phase, where we know the smallest degree da for which there exists a non-factor for h. We also have, m < \H\ polynomials gi,da th at are products of distinct irreducible (primitive) factors of h, all gi,djs are pairwise relatively prime and every irreducible (primitive) factor of degree da of h is a factor of one of these polynomials. We want to find an irreducible (primitive) polynomial of degree da th at is a non-factor of H . One approach is to divide the product of all irreducible (primitive) polynomials of degree da by the product of all m polynomials and find a factor of the result. This might pose a problem if we do not have the product at hand, i.e. only the polynomials gitda, or if the product is too large to handle as one polynomial. Another way is to randomly select irreducible (primitive) polynomials and check whether they are factors or non-factors. The only way to check is by doing the actual division. This division, however, will be regular long division, and not F F T division, whenever the divisor has very small degree compared to the degree of the dividend. If an irreducible (primitive) polynomial is relatively prim e to all of the fi'i.do’s, it is a non-factor. If it divides at least one of the polynomials, we can keep the result of the division and reduce our work in upcoming trials. This reduction requires th at polynomials do not repeat in the selection process. L em m a 17: 1 . The complexity of finding a non-factor once da is known is 0(\H \2n2c P aM(da)) for the irreducible case and 0(\H\2n2d^M(da) log log da) for the primitive case. 79 2. The expected complexity is 0(\H\ log \H\daM(da)) for the irreducible case and 0(\H\ ■ log \H\ ■ d2 loglog daM(da)) for the primitive case. P roof: 1. The procedure generates random polynomials, checks them for irreducibility (prim itivity) and whether they are factors or not. The expected num ber of ran dom polynomials that are tested for irreducibility (prim itivity) before an irre ducible (primitive) polynomial of degree da is found is da/2 (^-loglog da) [33]. The work required to test each polynomial for irreducibility is 0(daM(d a)) (0(d2M(da))) [33]. The sum of the d;,/s cannot exceed |if |n, therefore after at most irreducible polynomials are tried, a non-factor is found. The work involved with each try is \H\n ■ da (long division). Thus, the expected work required to find a non-factor is 0(\H \2n2 • dlM(da)). For the prim itive case the work is 0(\H\2n2d ? aM{da) log log da). For a more detailed discussion see Appendix B. 2. If the polynomials gij were factored (see proof of Lemma 16,(2)), once da is known, we draw irreducible (primitive) polynomials until a non-factor is found. We expect no more than \H\/da factors. When an irreducible (prim itive) polynomial is drawn, it takes 0 (log |if|) to check whether it is a fac tor or not. Hence, the expected work required to find a non-factor, once da is known, is bounded by 0 (\H \log \H\daM(da)) for the irreducible case and 0(1#! log \H\d%M(da) • log log da) for the primitive case. □ 3.4 Practical scenarios In this section we discuss some practical scenarios for finding zero-aliasing polyno mials. First, when we want a non-factor of a pre-specified degree. Second, when we want to find a non-factor fast. Third, we compare our algorithm for finding 80 a least degree non-factor with an exhaustive search over all irreducible (primitive) polynomials in ascending degrees. In some cases, this type of search will be faster. 3.4.1 F in d in g a n o n -fa cto r o f a p re-sp ec ified d eg ree In cases where the register is required to function as both a RA and a PG, a non factor of a pre-specified degree is needed. Thus P r o b lem 3: Given a set of polynomials H = {fix, h < i ,..., fi|jy|}, with deg(hi) < n, find an irreducible (primitive) non-factor of degree t for H. □ This problem is exactly the same as finding the least degree non-factor, except that we only need to consider the case of j = t, instead of iterating over all 1 < j < u. We first compute the polynomials then determine whether a non-factor of degree t exists, and if so find one. L em m a 18: 1. The complexity of finding a non-factor of degree t is 0{\H\2n2t2 M {t)) for the irreducible case and 0(\H\2n2t3M(t) log log t) for the primitive case. 2. The expected complexity is 0(\H\M{n)(t -f log n)). P roof: 1 . Computing the polynomials involves computing < 7,- > t = gcd(hi,x2t — a;) and for each I < E md{t) computing fi = gcd(gij, x2‘ — x) and gitt = fi- The cost of the first gcd computation is 0 (tM (d ,)-f M (dt) log d,). The cost of the |m d(i)| subsequent gcd and divisions is bounded by 0(log t(tM(dij) + M(ditt) log dj,*)). Substituting n for d^ and ditt we get 0 (lo g f • M(n)(t + logra)). Once we have the polynomials gi< t, we need to sift out m ultiple instances of the same irreducible polynomial. When using the gcd method, this has a worst case cost of 0(\H\2M(n) • logn). 81 At this stage, we know whether a non-factor of degree t exists or not. If one exists, we carry out phase 3. This has a worst case complexity of 0(\H \2n2t2 • M(t)). This is the dominant term for the whole process. The analysis is the same for the primitive case, hence the worst case complexity of finding an irreducible (primitive) non-factor of a given degree t for a set of polynomials H is 0(\H \2n2t2M(t)) (0(\H\2n2t3M(t) loglogt)). 2. We turn to analyze the expected complexity. For each A t , we compute gi> t — gcd(hi,x2t — x). This costs 0(\H\M(n)(t + logn)). The cost of sifting out the factors of degree less than t from the g^t’s, based on the expected number of factors for each degree, will be insignificant. Factoring and sorting the polynomials in the second phase has expected cost of 0(\H\ log \H\t • L(n)(t + logn)) (Eq. (3.2)). The expected number of distinct irreducible factors of degree t of H is bounded by \H\/t. Thus, the cost of finding a non-factor at this stage which consists of drawing at most ^ irreducible (primitive) polynomials, each at an expected cost of 0(^tM (t)) (O (|loglog£ • t2M(t))), and checking it against the list of factors, is bounded by 0 {^^tM (t)\o g (\H \/t)) for the irreducible case and O (|i/|(log \H\ — log t)t2M(t) log log t) for the primitive case. Hence, the expected complexity of finding a non-factor of degree t for H is bounded by 0(\H\M(n)(t + logn)). □ 3 .4 .2 F in d in g a n o n -fa cto r fast P r o b le m 4: Given a set of polynomials H = {hi, h2, . . ., with deg(hi) = di 5: n i S i= l — dh, find an irreducible (primitive) non-factor of H in less than 2° tries. D The sum of the degrees of all irreducible (primitive) polynomials of degree less than or equal to s(dh) (p(dh)) is greater than <4. If we look at u = s (u = p then > 2 ~c > 2 ~c) and if we draw 82 uniformly from all irreducible (primitive) polynomials of degree u, after 2 C draw ings we expect to find a non-factor. The expected work cost for this case is 0 ( 2° • (u2M(u) -f u\H\n)) = 0{2cu\H\n) which is the cost of 2C iterations of drawing a polynomial and testing for irreducibility, and once one is found dividing all \H\ polynomials by this candidate non-factor, using long division. For the primitive case this becomes 0 ( 2C • (u3M(u) log log u -f u\H\n)) = 0(2cu\H\n). E x a m p le 3: Using the numbers in Example 1 again, say we want to find a non factor in no more than 8 tries. We compute the bound p ( |1 0 10) and draw from all the primitive polynomials up to the computed bound. If we use Table 3.1, we see that instead of looking at the polynomials of degree less than or equal to 33, we need to consider all primitive polynomials of degree up to 34. In general, 5: 2, hence by Lemma 7, we only have to consider polynomials of degree greater by at most 2 than for the case when we want the minimum degree non-factor. □ We can also use the expected bounds es(d) and ep(d) to lower the degrees of the candidate non-factors. 3 .4 .3 E x h a u stiv e search In this subsection, we compare our algorithms with an exhaustive search for a least degree non-factor. We will look at the irreducible case. Assume the least degree irreducible non-factor has degree da. Also, assume we have a list of all irreducible polynomials in ascending order. The num ber of irre ducible roots of degree j is less than 2U We can bound the work required to find the non-factor, by an exhaustive search, by 0(\H\n2da+1). Using the expected bound on da (da = 0{log |f7|)), we can bound the work by 0(\H \2n). The expected work required to find the least degree non-factor, by our algorithms, is 0(\H\u2M (n )), which becomes 0(\H\\og2 \H\ • nlo g n ) when we substitute in the value of u. Not taking into account any of the constants involved with these two results, their ratio Assuming \H\ = 1024, this ratio is less than 1 for n > 1210. Assuming I#! = 2048, the ratio is less than 1 for n > 124,500. For \H\ = 4096, the ratio is less than 1 for n > 365,284,284. This suggests that depending on the num ber of target faults and the length of the test sequence, an exhaustive search might be more effective. Assuming the number of faults is less than 4096, based on the expected bound on the degree of a non-factor, we would need to store all the irreducible polynomials of degree at most 13. The number of these polynomials is 1389. 3.5 Experimental results The following experiments were conducted to verify our results. The experiments were conducted on a HP-700 workstation. 3.5.1 R a n d o m se le c tio n s b a sed on th e a b so lu te b o u n d s We generated a set of 1000 random polynomials of degree at most 200,000. The degree of the product of these polynomials was less than or equal to 200,000,000. We wanted a probability greater than 1 / 2 of finding a non-factor with ju st one drawing of a prim itive polynomial. By looking at Table 3.1, we can achieve this by selecting from the set of all primitive polynomials of degree less than or equal to 29. In fact, the probability of success with the first drawing is greater than 3/4. We drew the polynomials in a 2 step process. The first step selected the degree of the primitive candidate, the second selected the candidate. In the first step we selected a 32 bit num ber and took its value modulo the number of primitive roots in the fields GF[2] through G\F[229]. The result was used to determine the degree of the primitive candidate, by looking at the first field GF[2d] such that the num ber of primitive roots in the fields GF^2\ through GF[2d] is greater than the result. The selection of the actual polynomial was done by setting the coefficients of x , x 2, . . . , by a LFSR with a primitive feedback polynomial of degree d — 1 that was initialized to a random state. This guarantees th at no candidate will be selected twice and all candidates will have a chance at being considered. The candidates were tested for 84 prim itivity and if they were, they were tested for being non-factors. If at some point they would turn out to be factors, the search continued from the current state of the degree d — 1 LFSR. We generated 200 different sets of 1000 polynomials, and for each set we searched for a non-factor. In all 200 experiments the first primitive candidate turned out to be a non-factor. Of the non-factors that were found, 1 was of degree 2 1 , 2 were of degree 22, 3 of degree 23, 2 of degree 24, 7 of degree 25, 13 of degree 26, 32 of degree 27, 35 of degree 28 and 105 were of degree 29. The num ber of polynomials that were tested for prim itivity before one was found ranged from 1 to 160. The average number was 16. The tim e it took to find a primitive polynomial ranged from 0.01 seconds to 0.79 seconds. The average tim e was 0.104 seconds. It took between 153.25 and 166.68 seconds to find a non-factor, with the average being 160.50 seconds. 3.5.2 R a n d o m se le c tio n s b a sed on th e e x p e c te d b o u n d s Based on our expected bounds, Corollary 12, we should be able to find a non-factor of degree at most 14. We ran 100 experiment as above, only this time, we selected only prim itive polynomials of degree 11 (the expected bound based on Table 3.1). The first prim itive candidate that was selected was a non-factor in 6 6 of the 100 experiments. 19 experiments found the non-factor with the second candidate, 1 1 with the third, 2 with the fourth, 1 with the fifth and 1 with the sixth. We ran 100 experiments selecting only primitive candidates of degree 9. The num ber of primitive candidates that were tried before a non-factor was found ranged from 1 to 28. The average number of candidates was 7.5. To test the tightness of our expected bound, we ran 126 experiments in which 1024 random polynomials of degree at most 200,000 were generated and an exhaus tive search, in increasing order of degrees, was conducted to find the least degree non-factor. By our expected bound (Corollary 12), this least degree should be at most 14 and by Table 3.1 at most 11. In one experiment, the least degree was 7. In 35 it was 8 and in the remaining 90 experiments, the least degree was 9. 85 3 .5 .3 E x p e r im e n ts on b en ch m ark cir c u its We tried our worst case and expected bounds on error sequences of two circuits from the Berkeley synthesis benchmarks. The first circuit was m5, the second was ini. We used a fault simulator th at did not take into account any fault collapsing, hence the number of faults was twice the number of lines in the circuit (for stuck-at-0 and stuck-at-1 faults on each line). For circuit in5 there were 1092 faults, six of which were redundant, hence there were 1086 detectable faults. The circuit had 14 prim ary outputs. We used a test sequence of length 6530 that detects all the non-redundant faults and computed the effective output polynomials of all the faults. All were non-zero, hence there were no cancellation of errors from one output by errors of another output. Thus we had 1086 error polynomials of degree at most 6543. From Table 3.1, the worst case bound on the degree of a primitive non-factor is 23. To draw a prim itive non-factor with probability greater than | we need to consider all prim itive polynomials of degree 24 or less. We conducted 20 experiments of drawing zero-aliasing primitive polynomials, based on our worst case bounds. In all experiments, the first candidate was a non-factor. We then conducted another 20 experiments, this tim e drawing prim itive polynomials of degree 14, the size of the register available at the circuit outputs. In all experiments the first candidate was a non-factor. Based on our expected bounds (Table 3.1), we should find a non-factor of degree 11 or less. We tried finding non-factor of degree 11,9 and 7. In 17 of 20 degree 11 experiments, the first prim itive candidate was a non-factor. Two experiments found the non-factor with the second try, one with the third. We conducted 15 degree 9 experiments before considering all 48 primitive polynomials of degree 9. Of the 48 primitive polynomials of degree 9, 33 were factors, and 15 were non-factors. The average num ber of candidates tried before a non-factor was found was 3 |. All 18 primitive polynomials of degree 7 were factors. For circuit in i there were 568 faults, 567 of which were non-redundant. The circuit has 10 primary outputs and we used a test sequence of length 9280. Using the worst case bounds, to ensure selection of a primitive non-factor with probability greater than | , we considered all primitive polynomials of degree 24 or less. All 20 86 experiments found a non-factor with the first candidate. The expected bound (Table 3.1) for the degree of a primitive non-factor was 10. We tried to find non-factors of degree 11 and 10 (the size of the registers available at the outputs). All 20 degree 11 experiments found a non-factor with the first try. Of the 20 degree 10 experiments, 13 found a non-factor with the first try, 6 with the second and one with the third. For both circuits we tried to find the least degree non-factor using an exhaustive search. Since the fault extractor we used did not do any fault collapsing, some of the error polynomials were identical. By summing the values of all non-zero erroneous output words for each simulated fault, we found at least 292 different error polynomials in in7 and at least 566 different error polynomials in in5. This would make our expected bounds (Table 3.1) to be 9 for in7 and 10 for in5. For both circuits the least degree non-factor had degree 8 . It took 11 CPU minutes to find each of these polynomials. 3.6 Conclusions In this paper we presented procedures for selecting zero-aliasing feedback polynomi als for MISR-based RAs. When both PGs and RAs are designed as LFSRs/M ISRs, our scheme, combined with algorithms for selecting efficient feedback polynomials for pattern generation (Chapter 2 ), enables the selection of one feedback polynomial th at serves both tasks, thus reducing the overhead of reconfigurable registers. We presented upper bounds on the least degree irreducible and prim itive zero- aliasing polynomial for a set of modeled faults. We showed that in all practical test applications such a polynomial will always be of degree less than 53. In fact, by our expected bounds, when the number of faults is less than 1 0 6, this degree will be at most 21. In the experiments that were conducted, a zero-aliasing polynomial of degree less than the expected bound was always found. We also presented procedures for finding a zero-aliasing polynomial, when the objective is to minimize the degree, to have a specific degree or speed. We ana lyzed the computational effort that is required both under worst case conditions 87 worst case irreducible primitive bounds (u) smallest non-factor degree t non-factor s(dh) = [log(|/f|n+ 1 )] \H\2n2u2M(u) \H\2n2t2M(t) p{dh) = s(dh) + fl + loglog(2 s(4 t))] \H\2n2u3M(u) log log u \H\2n2t3M(t) log log t expected irreducible primitive bounds (-u) smallest non-factor degree t non-factor es(H) = flog |1 \H\M(n)u2 | H\M(n) (t + log n) ep(H) = 1 + es(H) + [loglog(2es(#))l | H\M(n)u2 \H\M(n)(t + logra) Table 3.3: Summary of Results and expected conditions. A (partial) summary of the results is presented in Table 3.3. For both the worst case analysis and expected analysis, Table 3.3 shows the upper bounds on the smallest non-factor, the computational complexity of finding a smallest non-factor and the complexity of finding a factor of a given degree. If one is willing to look at two signatures, a great deal of com putation effort can be saved due to the fact that most of the faults are detected early in the test sequence. For those faults, after the first signature we have low degree error polyno mials which can be processed fast. Only the remaining faults need the (potentially) high degree error polynomials which take more time to process. A dynamic pro gramming algorithm for finding the optim um placement of k signatures is given by Lambidonis et al. [23]. Based on our analysis and on our experiments, it is our conclusion th at when the error polynomials of the modeled target faults are available, zero-aliasing is an easily achievable goal. Thus, to ensure high quality tests, a prem ium should be put on fault modeling, autom ated test pattern generator design and fault simulation. W ith these tools available, zero-aliasing is not a problem. 88 Reference List [1] M.E. Aboulhamid and E. Cerny, A Class of Test Generators for Built-In Test ing, IEEE Trans. Computers, Vol. C-32, No. 10, October 1983, pp. 957-959 [2] V.K. Agarwal and E. Cerny, Store and Generate Built-In Testing Approach, Proc. Intl. Sym. on Fault-Tolerant Computing, pp. 35-40, June 1981 [3] A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Com puter Algorithms, Addison-Wesley Publishing Company, 1974 [4] F. Berglez, C. Gloster and G. Kedem, Built-In Self-Test with Weighted Random Pattern Hardware, Proc. Intl. Conf. on Computer Design, pp. 161-166, 1990 [5] E.R. Berlekamp, Factoring Polynomials Over Large Finite Fields, M athem atics of Com putation, Vol. 24, No. I l l , pp. 713-735, July, 1970 [6 ] R.K. Brayton, G.D. Hachtel, C. McMullen and A. Sangiovanni-Vincentelli, Logic Minimization Algorithms for VLSI Synthesis, Kluwer Academic Publish ers, Boston, MA, 1984 [7] D.G. Cantor and H. Zassenhaus, A New Algorithm for Factoring Polynomials over Finite Fields, M athematics of Com putation, Vol. 36, No. 154, pp. 587-592, April, 1981 [8 ] K. Chakrabarty and J.P. Hayes, Aliasing-Free Error Detection (ALFRED), Proc. 11th IEEE VLSI Test Sym., pp. 260-266, 1993 [9] D. Coppersmith, Fast Evaluation of Logarithms in Fields of Characteristic Two, IEEE Trans. Inform. Theory, Vol. IT-30, No. 4, pp. 587-594, July 1984 [10] T.H. Cormen, C.E. Leiserson and R.L. Rivest, Introduction to Algorithms, MIT Press, McGraw-Hill, 1990 [11] W. Daehn and J. Mucha, Hardware Test Pattern Generation for Built-In Test ing, Proc. Intl. Test Conf., pp. 110-113, 1981 [12] R. Dandapani, J.H. Patel and J.A. Abraham, Design of Test Pattern Generators for Built-In Test, Proc. Intl. Test Conf., pp. 315-319, 1984 89 [13] C. Dufaza and G. Cambon, LFSR based Deterministic and Pseudo Random Test Pattern Generator Structures, Proc. 2nd European Test Conf., pp. 27-34, 1991 [14] G. Edirisooriya and J.P. Robinson, A New Built-In Self-Test Method Based on Prestored Testing, Proc. VLSI Test Sym., pp. 10-16, 1993 [15] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, 1979. [16] R.L. Graham, D.E. Knuth and 0 . Patashnik, Concrete Mathematics, Addison- Wesley Publishing Company, 1989 [17] S.K. Gupta, D.K. Pradhan and S.M. Reddy, Zero Aliasing Compression, Proc. 2 0 th Int. Sym. on Fault-Tolerant Computing, pp. 254-263, 1990 [18] D. Ha, Automatic Test Pattern Generators and Fault Simulators for Combi national and Sequential Circuits, Technical Report, Virginia Polytechnic, June 1 9 9 2 [19] D.R. Heath-Brown, Zero-free regions for Dirichlet L-functions, and the least prime in an arithmetic progression, Proc. London Math. Soc., Vol. 64, No. 3, p p . 2 6 5 - 3 3 8 , 1 9 9 2 [20] S. Hellebrand, S. Tarnick and J. Rajski, Generation of Vector Patterns Through Reseeding of Multiple-Polynomial Linear Feedback Shift Registers, Proc. Intl. Test Conf., pp. 120-129, 1992 [21] K. Ireland and M. Rosen, A Classical Introduction to Modern Number Theory, Springer-Verlag New York Inc., 1982 [22] T. Kameda, S. Pilarski and A. Ivanov, Notes on Multiple Input Signature Anal ysis, IEEE Trans. Computer, Vol. 42, No. 2, pp. 228-234, February 1993 [23] D. Lambidonis, V.K. Agarwal, A. Ivanov and D. Xavier, Computation of Exact Fault Coverage for Compact Testing Schemes, Proc. Intl. Sym. Computers and Systems, pp. 1873-1876, 1991 [24] A. Lempel, G. Seroussi and S. Winograd, Complexity of multiplication infinite fields, Theoretical Computer Science, Vol. 22, pp. 285-296, 1983 [25] R. Lidl and H. Niederreiter, Introduction to finite fields and their applications, Cambridge University Press, 1986 [26] A. M ajumdar and S. Sastry, On the Distribution of Fault Coverage and Test Length in Random Testing of Combinational Circuits, Proc. Design Automation Conf., pp. 341-346, 1992 90 [27] F. Muradali, V.K. Agarwal and B. Nadeau-Dostie, A New Procedure for Weighted Random Built-In Self-Test, Proc. Intl. Test Conf., pp. 660-669, 1990 [28] A.M. Odlyzko, Discrete Logarithms in Finite Fields and Their Cryptographic Significance, Adv. in Cryptology (Proc. of Eurocrypt ’84), Lecture Notes in Com puter Science, Vol. 209, Springer-Verlag, New York, pp. 224-314, 1984 [29] C. Pomerance, A Note on the Least Prime in an Arithmetic Progression, J. Number Theory, Vol. 12, pp.218-223, 1980 [30] I. Pomeranz, S.M. Reddy and R. Tangirala, On Achieving Zero Aliasing for Modeled Faults, Proc. European Design Automation Conf., 1992 [31] D.K. Pradhan, S.K. G upta and M.G. Karpovsky, Aliasing Probability for Mul tiple Input Signature Analyzer, IEEE Trans. Computer, Vol. 39, No. 4, pp. 586-591, April 1990 [32] S.C Pohlig and M. Heilman, An Improved Algorithm for Computing Logarithms over GF[p] and Its Cryptographic Significance, IEEE Trans. Inform. Theory, Vol. IT-24, No. 1, pp. 106-110, January 1978 [33] M.O. Rabin, Probabilistic Algorithms in Finite Fields, SIAM J. of Computing, Vol. 9, No. 2, pp. 273-280, May 1980 [34] I.S. Reed and T.K. Truong, The Use of Finite Fields to Compute Convolutions, IEEE Trans. Inf. Th., Vol. IT-21, No. 2, pp. 208-213, March 1975 [35] P. Ribenboim, The Book of Prime Number Records, 2nd Edition, Springer- Verlag, 1989 [36] R.M. Robinson, The Converse of Fermat’ s Theorem, Amer. Math. Monthly, Vol. 64, pp. 703-710, 1957 [37] R.M. Robinson, A Report on Primes of the Form k2n -f 1 and on Factors of Fermat Numbers, Proc. Amer. Math. Soc., Vol. 9, pp. 673-680, 1958 [38] N.R. Saxena, P. France and E.J. McCluskey, Simple Bounds on Serial Signature Aliasing for Random Testing, IEEE Trans. Computer, Vol. 41, No. 5, pp. 638- 645, May 1992 [39] J. Savir and P.H. Bardell, On Random Pattern Test Length, IEEE Trans. Com puters, Vol. 33, No. 6 , pp. 467-474, June 1984 [40] J. Savir and P.H. Bardell, Built-In Self-Test: Milestones and Challenges, VLSI Design, Vol. 1 , No. 1 , pp. 23-44, 1993 91 [41] A. Schonhage, Schnelle Multiplikation von Polynomen iiber Korpern der Charakteristik 2, Acta Informatica, Vol. 7, 1977, pp. 395-398 [42] M. Serra, T. Slater, D.M. Miller and J.C. Munzio, The Analysis of One- Dimensional Cellular Automata and their Aliasing Properties, IEEE Trans. CAD, Vol. 9, No. 7, pp. 767-778, July 1990 [43] C.W. Starke, Built-In Test for CMOS Circuits, Proc. Intl. Test Conf., pp. 309- 314, 1984 [44] B. Vasudevan, D.E. Ross, M. Gala and K.L. Watson, LFSR Based Deterministic Hardware for At-Speed BIST, Proc. VLSI Test Sym., pp. 201-207, 1993 [45] K.D. Wagner, C.K. Chin and E.J. McCluskey, Pseudorandom Testing, IEEE Trans. Computers, Vol. 36, No. 3, pp. 332-343, March 1987 [46] J.A. Waicukauski, E. Lindenbloom, E.B. Eichelberger and O.P. Forlenza, A Method For Generating Weighted Random Patterns, IBM J. Res. Develop., pp. 149-161, March 1989 [47] T.W . Williams, W. Daehn, M. Gruetzner and C.W. Starke, Aliasing Errors in Signature Analysis Registers, IEEE Design and Test, pp. 39-45, April 1987 [48] D. Xavier, R.C. Aitken, A. Ivanov and V.K. Agarwal, Using an Asymmetric Error Model to Study Aliasing in Signature Analysis Registers, IEEE Trans. Computer-Aided Design, Vol. 11, No. 1, pp. 16-25, January 1992 92 Appendix A The least prime in an arithmetic progression Given an integer n, we would like to know what is the smallest prime of the form p = k T + 1. (A .l) Alternatively, what is the smallest k such that p is a prime. In Subsection A .l we review some of the known upper bounds on the least prime in a general arithm etic progression. In Subsection A.2 we show additional evidence that, for most cases, k is not too large. In Subsection A.3, assuming k is not too large, we review known methods to find p and a primitive element in GF\p\. Finally, in Table A .l, for 0 < n < 206, we show the least 1 < k < 100, if one exists, such that k2n + 1 is a prime. A .l General Bounds The question of what is the smallest prime of the form of Eq. A .l has received a lot of attention under the topic of the smallest prime in an arithmetic progression. We review some known facts, as given in [35, Ch. 4, Sec. IV.B]. For d > 2, a > 1 , relatively prime, let p(d, a) be the smallest prim e in the arithm etic progression {a -f kd | k > 0}. Can one find an upper bound, depending 93 only on a and d for p(d,a)? Let p(d) = max{p(d,a) | 1 < a < d,gcd(a,d) = 1}. Can one find an upper bound on p(d) depending only on dl In 1950, Erdos showed For every d > 2 and for every C > 0, there exists C' > 0 (depending on C), such that #{a | 1 < a < d , gcd(a,d) = 1 , p(d,a) < C<f>(d)\ogd} ^ n , — — ,> t While this states that for every d there are values of a that satisfy the inequality for p(d, a), it says nothing about a = 1 . In 1971, Elliot and Halberstam showed the following. For every e > 0, for every d > 2, not belonging to a set of density 0 # { 1 < a < d , gcd(a,d) = 1 , p(d,a) < 0 (d)(logd)1+£} = < j> (d ) — o(< j> (d)) The above bound is not valid for every a and excludes some of the d’s. In 1977, Kumar Murty established that p(d) < d2 + E for every e > 0 and d not belonging to a sequence of density 0 . Again, the bound may not be valid for some powers of 2 . In 1987, Bombieri, Friedlander and Iwaniec proved a theorem from which the fol lowing estimate for p(d, a) can be deduced p(d,a) < d2/(logd)k for every k > 0 , d > 2 , gcd(a, d) — 1 and d is outside a set of density 0 . 94 W ith the generalized Reimann hypothesis, Heath-Brown showed that p(d) < C{<f>{d))\\ogd)\ In 1944, Linnik proved the following theorem. There exists L > 1 such that p(d) < dL for every sufficiently large d > 2. A lot of effort has gone into estimating L, called Linnik’s constant. The best result is th at of Heath-Brown which showed that Linnik’s theorem holds unconditionally for L = 5.5 [19]. Heath-Brown (1978) conjectured th at p(d) < Cd(logd)2 and WagstafF (1979), based on heuristic arguments, showed that p(d) 4>{d)(logd)2. While there is plenty of proof that there’s a very good chance of finding a small prime given (A .l), the only guarantee on its size is given by Heath-Brown’s esti mation of Linnik’s constant, and this, too, is only for sufficiently large powers of 2. Most researchers believe the Heath-Brown conjecture. A.2 Some evidence for the likelihood of k < 2n Let q be a prime. Looking at the values of p — k2n + 1, p will be divisible by q iff k2n — — 1 (mod q). Since q and 2n are relatively prime, the first k for which q divides p is kg = —2~n (mod q) , kq < q. After that, q will divide p for k = kq (mod q). Let Q(n) be the set of primes q such that 2 < q < 2n. We would like to know whether there exists a 1 < k < 2 n that will set (A .l) to be prime. Since p will be 95 less than 2 2n + 2 , if for any k, p is composite, it is divisible by one of the primes in Q(n). Hence, numbering the elements of Q(n) as qx, . . . , q\Q(n)\, P is prime iff p ^ k q i (mod qi) Vi € { 1 ... |(?(n)|}. (A.2) If we look at all the numbers in the interval [1 ... 2n], and m ark those th at are congruent to kqi (mod qt) for all qi € Q(n), then the first unmarked number (if one exists) is the least k that will produce a prime number. For example, if n = 3 then Q(n) = {3,5,7} and k3 = 1, A :5 = 3 and £ 7 = 6 . The marked numbers are 1,3,4, 6 and 7 and for k ~ 2 we have p = 17, a prime. We would like to show that an unmarked number always exists. This is dependent on the starting positions of the kqi’s. The kq > ’s can not take on any value, only values that are additive inverses of values in the multiplicative group of 2 modulo qi. We will look at a broader question, and assume that the fc 9l’s can take on any value. Let Q — Q i- Mark off all the numbers in the interval I = [0 ... < 5 3 that are multiples of any of the c/,’s. By the Chinese Remainder Theorem, any non zero number in I has a unique representation modulo all the qi s, and in particular one such number, r, can be represented as (qx — kqi,. . . , q\Q{n)\ — & g|Q(„)|)- I*1 the interval [r ... r + 2n] the marked numbers are of the form ki + hence when substituting an unmarked numbers for k , we get a number in the form of (A.2). An integer i is said to be Q(n)-anti-smooth if it has no prime factors in Q(n). We would like to show that in /, every sub-interval [i... i + 2n] has at least one Q(n)~ anti-smooth number. In particular, this would imply that the interval [ r .. .r + 2n] has at least one Q(n)-anti-smooth number and, hence, the least k that would set (A .l) to a prime is less than or equal to 2n. A broader question, which would imply this result, was conjectured to be true [35, pp. 219-220] [29]. Unfortunately, it was proven by Pomerance [29] to be false. For our case, this means that there exists some N , such th at for n > T V , there is at least one interval with no <5(n)-anti-smooth numbers. This does not rule out the possibility that k is less than 2 n even for those large n ’s, because the “ bad” interval is not necessarily our interval. In fact, we will show that the average distance between 96 unmarked numbers is less than n, implying that not only is it probable th at our interval contains an unmarked number, but that one is likely to be found very fast, hence k is small. L e m m a 19: The marked off numbers in I are symmetric about Q/2. P ro o f: For all i, 0 and Q are marked, qz and Q — qi are marked, and for all 0 < j < QI qi, j • qi and Q — j • qi are marked. □ The number of Q(n)-anti-smooth numbers in the interval [1 ... Q] is the number of integers that are relatively prime to Q. This number is given by Euler’ s function 4>{n) = n J J ( 1 - - ) v P for all distinct prime divisors p of n. Rosser and Schoenfeld proved the following lower bound on 4>(n) [35, p. 172] 71 <f(n) > e~'y- — ------------- £ ---- for all n > 3 (A.3) “ In In n + = — T - ~ v ; 1 2e^ In Inn j = 0.577215665... [35, p. 162] is Euler’ s constant. The only exception to the bound is for n = 2 • 3 • 5 • 7 ■ 11 • 13 • 17 • 19 • 23, for which | is replaced by 2.50637. Using the lower bound for on Q , we find the average difference between Q(n)-anti-sm ooth number to be Q K Q <KQ) ~ e-'v S L In In in q 2e'Y (lnln<5) 2 + 5 ~ 2 e_'ye') ' In In Q 97 < ln ln Q for Q > e2* = es. Since Q ft£ e2" (lin^n^oo Q — e2", [ 2 1 , p.27]), the average difference between un marked numbers is less than n. A.3 Finding the Least Prime in the Arithmetic Progression k2n + 1 The procedure for finding the least prime in the arithm etic progression k2n + 1 is basically a sequential search over k. The bounds on the size of the smallest prime provide “good” evidence that we will not have to search for long. Once we find the prime, we also need to find a generator of the multiplicative group. The basic step is a primality test due to Proth (1878) [35, p. 40] [36, Th. 9] T h e o re m 2 0 : Let N = k2n + 1, where 0 < k < 2n. Suppose th at (a /N ) = — 1. Then N is prime iff a (w- * ) / 2 ^ (moci jv) (a/N) is the Jacobi symbol [35, pp. 35-36]. □ A corollary to Theorem 20 is given by Robinson [36]. C o ro lla ry 21: Let N = k2n 1, where n > 1,0 < k < 2n, and 3 J(k. Then N is prime iff 3 (7V—1)/2 = ( m o d □ Thus, in most cases, we can choose a = 3 in Theorem 20. W hen 3|A;, we need another value for a. If N is not a perfect square (if it is, then it’s not a prime) then the probability that the Jacobi symbol of a random choice for a will be — 1 is 1/2, hence we can easily find a value for a that will satisfy the hypothesis of Theorem 20. 98 Once we’ve found a prime N, we need to find a generator for its multiplicative group. If a is a generator, than so is for all (j, N — 1) = 1 , hence the probability of a random choice being a generator is Assuming k < 2” , then by (A.4), the probability is greater than ln^+I for n > 6 . In testing whether an element is a generator, we’ll need the factorization of N — 1. Again, assuming N < 22n, we can afford the work. In constructing addition and multiplication tables modulo p, it is advantageous to choose the smallest generator. Assuming k is small and the generators are uni formly distributed, we can test the elements sequentially until we find the smallest generator. In [37], Robinson found all the primes of the form & 2 n + l with k odd, 1 < k < 100 and 0 < n < 512. From the tables of [37], we constructed Table A .l, which for 0 < n < 206 gives the least k < 1 0 0 , if one exists, such that k2n + 1 is a prime. While Robinson considered only odd A r ’s, we wanted the smallest k , be it odd or even. From Table A .l, we see that for 1 < n < 100, there are only 8 cases for which k > n. For 6 of those k < 2n. For the other 2, for n = 100 k is at most 232 and for n = 99 k is at most 464. For 101 < n < 206, there are 26 cases where we do not know the least k. For 8 of those we know it is less than n, and for another 8 we know it is less than 2n. For the remaining 10 values of n , we can only say that k is (much) less that 17n. 99 n ~0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 k 25 29 24 12 6 3 7 * * * * * 96 48 24 12 6 3 * 72 36 18 9 100 Table A .l: Smallest 1 < k < 100 for which k2n + 1 is prime, 0 < n < 206 k n k n k n k n k n k n k n k n 1 23 20 46 19 69 23 92 7 115 * 138 * 161 18 184 1 24 10 47 27 70 39 93 29 116 * 139 * 162 9 185 1 25 5 48 15 71 39 94 43 117 56 140 61 163 75 186 2 26 7 49 14 72 40 95 77 118 28 141 * 164 93 187 1 27 15 50 7 73 20 96 57 119 14 142 58 165 74 188 3 28 12 51 14 74 10 97 86 120 7 143 29 166 37 189 3 29 6 52 7 75 5 98 43 121 81 144 43 167 30 190 2 30 3 53 20 76 93 99 * 122 93 145 68 168 15 191 1 31 35 54 10 77 29 100 * 123 42 146 34 169 * 192 15 32 18 55 5 78 15 101 * 124 21 147 17 170 93 193 12 33 9 56 27 79 36 102 58 125 11 148 81 171 56 194 6 34 12 57 29 80 18 103 29 126 10 149 * 172 28 195 3 35 6 58 54 81 9 104 43 127 5 150 * 173 14 196 5 36 3 59 27 82 13 105 53 128 21 151 65 174 7 197 4 37 15 60 31 83 20 106 37 129 21 152 81 175 27 198 2 38 10 61 36 84 10 107 * 130 * 153 63 176 * 199 1 39 5 62 18 85 5 108 93 131 72 154 * 177 56 200 6 40 6 63 9 86 87 109 74 132 36 155 * 178 28 201 3 41 3 64 12 87 75 110 37 133 18 156 * 179 14 202 11 42 9 65 6 88 61 111 30 134 9 157 83 180 7 203 7 43 9 66 3 89 51 112 15 135 * 158 43 181 83 204 11 44 15 67 9 90 28 113 * 136 * 159 72 182 100 205 25 45 31 68 31 91 14 114 * 137 * 160 36 183 50 206 Appendix B Generating irreducible polynomials There are two ways to randomly select irreducible polynomials. The first is by randomly selecting polynomials and checking whether they are irreducible. The second is by randomly selecting elements in GF[2d], verifying they are not in any of it’s subfields and constructing their minimal polynomial. The first m ethod is discussed by Rabin [33]. An irreducible polynomial of degree d has d + 1 coefficients, some of which are 0. Both the most significant and least significant coefficients are non-zero, in our case “1”. The number of non-zero coeffi cients must be odd, otherwise the polynomial is divisible by {x + 1). Thus, we have 2d~2 possible choices for the remaining coefficients. Call a particular assignment / . L e m m a 22: [33, Lemma l,p. 275] Let l i ,... ,lk be all the unique prime divisors of d. Let m,- = d/li and let the set md(d) = {m,•}*_!• A polynomial / € GF[2][x\ of degree d is irreducible in GF]^2][x\ iff (1 ) / | x 2d - x (2 ) gcd(f,x2 U — x) = 1 , for all u € md(d) □ The work required for Step (1) is one reduction modulo / . This costs 0(dM (d )) operations. Step (2) requires k gcd operations. Notice that while reducing (x 2 < i — x) 101 modulo / (Section 3.2.2) the interm ediate results include the values of (x2’ ”* — x ) modulo / , hence these reductions need not be computed again. The cost, is then, 0(kM (d) log d). Since k < log d and for d > 16 we have d > log2 d, the overall complexity for checking whether a degree d polynomial is irreducible is 0(dM(d)). For the primitive case, the test is [25, Ch. 3] (1) / I (2 ) gcd(f,xv — 1 ) = 1 , for all v £ md(2d — 1 ). There is more work involved in this case. First, we must compute x2J mod / = rj for 1 < j < d and then multiply and reduce the appropriate r5’s to get the equivalent of xv. Computing the rj )s can be done in 0(dM(d)). Denoting \md{2d — l)j by q, computing the xv’s will cost 0(qdM(d )) < 0(d2M(d)). The gcd operations will add another 0(qM(d)logd). Thus the overall work is 0(cPM(d)). By Lemma 5 nd— 1 cyd — < W * 7 thus, the probability, pr,>, of choosing an irreducible polynomial is 2 I2(d) 4 d <pr- = ^ ^ d and the expected number of tries, before an irreducible polynomial is found, is less than d/2. For the primitive case, we have, by Lemma 4, for d > 6 2d < /> {2 d - 1 ) 2d < —---- ;----- - < - r d • 2.08 • log log d d d thus, the probability, prpr, of choosing a primitive polynomial is 4 < /> (2 d — 1) 4 < prpr — n d - 2 j — J d ■ 2.08 • log log d 2d~2d ~ d and the expected number of tries, before a primitive is found, is less than (d • 2.08 log log d)/A. 102 If care is taken not to generate the same / more than once, there is no need to keep track of the irreducible polynomials tested. The second m ethod for selecting irreducible polynomials, is by selecting elements of GF[2d], verifying they are not elements of any subfield of GF[2d] and constructing the corresponding minimal polynomial. We assume we are given a primitive poly nomial, g, that defines the arithm etic of GF[2r f ] and a primitive element, a , a root of g. We randomly select an element by randomly selecting a power of a. Two construction methods for minimal polynomials are discussed in [25, p. 94] and [33]. The first involves reducing a 3 mod g , 0(dM(d)), constructing the m atrix whose j-th row is a 5- 7 mod g, for 0 < j < d, 0(dM(d)) and finding a set of dependent rows which includes rows 0 and d, 0 (d 3). The m ajor work load is in this last step, which will dominate the primitive case as well. The second construction is based on the fact that m < * s = n ( x " as2' ) t = 0 where m a s is the minimal polynomial of a 3. We first need to represent a d in term s of a polynomial in a of degree < d, 0(dM(d)). Next, we need to square and reduce this polynomial d — 1 times, 0(dM (d)). Finally, we need to perform the multiplication. We start from the left, multiplying the accumulated result by the next linear factor. The result is (x + ots V + l ) = a,jXJ+1 + ^ ( a i - i + diOt323* 1 ) x l + a 0a s2J+1- \t=0 / i=l Each m ultiplication costs M (d ), each addition costs d. There are d(d~ l)/2 + (d — 1) multiplications and (d)(d — l ) / 2 additions, for a total cost of 0(d2M(d)). In addition to the higher cost, over the case of randomly selecting polynomials, we must keep track of the generated polynomials, even if powers are generated just once, since every irreducible polynomial has d roots, thus a chance of being generated up to d times. 103 Appendix C Factoring a product of distinct irreducible polynomials of the same degree We consider two methods for factoring a product of distinct irreducible polynomials of the same degree. The first is due to Rabin [33], which is a generalization of the m ethod proposed by Berlekamp [5]. Consider the Trace polynomial [25, pp. 50-59] over GF[2d] Tr(x ) = x + x2 -f • • • -f x2 < t 1. Tr{x ) is a linear function from GF\2d\ to GF]2\. 2d~l elements of GF^2d] are sent to 0 and the other 2d~l elements are sent to 1. Let / be a polynomial of degree n = dt which we wish factor. Consider f\ = gcd(f, Tr(x)). If not all the roots of / are sent to 0 or 1 by the trace function, then f\ is a non-trivial factor of / . T h e o re m 23: [33, Theorem 6 ] Let 5 be an element of GF[2d]. Let cc\ ^ a.2 be elements of GF^2d\. Then 2d~l = |{5 : Tr(Sai) ^ Tr{Sa2)}\. □ 104 C o ro lla ry 24: [33, Corollary 5] For 8 6 GF[2d], consider fs = gcd(fi,Tr(8x)). We have 1 / 2 < prob(8 : 0 < deg(fs) < deg(fi)) P ro o f: The roots of fs are those a,- which are roots of f\ and Tr(8x). But for cti, 0 2 roots of fi, with probability 1/2, T r(5«i) ^ Tr(8a2). □ Rabin comments that the actual probability is nearly 1 — 1/2*, where k = deg(fi), but he can not prove this. The root finding algorithm is to compute /i, select 8 randomly, and compute f$. If deg(fs) < ^deg(fi), set f 2 = fs , otherwise f 2 = f i / f s ■ In at most logn iterations a root is found and all that is left is to construct its minimal polynomial. The problem we have with this algorithm is that it involves use of the Trace function, thus arithm etic in GF[2d] instead of GF[2] and the construction of minimal polynomials. Although this can be dealt with, it would complicate the required programs. The second method we consider is that of Cantor and Zassenhaus [7]. We first review the m ethod for factoring polynomials over GF[q] for odd q, followed by Cantor and Zassenhaus’ modification when q is a power of 2, and in particular when q — 2. We follow this review with our analysis of the algorithm ’s complexity for factoring a product of distinct irreducible polynomials of the same degree. We then show that when d, the degree of the irreducible factors, and q are even, then the case where q = 2 mod 3 can be treated in the same manner as the case where q = 1 mod 3. Let / = rii=i u* ke a polynomial over GF[q], q odd, with the Ui being distinct irreducible polynomials of degree d. By Euclid’ s algorithm, there exist gi,Ci such that 9iUi + ct - I J u j = 1 . (C .l) Since c ,- = (IIj^i uj)~l mod u,-, we can assume deg(ci) < d. 105 Set e, = C i n j¥i Uj, then 1 mod U i e i ■ 0 mod uj , j ± i with deg(ej) < deg(f). Let 6 ;, 1 < i < r, be a set of polynomials of degree < d. By the Chinese Remainder Theorem (CRT) [3, Ch. 8 ], there exists a unique polynomial b, deg(b) < rd, s.t. for all i b = bi mod Ui. Furthermore, r b = Y^,(biei mod /) . i Suppose we have a polynomial r a(x ) — J 2 aiei 1 = 1 with ai € {0 , ± 1 } and suppose a ( x ) ^ 0 , ± 1 . Let a be a root of / , hence a is a root of uk, for some k. Therefore, a ( ° 0 = 2 ^ a *e * '( ° 0 = akek{a). i= 1 By (C .l), we have 1 = gk{a)uk(a) + ek(a) = ek(a) which yields a(a) = ak hence n(o;) — ak = 0 and uk divides a(x) — ak. Therefore, let S = {i : a; = 0} , T — {i : a{ = 1} , R = {i : a,i = — 1}. Then, gcd(f,a(x) - 1 ) = J J i£ T gcd(f,a(x)) = ut i€S 106 gcd(f,a(x) + 1) = f j u; At least one of the above equations will result in a non-trivial factorization of / . The question is how to find such a polynomial a. A polynomial a is found by a random search. Choose a random non-constant polynomial b from GF[q][x], with all such polynomials given the same probability, l/(qn — q). By the CRT, b can be written as r b = ^2(b;et - mod / ) t= 1 with bi = b mod ut - (note we do not know the u,-’s). Set m = (qd — l)/2 . & ” = ( X > r e .) m o d / with b™ equal to 0,1 or — 1. Thus, bm mod / is of the form of the polynomial a(x) that we are looking for. It remains to be seen what is the probability of bm mod / turning out to be ± 1 (since b was chosen to be non-constant, it was not 0, hence bm ^ 0). For each iq, there are m & ,- ’s such that b™ = 1 mod U { and m & * • ’s such that 6-71 = — 1 mod Ui. For bm to equal ±1 mod / , we need all h™ to equal 1 mod iq or all b™ to equal — 1 mod Ui. Thus, there are 2m r — (q — 1 ) non-constant polynomials b th at, when raised to the m — th power, will equal ±1 mod / . There are qn — q possible 6 ’s, hence the probability that bm = dbl mod / is 2mr ~ q + 1 = 21- V - l / - , ; + l < 2, - r < qn — q qn — q ~ giving a probability of success exceeding 1 — 2 1-r > 1 / 2 . 107 Turning to the case where q is a power of 2, if q = 1 mod 3, then GF[q] contains a 3rd root of unity p. By setting m = (qd — l)/3 and choosing b as before, define a = bm = ^2 aiei mod / t = l where a* € GF\A\. If a $ GF[A), it is guaranteed th at one of gcd{f,a — y) , i/ G GIF [4] , results in a non-trivial factor of / . The probability th at a turns out to be in GF[A) is Z m r — (T r 4- 1 < 3 < 1/3. qn - q When q = 2 mod 3 (as is the case when q = 2), Cantor and Zassenhaus suggest to factor / in the quadratic extension GF[q2] of GF[q] and combine conjugate factors over GF[q]. Conjugate factors can be matched in the following way. For the case q = 2 , if we denote a root of an irreducible polynomial of degree d by a, then the remaining d — 1 roots are a 2 ,3 for 1 < j < d — 1. Over GF[A\, this irreducible polynomial factors into two polynomials of degree the roots of one being o?3 for even ji’s and the roots of the other being a2 3 for odd j ’s. Denote these polynomials as f and f". Since a is a root of / ', f'{oi) = 0. Therefore /'(a )2 = (IZi=o fla% ) 2 = 0 * But over fields of characteristic 2 we have d/2 d/2 d/2 (£/V)2 = E(/0V)2 = E(/;)>2 ) - ‘ =0 t = 0 t = 0 1 = 0 hence d/2 ft*) = E(/;>v. 1 = 0 The complexity of the algorithm is as follows. The polynomial b is raised to the power m and reduced modulo / . This can be done in 0(dM(n)) operations. On average, b is chosen three times, thus the complexity remains the same. The gcd operations that follow can be done in 0 (M (n ) log n) operations. There are r factors, hence this process repeats r times. The complexity, without combining factors, is thus, 0(rM (n)(d + logn)). Combining conjugate factors requires 0(2rlog(2r)) to sort the factors and 0(r(d + log(2r) + M(d))) to create the combined factors 108 by selecting a random factor, squaring its coefficients, finding its conjugate and multiplying the two. This is majorized by the factoring term . We now show that when d is even, or when r/d is large enough, we can achieve a high probability of success, without working in the quadratic extension GF[q2] of GF[q] (i.e. we can choose the coefficients of b to be from GF[q\). We consider the case of q — 2, but the discussion remains similar when q is a power of 2. The polynomial b is chosen at random from the set of non-zero polynomials of degree < dr = n over G F[2], all given the same probability. There are two cases to consider. The first is when d is odd, the second when d is even. For the case when d is odd, the u ,’s remain unchanged when working over GF\4], i.e. they do not factor and the irreducible factors of / , over GF[4], are exactly those over GF[2], Setting m = (4d — l)/3 , we have m = (2d — l)(2d + l)/3 = k(2d — 1). The V s can be seen as elements in GF[2d]. Such an element, if not 0, when raised to the m -th power will be 1 . We will get a non-trivial factor only when not all the reduced values are 1, i.e. at least one b™ mod iq = 0. This will happen only if = 0 mod iq, making b divisible by iq. So, the fraction of 6 ’s that will result in a non-trivial factorization is the fraction of 6 ’s that are divisible by at least one of the iq’s. By the inclusion- exclusion counting principle, the number of 6 ’s that are divisible by at least one tq - is which is equal to < 2 jy - (2 j - 1 r + <-ir ~ 2 * - ( ^ - ) ) ■ 109 The total number of 6 ’s is 2dr — 1, hence our chance of finding a non-trivial factor is dependent on the values of r and d. W hen d is even, the situation is much better. This time, each factors into Uit\ and Ujt2, each of degree d/2. Consequently, we have 4 d/ 2 _ 1 2 d - I m = ------------------------------------ = — - — . 3 3 W hen working in the quadratic extension, the elements of G.F[4] are scalars, hence we can find a non-trivial factor by looking at gcd(f,a — 7 ), 7 G GF[A\. Over G.F[2 j, the only scalars we can use are 0 and 1 . In order to get a non-trivial factor we first need a $ GjF[4] and second we need at least one b™ mod «; to equal 0 or 1. Another way of looking at this is that to fail to get a non-trivial factor, either all (b™ mod u,-) equal 1 or all (6™ mod iq) G { < £ , 52} for 8 primitive in GF[A}. Thus prob(f ailure) = prob(Vi : b™ mod wt - = 1) + probiyi : b™ mod G { < £ , 52}) (2 m)r m 2 dr - 1 + 2dr - 1 (*-¥)' ( ^ y 2 dr — 1 2 dr - 1 n r (2^ r + 110
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11255783
Unique identifier
UC11255783
Legacy Identifier
DP22888