Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Optimal defect-tolerant SRAM designs in terms of yield-per-area under constraints on soft-error resilience and performance
(USC Thesis Other)
Optimal defect-tolerant SRAM designs in terms of yield-per-area under constraints on soft-error resilience and performance
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
OPTIMAL DEFECT-TOLERANT SRAM DESIGNS IN TERMS OF YIELD-PER-
AREA UNDER CONSTRAINTS ON SOFT-ERROR RESILIENCE AND
PERFORMANCE
by
Jae Chul Cha
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
May 2010
Copyright 2010 Jae Chul Cha
ii
Dedication
To My Family
iii
Acknowledgements
I have a long list of people including colleagues and friends to whom I am deeply
grateful for helping me directly and indirectly throughout the study at University of
Southern California. However, I would like to mention a few words of personal
gratitude.
I would like to express my deepest gratitude and appreciation to my research
advisor, Professor Sandeep K. Gupta for his invaluable guidance and helpful
suggestions throughout this research. He has been a great source of motivation and
inspiration for me, which I can not express thank him enough.
I would also like to extend my appreciation to my committee members, Professor
Massoud Pedram and Professor Aiichiro Nakano for their valuable time and
insightful suggestions.
I would like to extend my sincere gratitude to my parents, Jun Hwan Cha and
Kwang Ja Myung, for their love and support that have given to me through my life. I
would have not been able to achieve my goals without their continuous
encouragement and support.
I wish to express my deep gratitude to my parents-in-law, Se Moon Jahng and
Soon Hee Lee, for their help and kind encouragement. Their sincere support to my
family has been crucial in achieving my Ph.D goals.
Words cannot express my heartfelt gratitude, appreciation and thanks to my wife,
iv
Jung Yoon Jahng, who stood beside me and encouraged me constantly. She is my
source of strength and without her support this thesis would never have been
accomplished.
I am greatly indebted to my sister, Min Kyung Cha and brother-in-law, Yong
Min Kim, whose affection and support is the source of inspiration and
encouragement for my studies.
Last but by no means least it gives me immense pleasure to thank my children,
Andrew and Claire for giving me happiness and joy of life.
v
Table of Contents
Dedication ………………………………………………………………………
Acknowledgements ……………………..……………………………………....
List of Tables ……………………………………………………………..…….
List of Figures ………………………………………………………………......
Abstract …………………………………………………………………………
Chapter 1 Introduction ………………………………………………….…
1.1 Trends of Fabrication Technology ……………………………..
1.2 Objective and Scope of Our Study ……………………………..
Chapter 2 Characterization of Granularity and Redundancy for SRAMs for
Optimal Yield-per-Area ………………………………………...
2.1 Introduction ……………………………………………………..
2.2 Related Research ………………………………………………..
2.3 Background on Yield and Critical Area ………………………...
2.4 Proposed Yield Model ………………………………………….
A. Generalized Memory Architecture ………………………..
B. An Overview of Our Model Development ……………......
C. Cell-level Analysis ………………………………………..
D. Su-array-level Analysis ……………………………………
E. Novelty of Our Approaches …………………………….....
F. Yield for H-tree Inter-sub-array Interconnect ……………..
G. Area Overheads ……………………………………………
2.5 Architectural Consideration and Tradeoffs ……………………..
A. Redundancy Architectures ………………………………..
B. Tradeoffs Associated with Granularity …………………...
2.6 Experimental Results …………………………………………...
A. Validation of Our Yield Model …………………………..
B. Yield vs. Spares …………………………………………..
C. Effect of Defect Rates and Memory Sizes on Optimality ...
2.7 Concluding Remarks ………………………………………........
Chapter 3 Characterization of Overheads due to Spare Switching
Schemes........................................................................................
ii
iii
viii
ix
xi
1
1
5
8
8
9
11
13
14
16
17
20
23
24
25
26
26
26
29
29
30
32
34
36
vi
3.1 Introduction ……………………………………………………..
3.2 Spare Switching Schemes ………………………………………
A. Column Switching ………………………………………...
a. Two Possible Switching Schemes ……………………
b. Area Estimates and Performance Penalties …………..
B. Row Switching …………………………………………...
3.3 Concluding Remarks ……………………………………………
Chapter 4 Sub-array Interconnect Optimizations for SRAM Architecture
that maximize Yield-per-Area .…………………………………
4.1 Introduction ……………………………………………………..
4.2 Methodology ……………………………………………………
4.3 Experiments on Optimal Spacing and Width …………………..
4.4 Impact of Buffer Insertion on Yield and Area ………………….
4.5 SRAM Bit Cell Modification …………………………………...
4.6 Concluding Remarks ……………………………………………
Chapter 5 An Integrated Approach to Exploit Spares and ECC to
Efficiently Combat High Defect and Soft-error Rates in Future
SRAMs …….................................................................................
5.1 Introduction ……………………………………………………..
A. Motivations ……………………………………………..
B. Basic Ideas: Integrated Use of Spares and ECC ………..
5.2 Previous Research ………………………………………………
A. ECC Protection Schemes ……………………………….
B. Integrated View of ECC and Spare Switching Schemes ..
5.3 Integrated Approach and Yield-Resilience Model ……………...
A. Integrated Memory Reconfiguration Approach …………
B. Chips produced under integrated reconfiguration
approach …………………………………………………….
C. Characterization of Defect Types ……………………….
D. Additional Assumptions ………………………………...
E. An Integrated Yield-Resilience Model ………………….
5.4 Illustrating the Benefits of Partial Use of ECC for Masking
Hard Defects ……………………………………………………
5.5 Tradeoff Analysis of ECC Schemes ……………………………
A. Scope of ECC Scheme ………………………………….
B. Code Length …………………………………………….
C. Strength of Code ………………………………………...
5.6 Characterization of Overheads Associated with ECC
Implementation …………………………………………………
A. Area Estimates and Performance Penalties ……………..
B. Observations …………………………………………….
5.7 Characterization of Design Tradeoffs: Guidelines and Optimal
Designs ………………………………………………………….
36
36
37
37
38
41
43
44
44
45
51
54
55
57
59
59
59
61
61
61
62
63
63
64
64
65
65
68
71
72
73
73
74
74
76
76
vii
A. Objective Function ……………………………………...
B. Case Study and Design…………………………………..
5.8 Concluding Remarks ……………………………………………
Chapter 6 Summary and Future Research …...……………………………..
6.1 Summary………………………………………………………...
6.2 Future Works ……………………………………………………
Bibliography …………………………………………………………………….
Appendices:
Appendix A: Consideration on Performance ………………......
Appendix B: Case Study ……………………………………….
77
77
80
81
81
83
85
96
100
viii
List of Tables
Table 2.1. Mapping relations between the defect types and replacement
requirements………………………………...…………………………………...
18
Table 2.2. Critical area calculation within a single cell (defect profiles are
from [63][64])…………………………………………………………………...
19
Table 2.3. Number of spare rows and columns required for optimal yield and
yield/area for 3 SRAM sizes……………………………………...……………..
31
Table 3.1. Comparisons of fan-in/-out capacitances in critical paths for two
schemes…………………………………………………………..……………...
40
Table 4.1. Optimal spacing (and width) values for various defect densities and
number of banks, and comparisons of yield and yield-per-area values between
optimal interconnect designs and minimum rule designs……………………….
52
Table 5.1. Collection of notations for deriving integrated yield-resilience
model…………………………………………………………………………….
65
Table A. Yield-per-area values for each possible configuration when the
number of spares are varied……………………………………………………..
102
ix
List of Figures
Figure 1.1 Yield history for several fabrication process generations [7]……….. 2
Figure 1.2. Trends of defect density (Intel, [105])……………………………… 3
Figure 1.3 Memory area vs. years………………………………………………. 4
Figure 1.4. Soft error rate trends for SRAMs (MOSYS [84])………………….. 4
Figure 2.1. Parameters that characterize sub-arrays…………………………….. 15
Figure 2.2. Levels of granularity for memory array: the aspect ratios 1:1 or 2:1. 15
Figure 2.3. Proposed yield calculation for optimal SRAM design……………... 16
Figure 2.4. Netlist of a SRAM cell……………………………………………... 17
Figure 2.5. SRAM cell’s physical layout [15] and dimensions:
a=4 λ, b=6 λ, c=4 λ, d=27 λ, e=3 λ, f=38 λ, g=4 λ, h=5 λ, i=12 λ, j=4 λ, k=28 λ, l=2 λ..
18
Figure 2.6. (a) Column multiplexing structure. (b) Normal operation when
there is no fault. (c) Repairable operation when 4th column must be replaced
due to one or more defects………………………………………………………
27
Figure 2.7. A (1×1) and a (2×1) architecture…………………………………… 27
Figure 2.8. Yields estimated by three methods when the spare rows are used
for repairs, while varying the size of SRAM – three methods are respectively
based on (1) multivariate negative binomial distributions, (2) our hierarchical
method, and (3) poisson distributions…………………………………………...
30
Figure 2.9. Yield/Area for 32MB for different number of spares………………. 32
Figure 2.10. Yield-per-area trends of 4Mb SRAM when sub-array dimensions
are varied for several defect rates………………………………………………..
33
Figure 2.11. Comparisons of yield-per-area trends for two sizes of SRAM
(defect densities are set to 1000 defects/cm2)…………………………………...
34
Figure 3.1. Two spare column switching schemes……………………………... 38
x
Figure 3.2. Area overheads due to spare column switching schemes for varying
number of spares……………………………………………………...…………
39
Figure 3.3. Comparison of access time penalties for two schemes for various
word lengths…………………………………………………………………......
41
Figure 3.4. (a) Spare row switching circuitry and (b) its area overheads………. 42
Figure 3.5. Trends of increase of row access time due to spare switching
circuitries………………………………………………………………………...
42
Figure 4.1. Two parallel conductors, two types of point defects (short and
open), and dimensions…………………………………………………………...
46
Figure 4.2. Predicted general trends of yield-per-area for two parallel
conductors (units are not real.)…………………………………………………..
48
Figure 4.3. Diagonal buffer insertion scheme for given bus wires without
incurring area penalty……………………………………………………………
55
Figure 5.1. A view of memory columns in terms of code blocks………………. 66
Figure 5.2. Chip yield improvements due to partial use of ECC………………... 70
Figure 5.3. Comparison of yield improvement with spares for two approaches.. 71
Figure 5.4. The impact of changes of scope of ECC scheme on resilience…….. 72
Figure 5.5. Code-to-check bit ratio for various code lengths and strengths……. 73
Figure 5.6. Comparison of soft-error resilience of different code strengths for a
fixed code length, for different sizes of SRAMs………………………………..
74
Figure 5.7. Area overheads, decoding, and decoding delay for different BCH
codes……………………………………………………………………………..
75
Figure A. (a) A detailed view of row decoders and a single cell in sub-array
blocks and (b) Sub-array interconnect topology………………………………...
96
Figure B. (a) Discharging mechanism of NOR row decoder and its equivalent
circuitry, and (b) bitline dischargng path and its equivalent distributed RC
circuitry………….................................................................................................
98
Figure C. Access time comparisons for different bank sizes and aspect ratios… 101
xi
Abstract
Since the advent of computer-aided design (CAD) of digital systems, technological
constraints and advancements as well as market forces have been major drivers of
the direction of design methodologies. We have seen the emphasis shift from area
minimization in the LSI era, to delay minimization in the early part of VLSI era, and
to power minimization in the recent decade. Given current technology trends, the IC
industry will soon need to address the next paradigm shift in CAD of digital systems,
since digital system design will soon confront computing technologies and fabrication
processes with extremely high levels of variations in the values of key parameters,
such as defect densities and soft error rates. While defect-tolerance (DT) and fault-
tolerance (FT) techniques have matured over the past 50 years, they have been
applied to a limited class of digital subsystems and in an era of relatively low defect
densities and low soft error rates. Furthermore, FT techniques have been largely
confined to avionics and other critical systems where cost is not a main objective. In
contrast, we must soon apply these approaches to the entire range of digital systems,
including those with strict constraints on cost, performance, and power. A direct
extrapolation of how DT/FT techniques are currently applied will erode much of the
benefits of new processes/technologies. Our early results clearly show that careful
application of DT/FT techniques will provide significant gains in the near future and
will become increasingly important thereafter. Our results also show that a large
space of possible ways of applying these techniques must be searched carefully to
obtain efficient designs. This makes it imperative to develop new systematic
xii
approaches to efficiently apply current and new DT and FT techniques to all digital
systems.
The objective of this dissertation is to develop new comprehensive methodologies
for designing defect-tolerant memory devices optimized in terms of yield-per-area
under high defect rates and high soft error rates where soft-error resilience and
performance are given as constraints. Memories are significant proportions of most
digital systems and memory-intensive chips continue to lead the migration to new
nanometer fabrication processes. Thus, improving the reliability and robustness of
such memory devices will be very crucial issues. As for hard defects, competitive
pressures require SRAM and µ-processor vendors to adopt the latest process before it
has matured and hence when it suffers from high defect rates. Also, SRAMs are
becoming more susceptible to soft errors with each process generation [95][101]
[103]. SRAMs may thus require increasing numbers of spares and stronger error
correcting codes. Under those circumstances, we explore optimal SRAM architecture
in terms of yield-per-area for given constraints on performance and soft error rates,
by considering various design alternatives, including granularities (characterized by
sizes of memory sub-arrays), spare switching schemes, error correcting codes, and
physical layouts.
Our models and experiments demonstrate that 1) the optimal granularity tends to
be finer with the increase of defect densities, while it tends to be almost independent
of memory size for a given defect density, 2) the use of spares is more useful for
optimization than physical layout changes, 3) interconnects between memory sub-
arrays tend to have impact on memory yield that is much greater that its proportion
xiii
of memory area, especially for high defect densities and large number of sub-arrays,
and 4) the integrated use of ECC and spares for hard defects can reduce the required
number of spares, thereby decreasing area and performance penalties.
1
Chapter 1
Introduction
1.1 Trends of Fabrication Technology
Scaling – including reductions in feature sizes, separations, and voltages – has
been ongoing since the inception of MOS. The motivating force behind scaling has
been the benefits it provides, including lower costs, lower delays, lower power, and
increased system capabilities, such as larger memories, larger numbers of functional
units, and higher precision arithmetic units. Hence, scaling has significantly
contributed to the IT revolution by enabling continuous improvements in capabilities
and performance of CMOS, accompanied by reductions in cost. Feature sizes,
separations, the number of dopant atoms per device, etc. are approaching, or in some
cases, surpassing, the capabilities of economically viable fabrication processes. For
example, the minimum feature sizes used today are below the wavelength of the light
used for lithography. Fewer than a few hundred dopant atoms are used per transistor.
Due to such reasons, many qualitatively new concerns are coming to the fore,
including higher soft error rates, higher defect densities, and higher susceptibility to
internal and external noise [3]. It is important to note that the above assessment of
impeding problems is made despite assuming that research and development in
fabrication process technology will continue unabated. Hence, unless we develop
new methodologies for design and test of circuits and architectures, the time needed
for a new technology to become economically viable will be significantly extended.
In turn, these will slowdown the entire IT revolution.
2
Due to the aforementioned phenomenon in fabrication technologies, yield loss
has become problematic with continued reduction in feature size. Trends shown in
Figure 1.1 illustrate reductions in yield with shrinking technologies [7]. The figure
also shows that it takes substantial time for newly introduced (i.e., pre-mature)
manufacturing lines to achieve certain levels of yield. This means that, in order to
achieve desirable yield for the chips from such new fabrication lines, chips may need
to have significant amount of redundancies in some form. Intel also showed similar
trends on defect densities where defect densities in immature fabrication lines are
much higher compared to the ones in mature fabrication lines (Figure 1.2), thereby
requiring large number of redundancies to achieve certain levels of yields. Note that
in Figure 1.2 the y-axis is an log scale, where actual values are concealed by the
original authors.
Figure 1.1. Yield history for several fabrication process generations [7].
For the purpose of confirming that defect density can be hundreds of defects per
cm
2
at the initial stage of new technology, we use the data from SEMATECH [7] and
3
ITRS [3], where we choose 180nm technology as an example case. Assuming that
the initial yield for the technology is around 0.1 [7] and the estimated SRAM area in
a chip is 400mm
2
[57], and the number of row and column redundancies are
respectively two (we choose these numbers, based on the number of spare rows and
columns supported by prevailing SRAM compilers [82]-[86]), we can estimate that
the defect density at the initial stage of high volume fabrication is several hundreds
defects per cm
2
.
Figure 1.2. Trends of defect density (Intel, [105]).
The importance of SRAM in digital systems is growing, and it is expected that
SRAM will take 50-90% of die area in some SOC’s [72]. Moreover, the general
trend is that the ratio of the area of memory modules to the overall chip area is
increasing, as shown in Figure 1.3 [72]. This trend indicates that the yield of SRAM
will have significant and increasing impact on the chip yield.
Hard defects describe permanent changes in device behaviors, while soft errors
are defined as temporary upset(s) in values at circuit nodes (latches, flip-flops, and
SRAM). Occasional incidences of data loss due to soft errors are attributable to alpha
4
particles, high-energy neutrons, thermal neutrons, and so forth, hitting the silicon
surface [101][104]. Soft error rates (SER) for SRAM are also increasing [95][101]
[103]. Figure 1.4 shows the trends of increasing SER in SRAMs presented by
MOSYS [84]. As the feature size shrinks, the amount of charge per device decreases,
thereby making particles with lower energies to cause soft errors. Accordingly, we
need stronger error correction schemes with each process generation.
0
0.5
1
1999 2002 2005 2008 2011 2014
Year
Memory area
ITRS 2000: memory block share to die area
Figure 1.3. Memory area vs. years
Figure 1.4. Soft error rate trends for SRAMs (MOSYS [84]).
5
1.2 Objective and Scope of Our Study
In this dissertation, we develop a framework to design self-repairable SRAMs
that are optimized in terms of yield-per-area while satisfying constraints on
performance and soft error resilience. The scope of research proposed in this
dissertation includes the characterization and exploration of the following design
alternatives during designs of optimal SRAM architectures.
1. Granularity: For the given memory size, we determine how it
should be partitioned into sub-arrays (i.e., banks) with some
aspect ratios. Note that many important properties of memory
are affected by granularities, such as access time, power
consumption, the number of spares needed to achieve certain
levels of yield, and spare reconfiguration overheads.
2. Spare-switching Schemes: Spare columns or rows of memory
cells are introduced in conjunction with the corresponding
switching circuitries. Spare-switching schemes are used for
repairing permanent hard faults either in a fixed manner (during
post-manufacturing testing) or in an adaptive manner (during
operational life of the chip). Installation of spare switching
schemes improves yields, while incurring additional area
overheads and delay penalties.
6
3. Physical Layouts: Two different and competing approaches are
studied for layout optimizations, namely layout de-compaction
(better yield via defect avoidance) and layout compaction (via
smaller area). We explore not only the optimal layout changes
of basic cells/modules, but also the optimal layouts of inter-
connects and global signals.
4. Soft Error-tolerance Schemes: ECC (Error Correcting Codes)
use check bits to enhance data integrity. ECC schemes are
typically used to handle transient soft-errors caused by radiation
and alpha particles. However, ECE can be also used for fixing
some types of hard-defects with some loss of soft-error
resilience. In this regard, we quantify the benefit and the cost
associated with the integrated use of spares and ECC protection
schemes against hard defects.
For carrying out the analysis on the aforementioned design alternatives, we
create a realistic SRAM yield model where all important parameters are incorporated.
This yield model serves a number of purposes, including explorations of tradeoffs
between various alternatives of SRAM configurations, and quantifications of yield
improvement associated with the use of spares. We also quantify the hardware cost
due to implementations of spares and ECC protection schemes.
7
Incorporation of defect-tolerant schemes will be likely to affect the access time in
a negative way. Therefore, while we are trying to optimize memory systems in terms
of yield and area using various defect/error correcting schemes, we must ensure that
the access time penalties stay within acceptable ranges.
We also consider the partial use of ECC for correcting some types of hard defects
(more later). In this case, we have to ensure that the remaining soft error capability
provided by ECC protection scheme exceeds the desired levels of resilience against
soft errors.
8
Chapter 2
Characterization of Granularity and Redundancy for SRAMs for
Optimal Yield-per-Area
2.1 Introduction
Reduction in feature sizes and increase in chip area continue to put a downward
pressure on fabrication yield [7]. While state-of-the-art fabrication techniques have
helped somewhat mitigate this trend [65], overall yields are decreasing.
SRAMs constitute large fractions of chip areas for many modern digital systems
under mass production. Therefore, instead of discarding all chips with non-working
memory modules, defect tolerance circuitry is incorporated such that memories can
work despite having certain combinations of defects. One of the most common
defect-tolerance approaches is to add redundant memory elements [63][106][107]
[108][109][110][111] in the form of spare rows and columns and corresponding
switching circuitry. To date, in typical memory architectures, the number of
redundant elements used is extremely small compared to the number of elements in
the original array, reflecting the relatively low defect rates of past and present
fabrication processes [71][112]. Hence, researchers have not had to pay much
attention to overheads due to redundancy. However, process variations, defect rates,
and susceptibility to noise are now rapidly increasing as fabrication processes move
deep into nanoscale. As a result, overheads due to redundant circuitries are growing
and will be substantial in the near future. Hence, it is now important to study the
impact of the overheads of redundancies during the design of repairable memory
architectures, to achieve optimality in terms of yield, area, delay, and power.
9
In this chapter, we characterize tradeoffs between granularity and redundancy in
SRAMs in terms of yield-per-area. Using a hierarchical yield model that we develop,
we identify optimal designs. In particular, we demonstrate that at very coarse and
fine granularities overall overheads are extremely high due to high complexities of
routing and number of spares, respectively. We then identify designs that maximize
yield-per-area with respect to granularity and the number of spares.
2.2 Related Research
Characterization of granularity, hierarchy and redundancy for a SRAM requires a
yield model. Stapper [63] developed a mixed Poisson yield model for memory with
redundancy, where gamma distribution is used as a mixing function for compound
poisson statistics. The resulting probability model is called the negative binomial
model, which is commonly used for yield calculations. This model introduces a
clustering factor, whose exact value is obtained by fitting empirical yield data from
the manufacturing line.
Schiano [113] proposed a Markov based model for fault-tolerant memories. This
model represents each state as an operating chip configuration, where manufacturing
defects are interpreted as causes of state transitions. This model is often used when
fault coverage must be incorporated into reliability expressions [115]. However,
Markov model is subject to state space explosion [114] and hence is not suitable for
technologies with high defect rates.
Monte Carlo simulations [116][117] are a well-known yield prediction method.
This approach calculates yield via repeated random defect-insertion experiments.
10
Therefore, the accuracy is dependent on the number of experiments, which can
impose significant computational burden when defect rates are high. This is due to
the fact that, as the defect rate increases, multiple defects must be considered and
more redundancy is required. More importantly, this approach cannot be used in an
analytical manner and hence is not conducive to identification of optimal design
configurations.
Several variants of the aforementioned methodologies have also been developed.
Wang [118] described a tool that calculates the yield of repairable embedded
SRAMs that are generated by a compiler. Fault pattern and reconfiguration method,
which is based on a stochastic method and is an improved version of a Markov
model, was proposed by Battaglini [108]. Koren [6] developed a unified negative
binomial distribution method.
From hardware perspective, there exist several defect-tolerant schemes for
memories. Redundant circuits can be implemented and activated using fusible links
that are programmable using lasers. Alternatively, defective rows can be replaced by
spare rows by switching between decoders [109]. Other BISR (Built in Self-Repair)
techniques have also been proposed [110][111].
For analysis of defects in SRAM, authors in [119] classify defects as opens/shorts
within a cell, and opens/shorts at bit and word lines. They then translate the faults
into functional fault models. In [71], 18 types of opens/shorts in SRAM are presented.
Among 18 types, only one type of defects is associated with bitlines, while the rest of
them are related to single cells. In [63], the author visually inspected faulty chips to
construct the relations between defect types – missing/extra patterns of metal,
11
polysilicon, etc., and fault types, namely single cells, double cells, single wordlines,
etc. Authors in [120] study resistive shorts in pull-up and pull-down transistors in
memory cells.
2.3 Background on Yield and Critical Area
One of the widely adopted integrated circuit yield formula is the negative
binomial yield model [121], which can be written as
α
α
Y
1
,
(2.1)
where φ is the average number of defects in a chip and is a clustering factor. When
is very large, the defect distribution is uniform and the yield in (2.1) becomes
Poisson statistics:
e
α
lim Y
α
1
.
(2.2)
It is believed that equation (2.2) provides a pessimistic estimate of yield [121], and
we will examine this claim in Section 2.4. Smaller values of correspond to higher
clustering of defects.
Next we consider critical area analysis [121]. Assuming that a spot defect is
circular and has diameter l, the density function f(l) is expressed as [121]
otherwise
l l l when
l
k
l f
M o P
0
,
) (
,
(2.3)
12
where k = (p−1) ·l
o
p-1
·l
M
p-1
/
( l
M
p-1
− l
o
p-1
), l
o
and l
M
are respectively the feature size
(expressed as a multiple of lambda) and the largest defect size, and the value of p is
between 2 and 3.5 [121]. Note that defects with less than the size of lambda do not
cause functional errors, but may affect performance, and hence not in the scope of
this dissertation, but the subject of our future research. The value of l
M
can be
obtained empirically [121], depending on the defect profiles.
Critical area for defect of type i and diameter l is defined as A
i
(l), and the average
over all defect sizes is defined as A
i
and is calculated as [121]
M
l
lo
i i
dl l f l A A ) ( ) (
.
(2.4)
The critical area for a missing material defect of size l in a conductor of length L
and width w is given by
otherwise
w l when w l w l L w l
l A
i
0
, ) (
2
1
) (
) (
2 2
. (2.5)
We can also use the above equation for extra material defect of size l between two
parallel conductors with a spacing of w.
Let d
i
denote the average number of defects of type i per unit area. The average
number of defects of type i is expressed as A
i
·d
i
. We can now calculate φ as [121]
i
i i
d A
.
(2.6)
The yield and critical area equations that we have briefly reviewed above will be
used extensively by our model to capture the probabilities of failure.
13
The difference between yield extraction for a design without redundancy and a
design with redundancies is that the former is the probability that no defect occurs,
which is difficult to achieve at high defect rates, while the latter is the probability
that defects are repairable by the available redundant resources, such as given
number of spare rows and columns. Thus, yield estimate for designs with
redundancy is a generalization of that for designs without any redundancy.
2.4 Proposed Yield Model
We have developed a yield model for SRAMs that serves as the key tool for our
characterization of new tradeoffs in design of SRAMs with redundancy. Our model
has three novel characteristics that are necessary due to specific characteristics of our
analysis and the trends in design and fabrication processes.
1. Minimize spare requirements for repairing multiple defects: Increases in
defect density and memory size make multiple defects increasingly likely. To
minimize pessimism, we capture more precisely (compared to [63][64]) the
combinations of multiple defects that can be repaired by the spares available in
a SRAM design (details later). Our model accomplishes this by enumerating
the spatial distribution of various combinations of defects that can be repaired
using available spares. This is in contrast with previous approaches that only
identify a much smaller subset of these combinations of defects as being
repairable [63][64].
14
2. Consider defects in spares: As the number of spares required to optimize
yield-per-area is increasing, we can no longer ignore the probability of spares
being defective.
3. Capture a wide range of architectures: The main purpose of our yield
model is to study design tradeoffs that will necessitate adoption of alternative
architectures, such as SRAM designs with various levels of granularities (i.e.,
SRAM designs with one large array, with multiple smaller sub-arrays, and so
on), hierarchical use of spares, up-sizing (i.e., increase in feature sizes and/or
inter-feature spacings) to reduce the probability of misbehavior due to likely
defects, and so on. Hence, we have developed our memory yield model to
consider such architectures.
In this chapter, like previous models, we focus on the storage cell array and its
spares, since this constitutes a large proportion of area of typical SRAMs. Our
discussion will not include the defect tolerant schemes and the overheads of other
parts, such as decoders, sense amplifiers, and precharging circuit. We hence assume
that such parts of the memory are either defect-free or have fixed (and high) yield
values.
A. Generalized Memory Architecture
One of the basic tradeoffs we study in this paper is between the granularities and
the number of spares. At the coarsest granularity, the entire memory is implemented
as a single array. At finer granularities, the memory array is partitioned into sub-
arrays, which have the aspect ratios (numbers of rows and columns in sub-arrays) of
15
1:1 or 2:1 (shown in Figure 2.2) to optimize area and access time [122]. We denote
by H and W the height and the width in each sub-array, respectively (including spare
rows and columns). 2
a
and 2
b
are respectively the numbers of horizontal and vertical
partitions of the original SRAM array (Figure 2.1). Therefore, the overall
architecture has 2
a
×2
b
sub- arrays. To maintain the aspect ratio of 1:1 or 2:1, either a
= b, or a = b+1. The number of spare rows and columns with each sub-array are
respectively R
s
and C
s
. The type of interconnects between sub-arrays is an H-tree
topology (Figure 2.2). We develop our yield model to consider such architectural
generalizations.
W
H
Cs
Rs
2
a
spares
sub-array
2
b
Figure 2.1. Parameters that characterize sub-arrays.
Figure 2.2. Levels of granularity for memory array: the aspect ratios 1:1 or 2:1.
16
B. An Overview of Our Model Development
We develop our model hierarchically at two levels, as illustrated in Figure 2.3. At
the cell level, we use layout of a SRAM cell (and its neighborhood) and its transistor-
level netlist as inputs. We perform two tasks. First, we perform critical area analysis
using the layout of a cell (and its neighborhood) to identify the likely defects at the
transistor level and estimate the probability of occurrence of each defect. Second, we
analyze the transistor-level netlist for each of the above likely defects to categorize
the defect according to the spares required for its repair. In particular, we identify
whether a defect necessitates replacement of a cell, a row, a column, a row and a
column, two adjacent columns, or two adjacent rows. We then sum up the critical
areas for all defects that have the same spare requirements. Probabilities of failure
are then computed for each category of spare requirement.
Probabilities of each type of spare requirement
Layout
Done
If not satisfied, layout changed
If satisfied
Optimal design
-Granularity
- Hierarchy
- Redundancy
Optimal design
-Granularity
- Hierarchy
- Redundancy
Defect density
The lowest level critical areas
Netlists
Classifications w.r.t . spare types
Cell -level
Probabilities of each replacement type w.r.t . row & col
Size
Point probability
Cumulative distribution probability
Array -level
Probabilities of each category of spare requirement
Layout
Done
If not satisfied, layout changed
If satisfied
Optimal design
-Granularity
- Hierarchy
- Redundancy
Optimal design
-Granularity
- Hierarchy
- Redundancy
Defect density
Critical Area Analysis
Netlists
Classifications w.r.t . spare types
Cell -level
Probabilities of each replacement type w.r.t. row & col
Size
Point probability
Cumulative distribution probability
Array -level
Figure 2.3. Proposed yield calculation for optimal SRAM design.
17
Subsequently, we continue the analysis at the array-level. At this level, we
consider the architecture of the sub-array, including the available spares, to compute
array-level probabilities. In particular, we use the probabilities computed during cell-
level analysis, along with enumeration of all combinations of repairable defects to
compute the probability that a memory will be useable (after repair, if necessary).
C. Cell-level Analysis
Transistor level defect types are classified as gate-drain open/short, gate-source
open/short, and drain-source open/short. Other types of defects at the lowest level are
opens and shorts associated with bit-line(s), word-line(s), and power and ground
within a single cell. We then calculate critical areas for each defect type using the
layout shown in Figure 2.5 [123]. The critical area calculation results are shown in
Table 2.2. This classification is further processed to determine whether spare row(s)
or/and column(s) are needed to mask the effect of the defect [63][119]. For instances,
a drain-source short and a gate-source short in transistor 1 (Figure 2.4) have different
spare requirements. The former corrupts the bit-line (spare column required for
repair), while the latter corrupts the word-line (spare row required).
Wordline
Bitline
Figure 2.4. Netlist of a SRAM cell.
18
Figure 2.5. SRAM cell’s physical layout [15] and dimensions:
a=4 λ, b=6 λ, c=4 λ, d=27 λ, e=3 λ, f=38 λ, g=4 λ, h=5 λ, i=12 λ, j=4 λ, k=28 λ, l=2 λ
Complete mapping relations between transistor level defect types and the required
redundancy types are listed in Table 2.1. In case of a single cell defect, we can
replace such faulty cell by either a spare row or a spare column; we opt to always use
a spare column or a spare rows, depending on which is smaller, to mask a single cell
defect.
Table 2.1. Mapping relations between the defect types and replacement requirements.
19
Table 2.2. Critical area calculation within a single cell (defect profiles are from [63][64])*.
* Note: the probability of open defects of wordlines becomes small when poly wordlines are
strapped by metals for performance reasons. Such facts were not reflected on yield
calculations in this thesis, thereby, making the calculated yield to be a little pessimistic.
20
We then sum up individual critical areas at the layout level that correspond to
each classification to calculate critical areas for type I through type VI (see Table 2.1
and Table 2.2). Probabilities of failure in terms of type I-VI are then calculated by
using the yield formula in Section 2.3. P
B
is defined as the probability that a column,
i.e., a bit-line, needs to be replaced to mask one or more defects, P
W
as the
probability that a row, i.e., a word-line, needs to be replaced; P
C
as the probability
that a single cell needs to be replaced; P
BW
as the probability that a row and a column
both need to be replaced; P
DB
as the probability that two adjacent columns need to be
replaced; and P
DW
as the probability that two adjacent rows need to be replaced.
D. Su-array-level Analysis
We then transform the individual cell level probabilities to the next higher level of
abstraction, namely column/row level probabilities with respect to an entire sub-
array of cells. Given that SRAM sub-array has respectively H and W rows and
columns of cells, including spare rows and columns, we calculate the following six
types of probabilities.
1. Probability that a column needs to be replaced: At least one cell in the column has
a defect that corrupts a bitline, and none of the cells in the column has a defect that
corrupts either a wordline (type II), both a bitline and a wordline (type III), a double
bitline (type IV), or a double wordline (type V). Thus,
K H
DW DB BW W B
K
B
H
K
K H C
P P P P P P C P
) 1 ( ) (
1
.
(2.7)
Note that for single cell defects we may choose to use either spare columns or spare
rows, depending on the replacement policy applied (more later). If single cell defects
21
are covered by spare columns by policy, single cell defects are treated as column
defects. In such case, P
B
terms in eqn (2.7) should be replaced by P
B
+P
C
.
2. Probability that a row needs to be replaced: At least one cell in the row has a
defect that corrupts a wordline, and none of the cells in the row has defects that
corrupt either a bitline (type I), both a bitline and a wordline (type III), a double
bitline (type IV), or a double wordline (type V). Therefore,
K W
DW DB BW B W
K
W
W
K
K W R
P P P P P P C P
) 1 ( ) (
1
. (2.8)
Note also here that if single cell defects are handled by spare rows according to policy,
then we consider single cell defects as row defects. In that case, P
W
terms in eqn (2.8)
should be replaced by P
W
+P
C
.
3. Probability that a row and a column both need to be replaced: A cell at the
intersection between the row and the column has a type III defect, while none of
cells in the row has either type I, III, IV, or V defect, and none of cells in the column
has either type II, III, IV, or V defect. Therefore,
1 1
) 1 ( ) 1 (
H
DW DB BW W
W
DW DB BW B BW RC
P P P P P P P P P P .
(2.9)
4. Probability that two consecutive columns need to be replaced: At least one pair of
cells that are in the same row has a type IV defect with respect to each other. None of
the cells in the two columns has a defect of either type I, III, or V. Therefore,
) ( 2
1
) 1 ( ) (
K H
DW BW W
K
DB
H
K
K H DC
P P P P C P
.
(2.10)
22
5. Probability that two consecutive rows need to be replaced: At least one pair of
cells that are in the same column has a type V fault with respect to each other. None
of the cells in the two rows has a fault of either type I, III, or IV. Therefore,
) ( 2
1
) 1 ( ) (
K W
DB BW B
K
DW
W
K
K W
P P P P C
DR
P
.
(2.11)
In equations above, recall that the parameter H includes not only the original rows
but also the spare rows. Likewise, the parameter W includes both original and spare
columns (if any). This allows us to capture the impact of defects on spare
rows/columns, which is becoming increasingly important as defect densities increase
and so do the numbers of spare rows and columns.
Now using the equations (2.7)-(2.11), we can calculate the probability mass
function, i.e., the probability that exactly C
s
spare columns and R
s
spare rows are
required as follows.
s s s s
s
s
s
s
ss s s
C R H C W R W H
DW DB BW B W C
rc - 2y - C
C rc - 2y - C rc - 2y - W
rc - 2x - R
R rc - 2x - R rc - 2x - H
rc
RC rc 2y) - (W 2x) - (H
x
DR x 1 H
y
DC y 1 W
/2 R
0 x
/2 C
0 y
2x) R 2y, min(C
0 rc
S S
) P P P P P P (
P C P C
P C P C P C
) C columns spare of no AND R rows spare of P(no.
1
.
(2.12)
The equation that we need ultimately is a cumulative distribution function:
SS
R
r
C
c
S S
c columns spare of no AND r rows spare of no P
C columns spare of no and R rows spare of no P
00
) . . (
) . . (
.
(2.13)
23
Based on such a cumulative distribution function, we can estimate how many
spare rows and columns are required for the given memory configuration to
maximize yield-per-area.
Our model does not use a clustering factor; however, it avoids pessimism by taking
into account the regular structure of SRAM and identifying more precisely the spares
required to repair a given combination of defects. We will later show that our
approach performs pretty close to other methods that use clustering factors.
E. Novelty of Our Approach
Our observation is that most yield models introduce and rely heavily on clustering
factors largely to compensate for inaccuracies in their array-level analysis. It is
known that defect distributions tend to be clustered, but, the level of clustering is
exaggerated by previous approaches. Moreover, process-dependent clustering factors
are empirically computed using the yield measured from actual production in
conjunction with that estimated by the model. Furthermore, the value of clustering
factor is very unpredictable, especially for immature manufacturing lines as well as
new fabrication processes. In such cases, all yield predictions provided by existing
models, before the actual mass production, can be very inaccurate.
Clustering factors are typically used to explain why fewer spares are generally
needed than the number of defects. However, in the case of the regularly structured
architectures like SRAM, even without clustered defects (i.e., even when all defects
are independent), there exist many defect combinations for which the number of
spares is less than the number of defects. For example, multiple single-cell defects in
a row require a only single spare row. In contrast to all previous approaches, our
24
model uses the simple Poisson statistics; however, our array-level analysis considers
specific structural characteristics of SRAM arrays (or sub-arrays), enabling much
more accurate yield estimations than the previous models. These considerations are
captured in equations (2.7)-(2.12). Most previous yield models, however, do not
accurately capture structural characteristics. For instance, [63] and [64] consider
some fixable defect combinations for given spare rows and columns incompletely.
As an example, assuming that one spare row is given, the number of possible defect
cases that are fixable by a spare row is counted as three, namely, 1) a zero defect, 2)
one single cell defect, and 3) a wordline defect. There are however other numerous
cases that can be covered by a single spare row in addition to three cases, as
discussed above. Due to such reasons, the previous approaches tend to heavily rely
on the clustering factors to compensate for the inaccuracies of their array-level
analyses. This results in clustering factors that are higher than the actual clustering
factors for the fabrication processes. Our results show that when we do not consider
any clustering effects, our yield estimates are somewhat pessimistic, but very close
to the actual yield (as shown in Section 2.6.A).
F. Yield for H-tree Inter-sub-array Interconnect
Now consider the simplified interconnect model shown in Figure 2.2. We can
estimate the yield of such interconnects. The total length of interconnects, L, is
derived as
S W
H
L
b a b b a b a a
b b a a b a
1 1 1 2 1
2
2 2 2 2 2
3 2 2 2 3 2
2
,
(2.14)
25
where S is the spacing between sub-arrays, H and W are respectively the number of
rows and columns of cells within each sub-array, and 2
a
and 2
b
are respectively the
number of rows and columns of sub-arrays, and a > 0.
We can now derive yield formula for the bus interconnect by using critical area
analysis for two metal wires with length L, width g, and spacing w. The critical area
A
C
for extra and missing metals in two wires is derived as:
. metals missing for
g
L
metals extra for
w
L
A
C
,
, ,
2
2
(2.15)
Given defect densities of missing and extra metals are respectively d
m
and d
e
, the
yield formula for bus is obtained when the bus width is N as;
) 1 (
1 1
2
N L d
g
d
w
bus
m e
e Y
.
(2.16)
G. Area Overheads
The height and the width of a single 6T SRAM cell are respectively 38 λ and 28 λ,
giving a cell area of 1064 λ
2
. Hence, the total area of array is H·2
a
·W·2
b
·1064 λ
2
.
When the array is partitioned into sub-arrays and the H-tree routing is incorporated,
the area can be computed as [W·2
b
·28λ+S·(2
a
-1)]·[H·2
a
·38λ+S·(2
b
-1)]. The size and
the number of multiplexers to switch between faulty and non-faulty rows/columns
are determined by the number of spare elements and the size of sub-array. Given the
number of spare rows and columns in each sub-array as respectively R
s
and C
s
,
(R
s
+1)-to-1 and (C
s
+1)-to-1multiplexers are needed at each column and row,
respectively. The number of column multiplexers in each sub-array is H −R
s
+1, while
26
that of row multiplexers in each sub-array is W −C
s
+1. The area overhead of each
multiplexer can be approximated as 3000 λ
2
per multiplexer input [124] [125]. The
areas of a single spare row and column are respectively W·28λ and H·38λ.
The above model can be easily used to consider partially good SRAMS as being
sellable memory modules. In subsequent chapters, we will generalize our model to
consider other architectures that use redundancy, including those that use redundancy
hierarchically and those that use up-sizing to mitigate the effects of likely defects.
2.5 Architectural Considerations and Tradeoffs
A. Redundancy Architectures
Spares and ECC (error correcting code) are two major techniques to increase yield
and reliability of memory systems [57][61][106]. Spares and ECC are primarily
designed for hardware defects and soft errors, respectively. Since our focus in this
chapter is on defects, we will limit our discussion to hardware redundancy (i.e.,
spares). Among possible designs for repairable memory systems that use redundancy,
we focus on techniques that use spare rows and columns. Figure 2.6 illustrates
operational principles of multiplexing circuitry. As described in Section 2.4.F, the
sizes of multiplexers are dependent on the numbers of spare rows and columns.
B. Tradeoffs Associated with Granularity
For large memories, partitioning a single memory array into several sub-arrays can
reduce the access time and power [107]. It can also improve the yield-per-area. Let
us undertake a first-order analysis of the effects of granularity on yield and area by
considering the (1×1) and (2×1) architectures shown in Figure 2.7. Assume that for
27
the (1×1) architecture a certain yield-per-area is obtained when the number of spare
rows and columns, i.e., R
s
and C
s
are respectively equal to p and q. Next consider a
(2×1) architecture, which is comprised of two sub-arrays designed with sufficient
spares to provide the same yield-per-area as the (1×1) architecture. As shown in
Figure 7, assume that for this architecture R
s
= r and C
s
= s. Let us now examine
some characteristics of the overheads.
(a) (b) (c)
Figure 2.6. (a) Column multiplexing structure. (b) Normal operation when there is no fault.
(c) Repairable operation when 4
th
column must be replaced due to one or more defects.
Array
Sub-Array
Sub-Array
p
q s
r
Array
Sub-Array
Sub-Array
p
q s
r
Figure 2.7. A (1×1) and a (2×1) architecture.
1. For the (1×1) architecture, the spare rows (R
s
= p) can be used to mask defects in
the entire array that require spare rows. In contrast, in the (2×1) architecture, each set
of spare rows (R
s
= r) can be used to mask defects only in the corresponding sub-
28
array. If, for the sake of a first-order analysis, we assume that the probability of a
row having defects is identical for both architectures, it is easy to see that the (2×1)
architecture will require a larger number of spare rows, i.e., 2r > p. In other words, to
a first order, the area overhead due to spare rows will be higher for the (2×1)
architecture.
2. Now consider the multiplexers required to replace faulty rows by spare rows. In
the (1×1) architecture, at each row we require a (p+1)-to-1 multiplexer. In contrast,
for the (2×1) architecture, at each row we require a (r+1)-to-1 multiplexer, where
typically r < p/2. Hence, the overheads associated with row multiplexers will
decrease.
3. Next consider the spare columns. Each column in the (1×1) architecture is, to a
first-order approximation, twice as long as a column in each sub-array in the (2×1)
architecture. Consequently, the probability of a column replacement is much higher
for the (1×1) architecture. Hence, to achieve comparable yield-per-area, higher number
of spare columns would be required for the (1×1) architecture, when compared to the
(2×1) architecture. In other words, q > s.
4. As in the case of row multiplexers, this will cause the (1×1) architecture to have
higher column multiplexing overhead.
5. However, the (2×1) architecture has additional overheads associated with the
additional H-tree routing required to connect the sub-arrays.
The above discussion illustrates – only to a first-order approximation – five of the
many differences between the two architectures. Now consider two extreme cases,
namely very large and very small memory capacities. For an extremely large
29
capacity memory, the large number of spare columns and the high complexity of
multiplexers (row and column) will cause the (1×1) architecture to have very high
overhead. On the other extreme, for extremely small memory capacities, the benefits
of (2×1) architecture will vanish while its overheads in terms of additional spare
rows and H-tree interconnects will cause it to be uncompetitive. From another
perspective, all overheads do not decrease with granularity. Hence, for moderately
large memories, we expect the optimal architecture to be at some intermediate level
of granularity.
Next we use our yield model to determine the actual (not first order) trends and
tradeoffs and derive optimal designs with respect to the granularity.
2.6 Experimental Results
The yield model we have developed has been implemented in C++ and evaluated.
We then use this model to explore the tradeoffs between various granularities. Spare
rows and columns are assumed to be non-ideal, i.e., they can be defective. For single
cell defects, spare columns are used for replacement.
A. Validation of Our Yield Model
We compare our approach (curve 2 in Figure 2.8) for array-level analysis with
other previous approaches based on the array-level analysis in [63][64] that use
poisson statistics (curve 3 in Figure 2.8) and multivariate negative binomial
distribution (curve 1 in Figure 2.8) for different memory sizes. For SRAMs of
relatively small sizes, all three methods estimate similar yields. However, as the size
of memory increases, when used in conjunction with precise approaches for array-
30
level analysis, the degree of pessimism of poisson statistics is so severe that, when
the size of SRAM is 8MB, the yield difference is nearly 50%. In contrast, the yield
values estimated by our method (curve 2) remains very close to the yield of
multivariate binominal distribution (curve 1) independent of the size of SRAM. For
the binomial distribution, clustering factor was empirically chosen as 3, whose
value is believed to be in the range of 0.3 and 5 [65]. These results clearly show that
our approach can accurately estimate yields without introducing clustering factors,
which must be computed empirically. This makes our model especially suitable for
predicting yield for a new fabrication process. In turn, this makes our yield model
suitable to derive optimal SRAM designs to be fabricated in a new process.
0
0.2
0.4
0.6
0.8
1
(1)
(2) (3)
512KB 1MB 2MB 4MB 8MB
Size of memory
Yield
Figure 2.8. Yields estimated by three methods when the spare rows are used for repairs,
while varying the size of SRAM – three methods are respectively based on (1) multivariate
negative binomial distributions, (2) our hierarchical method, and (3) poisson distributions.
B. Yield vs. Spares
Figure 2.9 shows the effect of spares on memory yield, where we can see that the
yield improves significantly with a relatively small number of spares. However,
beyond a certain number of spares, additional spares provide infinitesimal yield
31
improvements; when the yield gain due to spares begins to decrease, we need to
decide where to stop adding spares by using yield-per-area analysis. Table 2.3 shows
the numbers of spare rows and columns that optimize yield and yield-per-area for
three sizes of SRAMs for some granularities. Note that the numbers of spares that
maximize yield are larger than those that maximize yield-per-area, since the former
ignores the adverse effect of area overheads. Also note that simulation results shown
in Table 2.3 match with our first-order analysis in Section 2.5.B. As an example, for
the granularity (2×1) a 32MB SRAM requires 8 spare rows and 13 spare columns
compared to 12 spare rows and 21 spare columns required by the granularity (1×1).
As deduced by first-order analysis, the total number of spare rows (for both sub-
arrays) in (2×1) is 8×2=16, which is indeed greater than the total number of spare
rows for (1×1), namely 12. Also as expected, the number of spare columns required
by (2×1) architecture is lower than that for (1×1). Also note that for each size of
SRAMs in Table 2.3, the granularities that achieve the maximum yield-per-area are
respectively (1 1), (2 2), and (2 2), indicating that designs with finer
granularities tend to achieve better yield-per-area with the increase of the memory
size. (More in the subsequent subsection.)
Table 2.3. Number of spare rows and columns required for optimal yield and yield/area for 3
SRAM sizes.
No. of redundant rows and columns required (per sub-array) Type of
division 16Mb 32Mb 64Mb
Objective Yield* Yield/Area Yield* Yield/Area Yield* Yield/Area
(1 × 1) (11,16) (8,13) (15,25) (12,21) (23,39) (26,39)
(2 × 1) (7,12) (6,8) (11,15) (8,13) (16,26) (13,20)
(2 × 2) (6,8) (4,5) (8,11) (5,8) (11,16) (8,13)
(4 × 2) (4,7) (3,4) (5,6) (3,4) (7,12) (6,8)
(4 × 4) (4,5) (2,3) (4,7) (3,4) (6,8) (4,5)
We stop adding spares when additional spares provide infinitesimal yield improvement.
32
( λ
-2
)
No. of spare cols
No. of spare rows
Figure 2.9. Yield/Area for 32MB for different number of spares.
C. Effect of Defect Rates and Memory Sizes on Optimality
We here discuss the impact of defect densities and memory sizes on optimal
granularities. Figure 2.10 shows the trends of yield-per-area for various granularities
while defect densities are varied. Our evaluation results indicate that increases in
defect densities enables designs with finer granularities to have better yield-per-area.
In the figure, for the defect densities of 20 defects/cm
2
and 100 defects/cm
2
, the
coarsest design achieves the optimality, while, for the defect densities of 500
defects/cm
2
and 1000 defects/cm
2
, the configurations with the sub-array size of
10241024 (i.e., memory device with 4 sub-arrays) and 512 1024 (i.e., memory
device with 8 sub-arrays) achieve the optimality, respectively. We attribute this
phenomenon to the fact that under high defect rates, memory with coarse
granularities tends to incur large area overheads due to spares and the associated
spare switching circuitries. Note that in Figure 2.10(a), we confine the candidate
designs into small set of configurations, i.e., from 128 256 to 2048 2048, since we
33
know that other design candidates with finer granularities are inferior to coarser
designs in terms of yield-per-area. Figure 2.10(b) shows the predicted general trends
with full set of granularities as defect densities are varied. As a minor note, it is
intuitively clear that yield-per-area values under higher defect densities tend to be
lower than the yield-per-area values under lower defect densities, as shown in Figure
2.10(b), due to higher cost of spares and hence higher area overheads.
As briefly stated above, designs with finer granularities tend to achieve better
yield-per-area with the increase of the memory size for given defect densities.
However, note that the optimal size of sub-array remains more or less the same
regardless of the memory size. Figure 2.11 compares the yield-per-area trends for
three sizes of SRAM, namely 1Mb, 4Mb and 16Mb under the defect density of 1000
defects/cm
2
. In case of 1Mb SRAM, optimality is achieved when it has 2 banks,
while for 4Mb SRAM, optimal design is when it has 8 banks. For 16 Mb SRAM,
design with 32 banks achieves optimality.
Granularity
Yield-per-area
Coarse Fine
High defect rate
Low defect rate
Granularity
Yield-per-area
Coarse Fine
High defect rate
Low defect rate
1. 7E-10
1. 8E-10
1. 9E-10
2E-10
2. 1E-10
2. 2E-10
2. 3E-10
128x256
256x256
256x512
512x512
512x1024
1024x1024
1024x2048
2048x2048
Sub-array dimension
Yield-per-area
20 defects/cm2
100 defects/cm 2
500 defects/cm 2
1000 defects/cm 2
(a) Simulation results (b) Predicted general trends
Figure 2.10. Yield-per-area trends of 4Mb SRAM when sub-array dimensions are varied for
several defect rates.
34
1. 6E-10
1. 7E-10
1. 8E-10
1. 9E-10
2. 0E-10
(16x8) (8x8) (8x4) (4x4) (4x2) (2x2) (2x1) (1x1)
Sub-array dim ension
Yield-per-area
6. 0E-10
6. 5E-10
7. 0E-10
7. 5E-10
8. 0E-10
(8x8) (8x4) (4x4) (4x2) (2x2) (2x1) (1x1)
Sub-array dimension
Yield-per-area .
3.5E-11
3.9E-11
4.3E-11
4.7E-11
5.1E-11
(32x32) (32x16) (16x16) (16x8) (8x8) (8x4) (4x4) (4x2)
Sub-array dim ension
Yield-per-area
(a) 1Mb (b) 4Mb (c) 16Mb
Figure 2.11. Comparisons of yield-per-area trends for two sizes of SRAM (defect densities
are set to 1000 defects/cm
2
).
2.7. Concluding Remarks
We have developed a new yield model for SRAMs that does not require an
empirically computed clustering factor. We showed that this model computes yields
that are close to other methods that require such clustering factors. This is attributed
to the fact that our yield model takes into account the architectural characteristics of
SRAM and carries out a more accurate array-level analysis of the capabilities of
available spares. This makes our model particularly suitable for predicting yield for a
completely new fabrication process.
We characterized tradeoffs between granularity and redundancy for SRAM using
our new hierarchical yield model. Our analysis allows us to determine how the
original array should be partitioned into sub-arrays, and to estimate the numbers of
redundant rows and columns that maximize yield-per-area. We carried out tradeoff
analysis for several SRAM sizes, and examined yield enhancements due to
redundancy and associated area overhead. We also explored their tradeoffs for
35
different memory sizes. Our study shows that optimal yield-per-area is typically
achieved at intermediate levels of granularity.
In this chapter, all the hard defects were assumed to be handled by spares. In a
later chapter, ECC (Error Correcting Codes), introduced to combat soft-errors, is
used to handle some of the hard defects. We then can receive higher yield returns
when ECC schemes are used in conjunction with spares.
In the next chapter, we discuss several spare switching schemes and analyze
their overheads in terms of area and performance. This study is a part of our effort to
identify optimal yield-per-area SRAM configurations for given constraints on
performance and soft-error resilience.
36
Chapter 3
Characterization of Overheads due to Spare Switching Schemes
3.1 Introduction
As stated eariler, SRAM devices take most of die area for many SOCs. Therefore,
yield of SOCs is primarily dependent on SRAM yield. In this regard, many memory
products implement spares to improve yield, where spare rows and columns are used
for replacing faulty rows and columns [109][132][133]. Commercial SRAM
compilers support only small number of spares since they are mainly for mature
process with a few defects per cm
2
[82]-[86]. We however consider wide ranges of
defect rates, since increasing number of manufacturers of SRAM are forced to adapt
immature fabrication due to market pressures, causing increasing numbers of spares
to maintain desired levels of yield in immature fabrication lines.
In this chapter, we explore several types of spare switching schemes and identify
their characteristics. We quantify area and performance overheads due to
incorporation of spares, and then derive models to estimate overheads. Such
quantitative models will be later used for identifying optimal SRAM configuration in
conjunction with the yield model developed in the previous chapter.
3.2 Spare Switching Schemes
Switching to spares is accomplished using multiplexers, where appropriate
control inputs need to be applied. The value applied to control inputs can be stored in
registers that are programmable during chip’s lifetime. As an alternative, we can
37
disconnect faulty elements by blowing out appropriate fuses, which permanently
fixes the configuration during manufacturing. We opt to consider the former
approach, to take advantage of its flexibility and re-programmability.
A. Column Switching
a. Two Possible Switching Schemes
As for multiplexing scheme for introducing spare columns, we consider two types
of implementations illustrated in Figure 3.1. These differ in how multiplexers are
connected for column multiplexing. The first scheme connects the first input of each
multiplexer to the corresponding column in the original array, while the rest of inputs
are connected to the adjacent columns to its right (Figure 3.1). The columns near the
end of the original array use the spare columns as their neighbors to their right. If a
column in the original array is faulty, the corresponding multiplexer configures
around the faulty column by switching between the original and the adjacent column.
If both the original and the adjacent column are faulty, the third input of multiplexers
is connected, and so forth. On the other hand, in the second scheme, the first input of
each multiplexer is connected to the corresponding original column as in the first
scheme. However, in this configuration, the rest of inputs are connected to spare
columns, not neighbors (Figure 3.1).
38
1
1 1
S[0:no. of spares-1]
W[0 : word width - 1]
Word Select
Column
decoder
i
th
column
(i+1)
th
column
(i+Ns)
th
column
i
th
MUX
MUX MUX MUX
i
th
column
1
th
spare column
(Ns)
th
spare column
2
nd
spare column
Row Decoder
i
th
MUX
Scheme 1
Scheme 2
Spare switching mux’s
Sens
Amps
ROM
Figure 3.1. Two spare column switching schemes
We next analyze and quantify area overheads and access time penalties for
above two schemes and carry out comparative study.
b. Area Estimates and Performance Penalties
We use Design Compiler for logic synthesis in conjunction with layout information
[124][145][147] to estimate the area overheads associated with spare column
switching circuitry implementations under 180nm TSMC technology. We then
carried out curve fitting to obtain approximate models (Equation (3.1)) in terms of
N
R
, N
C
and L
W
, which respectively denote the number of rows, the number of
columns and word width. In the equation, each term respectively represents the cost
of spare columns, registers, and multiplexers. Figure 3.2 compares the area
39
overheads obtained from Verilog synthesis and approximate models that we derived.
Note that two abovementioned column switching schemes have the identical number
and size of multiplexers and registers, thereby having the same complexities. They
however incur different performance penalties, as explained below.
V; Verilog Syn.
A: Approximate
Number of spares
Area overheads ( λ
2
)
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
9000000
02 46 8
(128X128, 32, V)
(256X256, 32, V)
(512X512, 32, V)
(128X128, 64, V)
(256X256, 64, V)
(512X512, 64, V)
(128X128, 32, A)
(256X256, 32, A)
(512X512, 32, A)
(128X128, 64, A)
(256X256, 64, A)
(512X512, 64, A)
Figure 3.2. Area overheads due to spare column switching schemes for various numbers of
spares.
) 1 N ( log L 1400 ) 1 N ( log N 1600 N N 1064 Area
SC 2 W SC 2 C R SC col
(3.1)
We next analyze access time penalties for two schemes. The differences in the
inputs to the spare switching multiplexers change the fanin and fanout capacitances
seen by signals in the path. Table 3.1 summarizes fanin/fanout capacitances for the
two schemes in a categorized manner. As for Scheme 1, note that fanin and fanout
capacitances due to spare switching multiplexers seen by signals for both original
and spare column are identical. However, critical path is always on original column
access, since signals on an original column should go through word select
multiplexers, while signals on spare column do not need to. As for Scheme 2, critical
40
path can be either on original columns or on spare columns, depending on the word
length and number of words – for large word length, access to spare columns tends
to be the critical path, while for large number of words, access to original columns is
likely to become critical path, referring to Table 3.1.
Table 3.1. Comparisons of fan-in/-out capacitances in critical paths for two schemes.
no. of spares + 1 no. of spares + 1 no. of spares + 1 Fan-out
- no. of words no. of words Fan-out
Scheme 1
no. of spares + 1
1
Spare
multiplexer
Word
select
Critical Path Type
Fan-in
Fan-in
word length 1
- 1
Spare Original
Scheme 2
no. of spares + 1 no. of spares + 1 no. of spares + 1 Fan-out
- no. of words no. of words Fan-out
Original and spare
Scheme 1
no. of spares + 1
1
Spare
multiplexer
Word
select
Critical Path Type
Fan-in
Fan-in
word length 1
- 1
Spare Original
Scheme 2
.
Fan-in/-out
We have implemented delay model for two schemes using C++ on a Sun4U Sparc
with 512MB of main memory (see Appendix for details.). The parameters used in
our experiments are based on a 0.18um TSMC technology [146]. Figure 3.3
compares access time overheads for two schemes for varying word sizes when the
size of memory is 1Mb. We see that for the word length of 32 bits, Scheme 2 is
always faster than Scheme 1, independent of the number of spares. This is due to the
fact that the critical path for Scheme 2 is always on the original column accesses for
the word length of 32 bits, whose delay is always faster than Scheme 1, as shown in
Table 3.1. When the word length becomes 64 bits, Scheme 1 is faster than Scheme 2
only when the number of spares is not so significant, as illustrated in Figure 3.3.
However, as the number of spares increases, Scheme 2 becomes faster. In this case,
the critical path for Scheme 2 is always on the spare column accesses, and its delay
can be either larger or smaller than that of Scheme 1, depending on the number of
41
words, the number of spares, and the word length as illustrated in Table 3.1. For the
word length of 128 bits, we observe that Scheme 1 is faster than Scheme 2 for wide
ranges of number of spares. The above explanation for word length of 64 bits is also
applicable to this case. Note that Scheme 2 tends to be faster for large word lengths.
As a conclusion, these experiments indicate that the optimal spare switching scheme
is dependent on memory configurations and defect rates.
(ns)
Number of spares
(ns)
Number of spares Number of spares
word width: 32 bits 64 bits 128 bits
1Mb(1024x1024) SRAM
(a) (b) (c)
1.000E-09
1.100E-09
1.200E-09
1.300E-09
1.400E-09
1.500E-09
1.600E-09
1.700E-09
1.800E-09
1.900E-09
0 2040 6080 100
scheme 1
scheme 2
(ns)
1.000E-09
1.100E-09
1.200E-09
1.300E-09
1.400E-09
1.500E-09
1.600E-09
1.700E-09
1.800E-09
1.900E-09
0 20406080 100
1.000E-09
1.100E-09
1.200E-09
1.300E-09
1.400E-09
1.500E-09
1.600E-09
1.700E-09
1.800E-09
1.900E-09
0 20 406080 100
.
Column access time
Figure 3.3. Comparison of column access times for two schemes for some word lengths
while the number of spares are varied.
B. Row Switching
Major components of spare row switching circuitry are a ROM that stores faulty
row addresses and a wordline activation/deactivation signal generation logic that
disables (enables) original wordlines and enables (disables) a spare wordline when
an incoming row address signal tries to access faulty (non-faulty) wordline, as
illustrated in Figure 3.4(a). Trends of area overheads due to such spare row switching
schemes are shown in Figure 3.4(b), based on Verilog synthesis and layout
information [124][145][147]. The approximate area model is presented in equation
(3.2), where the first term represents the cost of spare rows, while the second one
indicates the overheads of ROMs that contain the faulty row address. The rest of the
42
terms are due to the overheads associated with wordline activation/deactivation
circuitries. Delay penalties due to such row spare switching circuitries are mostly due
to the ROM access for retrieving faulty address information and the driving of
enable/disable signals of original wordlines, which can be estimated by Elmore delay
model in conjunction with progressive sizing [124][145][147]. (See Appendix for
details.). Figure 3.5 shows the trends of row access time with spares for several sizes
of SRAM.
) ( N 35 N 1115 N 722 N ) N ( log 380 N N 1064 Area
2
C SR R SR R 2 SR C row
. (3.2)
Memory Array
Row Decoder
Spare Rows
Programmable Fault Address Decoder + Enable/disable flag generator
Enable/disable Flag
Number of spares
Area overheads ( λ
2
)
V: Verilog Synthesis
A: Approximation
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
16000000
18000000
0 5 10 15
(256x256, V )
(512x512, V )
(1024x1024, V )
(256x256, A)
(512x512, A)
(1024x1024, A)
Row Address
(a) (b)
Figure 3.4. (a) Spare row switching circuitry and (b) its area overheads.
0.00E+00
2.00E-10
4.00E-10
6.00E-10
8.00E-10
1.00E-09
1.20E-09
1.40E-09
1.60E-09
1.80E-09
14 8 12 16 20
256Kb
1Mb
4Mb
16Mb
(ns)
Number of spares
Figure 3.5. Trends of increase of row access time due to spare switching circuitries.
43
3.3 Concluding Remarks
Spares can improve yield, while imposing overheads on area and performance. In
the previous chapter, we presented yield model that quantifies the yield improvement
associated with spares. In this chapter, we explored several spare switching schemes
and analyzed their overheads in terms of area and performance. We also developed
models that estimate area and performance, which will be later used for identifying
optimal SRAM design that maximizes yield-per-area for given constraints on
performance and soft-error resilience. As for spare column switching schemes, we
discussed two types of switching schemes and explored their differences and
similarities in terms of performance and area overheads – we showed that either
scheme can be better than the other depending on memory configurations and defect
rates.
In the next chapter, we study optimal sub-array interconnect designs, and
explore the impact of interconnects on memory yield and area.
44
Chapter 4
Sub-array Interconnect Optimizations for SRAM Architecture that
maximize Yield-per-Area.
4.1 Introduction
The impact of interconnects on the system yield and area keeps increasing 1) as
the number of banks tends to increase with the size of embedded SRAM and 2) as
the word length increases. Moreover, with the scaling of fabrication technology,
global interconnects delay increases [3], while gate delay and local wire delay
decrease. The facts indicate that interconnects problems are becoming critical issues.
In order to achieve high yield (as an effort to maximize yield-per-area) in
immature fabrication lines, designers may need to use wide wires and wide spacing
between wires. Such wide wires and wide spacings are respectively helpful for
reducing missing material type defects and extra material type defects [63].
Increasing the width and the spacing of interconnects, however, will result in more
area overheads, increased line capacitances per unit length [142]. In order to reduce
area overheads (as an effort to maximize yield-per-area), we may consider using
minimum size width and spacing. In such a case, the probability of failure due to
missing materials and extra materials will increase. From the delay related
perspective, line capacitances will decrease, while line resistances tend to increase.
This chapter introduces a new methodology for determining the optimum width
and spacing of interconnects for sub-arrays, i.e., banks, which maximizes a yield-
per-area. As a first step, we develop an analytical expression for yield-per-area as a
function of defect densities, wire length, and spacing, where defect distribution
45
presented in [65] is used. We then carry out partial derivatives on the analytical
expression with respect to width and spacing to identify the optimal values of width
and spacing that achieve the maximum yield-per-area. We then carry out several case
studies that include the case where analytically obtained optimal width and/or
spacing is less than the minimum width and/or spacing rule specified by
manufacturers.
We then briefly explore the impact of buffer insertion on system yield and area.
We also discuss incremental changes of SRAM bit cell layout to investigate the
possibility of improving yield-per-area for given bit cell layout. Our experiments
show that only a slight improvement can be made through bit cell layout
modifications.
4.2 Methodology
We first carry out critical area analysis to estimate yield. We assume 1) a circular
spot defect, whose diameter is denoted l, and 2) the following defect size distribution
f(l) [63]
M o
P
l l l when
l
k
l f , ) ( ,
(4.1)
where k = (p−1) ·l
o
p-1
·l
M
p-1
/
( l
M
p-1
− l
o
p-1
), l
o
and l
M
are the minimum feature size
(i.e., lambda) and the largest defect size respectively. Assuming that l
o
<< l
M
[63], the
value of k can be simplified as (p−1) ·l
o
p-1
. The typical ranges of the value of p are
known to be between 2 and 3.5 [63].
Let us now start from two disjoint parallel conductors (Figure 4.1). Let us then
denote critical areas for extra and missing material defect types with diameter l
1
and
46
l
2
as CA
e
(l
1
) and CA
m
(l
2
), respectively. Their analytical expressions can be derived
as follows [63].
material missing for , w l when , w l ) w l (
2
1
L ) w l ( ) l ( CA
material extra for s l when , s l ) s l (
2
1
L ) s l ( ) l ( CA
2
2 2
2 2 2 2 m
1
2 2
1 1 1 1 e
,
(4.2)
where the conductor length and width are respectively are denoted L and w.
We then apply integral to compute the averaged critical areas over possible defect
sizes as [63]
material missing for dl ) l ( f ) l ( CA CA
material extra for dl ) l ( f ) l ( CA CA
M
M
l
lo
2 2 m m
l
lo
1 1 e e
.
(4.3)
short
open
s
w
l
L
Figure 4.1. Two parallel conductors, two types of point defects (short and open), and
dimensions.
Let d
e
and d
m
denote the average numbers of extra and missing material type
defects per unit area. The average numbers of extra and missing material type defects
are then respectively expressed as CA
e
·d
e
and CA
m
·d
m
.
The quadratic term in the equation (4.2) becomes negligible when the length of the
conductor becomes much longer than the width of conductor, which is the case with
47
our long interconnects. Therefore, when defect sizes for missing and extra materials
are respectively are l
1
and l
2
, critical areas for missing and extra materials can be
simplified as follows.
materials missing for
. otherwise 0
l w when L ) w l (
) l ( CA
materials extra for
. otherwise 0
l s when L ) s l (
) l ( CA
2 2
2 m
1 1
1 e
, (4.4)
where, s, w and L denote the spacing, the width and the length of conductors,
respectively.
Given that defect densities for extra materials and missing materials are
respectively d
e
and
d
m
, and defect density size distribution is k / l
p
, we then can
formulate the yield equation as:
) d L dl ) w l (
l
k
2 d L dl ) s l (
l
k
exp
) d CA 2 d CA ( exp Yield
m
l
lo p e
l
lo p
m m e e
M M
(
. (4.5)
Note that in Figure 4.1 there are two sources of missing material type of defects,
while there is one source of extra material type defect. Also note that the area of two
parallel conductors is (2w+s) ·L.
We then obtain a yield-per-area value as:
L ) s w 2 (
) d L dl ) w l (
l
k
2 d L dl ) s l (
l
k
( exp
Area
Yield
m
l
lo p e
l
lo p
M M
. (4.6)
Typically, l
o
<< l
m
and s << l
m
hold, and we use such facts to approximate the results
of integrations as follows:
48
L s w
w
p p
d L k s
p p
d L k exp
Arera
Yield
p
m
p
e
) 2 (
)
) 1 )( 2 (
1
2
) 1 )( 2 (
1
(
2 2
,
L s w
w d s d
p p
L k
exp
p
m
p
e
) 2 (
)) 2 (
) 1 )( 2 (
(
2 2
, (4.7)
L s w
w d s d
p
L l
exp
p
m
p
e
p
o
) 2 (
)) 2 (
) 2 (
(
2 2
1
.
The general shape of yield-per-area equation is concave (Figure 4.2) – when the
wire spacing and/or width is set to very small sizes, yield will become substantially
low, causing the yield-per-area to be extremely low, despite the area saving. In the
opposite extreme, where the sizes of wire spacing and width are increased
excessively, area overheads will become substantially high, making the yield-per-
area sub-optimal, even though yield is close to 1.
0
0.05
0.1
0.15
0.2
0.25
0.3
0 5 10 15 20 25
s
L
w
S
Y/A
(for fixed value of w)
0
0.05
0.1
0.15
0.2
0.25
0.3
0 5 10 15 20 25
s
L
w
S
Y/A
(for fixed value of w)
Figure 4.2. Predicted general trends of yield-per-area for two parallel conductors (units are
not real.).
We now derive analytical values of s and w that maximize yield-per-area using
two partial derivatives:
49
1. ) 2 ( 0
2, ) 2 ( 2 0
1 1
1 1
s w s d L l
s
s w w d L l
w
p
e
p
o
Area
Yield
p
m
p
o
Area
Yield
(4.8)
Solving the above two equations we get:
. ) 2 / (
, ) 1 / 2 (
2
1
1 1 *
2
1
1 1 *
p
p
m e m
p
o optimum
p
p
e m e
p
o optimum
d d d L l w
d d d L l s
(4.9)
Based on the above results, we can make some direct observations as follows:
1. As the length of the two conductors (L) increases, so do the optimal
width and spacing. It is due to the fact that, as the length increases, the
probability that defects cause missing/extra materials tends to increase.
Therefore, to achieve optimality, the width and spacing must be
increased.
2. As the defect rates (d
e
or d
m
) increase, so do the optimal width and
spacing.
Let us then explore specific cases where the optimal width and/or spacing are
smaller than the minimum dimensions specified by design rules. We respectively
denote S
m
and W
m
the minimum spacing and the minimum width, according to
design rules (for instance, the minimum spacing rule between polys is 3 λ). When the
width (or the spacing) that maximizes the yield-per-area, namely w*
optimum
(or
s*
optimum
), is smaller than the W
m
(or S
m
), w*
optimum
(or s*
optimum
) should be replaced by
W
m
(or S
m
) in the analysis as follows.
50
Case 1) when the analytically derived spacing is smaller than the
dimension specified by design rule, i.e., s*
optimum
= S
m
.
The value of w*
optimum
can then be calculated after fixing s to be S
m
.
. 0 2
, 1 ) 2 (
, 2 ) 2 (
) 1 (
2
0
1 1 1
1 1
1
m m
p
o m
p
o
p
m
p
m
p
o
m
p m
Sm s
Area
Yield
S d L l w d L l w
S w w d L l
S w w
p
d L k
w
(4.10)
Assuming from this point on p = 3, which is one of typical values [4],
the optimal value of w can be obtained:
. 2 ) (
, 0 2
, 0 2
2 2 2 2 *
2 2 2
1 1 1
m m o m o m o optimum
m m o m o
m m
p
o m
p
o
p
S d L l d L l d L l w
S d L l w d L l w
S d L l w d L l w
(4.11)
Case 2) when the analytically derived width is smaller than the dimension
specified by design rule, i.e., w*
optimum
= W
m
.
The value of s*
optimum
can then be calculated after fixing w to be W
m
.
.
2
8 ) (
, 0 2
, 0 2
, 1 ) 2 (
, 1 ) 2 (
) 1 (
0
2 2 2 2
*
2 2 2
1 1 1
1 1
1
m e o e o e o
optimum
m e o e o
m e
p
o e
p
o
p
m
p
e
p
o
p e
Wm w
Area
Yield
W d L l d L l d L l
s
S d L l s d L l s
W d L l s d L l s
s W s d L l
s w s
p
d L k
s
(4.12)
Note in the above analysis that we have only considered the area overheads due to
interconnect in the denominator at the yield-per-area expression. We now include the
area overheads due to SRAM cell arrays, denoted A, by adding the term in the
51
denominator. Also note that the factor of 2 multiplied at the missing material type
calculation in eqn (4.5) is removed. We then carry out partial derivatives with respect
to width and spacing, and solve two equations to obtain the following:
2
4 ' ' ' '
2
2
2 2
*
e o e o e o
optimum
d A l d K L l d K L l
s , (4.13)
2
4 ' ' ' ' ' '
2
2
2 2
*
m o m o m o
optimum
d A l d K L l d K L l
w . (4.14)
where,
2 / 1 2 / 1
1 ' ' , 1 '
e
m
m
e
d
d
K
d
d
K and ' L is the interconnect length
multiplied by word length.
4.3 Experiments on Optimal Spacing and Width
We next present experimental results on sub-array interconnect optimizations for
various defect densities and number of banks. We assume that chip area is 5cm
2
and
75% of the chip area is used for SRAM device considering that SRAM takes 50-90%
of die area in many SOC’s [3][72]. We also assume that the word length is 64 bits.
Table 4.(a) then lists optimal spacings and widths for various numbers of banks and
defect densities. The values in the table are rounded to integers, to make them integer
multiples of λs. Note that when defect levels are low, analytically derived optimal
spacings can be smaller than the minimum spacing and width rules, which are
typically 3 λ. In such cases, we fix the optimal values to be 3 λ. We also assume the
d
e
and d
m
to be equal for the simplicity of experimental setup. We see in the table
that the optimal spacing and width tend to increase with the number of banks,
52
although the values remain fixed for low defect densities – in fact there are increases
in optimal values with the number of banks, but the amount of increase is small
enough to be rounded off to the same integers for low defect densities. As the defect
densities increase, however, we see noticeable increases in optimal values with the
number of banks. The increases in the number of banks tend to cause the higher
complexities of interconnects, thereby making the yield of interconnects to be low
unless sufficient spacing and width are secured. Table 4.(b), (c), (d) and (e) show the
differences in yield and yield-per-area values when spacing and width are optimally-
sized and minimum-sized. Note that sub-array interconnects tend to have greater
impact on memory yield than memory area for low defect densities and small (or
intermediate) number of banks, while their impact on memory area grows with the
defect densities and the number of banks..
Table 4.1. Optimal spacing (and width) values for various defect densities and number of
banks, and comparisons of yield and yield-per-area values between optimal interconnect
designs and minimum rule designs
(a) Optimal spacing and width values that optimize in terms of yield-per-area.
Number of defects per cm
2
1 10 100 1000
4 3 λ 6 λ 19 λ 27 λ
8 3 λ 6 λ 19 λ 28 λ
16 3 λ 6 λ 20 λ 28 λ
32 3 λ 6 λ 20 λ 28 λ
64 3 λ 6 λ 20 λ 29 λ
128 3 λ 6 λ 21 λ 30 λ
256 3 λ 6 λ 21 λ 31 λ
No. of banks
512 3 λ 6 λ 22 λ 33 λ
53
Table 4.1. Continued
(b) Corresponding yield values for optimal spacing and width.
Number of defects per cm
2
1 10 100 1000
4 1.000 0.998 0.994 0.991
8 0.999 0.995 0.983 0.977
16 0.998 0.990 0.970 0.957
32 0.997 0.983 0.950 0.929
64 0.995 0.974 0.923 0.895
128 0.992 0.960 0.891 0.851
256 0.988 0.942 0.843 0.794
No. of banks
512 0.983 0.917 0.789 0.729
(c) Yield values when spacing and width are fixed into minimum sized.
Number of defects per cm
2
1 10 100 1000
4 1.000 0.996 0.960 0.921
8 0.999 0.989 0.896 0.803
16 0.998 0.980 0.814 0.663
32 0.997 0.966 0.710 0.504
64 0.995 0.948 0.586 0.343
128 0.992 0.922 0.446 0.199
256 0.988 0.887 0.303 0.092
No. of banks
512 0.983 0.840 0.176 0.031
(d) Corresponding yield-per-area values when width and spacing are optimized.
Number of defects per cm
2
1 10 100 1000
4 2.681E-112.674E-112.651E-112.637E-11
8 2.675E-112.656E-112.595E-112.559E-11
16 2.666E-112.631E-112.519E-112.455E-11
32 2.654E-112.596E-112.417E-112.316E-11
64 2.636E-112.548E-112.281E-112.136E-11
128 2.612E-112.481E-112.104E-111.911E-11
256 2.578E-112.391E-111.883E-111.641E-11
No. of banks
512 2.532E-112.270E-111.619E-111.338E-11
Units are λ
-2
.
54
Table 4.1. Continued
(e) Yield-per-area values when spacing and width are fixed into minimum sized.
Number of defects per cm
2
1 10 100 1000
4 2.681E-112.671E-112.574E-112.470E-11
8 2.675E-112.648E-112.400E-112.151E-11
16 2.666E-112.617E-112.174E-111.770E-11
32 2.654E-112.573E-111.891E-111.343E-11
64 2.636E-112.512E-111.552E-119.086E-12
128 2.612E-112.429E-111.174E-115.231E-12
256 2.578E-112.316E-117.906E-122.396E-12
No. of banks
512 2.532E-112.165E-114.523E-127.941E-13
Units are λ
-2
.
(f) Relative area overheads of interconnect with respect to whole memory array.
Number of defects per cm
2
1 10 100 1000
4 0.10 0.20 0.63 0.89
8 0.26 0.53 1.65 2.41
16 0.49 0.99 3.21 4.44
32 0.82 1.63 5.23 7.17
64 1.28 2.52 7.94 11.11
128 1.92 3.76 12.02 16.34
256 2.80 5.46 16.80 22.97
No. of banks
512 4.03 7.76 23.56 31.62
Units are %.
4.4 Impact of Buffer Insertion on Yield and Area
For long interconnects, buffers are sometimes inserted to improve performance
[143][144]. Buffer insertions in sub-array interconnect can be implemented without
incurring additional area overheads. Figure 4.3 shows an exemplary buffer insertion
scheme with no area penalty. If we assume that 1) inverters don’t use metal layer 2
as shown in the figure, and 2) bus interconnects use metal layer 2, buffers can be
placed without widening the spacing between bus wires by putting them diagonally
as shown in the figure. Therefore, even considering buffer insertions, area overheads
55
can be kept at the same level as no-buffer interconnects. As for yield loss due to
buffer insertion, their impact on yield tends to be insignificant – assuming that the
bus width is 128 bits and buffers are implemented at 100 locations, the partial yield
associated with buffer insertion is computed as 0.998 using the similar analysis in
[94]. If buffers are placed at 1000 locations, the yield drops to 0.98.
Figure 4.3. Diagonal buffer insertion scheme for given bus wires without incurring area
penalty.
4.5 SRAM Bit Cell Modification
We now turn our attention to SRAM cell layout as shown in Figure 2.5.
Referring to the layout discussed in [94], we explore the possibility of improving the
layout in terms of yield-per-area.
As discussed in previous chapters, defects impact in a variety of ways. Some
defects cause an entire bitline (or wordline) to be corrupted; some others cause
multiple bitlines (wordlines) to be corrupted; some cause a pair of a bitline and a
wordline to be corrupted simultaneously; and some cause only a certain single bit
cell to be inoperable.
The basic approaches for improving yield-per-area are as follows: based on the
detailed defect data in [94], we analyze that major defect types that cause bitline to
be inoperable are missing and extra material type defects on bitlines and bitbarlines.
56
We will thus incrementally change the spacing and width of bitlines and bitbarlines
to see the effect on the yield-per-area values. As for defects that cause wordline to be
inoperable, major sources are extra material type defects on wordlines, which will be
also changed incrementally for possible improvement. In case of a single bit cell
defects, these are caused by defects at diverse locations such as diffusion, polysilicon,
gate oxide, which are not easy to control by layout modifications, especially since
transistor sizes are determined to ensure correct writing and reading operations. Due
to similar reasons, defects that cause a pair of a bitline and a wordline to be corrupted
are not easily controllable. In case of defects that corrupt multiple bitlines (wordlines),
their critical areas are relatively small compared to other types of defects. They are
thus excluded from our discussion.
As stated earlier, we now carry out incremental changes on layouts, rather than
radical changes like changes to the orientation of components in layouts. We start
from the changes on bitline width. We assume the size of SRAM to be 1Mb. The
original yield of the SRAM and its yield-per-area is calculated as 0.6114 and
5.48×10
-10
( λ
-2
) based on the yield equation in [6]. We first increase the width of
bitline by 1 λ, which makes the width of a bit cell to be 2 λ wider. The new yield
now becomes 0.619, while the area is increased by 7.14%. The new yield-per-area
value is computed as 5.18×10
-10
( λ
-2
), which is smaller than the original yield-per-
area value. Therefore, increasing the width of bitlines is not helpful for improving
yield-per-area. We next try to increase the spacing between bitline (or bitbarlines)
and GND by 1 λ. The overall yield, however, drops into 0.601, which decreases
yield-per-area. We then try to widen the distance between bitline and bit-barline by 1
57
λ. The new yield becomes 0.613, while the area increases by 3.57%. The new yield-
per-area is obtained as 5.30×10
-10
( λ
-2
), which is worse compared to the original
yield-per-area. We next consider increasing the height of cell in order to reduce the
probability of defects that cause wordlines to be short with neighboring elements
within a cell. Increase by 1 λ in the height of the cell, however, results in the
decrease in the yield-per-area to 5.34×10
-10
( λ
-2
), mainly due to increases in defect
probabilities associated with bitline corruptions.
We can now conclude that no improvement can be made through layout
modifications, suggesting that bit cell array with the smallest area in conjunction
with spare rows and columns are the most preferable method of improving yield-per-
area under current manufacturing technology. However, if radically new nano-
fabrication technology is introduced and it has unprecedented high defect densities,
cell level modifications can take crucial part in yield-per-area improvement. As
explored in Chapter 2, we tend to need finer granularities to achieve the optimality in
terms of yield-per-area as defect rates increase, indicating that under extremely high
defect rates, the level of granularity for achieving optimality can be scaled down to
bit cell level. In such cases, spares should be installed in the same bit cell levels,
indicating that cell level layout optimizations will be better alternatives. The
associated study on those issues will be the subject of our future research.
4.6 Concluding Remarks
We presented a novel methodology for optimizing width and spacing
simultaneously for sub-array interconnects. Impact of interconnect width and spacing
58
on interconnect yield and area were studied and analyzed. Tradeoffs between yield
and area are considered and yield-per-area is used as a figure of merit. Analytical
expressions for the optimal width and spacing are derived. Our experiments explore
various trends associated with numbers of sub-arrays (i.e., banks) and defect rates.
We also show that buffer insertion can be done without additional area overheads
and significant yield penalties. We then look into the possibility of improving SRAM
bit cell layout in terms of yield-per-area.
In the next chapter, we will study ECC protection schemes against soft errors and
their partial use against hard defects. Yield enhancement as well as changes in soft-
error resilience due to integrated use of ECC scheme and spares will be discussed
and analyzed.
59
Chapter 5
An Integrated Approach to Exploit Spares and ECC to Efficiently
Combat High Defect and Soft-error Rates in Future SRAMs
5.1 Introduction
A. Motivations
As stated eariler, SRAMs have significant impact on chip yield and reliability,
because they occupy most of die area for many SOCs. According to the ITRS
roadmap [3], SRAMs are projected to occupy more than 90% of SOCs by 2011.
Exploiting the structural regularity of SRAM architectures, many companies and
researchers have developed SRAM compilers [81]-[89]. These compilers are mainly
designed for mature fabrication processes with a few defects per cm
2
and typically
used to design relatively small embedded memories. However, along with the trends
of denser and larger SRAMs, to maintain market share manufacturers of leading-
edge chips are forced to use immature processes, where defect densities are much
higher than for mature processes. Using the trends reported by ITRS [3],
SEMATECH [7], and Intel [126] regarding SRAM size and yield trends and for
various stages of process maturity, we show that during the initial stage of newer
nanoscale processes defect densities can be of the order of hundreds of defects per
cm
2
(see Section 5.3.E).
Soft error rate (SER) is also reported to be increasing for SRAMs, while
decreasing for DRAMs [95][101][103]. Moreover, operation at high altitudes, e.g.,
for aircrafts and space borne vehicles, increases SER significantly compared to
terrestrial operation [127][128]. Hence, it is imperative to explore wide ranges of
60
strengths and lengths of ECC codes.
As mentioned before, our overall goal is to identify optimal SRAM
configuration that maximizes yield-per-area under high defect densities and high soft
error rates, subject to constraints on delay and resilience to soft errors. To achieve
this goal, we need to consider various design alternatives. In this chapter, we focus
on integrated use of ECC protection schemes and spares to combat hard defects and
soft errors, as measured by yield enhancement and resilience to soft errors. As
detailed in Section 5.2, this is in clear contrast with previous combined approaches,
since they only consider the benefits of using spares and ECC to combat defects by
quantifying yield improvements, but ignore the corresponding reduction in the
ECC’s capability to combat soft errors.
B. Basic Ideas: Integrated Use of Spares and ECC
Spares and ECC protection schemes are classically intended for mitigating hard
defects and soft errors, respectively. However, in this paper, we view them in an
integrated manner. For instance, instead of discarding all chips that have too many
defects to be repaired by the available spare rows and columns, in some (many) such
chips, all defects not handled by spares can be masked by the ECC protection
schemes. Of course, such chips will see some loss in their ECC’s error correcting
capabilities.
The degree of loss of such a chip’s error correcting capability depends on the
types of hard defects that are masked by the ECC (and not by spares). A defect can
cause either a single cell, a whole column (or row), or multiple columns (or rows), or
both a column and a row to be defective [63][94]. Among these types of defects,
61
defects that require replacement of one or more rows cannot be handled by ECC,
since the entire code block is corrupted by such defects. In contrast, single cell
defects and single and double column replacement defects can be handled by ECC.
Single cell defects and column replacement defects affect the ECC’s error correcting
capability to very different extents, however. The use of ECC to mask a single cell
defect causes an infinitesimal loss of its error correcting capability, while the use of
ECC to mask a defect that necessitates column replacement can lead to higher loss of
its error correcting capability.
This chapter explores this idea in detail and develops detailed models and metrics
for yield and resilience to soft errors. It also clearly demonstrates the benefits of
taking an integrated view of spares and ECC. We then quantify the overheads
associated with ECC implementation.
5.2 Previous Research
A. ECC Protection Schemes
An ECC scheme adds check bits to data bits and creates coded bits. Depending on
the error correcting capability, an ECC is classified as SEC (single error correction),
SEC-DED (single error correction and double error detection), DEC (double error
correction), and so on. Most research has focused on SEC-DED schemes, since
currently these are commonly used [60]-[62]. However, soft error rates for SRAM
are growing and will continue to grow [95][101][103][104][127][130][131] largely
due to shrinking feature sizes, lower supply voltages, and higher speeds.
62
B. Integrated View of ECC and Spare Switching Schemes
Several previous approaches have taken a ‘synergistic’ or ‘combined’ view of
spares and ECC. However, all these approaches have focused only on the ability of
spares and ECC to jointly combat defects. Hence, these approaches only compute the
yield enhancement enabled by such an approach but do not quantify the
corresponding decrease in resilience to soft errors. In particular, in [134], the authors
propose such a synergistic approach, apply it to a large memory (for its era), and
estimate the yield benefits but ignore any reduction in ECC’s ability to combat soft
errors. This approach is extended in [61], but the focus is still only on yield
enhancement. The approach in [135] also ignores the reduction in ECC’s ability to
combat soft errors. In [60], authors present a combined ECC and spares based
mechanism for self-repair. The authors realize that a combined approach causes a
reduction in the ECC’s capability to combat soft errors, and avoid this by
recommending that ECC not be used to combat hard defects during fabrication.
Instead, they focus on detecting hard faults online using ECC, and repair them using
available spares. In contrast, our approach uses spares and ECC in an integrated
manner to enhance yield and reduce any corresponding reduction in resilience to soft
errors.
In [136], authors suggest a method to exploit unused spare columns to improve
resilience to soft errors by storing additional check bits in unused spare columns. In
contrast, our approach targets the upcoming era of high defect rates where, for a
large proportion of chips, the defects are expected to exceed the repair capacity of
available spares.
63
5.3 Integrated Approach and Yield-Resilience Model
As stated earlier, ECC protection schemes, which are intended for correcting soft
errors, can handle some types of hard defects, namely, defects that require single cell
or column replacement. As we will show ahead, the use of ECC against single cell
defects tends to result in infinitesimal degradation of the remaining resilience to soft
errors, while the use of ECC against column defects tend to have larger degradation
of resilience to soft errors. Whether the chips that need the partial use of ECC due to
insufficient spare columns are acceptable or not depends on the soft error resilience
requirements of specific applications in which the chip is to be employed. Also note
that the installation of stronger ECC scheme than needed, primarily for using ECC
against majority of hard defects, tends to result in sub-optimality (in terms of yield-
per-area) due to high area complexities of ECC compared to spare schemes.
We now significantly enhance the yield model presented in [94] to incorporate the
yield improvement associated with the partial use of ECC protection scheme for hard
defects. The impact of such use of ECC on the existing error correcting capability is
also considered in our analysis.
A. Integrated Memory Reconfiguration Approach
We apply the following integrated approach to reconfigure defective memories.
1) When all hard defects in a memory can be handled by available spare rows and
columns, ECC does not play a role in masking hard defects. In such a chip, ECC
protection scheme is solely used for detecting/correcting soft errors and soft error
resilience is unaffected.
2) When hard defects in a memory exceed the capabilities of available spares, ECC
64
protection scheme is used in part to correct errors caused by the defects not fixed
by spares. This diminishes the error correction capability of ECC.
For instance, suppose that a memory chip has two column defects, but has only
one spare column and DEC ECC. In this case, one of the two defects is handled by
the spare column. The other column defect is masked by the ECC. This diminishes
the ECC’s capability to combat soft errors that affect any of the words located in the
particular blocks. In this example, the ECC can correct a single soft error in the
words of that block, but it can still detect two errors. Hence, the words that include
the defective column can be considered as having an SEC-DED ECC. More
importantly, note that the ECC capability is undegraded (DEC ECC) for all words
that do not include the defective column.
B. Chips produced under integrated reconfiguration approach
Our integrated reconfiguration approach allows us to sell chips that would have
otherwise been discarded. The additional chips sold in this manner can have various
levels of resilience to soft errors, depending on how much of the ECC’s capability is
used to mask hard defects. This makes it possible for us to ‘bin’ the additional chips
sold based on their resilience to soft errors. Chips with high resilience to soft-errors
may be binned for use in mission-critical space applications or server applications,
while chips with low resilience to soft errors may be binned for use in low end
terrestrial applications.
C. Characterization of Defect Types
When ECC is used to combat a single cell hard defect, it only diminishes the
resilience to soft errors for the particular word that contains the defective cell. This
65
diminishes the overall resilience to soft errors for the entire SRAM by a negligible
amount. For instance, for a 1Mb SRAM that has a DEC with the code length of 256
under the SER of 10
7
FITs/Mb, where the original soft-error resilience is 0.99999,
the decrease in soft-error resilience due to partial uses of ECC for fixing a single
hard defects at 5, 10, 15, and 20 memory locations are respectively 8.0 × 10
-6
, 1.7 ×
10
-5
, 2.5 × 10
-5
, and 3.4 × 10
-5
. In contrast, under the same experimental setup, if one,
two, three, and four column defects are covered by ECC, where each column defect
is assumed to occur at different code blocks, the decreases in soft-error resilience are
respectively 0.00172, 0.00345, 0.00517, and 0.00689.
D. Additional Assumptions
We assume that for a non-ECC memory, single cell hard defects are masked using
spare rows or columns, whichever has lower cost. For instance, if a memory has
1024 rows and 512 columns, single cell defects are handled by spare rows, as each
spare row’s cost is 512 cells.
We further assume that for an ECC memory, single cell defects are handled by
ECC protection scheme. This allows us to use spares to combat hard defects with
more severe effects, such as those affecting single or double rows and columns.
E. An Integrated Yield-Resilience Model
We now derive an extended yield model that captures the benefits of using ECC
for fixing hard defects. Following is the list of notations that we use.
Table 5.1. Collection of notations for deriving integrated yield-resilience model.
Notation
M : Number of code blocks; each code block has number of bits equal to the code
length of the ECC.
C
S
: Number of available spare columns.
66
Table 5.1, Continued
R
S
: Number of available spare rows.
T: ECC’s error correcting capability per code block.
C
i
: Number of spare columns used for repairing code block i.
C
i, used
: Actual number of columns used for repairing code block i. This includes the
part of ECC’s error correcting capability used.
X
i
: Number of column defects that are handled by ECC in code block i.
R
i
: Number of spare rows used for repairing code block i.
e
i
: Number of soft errors in a code word i.
L
C
: Code length of the ECC
t : The length of the mission for which the system is used (or refresh interval; see
ahead), measured in days.
F
b
: FIT per bit.
First, we view the memory as blocks of width equal to the code length of the
ECC protection scheme (Figure 5.1). We then consider each code block as having as
many as T additional spare columns, where only a few tend to be used to combat
hard defects to minimize the loss of soft error capability. The sum of the number of
spare columns and the number of spare rows used for repairing each code block are
constrained by the number of available spare columns and rows as:
S
M
I
i
C C
1
.
(5.1)
S
M
I
i
R R
1
.
(5.2)
code block 1 code block 2 code block M
( R
1
, C
1
+ T ) ( R
2
, C
2
+ T ) ( R
M
, C
M
+ T )
Figure 5.1. A view of memory columns in terms of code blocks.
67
The actual number of columns used for repairing code block i can be larger than
C
i
, but cannot exceed C
i
+ T. Since we may partially (not entirely) use ECC’s
capability to combat defects so as to not diminish significantly its soft error
correcting capabilities, the following constraint is used:
i i used i i
X C C C
, ,
(5.3)
where, X
i
< T should hold.
We now compute an integrated probability that we use exactly use R
i
spare rows
and C
i
spare columns.
i i
i used i
X C
C C
used i i defects hard i i i i block code
C R P X C R P
,
) , ( ) , , (
,
.
(5.4)
We then multiply the individual probabilities of each code block to compute a
probability mass function:
M
i
i i i i block code M M M mass
X C R P X X R R C C p
1
1 1 1
) , , ( ) , , , , , , , , (
. (5.5)
We now obtain an accumulative probability:
) , , , , , , , , (
1 1 1
112 2
M M M mass
R CRR C C
acc
X X R R C C p P
M M
.
(5.6)
As for P
hard defects
(R
i
, C
i,used
) in equation (3.4), we borrow the probability mass
function derived in [94]. The only difference is that the terms related to single cell
defects are excluded in yield calculations when T ≠ 0, since single cell defects are
handled by ECC by default. Note that the number of single cell defects located on a
code word might exceed the (remaining) error correcting capability of the code word,
while our yield calculation considers chips with such defects to be working.
68
However, such a probability is negligible, especially when the memory has millions
of cells and defects are assumed independent.
Metric for soft error resilience: We now define the metric we use for soft-error
resilience. The probability (with respect to a single cell) that no soft error occurs
during an interval [0, t] is calculated as exp(- Λ(t)) [138], where Λ(t) = F
b
• (t • 24
hrs) / 10
9
hrs. The time t can be either the length of the mission or the memory
refresh interval [139], depending on how the system is designed. The probability of
hard defects that cause a single cell to be defective is added to this quantity when T ≠
0, since ECC handles such defects. For a code word of L
c
, the probability that the
number of soft error occurred during [0, t] is less than or equal to T-X
i
can be
computed as:
p L
X T
p
p c
i i
i word code
error soft
C
i
t t
p
L
t X T e P
) ) ( exp( ))) ( exp( 1 ( ) (
0
. (5.7)
The soft-error resilience with respect to a whole code block (whole memory) is
simply the product of probabilities of all the words within the code block (whole
memory).
5.4 Illustrating the Benefits of Partial Use of ECC for Masking Hard Defects
In this section, we show some example cases to illustrate the benefits of partial
use of ECC for masking hard defects. As a first example, Figure 5.2 shows yield
enhancements for various categories according to what type of support to combat
hard defects is being masked by the ECC protection scheme. The size of SRAM and
SER are assumed to be 1Mb with 8 banks and 5×10
7
FITs/Mb, respectively. We
69
further assume that the number of spares is given as one row and one column per
bank. We also assume that it has DEC with the code length of 256.
We then conduct experiments for three distinct defect rates as shown on the x-
axis of Figure 5.2. The part of each column in the figure labeled (1) represents the
percentage of chips where hard defects are completely covered by available spares.
We denote it as bin (1). The column labeled (2) indicates the percentage
improvements in yield when ECC is allowed to handle single cell defects, while the
part of each column labeled (3) represents the case where only one column defect is
allowed to be handled by ECC. These two cases are denoted as bin (2) and bin (3),
respectively. As expected, bin (1) has no degradation in soft-error resilience and it
corresponds to the classical yield. In contrast, bin (2) and bin (3) represent additional
yield enabled by the integrated approach we propose. The resilience against soft
error for bin (1) is 0.999. This value remains virtually unchanged for bin (2). (More
accurately, for bin (2) the decrease in resilience does not affect the first three decimal
places.) The resilience decreases to 0.957 for chips in bin (3).
Note the fact that the yield improvement associated with the integrated use of
spares and ECC increases with the defect rates for given number of spares as shown
in the figure. Depending on the applications in which the fabricated SRAM can be
used, some may choose to use chips only from bin (1), or top two bins namely bins
(1) and (2), or from all the three bins, namely bins (1), (2), and (3).
70
0
20
40
60
80
100
x 1 x 2 x 4
Defect densities
Percentage of chips (%)
ECC handles one
column defect.
ECC used against
single cell defects
Spares handle all
the defects.
(1)
(2)
(3)
(1)
(2)
(1)
(3)
(2)
(3)
100 200 400
Defect density (no. of defects/cm
2
)
(1)
(2)
(3)
Figure 5.2. Chip yield improvements due to partial use of ECC to combat hard defects.
As another example, for a given code block of size 512 × 512 under DEC
protection, suppose that ECC’s capability is partially used for handling zero, one, or
two column defects. The remaining resiliences to soft errors for three cases are 0.999,
0.996, and 0.1514, respectively. The chip in the last case cannot be used for a one
month mission, but, if we use it for daily mission, resilience to soft errors becomes
0.961. Alternatively, this chip may be used for applications where the SRAM is
refreshed every day.
We next show another example case where yield and soft-error resilience are
given as constraints. Consider a 4Mb SRAM where the target yield and resilience are
both 0.99. Assume that it has DEC with the code length of 256 under SER of 10
7
FITs/Mbs and 100 defects/cm
2
. Note that the number of column replacement defects
that is allowed to be handled by ECC is constrained by the soft-error resilience
constraints. We then compare the yield for two cases; the first case (Figure 5.3 (a))
uses ECC partially while maintaining the specified level of soft error resilience,
namely 0.99. The second case (Figure 5.3 (b)) does not use ECC for fixing hard
defects. We can see in the figures that our integrated use of ECC improves yield
71
faster than the independent approach as spares are added. For the integrated approach,
yield of 0.99 is achievable by adding (7, 8) spares, while (11, 8) spares are needed for
the independent approach. It is important to note that the difference between two
approaches grows with the defect rates. For instance, when the defect rate is doubled,
the above two schemes need (12, 11) and (18, 11) spares respectively to achieve the
target yield of 0.99. This reduction in spares significantly decrease hardware
complexity and delay penalty associated with spare switching multiplexers and their
controlling circuitries, as discussed in previous chapters.
(a) (b)
Figure 5.3. Comparison of yield improvement with spares for two approaches.
5.5 Tradeoff Analysis of ECC Schemes
We now carry out analysis on design alternatives and tradeoffs associated with
incorporations of ECC protection schemes. Selecting an appropriate ECC for given
constraints is one of key parts for identifying an optimal memory configuration.
72
A. Scope of ECC Scheme
When we need an MEC (multiple error correction) code, we can implement it in
many alternative ways, namely (1) use multiple Hamming code blocks, each with a
fraction of the bits, (2) use a single powerful MEC block of the desired code length,
and so on. For instance, 4×SECs, each operating on (approximately) one quarter of
the given code length; 2×DECs, each operating on (approximately) one half of the
original code length; and 1×QEC, of the given code length, can each cover four soft
errors. However, the scopes of error correcting capability are different. For example,
in the case of 4×SECs, if two erroneous bits are located too close to each other, ECC
protection will fail. From an opposite perspective, using multiple less powerful codes
is helpful for reducing encoder/decoder area overheads and delay penalties. In terms
of data-to-code ratio, using multiple Hamming code is the most efficient, while using
a single most powerful MEC code is the least efficient. Figure 5.4 shows the impact
of changes of scopes of use of ECCs (SEC-128, DEC-256, QEC-512) on soft-error
resilience for different size of SRAMs (1Mb, 4Mb) and different SERs (10
7
, 10
8
, and
10
9
FITS/Mb).
0
0.2
0.4
0.6
0.8
1
1Mb, 10^7 FITs/Mb
4Mb, 10^7 FITs/Mb
1Mb, 10^8 FITs/Mb
4Mb, 10^8 FITs/Mb
1Mb, 10^9 FITs/Mb
4Mb, 10^9 FITs/Mb
Soft-error Resilience
SEC, 128
DEC, 256
QEC, 512
Figure 5.4. The impact of changes of scope of ECC scheme on resilience.
73
Under 10
7
FITs/Mb, scopes of use of ECCs does not affect soft-error resilience. In
this case, we prefer SEC-128, which has the minimum area overhead and has the
smallest delay penalty. As SER is increased further, however, resilience begins to
differ for different scopes of use of ECCs.
B. Code Length
For a fixed error correcting capability, if we increase code length, (1) check bit
array overheads decrease, (2) delay and hardware complexities of encoders and
decoders increase, and (3) resilience to soft errors diminishes. Figure 5.5 illustrates
that the code-to-check bit ratio decreases with the code length for a given strength,
while the ratio tends to increase with the strength of code for a given code length.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 500 1000
Code length
Code-to-check bit ratio
T=1
T=2
T=3
T=4
Figure 5.5. Code-to-check bit ratio for various code lengths and strengths.
C. Strength of Code
For a fixed code length, increase in code strength enhances soft-error resilience at
the cost of area overheads and delay penalties. Note that differences in resilience
between various strengths tend to grow with the size of SRAM. For instance, soft-
error resilience of DEC, TEC and QED are similar for the case of 1Mb (code length
T: strength of code
74
of 512 and SER of 10
8
FITs/Mbs) as shown in Figure 5.6. As the size of SRAM
increases, however, the soft-error resilience of DEC begins to drop in a noticeable
manner, while resilience values of TEC and QEC remains almost unchanged.
0
0.2
0.4
0.6
0.8
1
1Mb 4Mb 16Mb
Soft-error Resilience
SEC
DE C
TE C
QE C
Figure 5.6. Comparison of soft-error resilience of different code strengths for a fixed code
length, for different sizes of SRAMs.
5.6 Characterization of Overheads Associated with ECC Implementation
Major components of ECC protection scheme are decoders, encoders, and check-
bit arrays. For given (L
C
, L
D
), where L
C
and L
D
are respectively the length of coded
bits and data bits, error correcting capabilities are dependent on the number of parity
bits, namely L
C
- L
D
. One of the most popular codes used in memories is the
Hamming code [141]. In its extended form, it is designed to either correct a single
error or detect two errors. With increase in SER, more powerful codes e.g., BCH
code, Golay-code, RS code, and RM code [141] can be used against soft errors. In
this dissertation, we use BCH codes, and treat Hamming code as a specific form of
BCH code.
A. Area Estimates and Performance Penalties
We now quantify area overheads and delay penalties due to ECC protection
schemes. We again used Design Compiler tool from Synopsys for Verilog synthesis
75
and Nanosim simulator for simulations under TSMC 180nm process to predict area
overheads and delay penalties (Figure 5.7). We then developed approximate models
for estimating area overheads (Equations (5.8) and (5.9)) as well as delay overheads
(Equations (5.10) and (5.11)). Note that T is the parameter that specifies the number
of soft errors that can be fixed by a code. In the equations, area and performance
overheads associated with decoders are mostly from 1) syndrome bit generations
from parity check matrix and the received code, and 2) the access of lookup table
(through syndrome bits) that contains correctable error patterns. As for overheads
due to encoders, they are mostly from generating code bits from generator matrix
and data bits.
0.00E+00
2.00E+05
4.00E+05
6.00E+05
8.00E+05
1.00E+06
1.20E+06
1.40E+06
1.60E+06
(7, 4)
(15, 11)
(15, 7)
(31, 26)
(31, 21)
(63, 57)
(127, 120)
(255, 247)
Verilog Synthesis
Approximation
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
(7, 4)
(15, 11)
(15, 7)
(31, 26)
(31, 21)
(63, 57)
(127, 120)
(255, 247)
Verilog Synthesis
Approximation
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
(7, 4)
(15, 11)
(15, 7)
(31, 26)
(31, 21)
(63, 57)
(127, 120)
(255, 247)
Verilog Synthesis
Approximation
( λ
2
)(ns)
(ns)
(a) Area overheads
(b) Decoder delay (c) Encoder delay
Figure 5.7. Area overheads, decoding, and decoding delay for different BCH codes.
) ( 16024 T 2 345 L ) L L ( 375 Area
2 ) L L (
C D C decoder
D C
. (5.8)
) ( 4981 1063 ) ( 112
2
D D D C encoder
L L L L Area . (5.9)
) ( 101 . 0 292 . 0 ) ( 164 . 0 002 . 0 ns T L L L Delay
D C C decoder
. (5.10)
) ( 004 . 0 log 089 . 0
2
ns L L Delay
D D encoder
. (5.11)
76
B. Observations
Based on the area and delay approximate models obtained above, we make some
observations: for a fixed error correcting capability, encoder and decoder
complexities increase with the code length, while check-bit array overheads are
reduced. The ratio of check bits to coded bits, which can be a measure of relative
check-bit array overheads, can be analytically expressed as (L
c
-L
d
)/L
c
, which is then
converted into log
2
(L
c
)•T/L
c
. It is straightforward to show that this function is
decreasing when the length of coded bits ≥ 8. For a fixed code length, complexities
of encoders and decoders grow with the error correcting capabilities. In terms of area
overheads, the check-bit array is dominant for large memory. However, for small and
intermediate size memories, area overheads associated with encoders/decoders and
check-bit arrays tend to be comparable to each other. In terms of delay overheads,
when the code length is small or intermediate, decoder delays tend to be larger than
encoder delays. As the code length increases, encoder delays can become
comparable to decoder delays.
5.7 Characterization of Design Tradeoffs: Guidelines and Optimal Designs
So far, we have characterized area and delay overheads associated with ECC and
spare switching schemes. We now carry out tradeoff analysis on various
combinations of spare switching schemes and ECC protection schemes to identify
optimal designs in terms of overall yield-per-area for given constraints on soft-error
resilience and performance.
77
A. Objective Function
For non-ECC memories, our design objective is to maximize yield-per-area under
given delay constraints, whereas for ECC memory, it is to maximize the following
modified objective for given constraints on delay and soft-error resilience:
bits coded to bits data of Ratio
Area
Yield
. (5.12)
The ratio of data bits to coded bits (i.e., density) tends to differ when the code length
and the strength of ECC are varied. Therefore, density factor is incorporated for fair
comparisons between design alternatives.
B. Case Study and Design
We now show process of identifying optimal design in terms of yield-per-area for
given constraints on soft-error correcting capability. For the simplicity of
experimental setup, we do not include performance constraints in this case study.
Note that experiments that include the performance constraints are carried out in the
appendix. Given defect rates and SER as respectively 100 defects/cm
2
and 10
8
FITs/Mbs, we identify optimal design for the given memory size of 256Kb and the
word width of 32 bits. The constraint on soft-error resilience is given as 0.99. We
now take the following steps for identifying optimal design.
1. We first consider six different configurations of ECC protections schemes,
namely DEC with the code length of 255 (DEC-255), SEC with the code
length of 255 (SEC-255), DEC with the code length of 127 (DEC-127),
SEC with the code length of 127 (SEC-127), DEC with the code length of
63 (DEC-63), and SEC with the code length of 63 (SEC-63). Their soft-
78
error resilience values for given SER are respectively 0.99, 0.84, 0.999,
0.917, 0.9999, and 0.958, respectively. Considering the constraints on soft-
error resilience, we exclude three candidates, namely SEC-255, SEC-127,
and SEC-63.
2. For the remaining candidates, we look into the possibility of using ECC
protection schemes for handling one hard defect that causes the whole
column to be defective. If ECC schemes are used for covering such a
defective column, their soft-error resiliencies drop to 0.918 for DEC-255,
0.978 for DEC-127, and 0.994 for DEC-63. Considering the constraint on
soft-error resilience, the first two schemes cannot be used to fix a defective
column. Note that all of the three schemes can be used to handle hard
defects that cause single cells to be defective, since the use of ECC against
the type of hard defects results in the infinitesimal loss of soft-error
capabilities of ECC protection scheme. In this example, the amount of loss
for the soft-error resilience due to such uses is at the levels of fourth digits
after the decimal point.
3. We then compute optimal numbers of spares for the remaining three
candidates that can achieve the maximum values of yield-per-area. The
optimal numbers of spares for each of three candidates are (3, 2) spares for
DEC-255, (2, 2) spares for DEC-127, and (1, 2) spares for DEC-63. The
maximum yield-per-area values for each of them are 2.91×10
-9
, 3.09×10
-9
,
79
and 2.87×10
-9
respectively. Therefore, we finally pick the design that has
DEC with the code length of 127 and (2, 2) spares.
So far, we showed how to find the optimal design that achieves maximum yield-
per-area while constraints on soft-error resiliencies are met. Next are some
observations made during evaluations.
1. As for the performance overheads due to implementing spares and ECC
protection schemes, their overheads are respectively 2.87ns (DEC-255),
2.29ns (DEC-127), and 1.83ns (DEC-63). The dominant overheads are
attributed to ECC protection schemes, since the number of spares are small,
thereby having small delay overheads associated with spare switching
circuitries. If tight performance constraints are given, we may consider using
SEC code schemes to meet timing constraints. For instance, SEC-255, SEC-
127, and SEC-63 have timing overheads of 1.85ns 1.43ns, and 1.14ns,
respectively.
2. For a fixed error correcting capability, dominant area overhead is the area of
the check-bit array for a wide range of code lengths. However, complexities
of encoder/decoder grow with code length, and encoder/decoder overheads
can be comparable to check-bit array overheads. For instance, the relative
area overheads of encoder/decoder logic over check-bit array for SEC for
code lengths of 63, 127, and 255 are respectively 0.97%, 3.7%, and 14.4%.
80
3. For a given code length, complexities of both check-bit array and
encoder/decoder logics grow with the correcting capability. However, the
degree of increase in encoder/decoder complexities is much larger than that
of the check-bit array. For instance, comparing SEC and DEC for code length
255, the percentage increases in area overheads for encoder/decoder logic and
check-bit array are respectively 249% and 89%.
5.8 Concluding Remarks
We propose an integrated approach to use ECC and spares and quantify benefits
associated with their integrated use. We derive a new model that captures not only
the classical benefits of spares and ECC in terms of yield and resilience to soft errors,
respectively, but also ECC’s capabilities to handle hard defects. Our approach is the
first to consider yield enhancement as well as any decrease in soft error resilience.
We develop the first yield-resilience model that quantifies all effects of our
integrated approach. We also quantify the overheads associated with ECC
implementations. We then use the quantified models to demonstrate the benefits of
our framework by applying our integrated approach to various example cases.
The case study illustrated in the last part of this dissertation illustrates the overall
optimization process that incorporates all the design alternatives that we have
explored throughout this dissertation.
81
Chapter 6
Summary and Future Research
6.1 Summary
This thesis explores optimal self-repairable memory designs that maximize yield-
per-area for given constraints on performance and soft-error resilience. We first
develop a yield model that not only accurately captures the structural characteristics
of memory array but also reflects yield improvement associated with spares. We
showed that our yield model behaves similarly to the yield model based on
multivariate binominal distribution that includes empirically-computed clustering
factors. We then use this model to identify the optimal granularity and the optimal
numbers of spares that maximize yield-per-area for a given memory size. Our
experiments shows that the optimal size of sub-array (i.e., optimal granularity) tends
to be finer with the increase in the defect densities, while it remains almost
unchanged regardless of memory size for a given defect density.
We then explore the optimal sub-array interconnect design that maximizes yield-
per-area. Our experiments indicate that, for low levels of defect densities and small
(or intermediate) number of banks, sub-array interconnects impact memory yield
significantly, well in excess of their proportion of the overall area. For very high
defect densities and large number of banks, however, both yield and area overheads
of interconnects significantly affect memory’s overall yield-per-area values.
As for bit-cell level physical layout changes for optimizing yield-per-area, we
showed that the array with the smallest possible cell size with spare rows and
columns is always better than the bit-cell level layout changes for wide ranges of
82
defect densities. However, for extremely high defect densities, which is yet
unprecedented but likely to occur in the radically new nano technologies, bit-cell
level layout changes can be effective as spare approaches. This is directly associated
with the observations made on optimal granularities – finer granularity tends to
achieve better yield-per-area with the increase of defect densities. If the optimal level
of granularity goes down to the level of bit-cells, the corresponding spares sizes will
become comparable to the sizes of bit-cells. In such cases, bit-cell level layout
changes will be useful for yield-per-area optimization.
We next study integrated use of ECC schemes and spares, enabling reduction of
the number of required spares, thereby decreasing performance and area overheads
due to spares. We showed that ECC scheme is effective for handling single cell
defects, resulting in the minimal loss of ECC’s error correcting capability, while the
use of ECC against other types of defects, namely column defects, might penalize
ECC’s error correcting capability significantly. Since ECC schemes tend to incur
much higher area overheads compared to spare circuitries, the use of unnecessarily
strong ECC protection schemes primarily for handling hard defects can make yield-
per-area sub-optimal.
83
6.2 Future Research
When ECC protection scheme is designated to handle some hard defects, they are
different from soft errors, since the physical locations of hard defects are known,
while soft errors tend to occur at random and unpredictable locations. Thus, hard
defects are erasures rather than random errors. As erasures can be corrected using
less powerful ECC than random errors [137], we can treat each hard defect fixed by
ECC as an erasure to obtain more precise and higher estimates of the remaining
resilience to soft errors. This has not been explored in this dissertation, and is a
subject of our ongoing research.
The integrated use of ECC and spares enables many additional new tradeoffs. For
example, for applications such as multimedia, where errors in low-significant bits are
more acceptable than errors in high-significant bits, if the number of hard defects
exceeds the available spares, spares can be prioritized to fix defects in high order bits.
Low-significant bits will then likely receive help from ECC, while sacrificing the
soft error correcting capability for low order bits. The detailed analysis associated
with such applications will be subject of our future research.
In Chapter 4, we showed that bit-cell level layout optimizations cannot improve
yield-per-area, while spares can. However, such a conclusion may not be valid if a
radically new nano-fabrication technology with unprecedentedly high defect
densities is introduced, as discussed in Chapter 2, Chapter 4, and Section 6.1. In this
regards, research on layout modifications for improving yield-per-area (as an
alternative to spare approaches) under extremely high defect densities will be carried
out in our future study.
84
We explored the impact of granularity on yield-per-area and performance in this
dissertation. In addition to those studies, our ongoing research includes the research
on power trends for various memory configurations. Note that the power is
considered as one of constraints to be met, while pursuing an optimal yield-per-area
design. According to our preliminary results, much finer granularity is needed to
achieve the optimality in terms of power, compared to the optimal granularity for
yield-per-area and performance.
85
Bibliography
[18] N. Abdulrazzaq. Performance Testing of Data-Path Circuits. Ph. D. Dissertation,
Department of Electrical Engineering - Systems, University of Southern California,
May 2001.
[19] N. Abdulrazzaq and S. K. Gupta. Test Generation for Path-Delay Faults in One-
Dimensional Iterative Logic Circuits. In Proceedings of International Test
Conference, pages 326-335, 2000.
[20] N. Abdulrazzaq and S. K. Gupta. Path-Delay Fault Simulation for Circuits with
Large Numbers of Paths for Very Large Test Sets. In Proceedings of VLSI Test
Symposium, pages 186-193, 2003.
[21] S. Al-Harbi. Functional testing of Constrained and Unconstrained Memories
Using March Tests. Ph. D. Dissertation, Department of Electrical Engineering -
Systems, University of Southern California, August 2001.
[22] S. Al-Harbi and S. K. Gupta. A New Methodology for Automatic Generation of
Optimal March Tests for Unlinked Faults. In Proceedings of VLSI Test Symposium,
2001.
[23] S. Al-Harbi and S. K. Gupta. Generating Complete and Optimal March Tests for
Linked Faults in Memories. In Proceedings of VLSI Test Symposium, pages 254-261,
2003.
[24] S. Al-Harbi and S. K. Gupta. An Efficient Methodology for Generating Optimal
and Uniform March Tests for Unlinked Faults in Constrained and Unconstrained
Memories. Submitted to IEEE Transactions on CAD.
[66] B. Amrutur and M. Horowitz, "Speed and power scaling of SRAM's," IEEE
Journal of Solid-State Circuits, pp. 175--185, Feb 2000.
[138] Michael A. Bajura, et al., “Models and Algorithmic Limits for an ECC-Based
Approach to Hardening Sub-100-nm SRAMs”, IEEE Trans. on Nuclear Science, vol.
54, No. 4, p. 935-945, 2007
[108] G. Battaglini, B. Ciciani, “An Improved Analytical Yield Evaluation Method
for Redundant RAMs,” IEEE Int. Workshop on Memory Technology Design and
Testing (MTDT), pp. 117-123, 1998.
[101] Baumann, “Soft Error Characterization and Modeling Methodologies at TI”,
Texas Instrument, 2000
86
[141] Richard E. Blahut, Algebraic Codes for Data Transmission, Cambridge
University Press, 2003
[1] M. A. Breuer, S. K. Gupta, and T. M. Mak, “Defect and Error Tolerance in the
Presence of Massive Numbers of Defects”, IEEE Design and Test of Computers, vol.
21, pp. 216-227, 2004.
[44] M. A. Breuer, Let's Think Analog. In Annual Symposium on VLSI, pp. 2-5,
March 2005.
[45] M. A. Breuer, Multi-media Applications and Imprecise Computation. In 8th
EUROMICRO Conference on Digital System Design, 2005.
[71] Kanad Chakraborty , Shriram Kulkarni , Mayukh Bhattacharya , Pinaki
Mazumder , Anurag Gupta, A physical design tool for built-in self-repairable RAMs,
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, v.9 n.2, p.352-
364, April 2001
[94] Jae Chul Cha, Sandeep K. Gupta, Characterization of granularity and
redundancy for SRAMs for optimal yield-per-area, In Proceedings of International
Conference on Computer Design, pages 219-226, 2008
[115] Yung-Yuan Chen, Shambhu J. Upadhyaya, “Modeling the Reliability of a
Class of Fault-Tolerant VLSI/WSI Systems Based on Multiple-Level Redundancy”,
IEEE Transaction on Computers (TC), vol. 43, no. 6, 1994
[56] T. Chen and G. Sunada, "Design of a Self-Testing and Self-Repairing Structure
for Highly Hierarchical Ultra-Large Capacity Memory Chips", IEEE Trans.on VLSI
Systems, Vol. 1, No. 2, June 1993, pp. 88-97
[62] C. L. Chen and M. Y. Hsiao, "Error-correcting codes for semiconductor
memory applications: A state-of-the-art-review," IBM J. Res. Develop., vol. 28, no.
2, pp. 124-134, Mar. 1984.
[142] C.-K. Cheng, J. Lillis, S. Lin, and N. Chang, Interconnect Analysis and
Synthesis. New York: Wiley, 1999.
[25] H. Cheung, Logic and Current Testing of Bridging Faults Using Severity
Metrics. Ph. D. Dissertation Proposal, Department of Electrical Engineering -
Systems, University of Southern California, 2005.
[26] H. Cheung and S. K. Gupta. A Framework to Minimize Test Escape and Yield
Loss during IDDQ Testing: A Case Study. In Proceedings of VLSI Test Symposium,
pages 89-96, 2000.
87
[27] H. Cheung and S. K. Gupta. IDDQ Profiles: A Technique to Reduce Test
Escape and Yield Loss during IDDQ Testing. In DBT, pages 45-50, 2000.
[46] I. S. Chong and A. Ortega, Hardware Testing for Error Tolerant Multimedia
Compression Based on Linear Transforms. In Defect and Fault Tolerance in VLSI
Systems Symposium, 2005.
[47] H. Chung and A. Ortega, Analysis and Testing for Error Tolerant Motion
Estimation. In Defect and Fault Tolerance in VLSI Systems Symposium, 2005.
[28] K.-Y. Chung and S. K. Gupta. Structural Delay Testing of Latch-Based High-
Speed Pipelines with Time Borrowing. In Proceedings of International Test
Conference, pages 1089-1097, 2003.
[29] K.-Y. Chung and S. K. Gupta. Structural Delay Testing of Latch-Based
Pipelines with Time Borrowing Assuming Restricted Scan. In Proceedings of VLSI
Test Symposium, 2006.
[2] I. L. Chuang and M. A. Nielsen. Quantum Computation and Quantum
Information. Cambridge Series on Information, 2000.
[136] Rudrajit Datta, Nur A. Touba, "Exploiting Unused Spare Columns to Improve
Memory ECC,", pp.47-52, 2009 27th IEEE VLSI Test Symposium
[67] R.J. Evans and P.D. Franzon. "Energy Consumption Modeling and
Optimization for SRAM's," IEEE Journal of Solid-State Circuits, Vol. 30, no. 5, pp.
571-9, May 1995
[72] B. Gao and R. Mehrotra, “Physical Compiler Unlimited”, Synopsys User Group
Meeting, 2005.
[98] A. B. Glaser and G. E. Subak-Sharpe, Integrated Circuit Engineering,Reading,
MA: Addison-Wesley, vol. 10, 1997, pp. 769–795.
[49] N. K. Gokli, Software Interface, Hardware Architecture, and Test Methodology
for the Nvidia NV34 Graphics Processor. M.S. Thesis, Department of Electrical
Engineering - Systems, University of Southern California, 2006.
[130] Graham, “Soft errors a problem as SRAM geometries shrink” ebn, 28 Jan 2002
[119] Said Hamdioui, A. J. van de Goor, “An experimental analysis of spot defects
in SRAMs: realistic fault models and tests”, in Proc. IEEE Asian Test Symp. (ATS),
pp.131-138, 2000
88
[75] H. H. Hana et al., “High speed multi-port static RAM silicon compiler,” in Proc.
IEEE Custom Integrated Circuits Conf., 1989, pp23.6.1-4
[131] Harling, “Embedded DRAM Has a Home in the Network Processing World”
Integrated System Design, 2001
[124] David Harrish, http://www.cmosvlsi.com
[59] V.G. Hemmady, S.M. Reddy, "On the Repair of Redundant RAMS", 26th
ACWIEEE Design Automation Conference, pp. 710- 712, 1989.
[133] M. Horiguchi, et al., “A flexible redundancy technique for high-density
DRAM,” IEEE J. Solid-State Circuits, vol. 26, p. 12–17, 1991.
[48] T-Y Hsieh, K-J Lee and M.A. Breuer, An Error-oriented Test Methodology to
Improve Yield with Errortolerance. In VLSI Test Symposium, 2006.
[68] H. Ikeda and H. Inukai, "High-Speed DRAM Architecture Development," IEEE
J. Solid-State Circuits,vol. 34, May 1999, pp. 685-692.
[3] International Technology Roadmap for Semiconductors (ITRS), 2001.
[4] A. Jee, and F. J. Ferguson, “Carafe: An Inductive Fault Analysis Tool for CMOS
VLSI Circuits”, Proc. IEEE VLSI Test Symposium, pp. 92-98, 1993.
[30] N. Jha and S. K. Gupta. Testing of Digital Systems. Cambridge University
Press, 2004.
[31] Z. Jiang. Advanced Test Generation Techniques: Improving Yield and
Protecting Intellectual Property. Ph. D. Dissertation, Department of Electrical
Engineering - Systems, University of Southern California, 2005.
[32] Z. Jiang and S. K. Gupta. An ATPG for Threshold Testing: Obtaining
Acceptable Yield in Future Processes. In Proceedings of International Test
Conference, pages 824-833, 2002.
[33] Z. Jiang and S. K. Gupta. A Test Generation Approach for Systems-on-Chip
that Use Intellectual Property Cores. In Proceedings of Asian Test Symposium, pages
278-281, 2003.
[50] Z. Jiang and S. Gupta. Threshold Testing: Covering Bridging and Other
Realistic Faults. In Asian Test Symposium, 2005.
89
[95] A.H. Johnston, “Scaling and Technology Issues for Soft Error Rates,” 4th
Annual Research Conference on Reliability, Stanford University ,pp. 4-5 (October
2000).
[137] L.L. Joiner and J.J. Komo, “Errors and Erasures Decoding of BCH and Reed-
Solomon Codes for Reduced M-Ary Orthogonal Signaling,” IEEE Trans. Comm.,
vol. 51, no. 1, p. 57-62, 2003
[111] T. Kawagoe and J. Ohtani and M. Niiro and T. Ooisih and M. Hamada and H.
Hidaka, ”A Built-In Self-Repair Analyzer (CRESTA) for Embedded DRAMs”, in
Proc. IEEE International Test Conference (ITC), pp.567-574, 2000
[69] O. Kebichi and M. Nicolaidis, “A tool for automatic generation of BISTed and
transparent BISTed RAMs,” in Proc. Int. Test Conf., Oct. 1992, pp. 570–575.
[57] Ilyoung Kim , Yervant Zorian , Goh Komoriya , Hai Pham , Frank P. Higgins ,
Jim L. Lewandowski, “Built in self repair for embedded high density SRAM”,
Proceedings of the 1998 IEEE International Test Conference, p.1112-1119, October
18-22, 1998
[100] T. Kim and W. Kuo, IEEE Transactions on Semi. Manufactu. Vol. 12, 1999,
pp. 485.
[6] I. Koren, Z. Koren and C.H. Stapper, “A Unified Negative Binomial Distribution
for Yield Analysis of Defect-Tolerant Circuits”, IEEE Trans. Computers, vol. 42, no.
6, pp. 724-734, June 1993.
[65] I. Koren and Z. Koren, “Defect tolerance in VLSI circuits: Techniques and yield
analysis,” Proc. IEEE, vol. 86, no. 9, pp. 1819–1836, 1998.
[121] I. Koren and C.H. Stapper, "Yield Models for Defect Tolerant VLSI Circuits:
A Review," in Proc. IEEE Int. Symp. Defect and Fault Tolerance in VLSI Systems
(DFT), Vol. 1, pp. 1-21, 1989.
[51] K. J. Lee, T. Y. Hsieh, and M. A. Breuer. A Novel Testing Methodology Based
on Error-rate to Support Errortolerance. In International Test Conference, 2005.
[7] R. C. Leachman and C. N. Berglund, “Systematic Mechanisms Limited Yield
(SMLY) Assessment Study”, International SEMATECH, Doc. #03034383A-ENG,
March 2003.
[114] D. Lee, J.A. Abraham, and D. Rennels, “A numerical technique for the
hierarchical evaluation of large, closed fault tolerant systems,” in Dependable
Computing for Critical Applications 2, R. Schlichting and J. Meyer, Eds New York:
Springer-Verlag, pp95-114, 1992
90
[106] J.-F. Li, J.-C. Yeh, R.-F. Huang, C.-W.Wu, P.-Y. Tsai, A. Hsu, and E. Chow,
“A built-in self-repair scheme for semiconductor memories with 2-D redundancy”, in
Proc. Int. Test Conf. (ITC), pp. 393–402, 2003
[8] T.-M. Mak, “Adaptive Computing: How to Test?”, June 2005. (T.-M. Mak of
Intel Corporation presented this seminar at the EE-Systems Department, University
of Southern California, on June 14, 2005.)
[109] T. Mano and M. Wada and N. Ieda, M. Tanimoto, ”A Redundancy Circuit for
a Fault-Tolerant 256K MOS RAM”, IEEE Journal of Solid State Circuits, vol.SC-17,
no.4, pp.726-731,1982
[9] F.J. Meyer and D.K. Pradhan, “Modeling Defect Spatial Distributions”, IEEE
Trans. Computers, vol. 38, no. 4, pp. 538-546, April 1989.
[99] B. T. Murphy, Proc. IEEE, vol. 52, , Dec. 1964, pp. 1537–1545.
[15] J. Von Neumann, “Probabilistic Logic and Synthesis of Reliable Organisms
from Unreliable Components”, pp. 43-98, in Automata Studies, ed. C. E. Shannon
and J. McCarthy, in Princeton University Press, Princeton N. J., 1956.
[135] Michael Nicolaidis, and et al., "A Diversified Memory Built-In Self- repair
Approach for Nanotechnologies," pp.313, IEEE VLSI Test Symp., ‘04
[140] E. Normand, “Single-event effects in avionics,” IEEE Trans. Nucl. Sci., vol.
43, pp. 461–474, 1996.
[112] M. Ottavi, L. Schiano, X. Wang, Y.B. Kim, F. Meyer, and F. Lombardi, "Yield
evaluation methods of SRAM arrays: A comparative study," in Proc. IEEE Instrum.
Measurement Tech. Conf. (IMTC), vol.2, pp.1525–1530, 2004
[10] W.A. Pleskacz and W. Maly, “Improved Yield Model for Submicron Domain”,
Proc. IEEE Symp. Defect and Fault Tolerance in VLSI Systems, pp. 2-10, October
1997.
[34] Md. S. Quasem, Fault Simulation and Multiple Scan Chain Design
Methodology for Systems-on-Chip (SOC). Ph. D. Dissertation, Department of
Electrical Engineering - Systems, University of Southern California, 2005.
[35] Md. S. Quasem and S. K. Gupta, Test Information for Cores: Comparative
Analysis and Recommendations. In Testing Embedded Core Systems, 2000. 18
91
[36] Md. S. Quasem and S. K. Gupta, An Exact Fault Simulation for Systems on
Silicon that Protects Each Core's Intellectual Property. A poster presentation at
Design and Test in Europe, 2001.
[37] Md. S. Quasem and S. K. Gupta, Designing Multiple Scan Chains for Systems-
on-Chip. In Proceedings of Asian Test Symposium, pages 424-427, 2003.
[38] Md. S. Quasem and S. K. Gupta, Designing Multiple Reconfigurable Scan
Chains for Systems on-Chip. In Proceedings of VLSI Test Symposium, 2004.
[39] Md. S. Quasem, Z. Jiang, and S. K. Gupta, Benefits of a SoC-Specific Test
Methodology. In IEEE Design and Test, vol. 20, pages 68-77, Jun. 2003.
[145] Jan M. Rabaey, Digital integrated circuits: a design perspective, Prentice-Hall,
Inc., Upper Saddle River, NJ, 1996
[103] Rodgers et al, “Advanced Memories for Space Applications”, BAE Systems,
December 2001
[73] Karem A. Sakallah, Mark Roberts, Richard B. Brown, C. David Kibler, Ajay
Chandna, "The Aurora RAM Compiler," dac, pp. 261-266, 32nd ACM/IEEE
Conference on Design Automation Conference (DAC'95), 1995
[139] A.M. Saleh, et al., “Reliability of Scrubbing Recovery Techniques for Memory
Systems,” IEEE Trans. on Reliability, Vol.39, No.1, 1990
[79] Scaling and Technology Issues for Soft Error Rates - A Johnston - 4th Annual
Research Conference on Reliability Stanford University, October 2000
[113] L. Schiano, M. Ottavi, F. Lombardi, “Markov Models of Fault-Tolerant
Memory Systems under SEU ", IEEE Int. Workshop on Memory Technology,
Design and Testing (MTDT), pp. 38 – 43, 2004.
[110] V. Schober and S. Paul and O. Picot, ”Memory Built-In Self-Repair using
redundant word”, in Proc. IEEE International Test Conference (ITC), pp.995- 1001,
2001
[132] Stanley Schuster, ”Multiple Word/Bit Line Redundancy for Semiconductor
Memories”, IEEE Journal of Solid State Circuits, Vol.SC-13, No.5, p.698-703, 1978
[144] H. Shah, P. Shiu, B. Bell, M. Aldredge, N. Sopory, and J. Davis, “Repeater
insertion and wire sizing optimization for throughput-centric VLSI global
interconnects,” in Proc. IEEE/ACM Int.l Conf. Computer- Aided Design, 2002, pp.
280–284.
92
[52] S. Shahidi and S. Gupta. Estimating Error Rate During Self-Test via One's
Counting. In International Test Conference, 2006.
[53] S. Shahidi and S. Gupta. A Theory of Error-Rate Testing. In International
Conference on Computer Design, 2006. 19
[70] H. Shinohara, N. Matsumoto, K. Fujimori, Y. Tsujihashi, H. Nakao, S. Kato, Y.
Horiba, and A. Tada, “A flexible multiport RAM compiler for data path,” IEEE J.
Solid-State Circuits, vol. 26, pp. 343–349, Mar. 1991.
[78] P. Shivakumar and N. P. Jouppi, “CACTI 3.0: an integrated cache timing,
power, and area model,” Aug. 2001.
[40] W. Sirisaengtaksin and S. K. Gupta, Enhanced Crosstalk Fault Model and
Methodology to Generate Tests for Arbitrary Inter-Core Interconnect Topology. In
Proceedings of Asian Test Symposium, pages 163-169, 2002.
[41] W. Sirisaengtaksin and S. K. Gupta, Modeling and Testing Crosstalk Faults in
Inter-Core Interconnects that Include Tri-State and Bi-Directional nets. In
Proceedings of Asian Test Symposium, pages 132-139, 2004.
[42] W. Sirisaengtaksin and S. K. Gupta, A Methodology to Compute Bounds on
Crosstalk Effects in Arbitrary Interconnects. In Proceedings of Asian Test
Symposium, 2005.
[43] W. Sirisaengtaksin, Modeling and Testing Faults in Arbitrary Inter-Core
Interconnects that Include Tri-State and Bi-Directional Nets. Ph. D. Dissertation,
Department of Electrical Engineering - Systems, University of Southern California,
2007.
[125] Michael John Sebastian Smith, Application-specific integrated circuits,
Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1998
[11] C.H. Stapper, “On Yield, Fault Distributions, and Clustering of Particles”, IBM
J. Research and Development, vol. 30, pp. 326-338, May 1986.
[12] C.H. Stapper, “Correlation Analysis of Particle Clusters on Integrated Circuit
Wafers”, IBM J. Research and Development, vol. 31, pp. 641-650, November 1987.
[13] C.H. Stapper, “Large-Area Fault Clusters and Fault Tolerance in VLSI Cricuits”,
IBM J. Research and Development, vol. 33, pp. 162-173, March 1989.
[14] C.H. Stapper, “Small-Area Fault Clusters and Fault Tolerance in VLSI Circuits”,
IBM J. Research and Development, vol. 33, pp. 174-177, March 1989.
93
[60] Chin-Lung Su , Yi-Ting Yeh , Cheng-Wen Wu, “An Integrated ECC and
Redundancy Repair Scheme for Memory Reliability Enhancement”, Proceedings of
the 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI
Systems (DFT'05), p.81-92, October 03-05, 2005
[63] C.H. Stapper, A.N.McLaren, and M.Dreckmann, “Yield Model for Productivity
Optimization of VLSI. Memory Chips with Redundancy and Partially Good
Product,” IBM Journal of Research and Development, Vo1.24, No.3, 1980
[64] C. H. Stapper, Improved Yield Models for Fault-Tolerant Memory Chips, IEEE
Transactions on Computers, v.42 n.7, p.872-881, July 1993
[97] C.H. Stapper, "Fact and Fiction in Yield. Modeling", J. of Microelectronics, vol.
20, , 1989, pp. 129.
[134] C.H. Stapper and et al., ”Synergistic Fault-Tolerance for Memory Chip- s”,
IEEE Trans. on Computers, Vol. 41, No. 9, pp. 1078-1087. 1992
[122] Steinweg, R.L.; Zampaglione, M.; Lin, P. “A user-configurable RAM compiler
for gate arrays”, in Proc. Second Annual IEEE ASIC Seminar and Exhibit,
Page(s):P12 - 3/1-4, 1989
[74] W. Swartz, C. Giuffre, W. Banzhaf, M. deWit, H. Khan, C. McIntosh, T. Pavey,
and D. Thomas, "CMOS RAM, ROM, and PLA Generators for ASIC Applications,"
Proceedings of the 1986 IEEE Custom Integrated Circuits Conference, pp. 334 - 338.
[143] S. Takahashi, M. Edahiro, and Y. Hayashi, “Interconnect design
strategy:Structures, repeaters and materials with strategic system performance
analysis (S2PAL) model,” IEEE Trans. Electron Devices, vol. 48, no. 2, pp. 239–251,
Feb. 2001.
[90] J. Tou, P. Gee, J. Duh, and R. Eesley, "A Sub-micron CMOS Embedded SRAM
Compiler," Proceedings of the 1991 IEEE Custom Integrated Circuits Conference,
pp. 22.3.1-22.3.4.
[58] R. Treuer, V.K. Agarwal, "Built-In Self-Diagnosis for Repairable Embedded
RAMS", IEEE Design & Test of Computers, Vol. 10, No. 2, pp. 24-33, June 1993.
[16] D. M. H. Walker and S. W. Director, “VLASIC: A Catastrophic Fault and Yield
Simulator for Integrated Circuits”, IEEE Trans. on CAD of Integrated Circuits and
Systems, vol. 5, pp. 541-556, October 1986.
[105] Clair Webb, et al., 45nm Design for Manufacturing, Intel Technology Journal,
Vol 12, Issue 02, ISSN 1535-864x, 2008
94
[147] Neil H.E. Weste, David Harris, CMOS VLSI Design: A Circuits and Systems
Perspective 3rd Ed., Addison Wesley 2006
[61] C. Wickman , D. Elliott , Bruce F. Cockburn, “Cost Models for Large File
Memory DRAMs with ECC and Bad Block Marking”, Proceedings of the 14th
International Symposium on Defect and Fault-Tolerance in VLSI Systems, p.319,
November 01-03, 1999
[107] T. Yamagata, H. Sato, K. Fujita, Y. Nishmura and K. Anami, ``A Distributed
Globally Replaceable Redundancy Scheme for Sub-Half-micron ULSI Memories
and Beyond," IEEE J. of Solid-State Circuits, vol. 31, pp. 195-201, 1996.
[116] K. Yamashita, S. Ikehara, "A Design and Yield Evaluation Technique for
Wafer-Scale Memory", IEEE Computer, vol. 25, no. 4, 1992.
[54] O.W. Yeung and K. M. Chugg. An Iterative Algorithm and Low Complexity
Hardware Architecture for Fast Acquisition of PN codes in UWB systems. In
Springer J. VLSI Signal Processing (special issue on Ultrawideband Systems), 2006.
[117] H. Walker, S.W. Director, "VLASIC: A Catastrophic Fault Yield Simulator for
Integrated circuits", IEEE Transactions on Computer Aided Design (TCAD), vol. 5,
no. 4, 1986.
[118] X. Wang, M. Ottavi, and F. Lombardi, “Yield analysis of compiler-based
arrays of embedded SRAMs,” in Proc. IEEE Int. Symp. Defect Fault Tolerance
VLSI Systems(DFT), pp.3-10, 2003.
[126] Clair Webb, et al., 45nm Design for Manufacturing, Intel Technology Journal,
Vol 12, Issue 02, ISSN 1535-864x, 2008
[127] T. Granlund et al., “Soft Error Rate Increase for New Generations of
SRAMs,” IEEE Trans. Nuclear Science, vol. 50, pp. 2065-2068, Dec. 2003.
[120] K. Zarrineh, A. P. Deo, and R. D. Adams, “Defect analysis and realistic fault
model extensions for static random access memories”, in Proc. IEEE Int. Workshop
on Memory Technology, Design and Testing (MTDT), pp. 119–124, 2000
[96] Morelos Zaragoza, Robert H., The Art of Error Correcting Codes, J. Wiley &
Sons, 2002.
[55] H. Zhu. Error-tolerance in Digital Speech Recording Systems. M.S. Thesis,
Department of Electrical Engineering - Systems, University of Southern California,
2006.
[104] J. F. Ziegler, “Terrestrial cosmic rays,” IBM J. Res. Develop., vol. 40, no. 1,
pp. 19–39, Jan. 1996
95
[17] Ziegler, et al., “IBM Experiments in Soft Fails in Computer Electronics (1978 -
1994)”, IBM J. of Research and Development, vol. 40, no. 1, January 1996. 17
[76] http://www.mosys.com/Technology/Technology-Overview.aspx
[77] Zhu, Z.; Johguchi, K.; Mattausch, H.J.; Koide, T.; Hironaka, T., "Low power
bank-based multi-port SRAM design due to bank standby mode," Circuits and
Systems, 2004. MWSCAS '04. The 2004 47th Midwest Symposium on , vol.1, no.,
pp. I-569-72 vol.1, 25-28 July 2004
[80] http://download.intel.com/technology/architecture-silicon/65nm-technology/tech
nology-manufacturing-leadership.pdf
[81] download.intel.com/technology/itj/q41997/pdf/manufacturing.pdf
[82] www.viragelogic.com
[83] www.artisan.com
[84] www.mosys.com
[85] www.novelics.com
[86] www.ti.com
[87] www.faraday-tech.com
[88] www.uniramtech.com
[89] www.dolphin-ic.com
[91] http://www.eetasia.com/ART_8800474643_499486_NP_f76d0612.HTM
[92] http://www.chipworks.com/blogs.aspx?id=2706&blogid=86
[93] http://notes.sematech.org/index4.htm
[123] paradise.ucsd.edu/class/ece165/notes/lecC.pdf
[128] Measurement and Reporting of Alpha Particles and Terrestrial Cosmic Ray-
Induced Soft Errors in Semiconductor Devices, JEDEC JESD 89.
[146] http://www.mosis.com/Technical/Testdata/tsmc-018-prm.html
[5] The Electrical Engineering Handbook, R. C. Dorf, editor-in-Chief, Fault
Tolerance, Chapter 87, by B. W. Johnson, CRC Press, 1993, pp. 2020.
96
Appendix A:
Consideration on Performance
Overall access time overhead is composed of row and column access times. Row
access time is the time required for a certain wordline to be activated, while column
access time is associated with discharging a pre-charged bitline. As for row decoders,
we use dynamic NOR type decoders [124][145], which can facilitate matching the
pitches of decoders to the dimensions of SRAM cells (Figure A(a)).
V
dd
precharge
Address signals
Cell Arrays
Wordline
Dynamic ROW Decoder
Single Cell
Wordline
Bitline
Multiplexer
Data in/out
(a) (b)
Figure A. (a) A detailed view of row decoders and a single cell in sub-array blocks and (b)
Sub-array interconnect topology
97
As shown in the figure, only one output of the dynamic row decoder remains
unchanged (i.e., maintained as charged) while the others are discharged by address
signals. The output of dynamic row decoders should drive large capacitive wordline,
which is mostly due to gate capacitances connected to wordline (as shown in Figure
A(a)) and wire capacitances of wordline. Therefore, inverter chains with progressive
sizing are incorporated for delay optimization. Transistors in inverter chains are
oriented horizontally to match pitches with SRAM core cells. Input address signals
also need inverter chains to drive large numbers of gate capacitances in row decoders.
As for column access time, discharging time of precharged bitline is mostly
dependent of source/drain capacitances (from individual cells) attached to bitline,
wire capacitance of bitline, and multiplexers that select desired data from sub-arrays
(Figure A(b)). As for precharging time, it is the time it takes for recovering
discharged bitlines back to the precharged state.
As for sense amplifiers, we use latch type devices [145], and, to make the size
of sense amplifiers to be small as much as possible, isolate transistors (i.e., on/off
switches) are installed at the end of bitlines (but before the locations of sense
amplifiers) [145], which can physically isolate most of capacitive loads due to
bitlines when necessary (Figure B(b)). This enables the significant reduction of loads
seen by sense amplifiers, thereby making the time for the sense amplifying phase to
be ignorable. Such small sized amplifiers consisting of five transistors are
comparable with six transistor SRAM core cells in terms of area overheads.
Therefore, even if such sense amplifiers are attached at the end of each bitline, their
area overheads can be estimated as nothing but a single row of SRAM array.
98
Inverter chains for input address signals and wordline activations are created
under the FO4 rules, and their delay computations can be done using logical effort
analysis. As for the delay of dynamic NOR row decoder, discharging time can be
computed by using distributed RC model analysis (Elmore delay model) as shown in
Figure B(a). As for bitline discharging, it is accomplished through ground path in
SRAM cell as shown in Figure B(b). Note that bitline is also connected through word
select multiplexers and sub-array select multiplexers as well as SRAM transistors.
Such facts are considered and incorporated in the equivalent distributed RC model as
shown in Figure B(b).
Mux’s of sub-array
Interconnect
Mux of word-select
bit_b bit
sense sense_b
sense_clk
isolation
transistors
regenerative
feedback
Isolation Transistor
Sense Amplifier
R
SRAM, eq
R
wire
C
wire
R
mux, word
C
mux, word
R
mux, int 0
C
mux, int 0
R
mux, int 1
C
mux, int 1
R
mux, int N
C
mux, int N
R
isolation trans
C
isolation trans
V
dd
Wordline 0
C
eq,horizontal
R
Ntype,dec
C
eq,vertical
R
Ntype,prech
precharge
Wordline 1
Wordline 2
Wordline 3
Discharging path of dynamic NOR row decoder
Corresponding equivalent circuitry
Discharging through SRAM cell
Column discharging path Equivalent circuitry
(a) (b)
Figure B. (a) Discharging mechanism of NOR row decoder and its equivalent circuitry, and
(b) bitline dischargng path and its equivalent distributed RC circuitry
The above approaches for estimating the access time are implemented using
C++ under the TSMC 180um technology. Such a performance model is to be used in
99
conjunction with yield, area, and soft-error rate models for identifying optimal
designs in terms of yield-per-area for given constraints on performance and soft-error
resiliencies, as you will see in the Case Study in the next section.
100
Appendix B:
Case Study
In this case study, we show the overall process of identifying an optimal SRAM
configuration under given performance and soft-error resilience constraints, by
integrating all the methodologies that we have developed in this dissertation.
Objective: 16Mb SRAM that maximizes yield-per-area
Specification: access time < 9ns, Soft-error resilience > 0.9, and word length = 64bits
Environment: defect rates of 100 defects/cm
2
and soft error rates of 10
7
FIT/Mb
Step 1. For many possible bank sizes and aspect ratios, architectures with the access
time of less than 9 ns are chosen as candidate designs. Figure C shows access time
overheads for different configurations of bank sizes and their aspect ratios. Note that
the aspect ratios of bank modules are constrained into 1:1, 2:1, or 1:2 ratio
considering floorplanning. Among the list of configurations, architectures with the
bank sizes of 512
x
1024, 1024
x
1024, and 1024
x
2048 are selected, which
respectively have the access time overheads of 8.39 ns, 8.16 ns, and 7.64.
101
0
5
10
15
20
25
access time (ns)
1
2
3
4
5
6
7
8
9
.
.
.
Bank size
64K
128K
128K
256K
512K
512K
1M
2M
2M
4M
8M
16M
: 1:1 Ratio
: 2:1 Ratio
: 1:2 Ratio
(Height-to-Width Ratio)
: 1:1 Ratio
: 2:1 Ratio
: 1:2 Ratio
(Height-to-Width Ratio)
Figure C. Access time comparisons for different bank sizes and aspect ratios.
Step 2. As for ECC code, double error correcting codes are used, since the use of
single error correcting codes fails to achieve the reliability of 0.9 for 16Mb. DECs
with the code length of 512, 256, and 128 achieve the reliability of 0.7675, 0.9355,
and 0.9836, respectively. We therefore constrain our design candidates into DEC-
256 and DEC-128.
Step 3-a. We now identify optimal numbers of spare rows and columns that achieves
the maximum yield-per-area for each possible configuration. We then find an
ultimate optimal design among them. Table A shows the overall yield-per-area
values for possible configurations with various numbers of spares. According to the
table, the optimality is achieved when the architecture is composed of the bank size
of 1024x2048 (i.e., the number of banks is 8 for 16Mb system), DEC with the code
length of 256, and 7 spare columns and 6 spare rows. Optimal spacing and width
values of sub-array interconnects are computed as 14 λ. (See Step 3-b.)
102
Table A. Yield-per-area values for each possible configuration when the number of spares
are varied.
No. of spares Yield-per-area values ( λ
-2
)
Bank size 512x1024 512x1024 1024x1024 1024x1024 1024x2048 1024x2048
Columns Rows
ECC code DEC-256 DEC-128 DEC-256 DEC-128 DEC-256 DEC-128
0 0
2.75E-17 2.57E-17 2.85E-17 2.66E-17 3.00E-17 2.80E-17
1 0
4.83E-15 4.52E-15 3.49E-15 3.26E-15 2.10E-15 1.96E-15
0 1
1.11E-15 1.04E-15 9.50E-16 8.88E-16 7.29E-16 6.81E-16
1 1
2.91E-12 2.72E-12 9.28E-13 8.67E-13 1.97E-13 1.84E-13
1 2
7.06E-12 6.60E-12 2.80E-12 2.62E-12 7.27E-13 6.79E-13
2 1
9.59E-12 8.97E-12 4.40E-12 4.11E-12 1.26E-12 1.17E-12
2 2
2.77E-11 2.59E-11 1.74E-11 1.63E-11 6.41E-12 5.99E-12
2 3
3.15E-11 2.95E-11 2.28E-11 2.13E-11 1.04E-11 9.75E-12
3 2
3.34E-11 3.13E-11 2.61E-11 2.44E-11 1.36E-11 1.27E-11
3 3
3.84E-11 3.59E-11 3.53E-11 3.30E-11 2.39E-11 2.24E-11
3 4
3.90E-11 3.65E-11 3.72E-11 3.48E-11 2.80E-11 2.62E-11
4 3
3.92E-11 3.67E-11 3.84E-11 3.59E-11 3.12E-11 2.91E-11
4 4
3.98E-11 3.72E-11 4.07E-11 3.80E-11 3.71E-11 3.47E-11
4 5
3.98E-11 3.72E-11 4.09E-11 3.83E-11 3.88E-11 3.62E-11
5 4
3.99E-11 3.73E-11 4.12E-11 3.85E-11 4.02E-11 3.76E-11
5 5
3.99E-11 3.73E-11 4.16E-11 3.88E-11 4.21E-11 3.94E-11
5 6
3.98E-11 3.72E-11 4.16E-11 3.88E-11 4.26E-11 3.98E-11
6 5
3.99E-11 3.73E-11 4.16E-11 3.89E-11 4.30E-11 4.02E-11
6 6
3.98E-11 3.72E-11 4.16E-11 3.89E-11 4.35E-11 4.06E-11
6 7
3.97E-11 3.72E-11 4.16E-11 3.89E-11 4.35E-11 4.07E-11
7 6
3.98E-11 3.72E-11 4.16E-11 3.89E-11 4.37E-11 4.08E-11
7 7
3.97E-11 3.71E-11 4.16E-11 3.89E-11 4.37E-11 4.09E-11
- Data-to-code ratio is included in yield-per-area calculations.
Step 3-b. Yields and area overheads associated with sub-array (i.e., bank)
interconnects are considered in conjunction with step 3-a. For the given memory size
(i.e., 16Mb) with the fixed number of banks (i.e., 8), variations of area overheads
including ECC schemes (DEC-128 and DEC-256) and spares (up-to 7 spare rows
and 7 spare columns) are in the ranges of 18898375842 λ
2
~ 19256333358 λ
2
. For
such ranges of area overheads, optimal spacing and width values for interconnects
are between 13.867 λ and 13.996 λ, indicating that the optimal values of width and
spacing are fixed into 14 λ, regardless of the choice of ECC scheme and spares.
Abstract (if available)
Abstract
Since the advent of computer-aided design (CAD) of digital systems, technological constraints and advancements as well as market forces have been major drivers of the direction of design methodologies. We have seen the emphasis shift from area minimization in the LSI era, to delay minimization in the early part of VLSI era, and to power minimization in the recent decade. Given current technology trends, the IC industry will soon need to address the next paradigm shift in CAD of digital systems, since digital system design will soon confront computing technologies and fabrication processes with extremely high levels of variations in the values of key parameters, such as defect densities and soft error rates. While defect-tolerance (DT) and fault-tolerance (FT) techniques have matured over the past 50 years, they have been applied to a limited class of digital subsystems and in an era of relatively low defect densities and low soft error rates. Furthermore, FT techniques have been largely confined to avionics and other critical systems where cost is not a main objective. In contrast, we must soon apply these approaches to the entire range of digital systems, including those with strict constraints on cost, performance, and power. A direct extrapolation of how DT/FT techniques are currently applied will erode much of the benefits of new processes/technologies. Our early results clearly show that careful application of DT/FT techniques will provide significant gains in the near future and will become increasingly important thereafter. Our results also show that a large space of possible ways of applying these techniques must be searched carefully to obtain efficient designs. This makes it imperative to develop new systematic approaches to efficiently apply current and new DT and FT techniques to all digital systems.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
High level design for yield via redundancy in low yield environments
PDF
Variation-aware circuit and chip level power optimization in digital VLSI systems
PDF
Design and testing of SRAMs resilient to bias temperature instability (BTI) aging
PDF
Demand based techniques to improve the energy efficiency of the execution units and the register file in general purpose graphics processing units
Asset Metadata
Creator
Cha, Jae Chul
(author)
Core Title
Optimal defect-tolerant SRAM designs in terms of yield-per-area under constraints on soft-error resilience and performance
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2010-05
Publication Date
05/07/2010
Defense Date
03/30/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
defect-tolerance,ECC,memory systems,OAI-PMH Harvest
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Gupta, Sandeep K. (
committee chair
), Nakano, Aiichiro (
committee member
), Pedram, Massoud (
committee member
)
Creator Email
jaecha@usc.edu,jccha74@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3053
Unique identifier
UC1166998
Identifier
etd-CHA-3693 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-339427 (legacy record id),usctheses-m3053 (legacy record id)
Legacy Identifier
etd-CHA-3693.pdf
Dmrecord
339427
Document Type
Dissertation
Rights
Cha, Jae Chul
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
defect-tolerance
ECC
memory systems