Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Performance issues in network on chip FIFO queues
(USC Thesis Other)
Performance issues in network on chip FIFO queues
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
PERFORMANCE ISSUES IN NETWORK ON CHIP FIFO
QUEUES
by
Aniket Kadkol
A Thesis Presented to the
FACULTY OF THE SCHOOL OF ENGINEERING
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment o f the
Requirements for the Degree
MASTER OF SCIENCE
(ELECTRICAL ENGINEERING)
December 2004
Copyright 2004 Aniket Kadkol
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UMI Number: 1424245
INFORMATION TO USERS
The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, colored or poor quality illustrations and
photographs, print bleed-through, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
UMI
UMI Microform 1424245
Copyright 2005 by ProQuest Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company
300 North Zeeb Road
P.O. Box 1346
Ann Arbor, Ml 48106-1346
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ii
ACKNOWLEDGEMENTS
I would like to express my sincere gratitude to my thesis advisor and committee
chair Dr. Alice C. Parker whose constant guidance and encouragement has been a
major force behind the materialization of this thesis project. A word of thanks to
Dr. Won Namgoong and Dr.Hashemi Hossein for agreeing to be on the committee
and giving helpful advice time to time. I would like to thank Dr. Won Namgoong
especially for his lectures and notes on SRAM design which have been really
helpful for this research work.
I would like to thank Praveen Krishnanunni my project partner and a close friend
whose discussions and ideas from his research on Area comparison of Network
Processor Network Queues gave me more insight in this work and were helpful
for the completion of this thesis.
Finally I would like to thank my parents and my brother Anup for constantly
being there for me through my Master’s studies at USC. This thesis is dedicated
to them.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
iii
TABLE OF CONTENTS
Acknowledgements ii
List of tables iv
List of figures vi
List of graphs viii
List of Nomenclatures ix
Abstract viii
Chapter 1 : Introduction 1
Chapter 2 : Related Work 6
Chapter 3 : FIFO System Architecture 9
Chapter 4 : Memory 15
Chapter 5 : Address Generation Block 31
Chapter 6 : Flag Generation Block 33
Chapter 7 : Analysis and Results 36
Chapter 8 : Conclusion and Future Work 57
Bibliography 59
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
iv
LIST OF TABLES
Table 1 Read Cycle time for Counter and Decoder type of addressing for
0.18y. 35
Table 2 Read Cycle time for Counter and Decoder type of addressing for
0.15p. 35
Table 3 Read Cycle time for Counter and Decoder type of addressing for
0.13y. 36
Table 4 Read Cycle time for Counter and Decoder type of addressing for
0.09p - 36
Table 5 Read Cycle time for ring pointer type of addressing for 0.18p. 41
Table 6 Read Cycle time for ring pointer type of addressing for 0.15 p . 41
Table 7 Read Cycle time for ring pointer type of addressing for 0.13p. 41
Table 8 Read Cycle time for ring pointer type of addressing for 0.09p. 42
Table 9 Read Cycle time for counter and decoder type of addressing 0.18p
with reduced power supply 43
Table 10 Write Cycle time for Counter and Decoder type of addressing for
0.18p. 45
Table 11 Write Cycle time for Counter and Decoder type of addressing for
0.15p. 46
Table 12 Write Cycle time for Counter and Decoder type of addressing for
0.13p. 47
Table 13 Write Cycle time for Counter and Decoder type of addressing for
0.09p. 47
Table 14 Write Cycle time for ring pointer type of addressing for 0.18p. 47
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
V
Table 15 Write Cycle time for ring pointer type of addressing for 0.15 p . 48
Table 16 Write Cycle time for ring pointer type of addressing for 0.13 p . 48
Table 17 Write Cycle time for ring pointer type of addressing for 0.09p. 48
Table 18 Read Cycle time for ring pointer type of addressing for 0.18p. 51
Table 19 Read Cycle time for ring pointer type of addressing for 0.15 p . 51
Table 20 Read Cycle time for ring pointer type of addressing for 0.13 p . 51
Table 21 Read Cycle time for ring pointer type of addressing for 0.09y. 51
Table 22 Write Cycle time for ring pointer type of addressing for 0.18p. 54
Table 23 Write Cycle time for ring pointer type of addressing for 0.15 p . 54
Table 24 Write Cycle time for ring pointer type of addressing for 0.13 p . 55
Table 25 Write Cycle time for ring pointer type of addressing for 0.09p. 55
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
VI
LIST OF FIGURES
Figure 1 Basic FIFO block 9
Figure 2 Generic Shift-Register Cell Composition 10
Figure 3 Ring Pointer FIFO Architecture 12
Figure 4 6 Transistor SRAM Cell 15
Figure 5 Write circuit for the SRAM cell 16
Figure 6 Dual Port SRAM Cell 17
Figure 7 1 Transistor DRAM Cell 19
Figure 8 3 Port 3 Transistors DRAM Cell 21
Figure 9 Refresh Control Logic 24
Figure 10 Logic inside the comparator block in the Refresh
Control Logic Block 24
Figure 11 Sense Amplifier Circuitry for SRAM Cell 26
Figure 12 Open bit-line architecture with dummy cells 29
Figure 13 Pre-decoder and Decoder chain to drive the wordlines
for a 1024 location memory 31
Figure 14 2 bit Ring Pointer circuit 32
Figure 15 Flag generator block 34
Figure 16 Flag Generation Logic 35
Figure 17 Read Clock for SRAM 36
Figure 18 Write Clock for SRAM 44
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
vii
LIST OF GRAPHS
Graph 1 Delay variations with Process changes. 39
Graph 2 Variations in Read Cycle times for different number of locations
with decoder and counter type of addressing method. 40
Graph 3 Delay comparison graph for different addressing methods. 43
Graph 4 Variations in Write Cycle times for different number of locations
with decoder and counter type of addressing method. 44
Graph 5 Variations in Write Cycle times with process changes. 46
Graph 6 Variations in Read/Write Cycle times with process changes. 47
Graph 7 Variations in Read Cycle times for different types of Memory. 52
Graph 8 Comparison of Read Cycle times and area for a 2-port DRAM
with the 3-port DRAM 53
Graph 9 Comparison of Read Cycle times and Aarea for a 3-port DRAM
with the 2-port SRAM. 53
Graph 10 Variations in Write Cycle times for different types of Memory 56
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
LIST OF NOMENCLATURES
Mbps - mega bits per second
Gbps - giga bits per second
kb - kilobit
ns - nanosecond
TR - transmission block
V dd - Power supply
BL - bit line
~BL - bit bar line
Cs - storage capacitance
C b l - bit line capacitance
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
fF - femto farads
(j.s - micro seconds
LSB - least significant bit
MSB - most significant bit
Wc ik- Write clock
R cik - Read clock
X - process
p - micron
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Abstract
x
Increasing line rates require higher memory bandwidth and access speeds. With
line rates expected to reach OC-192 (10 Gbps) values soon, memory bandwidth
and access speeds are becoming a bottleneck in network performance. Memory
cores in FIFO buffers frequently use Dual port SRAM or DRAM cores. We have
looked at the performance related issues of these memory cores while proposing a
3-port DRAM cell to decrease the refresh rates and hence eventually allow for
faster access times. This thesis also compares the different methods of addressing
with respect to speed.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Aniket Kadkol Alice C.Parker
ABSTRACT
PERFORMANCE ISSUES IN NETWORK ON CHIP FIFO QUEUES
Increasing line rates require higher memory bandwidth and access speeds. With
line rates expected to reach OC-12 (10 Gbps) values soon, memory bandwidth
and access speeds are becoming a bottleneck in network performance. Memory
cores in FIFO buffers frequently use Dual port SRAM or DRAM cores. We have
looked at the performance related issues of these memory cores while proposing a
3-port DRAM cell to decrease the refresh rates and hence eventually allow for
faster access times. This thesis also compares the different methods of addressing
with respect to speed.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1
Chapter 1: Introduction
High-performance systems like routers and high-speed processors use buffers
to synchronize data streams between two asynchronously operating
subsystems. A FIFO memory provides high-speed data buffering between
systems that operate at different speeds [16]. The three types of memory
implementation in FIFOs commonly used are the shifting type, SRAM and
DRAM memories [17]. The most commonly used SRAM memory core
FIFOs employ a dual-port SRAM cell where simultaneous read and writes
are possible, resulting in faster operation than the single-port RAMs. The
problem with the dual-port SRAM-type FIFOs is that they are not dense and
occupy large areas, so they are generally avoided in big FIFO applications. In
such situations, DRAM memory cores are used, where a big core of DRAM
memory facilitates main data storage and a small SRAM buffer coupled with
it allows for faster access. Normally, ring pointer circuits control the
operation of these memory buffers. The ring pointer circuits keep track of the
locations where data is to be written or from where data has to be read. These
ring pointer circuits are used for addressing, as they are much faster than the
decoder addressing circuits. Basic implementations of ring pointers include
the shift register, where logic 1 is shifted along the chain. Hence, at every
clock, the next location of the memory is pointed to as the logic 1 keeps
shifting. The problem with such shift register circuits is the large power
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2
consumption due to the large cumulative capacitance in the clock line of the
register chain.
A low-power pointer circuit using a single pass-transistor for a transmission
gate is another kind of ring pointer circuit. D-latches are often used, and
double edge- triggered pointer circuits are used as the ring pointers [16].
This report concentrates on the performance issues of these different memory
modules while also looking at a proposed 3-port DRAM cell that allows for
faster refresh operation and hence faster DRAM access time.
1.1 Applications:
FIFO memories are finding applications in communication and signal
processing applications; without these FIFOs, asynchronous data transfer
would be a challenging task. From ATM switches to Ethernet applications,
we find FIFOs in use everywhere. ATM switches are used in digital
communication technology due to their capability of handling the conflicting
requirements of voice, data and video transmission. An ATM data unit
consists of fixed 53-byte blocks, which are suitable for transmission on both
local (LANs) and wide area networks (WANs). The expected data rates
defined by the SONET standard include transmission rates exceeding gigabits
per second (Gbps). Future standards are likely to reach speeds in the range of
tens of gigabits per second. Now consider a conflict in such a high-speed
system when two packets arrive at a switch at the same time from two
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3
different sources. During the resolution of conflict, a buffer implemented as a
FIFO is required to store the data. After the conflict is resolved, the FIFO is
used again to store the packet that was not transmitted immediately for
deferred transmission. As the data rates increase, FIFO performance typically
limits the overall data rate that can be handled by the ATM switch. FIFO
speeds become the bottleneck in these fast transfer operations. Ring pointers
usually keep track of where the data is to be written or read from in a RAM
type FIFO memory [12]. Ring pointer circuits are required to give higher
throughput at lower voltages and consume low power in order to optimize the
FIFO performance.
1.2 Thesis Overview:
Increasing network traffic is placing a big demand on the bandwidth of
queues and buffer systems. FIFO bandwidth/performance plays an important
role in determining the system performance. Current line speeds are in the
OC-12 (622 Mbps) to OC-48 (2.5 Gbps) range but in the near future we will
be faced with OC-192 (10 Gbps) line rates.
When a packet comes in on a line, buffering of that packet in the FIFO
involves the following steps:
• The packet is stored or written in one location of the memory buffer.
• The packet is read and processed by the various functional units.
• The processed packet is then stored in the memory buffer again.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Finally, the processed packet is read out.
4
The memory throughput here has to be 4 times the line rate meaning 40Gbps
assuming we are operating at a line rate of lOGbps. This memory throughput
can be halved by using Dual port memory modules, which is the current trend
in FIFO memories as they simplify the write/read operation from the memory
buffer. The Netchip project group had set the internal packet size at 1Kb for
simulation purposes hence the memory access time is lKb/20Gbps = 47ns /
packet. This 47ns of memory access time includes the packet buffering and
processing times. We set the target time for buffering a packet to 5ns. We
have looked at on-chip RAMs that will allow this buffering time. This thesis
discusses performance parameters of network on- chip FIFO queues using
dual port SRAM and DRAM type memory modules used in the FIFO design
to reach the target of 5ns buffering time. We have tried to provide
performance numbers of the memory modules and have proposed an
innovative 3-port DRAM cell, which is faster than the current DRAM
memories used in FIFOs.
Chapter 2 gives an overview of research being done on FIFO memory design.
Chapter 3 goes over the various blocks, which are the part of the FIFO
architecture. Chapter 4 gives a brief understanding of Memories currently
used and the modifications made in the DRAM memory to better
performance. Chapter 5 discusses the addressing issues used. Chapter 6 gives
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5
an insight into how the ‘Empty’ and ‘Full’ flags are generated in a FIFO.
Chapter 7 presents the numbers and results from the simulations. Chapter 8
gives an idea of where FIFO memory design is heading in the future.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6
Chapter 2: Related Work
Research in the area of on-chip memory design to improve buffer
performance and hence the data transfer speeds in high-speed communication
networks is an ongoing process. This chapter looks at the various areas of
research in FIFO memory design, providing a prelude to the new 3-port
DRAM memory module we proposed which is faster than the conventional
DRAMs currently in use in on-chip memory buffers. On-chip memory
buffers use SRAM, DRAM, or sometimes both forms of memory modules.
Bharadwaj S. Amrutur in his thesis on “Design and Analysis of Fast low
power SRAMs” [10] describes design of SRAM memories and the related
circuitry like the decoders and the sense amplifiers. This thesis is an excellent
reference when embarking on SRAM memory design. We used this thesis as
a basic reference when looking at SRAM memory and related circuitry.
Shumao Xie, Vijay Krishnan and M.J. Irwin in their paper on FIFO memory
[17] look at a 7-transistor implementation of the dual port SRAM cell, which
is said to increase performance and reduce power consumption. They say that
this 7-transistor implementation need not be sized and hence saves on chip
area as compared to the 6-transistor SRAM cell, which needs to be sized for
correct read and write operations. The 7-transistor memory cell
implementation needs control logic in the circuitry, which increases the area,
though. Their work gives good insight into a novel method of dual port
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
7
SRAM design and includes the advantages over designs with 6-transistor
implementation with and without sense amplifier circuitry.
Two important aspects that FIFO designer have been working on are how to
increase the amount of buffer space on chip and how to make buffers faster.
DRAM memories are denser but slower due to the refresh needed in each
DRAM memory cell and hence are generally not used in FIFO memories.
SRAM memories are faster but since they are not as dense as DRAM
memories, a large amount of SRAM memory cannot be used on chip.
Takayasu Sakurai et al. in their work on “Transparent Refresh DRAM
(TReD) Using Dual Port DRAM cell” [15], talk about a methodology that is
supposed to make the DRAM memory almost refresh free. This transparent
refresh DRAM is a dual port DRAM cell in which one port is used for refresh
and the other port is used for read/write operations. This work provides
insight into how DRAM’s could be used instead of SRAM memories in
FIFOs as the transparent refresh almost removes the refresh time. We studied
the dual port DRAM proposed by Takayasu Sakurai et al. In our FIFO we
have made modifications to Sakurai’s DRAM and proposed a 3-port DRAM
cell with refresh bit line. Some designers wanting to make use of the
advantages of each form of memory use both SRAM and DRAM in their
memory modules, large DRAM core to hold most of the data and a small but
fast SRAM cache for fast data transfer operations. Masashi Hashimoto, et al.,
in their paper on a 256K * 4 FIFO Memory [8] discuss the use of such
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
8
architecture in order to have a large memory buffer. These memories need
complex circuitry to control the flow of data.
FIFO memory buffers need low-power addressing circuits in order to point to
the location where the data is to be written to or read from. Masashi
Hashimoto et al., in their memory design also describe a low-power high
speed address pointer. This address pointer or ring-pointer circuitry has been
used in our design and has been explained in Chapter 5. Another relevant
publication, which talks about the different address pointer designs, is the
research work of Haibo Wang and Sarma Vrudhula [16]. They describe a
new address-pointer design, which can be used in FIFO memory designs that
operate at low supply voltages. An important aspect of FIFO memory design
is the generation of empty and full flags to indicate the status of the buffer.
Clifford E. Cummins has done relevant research in this area. His publication
on Asynchronous FIFO design [17] is of great aid when designing the flag
generator block. The work on asynchronous FIFO design looks at various
issues to be considered when the clock domain for read and write are
different. The Flag generator block used in this thesis is a result of
referencing the work done by Clifford E. Cummins [17] and coming up with
some new ideas in collaboration with Praveen Krishnanunni [20].
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
9
Chapter 3: FIFO System Architecture
3.1 Basic FIFO block:
The term FIFO stands for first in first out; the data stored in the buffer that
goes first comes out first. Shown below is a primitive FIFO block diagram
and functionality of the FIFO is explained.
D ata In D ata 1 ► D ata O u t
Figure 1: Basic FIFO block
Data comes in and is stored in the memory buffer but the Data Out transfer
goes in the order the data had come in, so in this case Data 1 came in first
then Data2, Data3 and so on. Datal is stored in location 1, the next incoming
data is routed to location 2 and when the readout occurs, the data read starts
from location 1. The FIFO fimctioning is such that, when the readout is done,
Data 1 will be read out first and so on.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
10
Since FIFO buffers form such an integral part of high-speed systems,
different implementations are tried for faster performance. The two
architectures commonly used in FIFO designs are the shift register design and
the ring-pointer based design with RAM buffers. This report looks at the
different kinds of RAM used for memory buffers with the different ring-
pointer circuits and discusses the tradeoffs of each type of pointer circuit and
the different memory implementations.
3.2 Shift Register Based FIFO Design:
Shift-register cells are commonly used in sequentially accessed memories to
provide very fast write and read data rates [12].
The generic CMOS shift - register cell is shown in Fig 2 [12].
PI P2
TR1
TR2
STORAGE
ELEMENT
& AMP
STORAGE
ELEMENT
& AMP
Figure 2: Generic Shift-Register Cell Composition
To store and transfer data, a shift register cell requires a control signal with
two distinct phases PI and P2, two transmission devices TR1 and TR2 and
two storage and amplifying blocks in each cell. The working of the shift-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
11
register cell is as follows: During the first phase, PI TR1 pushes data through
to the storage element. The data in transfer might lose its signal value and
hence an amplifier is needed to boost the data value. Similarly, dining the
second phase P2 the TR2 or transmission block 2 pushes the value through to
the storage element and amplifier block. During Phase 2 TR1 is switched off
and hence, the new data coming in is not put into the Storage Element block.
The Shift Register cell is considered to be made up of two halves (TR1 and
the Storage and Amp block as the first half and TR2 and a second Storage
and Amp block as the second half). The shift register cell operation is fast,
meaning the data transfer is fast because of the minimum load on the output
of each half cell. Though the shift register architecture is fast, it is not used
because of the high power dissipation, which is caused due to the switching
on of one-half of all shift register cells at each phase. Another major
disadvantage of the shift-register architecture is the number of transmission
and storage blocks needed. For an N-bit array, 2N transmission and storage
blocks are needed which limits the packing densities.
The ring-pointer architecture is preferred since it has the advantages of low
power and low latency. Dynamic power dissipation due to transitions from
shifting data along the length of a shift register is eliminated. Latency is
reduced to the sum of the write time and the access time of the memory rather
than the time required for data to move the length of a shift register.
However, throughput is reduced in the ring-pointer architecture and is limited
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
12
by the charging of the write bit lines in the write path and the memory cycle
time in the read path which, in turn, is limited by the sense amplifier [12].
3.3 Ring Pointer FIFO:
Write Enable Read Enable
'Full Flag Empty Flag
DATA OUT
READ
POINTER
WRITE
POINTER
DATA IN
FLAG
GENERATOR
FLAG
GENERATOR
MEMORY
ARRAY
Figure 3: Ring-Pointer FIFO Architecture
The block diagram of a Ring-Pointer based FIFO is shown in Fig 3
The central block of this FIFO is the memory array; a dual port SRAM or
DRAM type of memory is used for the memory array. SRAM is commonly
used, as it is faster in operation and does not require the complex refresh
circuitry that the DRAM memory needs.
The choice of dual-port SRAM is commonly made for FIFOs since it allows
simultaneous read and write in the same clock cycle, as one of the two ports
can be used for reading and the other port can be used for writing, hence it is
faster. The memory cell of the dual port SRAM will be discussed later in
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
13
detail, including how reads and writes are performed. Now we look at the
basic functioning of the above ring-pointer FIFO architecture.
When data is to be buffered in the FIFO memory, a Write Enable signal is
generated. This signal goes to two blocks; the Flag Generator block and the
Write Pointer block; the Write Pointer points to the location where data is to
be written. Assuming we start at location 0, the Write Pointer will point to
that location when the first Write-Enable signal comes in. The data is stored
in the location pointed to by the write pointer. The write pointer circuit points
to the next location on the next Write-enable signal. The Read Pointer
performs the exactly the same operation based on the Read-enable Signal. It
points to the location where the Data was stored first. Similarly, the Read
Pointer continues pointing to the next location on every Read- enable signal
and thus the FIFO performs its Write and Read Operation.
The Flag Generator units generate the ‘Empty’ Flag on the information from
the Write Pointer side and the ‘Full’ Flag on the information from the Read
Pointer side.
The ‘Empty’ and the ‘Full’ flags let the control unit know the status of the
FIFO and hence if data can be written into or read from the FIFO. If the Flag
generator shows ‘Full’ then no more data, can be written; similarly if the Flag
generator shows ‘Empty’ then there cannot be any readouts from the FIFO.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
14
The speed of the FIFO is an important factor; the entire system’s data transfer
relies on how fast the location in memory can be pointed to and how fast it
can be written/read from the memory. The memory and the ring pointer
circuits were looked at extensively, as they are in the critical path and could
solve some of the bottlenecks in high-speed data transfer and buffering.
These are the Critical blocks of the FIFO architecture and they influence the
speed; hence, we start by looking at those blocks first.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
15
Chapter 4: Memory
Memory Array Blocks in FIFOs use static or dynamic types of memories.
The dual-port SRAM cell and the 3-port DRAM architectures have been used
in this thesis to implement the FIFO memory. A brief description and
working of these memory structures follows.
4.1 Single Port SRAM cell:
M2~C,
M5
M l
~BL
WL
M3
M4
M6
BL
Figure 4: 6 Transistor SRAM Cell
Figure 4 shows the diagram of a 6T SRAM cell. The cell is made up of two
cross-coupled inverters. Two pass transistors M5 and M6 give access to the
cell when the word line (WL) is enabled. The word line and the 2-Bit lines
(BL and BL complement) control the write and the read operations of this
cell. During the write operation, if a 1 has to be written then BL is left at
V dd=2.5v or the appropriate voltage and BL complement is pulled down to
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
16
ground and the Word line WL is enabled; hence a 1 is written at Q. When
writing a 0 the BL line is grounded and the BL complement line is left at the
appropriate voltage
The following diagram shows how the transistors M10 and M il allow for
grounding one of the bit lines during the write operation depending on the
value of the data.
Precharge
bar ■ =
r
Write enable
Bit
i Precharge
u -------
6T SRAM CELL
M il
M10
Bit bar
Precharge
data
Write enable
Figure 5: Write circuit for the SRAM cell
During the Read Operation, we pre-charge both the bit-lines to high and
enable the word line. Depending on the value stored in the cell, one of the bit-
line voltages dips, which is sensed by a Sense Amplifier and which gives out
the value stored in the cell.For this 6-transistor cell to work correctly some of
the transistors have to be sized correctly. 0 on the bit line must overpower M3
during write, hence M6 must be sized at least 2 times M3 in width. At the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
17
same time, M4 must be able to pull the bit line down during read so M4 must
be sized at least 2 times M6 in width.
4.1.1 Dual Port SRAM Cell:
Though the Dual-Port SRAM cell is approximately twice the size of a normal
SRAM cell it can be advantageous to use in FIFO memories as one port can
be used to read and the other port can be used to write data into the cell hence
providing faster access speeds.
The Dual Port SRAM cell has two word lines and two pairs of bit-lines. Since
we have two port functionality in this cell two more pass-transistors are also
added to the 6-transistor cell explained previously. Figure 6 shows the circuit
diagram of a Dual-Port SRAM cell.
~b1 ~b2
w 1
invl
M 1
M2
M4 M3
Inv2
Figure 6: Dual Port SRAM Cell
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
18
Both bit lines on the dual-port SRAM access the same cell but each port has a
different set of controls. The read port is controlled by the Read Pointer and
the read clock and similarly for the write clock. So both the ports can work
independently of each other and hence the Dual port SRAM allows faster
data transfer and higher bandwidth. The read and the write operation of a
Dual Port SRAM cell is the same as a 6- Transistor SRAM cell. The
difference is that one word line is used with a pair of bit-lines to form the
read port and the other word line with the other pair of bit-lines forms the
Write Port. When data is to be written into the cell then one of the word lines
say W1 is turned ON and the data is placed on the bit line. If a 1 is to be
written into the cell, bit-line bl is pre-charged to the high value, complement
bl is pulled down to 0 or ground and hence data is written into the cell. When
data is to be read out from this cell, then bit-lines b2 and complement b2 are
pre-charged to high and the word line is turned ON. Depending on the data
stored in the cell there is a dip in voltage on one of the bit lines. The sense
amplifier senses this dip in voltage on the bit-line; hence, the data is read out.
Since the operation of the read and write ports in a Dual Port SRAM cell is,
independent of each other the sizing of the transistors in the cell is similar to
that of the 6-Transistor SRAM cell. Although there are 8-transistors in the
cell, only six are functional when one of the ports is in operation.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
19
4.2 DRAM
4.2.1 1 Transistor DRAM:
BL
WL
'B L '
Ml
Figure 7: 1 Transistor DRAM Cell
A simple 1-transistor DRAM cell is shown in Fig 4 [12]. We have a single
pass transistor connecting to a Word line WL and a Bit-Line BL. The
operation of the IT DRAM cell is very easy. When we want to write data into
the cell, we place the data on the bit-line BL and raise or enable the word
line. The storage capacitor Cs is charged if a high value was on the bit-line
and is discharged if a low value was placed on the bit-line. During the read
operation, the Bit-line is pre-charged to a high value and the word line is
enabled. A charge distribution takes place between the storage capacitor and
the bit-line capacitor. A transition in voltage takes place on the bit-line, the
direction of which determines the value stored in the cell. Since the bit-line
capacitance is much larger than the storage capacitance, every read operation
destroys the value stored in the cell. We must refresh the cell value after each
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
20
read and periodically too if the cell is not read since the data stored in the
storage capacitor is destroyed. The refresh operation consists of feeding the
output of the sense amplifier back to the bit-line during read-out. The typical
values of capacitors used in DRAM cells are 50fF [10]. The value in a
DRAM cell is stored as capacitor charge, which leaks charge with the
passage of time, hence to retain the value in the cell the rows are refreshed
once every 1 ps.
There are different methods in a DRAM to refresh the cells. The commonly
used refresh methods are distributed and burst. In distributed type of refresh
method, a row is refreshed at a time whereas in burst mode refresh a few
rows are refreshed simultaneously [20]. The distributed and burst refreshes
are controlled by the memory controller, which has to stop the read/write
operation and give time for the refresh to take place [20]. We have looked at
another method of refreshing the memory cell; this definitely consumes more
area as we have 3-transistor cells with dedicated bit line and word line for
write/read and refresh operations.
The 3-transistor dedicated write/read and refresh DRAM cell is shown in Fig
8. The 3-port DRAM cell is derived from the transparent refresh DRAM
using a dual port cell [14]. The transparent refresh DRAM cannot be used in
FIFO memories where we need simultaneous read and writes.
For faster operation, we need dedicated ports for reading and writing into the
memory cell. Hence, we1 propose the 3-port DRAM with dedicated ports for
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
21
read, write and refresh. The 3-port DRAM has advantages when used in
FIFO buffers because of the dedicated refresh, and dual read/write ports but
has overhead in area because of the addition of a pass transistor and also
because the storage capacitor has to be twice as large as that of a
conventional DRAM cell [14] . The storage capacitance has to be twice as
large as that of a conventional DRAM cell because at any time 2 word lines
in the form of read and write word lines or read and refresh word lines or
write and refresh word lines can be opened to the cell. The capacitor then
sees the capacitance of two bit-lines. In order to maintain the bit-line signal
voltage at the same level we have to make the storage capacitor twice as large
as compared to the conventional DRAM [14].
Refresh Wordline
t
L
'--------------^
TIT
C T ^'
Ti T
I T
^ c
Write Wordline
Read Wordline
Refresh Bitline Write Bitline Read Bitline
Figure 8: 3 Port 3 Transistor DRAM Cell.
1 The proposed 3-port DRAM was a result of collabration with Praveen Krishnanurmi a
graduate student at USC, researching on Area comparisons in FIFO queues, and can be found
in his master’s thesis [11].
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
22
The refreshing technique in a 3-port 3-transistor DRAM cell is looked at as
an alternative to refreshing in a 1-transistor DRAM cell. The dedicated port
for refresh provides a simple way of refreshing the DRAM cell and hence is
considered. The two ports Read and Write are typically used for the FIFO
operation. The refresh control logic for the 3-port 3-transistor DRAM is
implemented as follows. The experimental FIFO used for simulations is 16
locations deep. Most related research papers on DRAM memories say that a
memory cell should be refreshed every 16 ps. If we do not refresh a memory
cell once every 16ps [19] , the DRAM cell storage capacitance would have
leaked enough charge that a logic 1 would not be recognized as a logic 1
anymore. We have a 16 location FIFO. Each row will have its value
discharged in 16ps. Therefore, the refresh cycle should be equal to lps so
that it reaches the 16th location in 16ps and hence the data value stored in the
cell is not destroyed. The write clock cycle is 5ns and the refresh cycle is 1 ps
so between 2 refresh cycles we can have lps/5ns = 200 write clocks. We
have to set the refresh clock so that it does one refresh before 200 write
clocks. We have used an 11-bit counter for the refresh circuit. The 4 MSBs of
the 11-bit refresh counter circuit are used for address generation. A bit shift
means a multiplication by 2 clocks. If we omit 7 LSB’s then we get a
multiplication of 27 = 128 clocks and if we omit 8 LSB’s then a
multiplication of 28 = 256 clocks , we need the refresh clock to perform 1
refresh operation before 200 clocks and so we use a 7(LSB’s) + 4(MSB’s for
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
23
comparison) = 11 bit counter. Hence, we set the refresh to take place once
after every 128 write clock cycles. In other words after every 128 write clock
cycles we refresh location 1 of the memory irrespective of whether it has
been read or not. The refresh control logic is designed so that when an
‘Empty’ flag is generated, meaning the FIFO is empty the refresh address
pointer is disabled. The inverted empty signal enables the disabling of the
refresh counter when the FIFO is empty.
The refresh control logic also takes care of times when the FIFO could be frill
and no more write operations are possible. When such a condition occurs, the
write pointer points to a particular location and the refresh pointer is pointing
to the same location. The refresh pointer would end up thinking that a write is
going to occur at this location and hence might not refresh the location. This
would result in skipping a refresh operation of one row, which is not what is
wanted. To counter this problem we feed the inverted frill signal to the
comparator block of the refresh control logic. If the FIFO is full the inverted
full is at logic 0 and hence the output of the comparator block is a logic 1
which allows the refresh at that location.
Shown in Fig 9 is the refresh control logic.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
24
Full
Write Clock
bit bus
b10
refresh
4 bit bus
Decoder Output
Write Clock
Comparator
4 bit
Counter
Enable
11 bit
Counter
Figure 9: Refresh Control Logic.
4 bits from the Refresh Pointer 4 bits from the Write Pointer
Figure 10: Logic inside the comparator block in the Refresh Control Logic
Block
An 11-bit counter is used and the 4 MSBs are used because the b2, b3, b4, b5
transition from 0000 to 0001 and from 0001 to 0010 occurs only after every
128 write clocks. If the comparison of these 4 bits gives a 1 implying that
both the addresses are the same then we do not refresh. If the comparator
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
25
output is a 0 implying that the write pointer and refresh counter are not at the
same location then the decoder output which is the address of the refresh
word line goes through and hence the cell is refreshed. For generating the
address for refresh, we have to use a decoder. If we use a ring pointer circuit
to generate the refresh addresses, then at every clock a refresh word line will
be turned ON, which is not desirable. The decoder is fed the 4 MSB of the
11 bit counter and hence the address of the refresh word line is generated.
The 3-port DRAM cell though larger than a conventional 2-port DRAM [6]
cell by 50%, is much faster than the conventional 2-port DRAM. In addition,
the refresh circuitry used in the 3-port DRAM is much simpler than the
complex refresh- controller circuitry used in the 2-port DRAM. So overall the
3-port memory block and refresh circuitry still turns out to be smaller in area
than a conventional 2-port DRAM memory block and its refresh controller,
as has been shown in Graph 8 in Chapter 7. The 3-port DRAM saves in
overall area and gains in speed over conventional DRAMs. Though it does
not reach the speeds of conventional 2-port SRAMs it can be of use where
faster memories are needed with a small area constraint. The comparisons for
area and read cycle time for the 3-port DRAM and the 2-port SRAM are
shown in Graph 9 in Chapter 7.
2 The area numbers used for comparison, have been referred to from the work done on Area
comparisons in FIFO queues by Praveen Krishnanunni [11].
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
26
4.3 Sense Amplifier:
The sense amplifier senses the data value stored in the cell during a read
operation. The sense amplifier detects the difference between the Bit and the
Bit bar line and amplifies it to give the exact value of data stored in the
Memory Cell [10].
A Latch-based sense amplifier [2], in which the input and output nodes are
not isolated, is used for the SRAM. The open bit-line architecture or the
single to differential conversion type of sensing is used for the DRAM
memory cell.
u
i Precharge
Bit
6T SRAM CELL
1*
Bit bar
read enable
M5
M6
r~L
MO Ml
M2
M3
M4
Figure 11: SRAM Sense Amplifier Circuitry.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
27
Every column for the SRAM has its own sense amplifier. In case the memory
were split into banks, to get a complete word read out from the memory we
would need to read a sub-word and then would need to multiplex the output
columns. We have avoided multiplexing of output columns. Multiplexing of
output columns would have resulted in an increase in the number of
transistors, as multiplexing requires additional circuitry. As column
multiplexers are not used, the increase in capacitance at the output node is
eliminated. Hence, delay is not adversely affected. The operation of the Sense
amplifier circuitry is as follows [2]: The circuitry for a latch based sense
amp is shown in Figure 11. Initially the read enable signal is low and both
the PMOS transistors (M5 and M6) connecting the bit and the bit bar lines
are ON. When data has to be read out then the read enable signal is
generated which turns on NMOS (M4) and the cross-coupled inverter pair
starts its operation. The read-enable signal generation takes place after either
of the bit lines has discharged depending on the value stored in the memory
cell. If a logic value of 1 was stored in the cell then bit bar would discharge
and vice-versa. During the Read operation both the bit-lines are pre-charged
to V dd and then the word-line is turned ON. Depending on the value stored in
the cell, one of the bit lines discharges through the pass transistors. Due to the
slight voltage difference that arises between the bit and bit bar lines, NMOS
M3 is more turned ON than NMOS M2 and hence the voltage at drain of
transistor M3 is pulled down much faster than the voltage at drain of
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
28
transistor M2 that is the output of the Sense amplifier circuit. This faster
pulling down on one side of the cross coupled inverter circuitry causes
feedback and the initial voltage difference is exponentially increased till one
of the NMOS transistors is turned off and the output port goes high or
basically shows the value which was stored in the memory cell.
4.3.1 Open bit-line Sensing in DRAM H31:
Open bit-line sensing is used in DRAMs because we have to sense the value
stored in the Memory cell based on one bit-line. Depending on the value
stored in the DRAM cell, the bit-line toggles and this variation in signal on
the bit-line has to be sensed [13]. The open bit-line architecture is as follows:
the memory array is divided into two halves, with the differential amplifier
placed in the middle. On each side, dummy cells are added. These dummy
cells are very similar to the DRAM cells; the only difference is that they are
not meant to store values. These dummy cells are just used for reference
whereas the dummy cell on one side of differential amplifier is turned on
when the memory cell on the other side of the differential amplifier is turned
on.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
29
Dummy cell
“ J J
A
Dummy wordline
C/2
wordline L
M2
BLR Dummy cell
TT U
Dummy wordline
Figure 12: Open bit-line architecture with dummy cells
When the EQ signal is raised, bit lines BLL and BLR on either side of the
differential amplifier are pre-charged to Vdd/2. Enabling Word lines L and L’
at the same time ensures that the dummy cells are charged to Vdd/2. One of
the word lines is enabled during the read cycle. Assuming that a cell in the
left half of the array is selected by raising the word line this causes a change
on the bit line on the side of this cell. Hence, the appropriate voltage
reference is created by simultaneously raising the word line for the dummy
cell on the other side.
Under the assumption that the left and the right memory sides are perfectly
matched the resulting voltage on the BLR resides between logic low and high
and causes the sense latch to toggle. Dividing bit-line into two halves
effectively reduces the bit-line capacitance. This kind of arrangement allows
for a refresh during the read cycle itself. Once the bit line on either side
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
30
stabilizes or reaches a certain value and the word line is, high the refresh
action takes place automatically, as the value on the bit-line is written back
into the capacitor [13].
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 5: Address Generation Block
Two kinds of address generation circuits were studied to find out which
one is better than the other in terms of speed:
i) Counter and the Address decoder circuitry
ii) Ring-pointer circuit for address generation since this is a non-random
access memory.
Praveen Krishnanunni [11] in his thesis on Area comparisons of Network on
Chip FIFO Queues discusses the sizing of these address generation circuits.
The address generation circuits used in this thesis have been referenced from
the thesis[l I].
—X 2 — X 4
' E F G H
Figure 13: Pre-decoder and decoder chain to drive the word lines for a 1024
location memory.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
32
The addresses generated have to enable the word lines for each location and
hence a decoder circuitry is used. The address decoder circuitry is designed
and discussed in [11] and is shown in Figure 13.
In the FIFO, we need two ring-pointer circuits, one on the reader side and one
on the writer side to point to the locations from where data is to be read or
written. The basic diagram of a 2-bit Ring pointer circuitry used to test the
working of the fifo is shown in Figure 14 below [16].
Clock
*
Figure 14: 2 bit Ring Pointer circuit.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
33
Chapter 6: Flag Generation Block
Flag generation is a very important aspect of the FIFO, the flags generated by
this block help the FIFO know as to if there is more storage space in the
memory array or if the memory array is full. The flag generation scheme
with the logic has been shown in Figure 15 [11] and Figure 16 [11]. The flag
generator block generates two flags:
1) An empty flag indicating that no more read operations can be
performed.
2) A full flag indicating that no more write operations can be performed.
The flag generation block was developed in collaboration with Praveen
Krishnanunni [11]. Details and explanation of the block and the logic have
been explained thoroughly in his Master’s thesis on “Area Comparisons of
Network on Chip FIFO Queues [11].”
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
34
Write RS
domain
— ►
F/
pointer F
1
W cLK
..P
aWSg a RSg
RS
F/
F
2
W
s
F/
F
o
W Read
S
«—
domain
F/ pointer
F
1
Rclk
WP
R c l k
R sg
WcLK
WS G
RP
D rd
FULL
flag logic
EMPTY
flag logic
Subtra
ctorl
Almost
Full
Logic
Almost
Empty
Logic
Binary to
Gray
code
Converte
Binary to
Gray
code
Converte
Subtractor2
FULL FLAG EMPTY FLAG
Figure 15: Flag generator block
WS F/F: Write domain sampling flip flop
RS F/F: Read domain sampling flip flop
W S g : Write pointer sampled in gray code
R S g : Read pointer sampled in gray code
WP : Write Pointer
RP : Read Pointer
Dw d : Difference in write domain
R w d : Difference in read domain
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4 bit Subtactor
output in Write
Domain
2 MSB bits of
Subtactor output
in Write Domain
Full
Wclk
2 MSB bits of
Subtactor output
in Read Domain
AE
5 >
Empty
Rclk
4 bit Sublactor
output in Read
Domain
Figure 16: Flag Generation Logic
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
36
Chapter 7: Analysis and Results
The generation of the control signals is done in the following way.
7.1 SRAM Read Cycle time:
The read clock is shown below:
r — *T \ H
T1 T2 T3
Figure 17: SRAM Read Clock
During the read clock the following things have to take place
1) Counter + Decoder or Ring pointer have to generate the address. T1 is the
time taken by either the Counter and Decoder circuit or the Ring Pointer
circuit to generate the address.
2) The word line must be ON until one of the bit lines discharges to Vdd-
150mv. T2 is the time taken by one of the bit lines to discharge to Vdd-
150mv.
3) The sense amplifier has to sense the dip in the voltage levels and output the
correct value stored in the memory cell. T3 is the time taken by the sense
amplifier to display the output.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
37
3) The sense amplifier has to sense the dip in the voltage levels and output the
correct value stored in the memory cell. T3 is the time taken by the sense
amplifier to display the output.
We simulated the memories to determine how the read clock time varies with
respect to the change in addressing mode, process variation, number of
locations in the memory buffer and variation in voltage. The simulation
results are tabulated below
The results are for the SRAM memory core with Counter and Decoder type
of addressing. The following tables shows the read cycle time for a 16, 32,64
and 128 locations FIFO for 0.18p, 0.15p, 0.13p and 0.09p technology.
(U)
Locations
Bitline
Capacitance
(fF)
Decoder+
Counter
delay(ns)
Sense
Amp(ns)
Discharge
to Vdd-
150mv(ns)
Read
cycle time
(ns)
0.18 16 43.2fF 2.3 2.6 0.06 4.96
32 86.4fF 2.3 2.6 0.1 5
64 172.8fF 2.3 2.6 0.17 5.07
128 345.6fF 2.3 2.6 0.32 5.22
Table 1: Read Cycle time for Counter and Decoder type of addressing for
0.18p.
( V O
Locations
Bitline
Capacitance
(fF)
Decoder+
Counter
delay
(ns)
Sense
Amp(ns)
Discharge
to Vdd-
150mv(ns)
Read
cycle
time(ns)
0.15 16 36 2.3 2.6 0.05 4.95
32 72 2.3 2.6 0.09 4.99
64 144 2.3 2.6 0.12 5.02
128 288 2.3 2.6 0.3 5.2
Table 2: Read Cycle time for Counter and Decoder type of addressing for
0.15p.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
38
(u)
Locations Bit line
Capacitanc
e (fF)
Decoder
+
Counter
delay(ns)
Sense
Amp(n
s)
Discharge
to Vdd-
150mv(ns)
Read
cycle
time(ns)
0.09 16
21.6
2.3
2.6
0.01 4.91
32
43.2
2.3
2.6
0.06 4.96
64
84.6
2.3
2.6
0.1 5
128
172.8
2.3
2.6
0.17 5.07
Table 3: Read Cycle time for Counter and Decoder type of addressing for
0.13p.
(V O
Location
s
Bit line
Capacitan
ce (fF)
Decoder
+
Counter
delay(ns)
Sense
Amp(n
s)
Discharge
to Vdd-
150mv(ns)
Read
cycle
time(ns)
0.13 16 31.2 2.3 2.6 0.04 4.94
32 62.4 2.3 2.6 0.07 4.97
64 124.8 2.3 2.6 0.11 5.01
128 249.6 2.3 2.6 0.24 5.14
Table 4: Read Cycle time for Counter and Decoder type of addressing for
0.09p.
From the numbers obtained from simulation and from the plot below we can
observe that as we are moving towards smaller and smaller processes, the
delay is reduced. This is because the bit line capacitance decreases and it
takes less time for the bit line to discharge from V dd to Vdd- 150mv, which is
when the sense amplifier can detect the value.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
39
Delay Variations as Process changes
08
— ♦— Delay Variations as
Process changes
2.8 2.9 3 3.1
Delay (ns)
Graph 1: Delay variations with process changes.
The following graph gives a plot of how the delay is affected by the number
of locations in the memory buffer. The increase in the number of locations
means longer bit lines, which increases the bit line capacitance hence delay
increases.
M
(ft
O
O
0.05
n
0.15
0.1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
40
No of Locations vs Read Cycle time with Decoder and Counter type of
Addressing
4.9 5 5.1 5.2
Read Cycle time(ns)
5.3
No of Locations vs
Read Cycle time with
Decoder and Counter
type of Addressing
Graph 2: Variations in Read Cycle times for different number of locations
with decoder and counter type of addressing method.
I then looked at the Ring Pointer method of addressing which is a faster way
of addressing than the counter and decoder method because a ring pointer
shifts the bit along the chain and every clock one of the output lines goes
high. This addressing method works, since FIFO is a non-random access type
of memory.
Shown below are the simulation Read cycle times for the ring-pointer type of
addressing method. The read cycle times are much faster, which can be
understood from the above explanation and can be verified from the
tabulations below.
The results are for the SRAM memory core with Counter and Decoder type
of addressing.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
41
Table 5 shows the read cycle time for 16, 32, 64 and 128 locations FIFOs for
0.18p, 0.15p, 0.13p and 0.09p technology.
(vO
Locations
Bitline
Capacitance
m
Sense
Amp(ns)
Discharge to
Vdd-
150mv(ns)
Ring
pointer
delay(ns)
Read
Cycle
time(ns)
0.18 16 43.2 2.6 0.06 0.18 2.84
32 86.4 2.6 0.1 0.18 2.88
64 172.8 2.6 0.17 0.18 2.95
128 345.6 2.6 0.32 0.18 3.1
Table 5: Read Cycle time for ring pointer type of addressing for 0.18p.
Table 6 shows the read cycle times with the ring pointer type of addressing
mode for 0.15 p. technology.
(Vi)
Locations
Bitline
Capacitance
(fF)
Sense
Amp(ns)
Discharge to
Vdd-
150mv(ns)
Ring
pointer
delay(ns)
Read
Cycle
time(ns)
0.15 16 36 2.6 0.05 0.18 2.83
32 72 2.6 0.09 0.18 2.87
64 144 2.6 0.12 0.18 2.9
128 288 2.6 0.3 0.18 3.08
Table 6: Read Cycle time for ring pointer type of addressing for 0.15 p .
Table 7 shows the read cycle times with the ring pointer type of addressing
mode for 0.13p technology.
(Vi)
Locations
Bitline
Capacitance
(fF)
Sense
Amp(ns)
Discharge to
Vdd-
150mv(ns)
Ring
pointer
delay(ns)
Read
Cycle
time(ns)
0.13 16 31.2 2.6 0.04 0.18 2.82
32 62.4 2.6 0.07 0.18 2.85
64 124.8 2.6 0.11 0.18 2.89
128 249.6 2.6 0.24 0.18 3.02
Table 7: Read Cycle time for ring pointer type of addressing for 0.13p.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
42
Following table shows the read cycle times with the ring pointer type of
addressing mode for 0.09p technology.
(u)
Locations
Bitline
Capacitance
(fF)
Sense
Amp(ns)
Discharge
to Vdd-
150mv(ns)
Ring
pointer
delay(ns)
Read
Cycle
time(ns)
0.09 16 21.6 2.6 0.01 0.18 2.79
32 43.2 2.6 0.06 0.18 2.84
64 84.6 2.6 0.1 0.18 2.88
128 172.8 2.6 0.17 0.18 2.95
Table 8: Read Cycle time for ring pointer type of addressing for 0.09p.
The ring pointer delay is small and the shifting of the bits takes place fast.
The small read pointer delay contributes to a smaller read cycle time and
hence faster accessing. From the above results, it can be concluded that for
FIFO type of memories, using the ring pointer type addressing leads to faster
accesses and hence is recommended over decoder type addressing mode.
The read cycle times with ring pointer type of addressing are much faster
than the cycle times for counter and decoder kind of addressing for all FIFO
sizes as can be seen from Graph 3.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
43
Delay comparison of Counter and Decoder to
Ring pointer based addressing method
Read clock cycle
time for Counter and
Decoder type
addressing mode
Read clock cycle
time for Ring pointer
type addressing
mode
Graph 3: Delay comparison graph for different addressing methods.
Next, we looked at the effect the power supply has on the cycle times. We
changed Vdd from 2.5V to 2.25V and observed that for a 10% decrease in
voltage supply the delay increases by approximately 8%. Approximate power
consumed by the circuit was noted for the two different values of voltage
supply.
Table 9 shows the read cycle times for a 0.18u technology memory buffer
with 2.25V power supply. The delay increases from the 2.5V power supply
because of longer time taken by circuits to produce output.
(V O
Locations
Bitline
Capacitance
fF
Decoder +
Counter
delay (ns)
Sense
Amp(ns)
Discharge
to Vdd-
150mv(ns)
Read
cycle
time(ns)
0.18u 16 43.2 2.45 2.8 0.04 5.29
32 86.4 2.45 2.8 0.08 5.33
64 172.8 2.45 2.8 0.2 5.45
128 345.6 2.45 2.8 0.3 5.55
Table 9: Read Cycle time for counter and decoder type of addressing for
0.18p with reduced power supply.
140
m 120
o 100
5.2, 128
Delay (ns)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
44
7.2 SRAM Write Cycle time:
The write clock is shown below:
4 ----- ► « ------------------------------ W
T1 T2
Figure 18: SRAM Write Clock.
The write clock should be on for enough time so that
1) The addressing circuit generates and points to the location where the data
is to be written. T1 is the time taken by either the Counter and Decoder
circuit or the Ring Pointer circuit to generate the address and for the bit line
to be pre-charged to Vdd.
2) One of the bit lines is to be pulled from Vdd to 0 depending of what value is
being written in the cell. T2 is the time taken to pull down the bit line from
Vdd to 0.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
45
The write cycle times for the counter and decoder addressing mode for a 16,
32, 64 and 128 locations FIFO for 0.18p, 0.15p, 0.13p and 0.09p technology
are shown in Table 10.
It can be noticed that the write cycle time increases faster as the number of
locations increases because the bit line capacitance increases and charging
the big bit line capacitor to V dd takes more time as the number of locations
increases. The pull down circuitry can be made stronger to reduc’ge the time
but then it affects the area.
(fl)
Locations
Bitline
Capacitance
(fF)
Decoder +
Counter(ns)
0 to Vdd
charging(ns)
write cycle
time(ns)
0.18 16 43.2fF 2.3 0.48 2.78
32 86.4fF 2.3 1.14 3.44
64 172.8fF 2.3 2.28 4.58
128 345.6fF 2.3 3.18 5.48
Table 10: Write Cycle time for Counter and Decoder type of addressing for
0 . 1 8 p .
The steady increase in write cycle time as number of locations increases is
shown in the Graph 4 below.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
46
No of Locations vs Write cycle time with Decoder type
addressing
5.48
0 100
1 80
o en
* * —
o
o
z
5 6 0 1 2 3 4
Write Cycle time(ns)
Graph 4: Variations in Write Cycle times for different number of
locations with decoder and counter type of addressing method.
(V O
Locations
Bitline
Capacitance
(fF)
Decoder +
Counter(ns)
Vdd to 0
discharging(ns)
write
cycle
time
(ns)
0.15 16 36 2.3 0.36 2.66
32 72 2.3 0.7 3
64 144 2.3 1.35 3.65
128 288 2.3 2.66 4.96
Table 11: Write Cycle time for Counter and Decoder type of addressing
for 0.15p.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
47
(u)
Locations
Bitline
Capacitance
(fF)
Decoder +
Counter(ns)
Vdd to 0
discharging(ns)
write
cycle
time(ns)
0.13 16 31.2fF 2.3 0.22 2.52
32 62.4fF 2.3 0.5 2.8
64 124.8fF 2.3 1.17 3.47
128 249.6fF 2.3 2.31 4.61
Table 12: Write Cycle time for Counter and Decoder type of addressing
for 0.13p.
(U)
Locations
Bitline
Capacitance
(fF)
Decoder +
Counter(ns)
Vdd to 0
discharging(ns)
write
cycle
time(ns)
0.09 16 21.6fF 2.3 0.2 2.5
32 43.2fF 2.3 0.48 2.78
64 84.6fF 2.3 1.14 3.44
128 172.8fF 2.3 2.28 4.58
Table 13: Write Cycle time for Counter and Decoder type of addressing
for 0.09p.
The write cycle times for 16, 32, 64 and 128 location FIFOs for 0.18p,
0.15p, 0.13p and 0.09p technology for the ring pointer addressing mode
are shown in the following tables.
( V O
Locations
Bitline
Capacitance
(fF)
Ring counter
delay(ns)
Vdd to 0
discharging(ns)
Write
cycle
time(ns)
0.18 16 43.2fF 0.18 0.48 0.66
32 86.4fF 0.18 1.14 1.32
64 172.8fF 0.18 2.28 2.46
128 345.6fF 0.18 3.18 3.36
Table 14: Write Cycle time for ring pointer type of addressing for
0.18p.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
48
(p)
Locations
Bitline
Capacitance
(fF)
Ring counter
delay(ns)
Vdd to 0
discharging(ns)
Write
cycle
time(ns)
0.15 16 36 0.18 0.36 0.54
32 72 0.18 0.7 0.88
64 144 0.18 1.35 1.53
128 288 0.18 2.66 2.84
Table 15: Write Cycle time for ring pointer type of addressing for
0.15p.
(p)
Locations
Bitline
Capacitance
(fF)
Ring counter
delay(ns)
Vdd to 0
discharging(ns)
Write
cycle
time(ns)
0.13 16 31.2fF 0.18 0.22 0.4
32 62.4fF 0.18 0.5 0.68
64 124.8fF 0.18 1.17 1.35
128 249.6fF 0.18 2.31 2.49
Table 16: Write Cycle time for ring pointer type of addressing for
0.13p.
(P) Locations
Bitline
Capacitance
(fF)
Ring counter
delay(ns)
Vdd to 0
discharging(ns)
Write
cycle
time(ns)
0.09 16 21.6fF 0.18 0.2 0.38
32 43.2fF 0.18 0.48 0.66
64 84.6fF 0.18 1.14 1.32
128 172.8fF 0.18 2.28 2.46
Table 17: Write Cycle time for ring pointer type of addressing for
0.09p.
A graph of the variations in write cycle time with process changes is shown
in Graph 5.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
49
Variations in Write cycle time with process changes
0.2
3.36, 0.18
0.15
3
1 0
(0
0 )
o
o
CL
0.05
0 0.5 1 1.5 2 2.5 3 3.5 4
Write Cycle time(ns)
Graph 5: Variations in Write Cycle times with process changes.
The variations in read/write cycle times with process change are shown in
Graph 6. The read cycle time varies steadily with process change even though
the bit line capacitance changes significantly. The read cycle time consists of
the time to discharge a bit line by 150mV. The write cycle times show bigger
changes with change in process as the bit line capacitance is increasing
implies that longer time is taken to pre-charge it to Vdd.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
50
Variations in Read/Write cycle times with process change
0.2
0.18
0.16
0.14
I 0-12
| 0.1
I 0.08
0.06
0.04
0.02
0
0 1 2 3 4
Delay (ns)
Write clock cycle
time(ns)
Read clock cycle
time(ns)
Graph 6: Variations in Read/Write Cycle times with process changes.
7.3 DRAM read cycle time:
The read cycle times for a DRAM with a ring-pointer method of addressing
are given. With the DRAM type of memory module the ring-pointer
addressing mode is used because this type of memory is much slower as
compared to the SRAM and hence the faster method of addressing has to be
used to meet access speed requirements. The read cycle times for 16, 32, 64
and 128 location FIFOs for 0.18p, 0.15p, 0.13p and 0.09p technology for the
ring-pointer addressing mode are shown in the following tables
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
51
(p)
Locations
Bitline
Capacitance fF
Ring
Pointer(ns)
Sense amp +
Refresh(ns)
Read
cycle
time(ns)
0.18 16 43.2 0.18 4.89 5.07
32 86.4 0.18 4.94 5.12
64 172.8 0.18 4.95 5.13
128 345.6 0.18 5 5.18
Table 18: Read Cycle time for ring pointer type of addressing for 0.18 p .
(p) Locations
Bitline
Capacitance fF Ring Pointer(ns)
Sense amp +
Refresh(ns)
Read
cycle
time(ns)
0.15 16 36 0.18 4.88 5.06
32 72 0.18 4.91 5.09
64 144 0.18 4.93 5.11
128 288 0.18 4.95 5.13
Table 19: Read Cycle time for ring pointer type of addressing for 0.15p.
(P)
Locations
Bitline
Capacitance fF
Ring
Pointer(ns)
Sense amp +
Refresh(ns)
Read
cycle
time(ns)
0.13 16 31.2 0.18 4.85 5.03
32 62.4 0.18 4.88 5.07
64 124.8 0.18 4.90 5.10
128 249.6 0.18 4.93 5.13
Table 20: Read Cycle time for ring pointer type of addressing for 0.13 p .
(P) Locations
Bitline
Capacitance fF
Ring
Pointer(ns)
Sense amp +
Refresh(ns)
Read
cycle
time(ns)
0.09 16 21.6 0.18 4.80 4.98
32 43.2 0.18 4.89 5.06
64 84.6 0.18 4.92 5.08
128 172.8 0.18 4.95 5.11
Table 21: Read Cycle time for ring pointer type of addressing for 0.09p.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
52
The variations in read cycle time for the SRAM and DRAM type of memory
modules are compared in Graph 7. The SRAM is obviously faster than the
DRAM and hence is preferred for use in FIFO memory buffers.
Variations in Read Cycle time for Different type
of Memory
140
120
( O
5 100
3.08
SRAM
DRAM
► 2.87
Delay (ns)
Graph 7: Variations in Read Cycle times for different types of Memory.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
53
Area and delay comparisons between a 64-
location 256-bit packet size 3-port DRAM and 2-
port DRAM
( 0
<
< 0
a >
30000000
25000000
20000000
15000000
10000000
5000000
0
3.11,
* 8.63,
20849058.55
25265818
♦ 3-port DRAM
2-port DRAM
4 6
Delay (ns)
10
Graph 8: Comparison of Read Cycle times and area for a 2-port DRAM with
the 3-port DRAM
Area and Delay Com parisons of a 64 location
1024 bit packet 3-Port DRAM with 2-Port SRAM
140000000
120000000
— 100000000
y
• <
80000000
• — 2-Port SRAM
»- 3-Port DRAM
ra
£
60000000
< 40000000
20000000
20849058.55
0 1 2 3 4 5 6
Read Cycle time (ns)
Graph 9: Comparison of Read Cycle times and area for a 3-port DRAM with
the 2-port SRAM
Area numbers in the above graphs are referenced from Praveen Krishnanunni’s thesis [11].
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
54
7.4 Dram Write Cycle time:
The write cycle times for a DRAM with the ring-pointer method of
addressing are given. With the DRAM type of memory module the ring-
pointer based addressing mode is used because this type of memory is much
slower compared to the SRAM and hence the faster method of addressing has
to be used to compensate for the slower memory. The write cycle time
involves the addressing delay, the time take to charge a bit line to V dd and the
time taken to store the value in the storage capacitance.
u
Locations
Bitline
capacitances(fF)
Ring
Pointer
(ns)
0 to
charging(ns)
Time to
store
value in
DRAM
ceii(ns)
Write
cycle
time(ns)
0.18 16 31.68 0.18 0.4 2 2.58
32 63.36 0.18 0.7 2 2.88
64 126.72 0.18 1.25 2 3.43
128 253.44 0.18 2.8 2 4.98
Table 22: Write Cycle time for ring pointer type of addressing for 0.18 p .
U
Locations
Bitline
capacitances(fF)
Ring
Pointer
(ns)
0 to Vdd
charging(ns)
Time to
store
value in
DRAM
cell(ns)
Write
cycle
time(ns)
0.15 16 26.4 0.18 0.35 2 2.53
32 52.8 0.18 0.6 2 2.78
64 105.6 0.18 1.1 2 3.28
128 211.2 0.18 2.4 2 4.58
Table 23: Write Cycle time for ring pointer type of addressing for 0.15 p .
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
55
u
Locations
Bitline
capacitances(fF)
Ring
Pointer
(ns)
0 to Vdd
charging(ns)
Time to
store
value in
DRAM
cell(ns)
Write
cycle
time(ns)
0.13 16 22.9 0.18 0.3 2 2.48
32 45.76 0.18 0.5 2 2.68
64 91.52 0.18 0.9 2 3.08
128 183.04 0.18 2.2 2 4.38
Table 24: Write Cycle time for ring pointer type of addressing for 0.13 p .
U
Locations
Bitline
capacitances(fF)
Ring
Pointer
(ns)
0 to Vdd
charging(ns)
Time to
store
value in
DRAM
cell(ns)
Write
cycle
time(ns)
0.09 16 15.84 0.18 0.2 2 2.38
32 31.68 0.18 0.4 2 2.58
64 63.36 0.18 0.7 2 2.88
128 126.72 0.18 1.25 2 3.43
Table 25: Write Cycle time for ring pointer type of addressing for 0.09p.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
56
Variations in Write Cycle time for Different
types of Memory
CO
c
o
s
re
o
o
< * -
o
o
140
120
100
80
60
40
X'2.78
20
0
0 1 2 3 4 5
SRAM
DRAM
Delay (ns)
Graph 10: Variations in Write Cycle times for different types of Memory.
The write cycle times of the DRAM is obviously much more than that of the
SRAM because of the time taken to store the value in the storage capacitance
which is the overhead as compared to the SRAM where the value is stored by
means of the cross coupled inverter action.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
57
Chapter 8: Conclusion and Future Work
We have made an effort to look at the performance issues of the different
memory cores used in a FIFO queue in this thesis. This study and the results
obtained are a part of the research for the Netchip project. We have looked at
the Dual port SRAM memory and proposed a 3-port DRAM cell for faster
refresh rates. The 3-port DRAM cell has a dedicated port for each reading,
writing and refreshing. The 3-port DRAM cell improves the performance of
the DRAM cell and allows for faster access times it still lags behind the
SRAM memory in terms of performance as has been shown by Graph 9. The
SRAM memory core is around 2 times faster than the DRAM memory core
in terms of access times. This allows us to think that SRAM cores will always
be used for high performance applications but cost and the recent
development of soft errors in SRAM cores are supporting a switch to DRAM
memory cores. The future is heading towards smaller processes, which imply
shrinking geometries, and the smaller SRAM cells are susceptible to soft
errors. Soft errors occur when charged particles penetrate a memory cell and
cross a junction, creating an aberrant charge that changes the state of the bit.
The most common sources of soft errors are alpha particles emitted by
contaminants in memory chip packages or cosmic rays penetrating the earth's
atmosphere [3]. For a soft error to occur the collected charge at that node
should be more than the critical charge, that is, the charge to turn on the
device at that node. As CMOS device sizes are shrinking due to smaller
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
58
processes the charge stored at each node is decreasing. Also the smaller
processes lead to smaller area of nodes and hence less chance of being hit by
a particle. Per bit, the error decreases but then we are packing a large number
of bits per chip at a time these days and hence the overall system is more
prone to being hit by a particle and causing an error. The particles that cause
these soft errors are Alpha particles emitted by decaying radioactive
impurities in packaging and interconnect materials, Cosmic Ray induced
neutrons cause errors because of the charge induced due to silicon recoil and
neutron induced 10B fission that releases an Alpha particle and a lithium ion
[5]. These soft errors can cause the data value to change in the memory,
leading to wrong data being routed in the network and creating errors. Hence,
a look at how these soft errors can be reduced is an area of current research.
For applications where the device geometries are not of much importance
SRAM memory cores are still in use. For applications where the device
geometries matter, designers are looking at a switch. Mosys inc. for example
is working with a IT SRAM cell with multiple bank DRAM cells with
comparable SRAM speeds [3].
This thesis examines the performance numbers of the dual port SRAM core
and 3-port DRAM core memories for a 16 * 1024 bit experimental FIFO
queue with a target access time of 5ns. The report goes over the basic circuits
used in Memory queue designs and should be a good reference for study and
design of memory circuits for FIFO buffers.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Bibliography
59
[1] Bharadwaj S. Amrutur and Mark A. Horowitz “Speed and Power Scaling
of SRAM’s.” IEEE Transactions on Solid State Circuits, Vol.35, No.2 ,2000
[2] Bharadwaj S. Amrutur “Design and Analysis of Fast low power
SRAMs.” Thesis, Department of Electrical Engineering, Stanford University
, Aug 1999
[3] Cataldo Anthony “SRAM soft errors cause hard network problems.” EE
Times August 17,2001
[4] Clifford E. Cummings “Simulation and Synthesis Techniques for
Asynchronous FIFO Design”. Sunburst Design, Inc.
[5] Degaiahal Vijay “Soft-errors: Problem & Solutions.” CSE 477, Spring 04
- Adaptedfrom CSE 598C-Fa ’ 03 Page 2.
[6] Ekanayake Virantha and Ranjit Manohar “Asynchronous DRAM Design
and Synthesis”. Ninth International Symposium on Asynchronous Circuits
and Systemns, 12-15 May 2003, Pages: 174-183.
[7] Tegze P. Haraszti “CMOS Memory Circuits.” Kluwer Academic
Publishers.
[8] Masashi Hashimoto, Masayoshi Nomura, Kenji Sasaki etc “A 20-ns
256K * 4 FIFO Memory.” IEEE J. of Solid State Circuits, Vol. 23, No. 2,
pp490-499,1988.
[9] Sambuddhi Hettiaratchi, Peter Y.K. Cheung, Thomas J.W.Clarke
“Performance- Area Trade-off of Address Generators for Address Decoders-
Decoupled Memory.”
[10] Kang, Leblebici “CMOS Digital Integrated Circuits.” McGrow H ill,
Internation edition 1999.
[11] Praveen Krishnanunni “ Area Comparisons in Network on Chip FIFO
Queues” Master’ s Thesis, Department of Electrical engineering. University
of Southern California, May 2004.
[12] Amit Mehrotra “Hybrid Josephson CMOS FIFO,” Master’ s thesis,
University of California Berkeley
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
60
[13] Jan M. Rabaey “Digital Integrated Circuits A Design Perspective.”
Prince Hall Upper Saddle River, New Jersey, pp350, 1996
[14] Takayasu Sakurai, Kazutaka Nogami, Kashuhiro Sawada and Tetsuya
Ilzuka
“Transparent Refresh DRAM (TReD) Using Dual Port DRAM cell”.
Proceedings of the IEEE 1988 Custom Integrated Circuits Conference
(Cat. No.88CH2584-l)
[15] Sutherland, Sproull, and Harris “Logical Effort”, Kaufman Publishers,
1999.
[16] Haibo Wang and Sarma B. K. Vrudhula “A Low voltage, Low Power
Ring Pointer for FIFO Memory Design.” IEEE International
Symposium on Circuits and Systems, 1998
[17] Shumao Xie, Vijaykrishnan Narayanan and M.J. Irwin “A High
Performance Low Power FIFO Memory.” Project Report, Department of
Computer Science and Engineering, Penn State University
[18] Yantchev J.T, Huang C.G, Josephs M.B and Nedelchev I.M. “Low-
latency asynchronous FIFO buffers” Asynchronous Design Methodologies,
1995. Proceedings Second Working Conference on, 30-31 May 1995
Pages: 24-31.
[19] Micron Technologies Technical Notes “Various Methods of DRAM
refresh.”
[20] Oempcworld.com “2K_vs_4KReffesh.htm.”
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Extending the design space for networks on chip
PDF
Design tradeoffs in a packet-switched network on chip architecture
PDF
On-chip tuning scheme for CMOS RF filters by implicit gain determination
PDF
Performance of CA-MLSE with a predictor
PDF
Analysis of wired short cuts in wireless sensor networks
PDF
Studies on the impact of long-term correlation on computer network performance
PDF
Turbo codec design for low-BER, latency-critical links
PDF
Nanotubes and their applications in nanoelectronic devices
PDF
Area comparisons of FIFO queues using SRAM and DRAM memory cores
PDF
A comparative study of network simulators: NS and OPNET
PDF
Multicast-based micro-mobility protocol: Design and evaluation
PDF
Raga structure: Geometric and generative models
PDF
Boundary estimation and tracking of spatially diffuse phenomena in sensor networks
PDF
A low-cost framework for processor functional verification
PDF
Platform-based design for mobile multimedia applications
PDF
Optimum complementary bipolar process for silicon-germanium HBTs
PDF
Error-tolerance in digital speech recording systems
PDF
Performance analysis of network source coding algorithms
PDF
A technical survey of embedded processors
PDF
Parallel on-chip simultaneous multithreading
Asset Metadata
Creator
Kadkol, Aniket (author)
Core Title
Performance issues in network on chip FIFO queues
Degree
Master of Science
Degree Program
Electrical Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-321718
Unique identifier
UC11336982
Identifier
1424245.pdf (filename),usctheses-c16-321718 (legacy record id)
Legacy Identifier
1424245.pdf
Dmrecord
321718
Document Type
Thesis
Rights
Kadkol, Aniket
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical