0% found this document useful (0 votes)
19 views13 pages

Non Volatile Memories For Removable Media

Este documento presenta las características básicas de la memoria Flash NAND y la arquitectura básica de las tarjetas Flash. Ofrece una perspectiva sobre las oportunidades y los desafíos de los futuros sistemas Flash. Proceedings of the IEEE | Vol. 97, No. 1, January 2009

Uploaded by

Jorge Hernández
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views13 pages

Non Volatile Memories For Removable Media

Este documento presenta las características básicas de la memoria Flash NAND y la arquitectura básica de las tarjetas Flash. Ofrece una perspectiva sobre las oportunidades y los desafíos de los futuros sistemas Flash. Proceedings of the IEEE | Vol. 97, No. 1, January 2009

Uploaded by

Jorge Hernández
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

INVITED

PAPER

Non-Volatile Memories for


Removable Media
NAND flash devices with fast reprogramming, erasing and reading
capabilities offer high capacity portable data storage for computers,
and for applications in portable equipment.
By Rino Micheloni, Senior Member IEEE , Massimiliano Picca, Stefano Amato,
Helmut Schwalm, Michael Scheppler, and Stefano Commodaro

ABSTRACT | NAND Flash memory has become the preferred reprogrammed in either byte/word or pages. Different Flash
nonvolatile choice for portable consumer electronic devices. implementations exist: NOR and NAND are the most
Features such as high density, low cost, and fast write times common, but AND and NROM exist as well [1]. This kind of
make NAND perfectly suited for media applications where large device is used in memory cards and USB Flash drives for
files of sequential data need to be loaded into the memory storage and data transfer between computers and digital
quickly and repeatedly. When compared to a hard disk drive, a systems. Other applications include personal digital assis-
limitation of the Flash memory is the finite number of erase/ tants, notebooks, MP3 players, digital cameras, and cellular
write cycles: most of commercially available NAND products are phones. In the last years it also gained some popularity in the
guaranteed to withstand 105 programming cycles at most. As a game console market. In the future, it is expected that Flash-
consequence, special care (remapping, bad block management based systems such as solid-state disks (SSDs) will increas-
algorithms, etc.) has to be taken when hard-drive based, read/ ingly replace conventional hard disk drives (HDDs).
write intensive applications, such as operating systems, are All types of nonvolatile Flash memory have strong
migrated to Flash-memory based devices. One of the basic re- limitations to perform random access in writing and
quirements of the consumer market for data storage is the reading. The memory system, e.g., a Flash card like an
portability of stored data from one device to the other. Flash SD-card, is a small Bsystem in package[ built around the
cards are the actual solution. A Flash card is a nonvolatile Flash memory and combined with a microcontroller
Bsystem in package[ in which a NAND Flash memory is em- capable of overcoming these limitations, being able to
bedded with a dedicated controller. This paper presents the perform random access in both read and write, featuring a
basic features of the NAND Flash memory and the basic archi- technology-independent interface.
tecture of Flash cards. We provide an outlook on opportunities
and challenges of future Flash systems. 1) Features: The popularity of the Flash memory for
applications such as storage on portable devices is mainly based
on its distinctive characteristic of being nonvolatile (i.e., stored
KEYWORDS | Flash cards; NAND Flash memory; system in
information is retained even when not powered, differently
package
from other kinds of memories, like DRAM). Flash memory
also offers fast access times in read (although not as fast as
I. INTRODUCTION DRAM) and better mechanical shock resistance compared to
hard-disks. The high storage density of a Flash-based system is
Flash memory is a type of nonvolatile memory that can be
typically achieved by means of advanced packaging techniques.
erased in large blocks (erase blocks or sectors) and
Furthermore, the smart memory management carried out by
the microcontroller allows a high endurance and an appealing
Manuscript received March 27, 2008; revised May 22, 2008. Current version published reliability of the resulting system.
February 27, 2009. This work was supported in part by the European Commission
(Project No. FP7-21503-ELITE).
R. Micheloni, M. Picca, S. Amato, and S. Commodaro are with Qimonda Italy Srl, 2) Outlook and Challenges: As manufacturers increase the
20059 Vimercate, Italy (e-mail: rino.micheloni@ieee.org).
H. Schwalm and M. Scheppler are with Qimonda Flash GmbH, 82008 Unterhaching, density of data storage in Flash devices, the size of individual
Germany. memory cells becomes smaller and the number of electrons
Digital Object Identifier: 10.1109/JPROC.2008.2007477 stored in the cell decreases. Moreover, coupling between

148 Proceedings of the IEEE | Vol. 97, No. 1, January 2009 0018-9219/$25.00  2009 IEEE
Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on March 6, 2009 at 12:59 from IEEE Xplore. Restrictions apply.
Micheloni et al.: Non-Volatile Memories for Removable Media

electrons results in a different threshold voltage of the


transistor and, therefore, in a different current sunk by the
cell under fixed biasing conditions. MLC and SLC concept
will be discussed in Section IV-B.
There are two dominant kinds of Flash memories:
NAND and NOR. The name is related to the topology used
for the array of cells. NAND Flash memory is the core of
the removable USB interface storage devices known as
USB drives, as well as most memory card formats available
Fig. 1. Schematic representation of a floating-gate memory cell. today on the market.
NAND Flash uses Fowler–Nordheim (FN) tunnel
injection for writing and Fowler–Nordheim tunnel release
adjacent cells and quantum effects can change the write for erasing.
characteristics of cells, making it more difficult to design Due to their construction principles, NAND Flash mem-
devices able to guarantee reasonable data integrity. There- ories cannot provide Bexecute in place.[ These memories are
fore a countermeasure like, for example, improved error in fact accessed like hard disks, and therefore are very suitable
correction code (ECC) is applied. for use in mass-storage devices such as memory cards.
While programming is performed on a page basis, erase
can only be performed on a block basis (i.e., a group of
II . FLAS H S TORAGE MEDIA pages). Pages are typically 2048 or 4096 bytes in size.
ARCHITECTURE Associated with each page there are a few bytes that can be
used for storage of error detection and correction checksum
A. Flash Die as well as for administration as requested by the external
Flash memory contains an array of floating-gate tran- microcontroller. Fig. 2 shows a schematic representation of
sistors: each of them is acting as memory cell. In single- the memory organization of a NAND Flash memory.
level cell (SLC) devices, each memory cell stores one bit of Another limitation is the finite number of write-erase
information; for multilevel cell (MLC) devices, more than cycles (manufacturers usually guarantee 100 000 write-
one bit per cell can be stored [2]. Fig. 1 shows a schematic erase cycles for SLC NAND). Furthermore, in order to
cross-section of a floating-gate memory cell. The floating increase manufacturing yield and, therefore, reduce
gate is used to store electrons: changing the number of NAND memory cost, NAND devices are shipped from

Fig. 2. Schematic representation of the memory organization of a NAND Flash memory.

Vol. 97, No. 1, January 2009 | Proceedings of the IEEE 149


Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on March 6, 2009 at 12:59 from IEEE Xplore. Restrictions apply.
Micheloni et al.: Non-Volatile Memories for Removable Media

Table 1 NOR and NAND Performances

the factory with some bad blocks (i.e., blocks where some As shown in Fig. 4, using multiple memory compo-
locations do not guarantee the standard level of reliability nents is an efficient way to improve data throughput while
when used), which are identified and marked according to having the same page programming time.
a specified bad-block marking strategy. The memory controller is responsible for scheduling
The first physical block of a NAND memory (block 0) is the distributed accesses at the memory channels. The
always guaranteed to be readable and free from errors controller uses dedicated engines for the low level com-
when the device is shipped to a customer. Hence, code, all munication protocol with the Flash. By means of firmware
vital pointers for partitioning, and bad block management on the central CPU, it executes the address translation
for the device can be located inside this block (typically a from a host request to the physical addresses. Fig. 5 shows
pointer to the bad block tables). the block diagram of a typical memory card/SSD.
On the other hand, NOR Flash memories are capable of With respect to the host side, the memory controller
a very fast read access time (less than 100 ns), they offer a includes a dedicated host interface, which usually com-
programming time comparable to that of the NAND (but plies to several interface standards like SD, MMC, or CF.
the amount of programmed bit per operation is consider- Hence, from an external point of view, a memory system
ably smaller), but they feature an erase time which is some is vendor independent and characterized only by param-
order of magnitudes higher than the NAND. For these eters like read/write performance and power consump-
reasons, and due to the capability to Bexecute in place,[ tion. Especially for embedded applications, this interface
NOR Flash memories are suitable for code storage and will
not be considered in the following sections.
Table 1 shows the most important parameters of the
two types of memory.

B. Multidie Storage System Architecture


A typical memory system is composed by several
NAND memories. Typically, an 8-bit bus, usually called
channel, is used to connect different memories to the
controller. It is important to underline that multiple Flash
memories in a system are a means for increasing both
Fig. 3. Interleave operation.
storage density and read/write performance [3].
Operations on a channel can be interleaved, which
means that another memory access can be launched on an
idle memory while the first one is still busy (e.g., writing or
erasing). For instance, a sequence of multiple write
accesses can be directed to a channel each addressing a
different die, as shown in Fig. 3: in this way, the channel
utilization is maximized by pipelining data load while the
program operation takes place without requiring channel
occupation. A system typically has two to eight channels
operating in parallel. This is a further means for increased
memory performance.
Moreover, it is clear that the data load phase is not
negligible compared to the program operation (the same
comment is valid for data output); therefore increasing I/O
interface speed is another smart way to improve general
performance. High-speed interfaces, like DDR, have Fig. 4. Page program throughput improvement due to interleave
already been reported [4]. operation.

150 Proceedings of the IEEE | Vol. 97, No. 1, January 2009


Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on March 6, 2009 at 12:59 from IEEE Xplore. Restrictions apply.
Micheloni et al.: Non-Volatile Memories for Removable Media

Fig. 6. Different card form factors: SD, mini-SD, SD, MMC, CF, and
a miniaturized version of a USB device.

For the larger form factors, the card is a complete, small


system where every component is soldered on a printed
Fig. 5. Block diagram of a typical memory card/SSD. circuit board (PCB) and is independently packaged, as
shown in Fig. 7. For instance, the Flash memories are tested
devices in thin small outline packages. It is possible to add
standardization strongly facilitates maintenance of host some additional components; for instance, an external
system firmware when using different Flash sources. step-down converter can be added in order to derive an
There are also hybrid architectures that combine dif- internal power supply (CompactFlash cards can work at
ferent types of memory. Most common is usage of DRAM either 5 or 3.3 V) or a quartz can be used in order to have an
as memory cache. During write access, the cache is used accurate clock signal. Also, reasonable blocking capacitors
for storing data before transfer to the Flash. The benefit is are inserted for stabilizing and buffering power supply.
that data updating, e.g., in tables is faster and does not For small form factors like SD, the size of the card is
wear out the Flash. At read access, repeated access on a comparable to that of the NAND die. Therefore, the
data structure is supported. For instance, execution of a memory chip is mounted as bare die on a small substrate.
software application is accelerated. Moreover, the die thickness has to be reduced in order to
Another architecture uses a companion NOR Flash for comply with the thickness of SD, considering that several
purpose of Bin-place execution[ of software without dies are stacked, i.e., mounted one on top of each other. All
prefetch latency. For hybrid solutions, a multiple-die these issues cause a severe limitation to the maximum
approach, where different memories are packaged in the density of the card, and external components, like voltage
same chip, is a possibility to reduce both area and power regulators and quartz, cannot be used. In other words, the
consumption. memory controller of the card has to implement all the
required functions.

II I. MEMORY CARD ARCHITECTURE


Several types of memory cards are available on the market,1
each of which has a different user interface and a specific
form factor, depending on the needs of the target application:
e.g., mobile phones need very small-sized removable media
(SD, reduced MMC). On the other hand, digital cameras
can accept a larger size while needing more memory density
(CF, SD, MMC). Fig. 6 shows different form factors.
The interfaces of the memory cards (including USB
sticks) support several protocols: parallel or serial,
synchronous or asynchronous. Both the form factor and
the size of memory cards vary, depending on the target
application. Moreover, the memory cards support hot
insertion and hot extraction procedures, which requires
the ability to manage sudden loss of power supply while
guaranteeing the validity of stored data.
1
http://www.mmca.org; http://www.compactflash.org; http://
www.sdcard.com. Fig. 7. CompactFlash (CF) card.

Vol. 97, No. 1, January 2009 | Proceedings of the IEEE 151


Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on March 6, 2009 at 12:59 from IEEE Xplore. Restrictions apply.
Micheloni et al.: Non-Volatile Memories for Removable Media

the Flash, the first part is the host interface, which imple-
ments the required industry-standard protocol (MMC, SD,
CF, etc.), thus ensuring both logical and electrical inter-
operability between the card and the host. This block is a
mix of hardwareVbuffers, drivers, etc.Vand firmwareV
command decoding performed by the embedded
processorVwhich decodes the command sequence in-
voked by the host and handles the data flow to/from the
Flash memories. The second part is the Flash file system
(FFS) [5]: that is, the file system that enables the use of
Flash cards and USB like magnetic disks. For instance,
sequential memory access on a multitude of subsectors
that constitute a file is organized by linked lists (stored on
the Flash card itself), which are used by the host to build
the file allocation table. Just like the case with HDD, de-
fragmentation can be invoked by the host in order to opti-
mize access speed and data organization.
The FFS is implemented in firmware and manages
(at NAND Flash level) all data accesses to/from the host
with a minimum granularity of 512 Byte (one subsector).
This block is of utmost importance during data transfer
operations. As already outlined in the previous section,
Flash memories have intrinsic limitations, some of which
can be overcome by performing erase operations, while
some others lead to unrecoverable situations and require
specific management.
The FFS is usually implemented in the form of firm-
ware inside the controller, with each sublayer performing
a specific function. The main functions are: wear leveling
management, garbage collection, and bad block manage-
ment. For all these functions, tables are widely used in
Fig. 8. Schematic representation of a memory card.
order to map sectors and pages from logical to physical
(Flash translation layer), as shown in Fig. 9 [6], [7]: the
upper row is the logical view of the memory, while the
The assembly stress for small form factors is quite high; lower row is the physical one. From the host perspective,
and, therefore, system testing is at the end of the pro- data are transparently written and overwritten inside a
duction. Hence, production cost is higher. given logical block. Due to Flash limitations, overwrite on
Fig. 8 shows a schematic representation of a memory the same page is not possible; therefore a new page must
card. Two types of components can be identified: the be allocated in the physical block and the previous one is
memory controller and the Flash memory components. marked as invalid. It is clear that at some point, the current
Actual implementation may vary, but for the sake of clarity physical block becomes full and therefore a second one
the block diagram is divided into layers whose functions (buffer) is assigned to the same logical block.
are described in detail.

A. Memory Controller
The aim of the memory controller is twofold: 1) to
provide the most suitable interface and protocol towards
both the host and the Flash memories and 2) to efficiently
handle data, maximizing transfer speed, data integrity, and
information retention. In order to carry out such tasks, an
application-specific device is designed, embedding a
standard processorVusually 8/16 bitVtogether with
dedicated hardware to handle timing-critical tasks.
For the sake of discussion, the memory controller can
be divided into four parts, which are implemented either
in hardware or in firmware. Proceeding from the host to Fig. 9. Logical to physical block management.

152 Proceedings of the IEEE | Vol. 97, No. 1, January 2009


Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on March 6, 2009 at 12:59 from IEEE Xplore. Restrictions apply.
Micheloni et al.: Non-Volatile Memories for Removable Media

way, all the physical sectors are evenly used, thus keeping
the aging under a reasonable value. Two kinds of ap-
proaches are possible: dynamic wear leveling is normally
used to follow up a user’s request of update for a sector;
static wear leveling can also be implemented, where every
sector, even the least modified, is eligible for remapping as
soon as its aging deviates from the average value.

2) Garbage Collection: Both wear leveling techniques rely


on the availability of free sectors that can be filled up with
the updates: as soon as the number of free sectors falls below
a given threshold, sectors are Bcompacted[ and multiple,
obsolete copies are deleted. This operation is performed by
the Garbage Collection module, which selects the blocks
containing the invalid sectors, copies the latest valid copy
into free sectors, and erases such blocks (see Fig. 10).
In order to minimize the impact on performance,
garbage collection can be performed in background. In a
multichannel system, wear leveling is applied on die level,
channel level, and system level depending on the age of the
system. The equilibrium generated by the wear leveling
distributes wearout stress over the array rather than on
Fig. 10. Garbage collection. single hot spots. Hence, the bigger the memory density,
the lower the wearout per cell is.

The required translation tables are always stored on the 3) Bad Block Management: No matter how smart the wear
memory card itself, thus reducing the overall card capa- leveling algorithm is, an intrinsic limitation of NAND Flash
city. The organization of this information is very important memories is represented by the presence of so-called bad
because it has an impact on data access speed and, there- blocks, i.e., blocks that contain one or more locations whose
fore, on card performances. reliability is not guaranteed. The Bad Block management
module creates and maintains a map of bad blocks, as shown
1) Wear Leveling Management: Usually, not all the in Fig. 11: the map is created during factory initialization of
information stored within the same memory location the memory card, thus containing the list of the bad blocks
changes with the same frequency: some data are often already present during the factory testing of the NAND Flash
updated while others remain always the same for a very memory modules. Then it is updated during device lifetime
long timeVworst case, for the whole life of the device. It is whenever a block becomes bad.
clear that the blocks containing frequently updated
information are stressed with a large number of write/ 4) Error Correction: This task is typically executed by a
erase cycles, while the blocks containing information specific hardware inside the memory controller. Examples of
updated very rarely are much less stressed. memories with embedded ECC are also reported [8], [10].
In order to mitigate disturbs, it is important to keep the The most popular ECC codes, correcting more than one
aging of each page/block as minimum and as uniform as error, are Reed–Solomon and Bose–Chauduri–Hocquenghem
possible: that is, the number of both read and program
cycles applied to each page must be monitored. Further-
more, the maximum number of allowed program/erase
cycles for a block (i.e., its endurance) should be considered:
in case SLC NAND memories are used, this number is on
the order of 100 000 cycles, which is reduced to 10 000
when MLC NAND memories are used. Wear leveling tech-
niques rely on the concept of logical to physical translation
for each sector: that is, each time the host application
requires updates to the same (logical) sector, the memory
controller dynamically maps the sector onto a different
(physical) sector, keeping track of the mapping either in a
specific table or with pointers. The out-of-date copy of the
sector is tagged as both invalid and eligible for erase. In this Fig. 11. Bad block management.

Vol. 97, No. 1, January 2009 | Proceedings of the IEEE 153


Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on March 6, 2009 at 12:59 from IEEE Xplore. Restrictions apply.
Micheloni et al.: Non-Volatile Memories for Removable Media

If, for example, the memory is able to guarantee a


probability p of 108 , the use of an ECC correcting two errors
allows a CEP of 0.001 ppm (defective parts per million) with
respect to 100 ppm, guaranteed by an ECC correcting only
one error. On the other hand, if a CEP of 100 ppm is
required, the use of an ECC correcting a single error leads to
a probability p of 108 , while an ECC correcting four errors
leads to the higher probability of 105 .
Fig. 12. Simplified block diagram of a memory system. The object of the theory of error correction codes is the
addition of redundant terms to the message, such that, on
reading, it is possible to detect the errors and to recover
(BCH) [11]. While the encoding takes few controller cycles the message that has most probably been written.
of latency, the decoding phase can take a large number of Methods of error correction are applied for the purpose
cycles and visibly reduce read performance as well as the of data restoration at read access. Block code error
memory response time at random access. correction is applied on subsectors of data. Depending
There are different reasons for disturbed analog levels on the used error correcting schemes, different amounts of
at read access that can result in falsely classified bits at the redundant bits called parity bits are needed.
sensing circuitry with a certain probability: Between the length n of the code words, the number k
• noise (e.g., at the power rails); of information bits, and the number t of correctable errors,
• level disturbances (read/write at neighbor cells); a relationship known as Hamming inequality exists, from
• long-term level shift (retention problems). which it is possible to compute the minimum number of
The allowed probability of failed reads after correction parity bits
is dependent on the use case of the application. Price-
sensitive consumer application with a relatively low num-
ber of read accesses during product life time can tolerate a t  
X n
higher probability of read failures as compared to high-end  2nk : (3)
applications with a high number of memory accesses. Most i¼0
i
demanding are applications which use Flash devices as
cache modules for processors.
The reliability that a memory can offer is its intrinsic It is not always possible to reach this minimum
error probability. This probability could not be the one that number, in which case the number of parity bits for a good
the user wishes. Through ECC, it is possible to fill the code must be as near as possible to this number. On the
discrepancy between the desired error probability and the other hand, the bigger the size of the subsector, the lower
error probability offered by the memory; the latter the relative amount of spare area in the Flash die is for
probability can be written as parity bits. Hence, there is an impact in Flash die size.

Number of bit errors


p¼ (1)
Total number of bits

while the chip error probability (CEP) is defined as

Number of chip errorsðpÞ


CEPðpÞ ¼ : (2)
Total number of chips

Fig. 12 shows a typical system composed of a memory


array and an ECC block. CEP is usually calculated before
(CEPin ) and after (CEPout ) the ECC block.
Fig. 13 shows the graphs of CEPin (indicated in the
legend as Bno ECC[) and CEPout as a function of p for a
system with 512 Mbit memory, 512 Byte block, and ECC
able to correct one to four errors. The page size of 512 Byte Fig. 13. Chip error probability as a function of the error probability of
is the typical sector size of host operating systems. the cell.

154 Proceedings of the IEEE | Vol. 97, No. 1, January 2009


Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on March 6, 2009 at 12:59 from IEEE Xplore. Restrictions apply.
Micheloni et al.: Non-Volatile Memories for Removable Media

Table 2 Comparison of the Number of Parity Bits Required by the


Different ECC Algorithms Versus Data Size

Fig. 15. Flipped die stacking.

BCH and Reed–Solomon codes have a very similar


structure, but BCH codes require fewer parity bits. This is
One way to overcome this issue is to exploit both sides
one reason why they were preferred for an ECC embedded
of the PCB, as shown in Fig. 15: in this way, the PCB acts as
in a memory [10].
an interposer, and components are evenly flipped on the
Table 2 shows the number of parity bits required by
two sides of the PCB. Height is reduced, but there is an
BCH and RS algorithms compared to the minimum
additional constraint on design: in fact, since the lower die
number (Min) calculated with the Hamming inequality.
is flipped, its pads are no longer matching those of the
Different data sizes and number of correctable errors are
upper die. The only way to have corresponding pads facing
considered.
one another is to design the pad section in such a way that
pad to signal correspondence can be scrambled. That is,
IV. MEMORY CARDS AND when a die is used as the bottom one, it is configured with
3- D INT E GRAT ION mirrored pads. Such a solution is achievable, but chip
design is more complex (signals must be multiplexed in
Reduced form factor has been one of the main drivers for
order to perform the scramble) and chip area is increased,
the success of the memory cards; on the other hand,
since it might be necessary to have additional pads to
density requirement has grown dramatically to the extent
ensure symmetry when flipping.
that standard packaging (and design) techniques are no
The real breakthrough is achieved by completely
longer able to sustain the pace. In order to solve this issue,
removing the interposer, thus using all the available height
two approaches are possible: 1) implement advanced
for silicon (apart from a minimum overhead due to die-to-die
integration solutions (both at package and at design level)
glue). Fig. 16 shows an implementation, where a staircase
and 2) provide increased information content.
arrangement of the die is used: any overhead is reduced to
the minimum, bonding does not pose any particular issue,
A. Advanced Integration Solutions
and chip mechanical reliability is maintained (the disoverlap
The classic way to increase density is to implement a
between die is small compared to the die length, and
multichip solution, where several die are stacked together.
therefore the overall stability is not compromised, since the
The advantage of this approach is that it can be applied to
upmost die does not go beyond the overall center of mass).
existing bare die, as shown in Fig. 14: die are separated by
The drawback is that such a solution has a heavy impact
means of a so-called interposer, so that there is enough
on chip design, since all the pads must be located on the
space for the bonding wires to be connected to the pads
same side of the die. In a traditional memory component,
and bonded as required. On the other hand, the use of
pads are arranged along two sides of the device: circuitry is
interposer has the immediate drawback of increasing the
then evenly located next to the two pad rows and the array
height of the multichip, and height is one of the most
occupies the majority of the central area. Fig. 17 shows the
relevant limiting factors for memory cards.
floorplan of a memory device whose pads lie on two
opposite sides.
If all pads lie on one side, as shown in Fig. 18, the chip
floorplan is heavily impacted [12]: most of the circuits are

Fig. 14. Classic die stacking. Fig. 16. Staircase die stacking.

Vol. 97, No. 1, January 2009 | Proceedings of the IEEE 155


Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on March 6, 2009 at 12:59 from IEEE Xplore. Restrictions apply.
Micheloni et al.: Non-Volatile Memories for Removable Media

Table 3 Die Stacking Options: Pros and Cons

to avoid IR drops issues (i.e., the voltage at the end of the


rail is reduced due to the resistive nature of the metal line).
Table 3 summarizes the main features of the suggested
solutions.
The solutions presented so far exploit advances in
stacking techniques, eventually requiring changes in the
chip design floorplan. Quite recently, advanced design and
manufacturing solutions have been presented, where the
three-dimensional (3-D) integration is performed directly
at chip level. The concept is simple: instead of stacking
Fig. 17. Memory device with pads along opposite sides. several die, each of which is a fully functional memory
component, it is possible to embed in the same silicon die
more than one memory array. In this way, all the control
moved next to the pads in order to minimize the length of logic, analog circuitry, and pads can be shared by the
the connection and to optimize circuit placement. But different memory arrays. In order to keep the area at the
some of the circuits still reside on the opposite side of the minimum, the memory arrays are grown one on top of
die (for instance, part of the decoding logic of the array and the other, exploiting the most recent breakthroughs in
part of the page buffers, i.e., the latches where data are silicon manufacturing technology. Two different solutions
stored, either to be written to the memory or when they are have been recently presented for NAND Flash memories:
read from the memory to be provided to the external in one case [13], [14], the topology of the memory array is
world). Of course, such circuits must be connected to the the usual one and another array is diffused on top of it, as
rest of the chip, both from a functional and from a power shown in Fig. 19, so that two layers exist. Therefore the
supply point of view. Since all pads are on the opposite side, NAND strings (i.e., the series of Flash memory cells,
including power supply ones, it is necessary to redesign the which is the basic building block of the array) are diffused
power rail distribution inside the chip, making sure that the on the X-Y plane. Around the arrays, all the peripheral
size and geometry of the rails is designed properly, in order circuitry is placed in the first (i.e., lower) layer. The only
exception is the wordline (WL) decoder. To avoid FN
erasing of the unselected layer, all WLs in that layer must

Fig. 18. Memory device with pads along one side. Fig. 19. Three-dimensional horizontal memory array.

156 Proceedings of the IEEE | Vol. 97, No. 1, January 2009


Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on March 6, 2009 at 12:59 from IEEE Xplore. Restrictions apply.
Micheloni et al.: Non-Volatile Memories for Removable Media

Fig. 20. Three-dimensional vertical memory array. (a) Top down


view of 3-D vertical memory array. (b) Equivalent circuit of the
vertical NAND string.
Fig. 22. MLC approach in NOR architecture. The memory cell is read at
constant VGS , exploiting the three different reference cells.

be floating, just like the WLs of unselected blocks in the


selected layer. This function is performed by the layer-
The concept can be extended by adding multiple
dedicated WL decoder.
reference voltage values while decreasing the quantum of
The second approach [15] is shown in Fig. 20: in this
charge that can be stored and sensed, thus leading to the
case, the NAND strings are orthogonal to the die (along
so-called MLC approach [16]–[20].
the Z-direction). NAND string is on the plugs located
In the case of 2 bits per cell (see Fig. 22), each cell
vertically in the holes punched through whole stack of the
conveys 2 bits of information, and therefore four different
gate plates. Each plate acts as a control gate except the
combinations are possible (00, 01, 10, 11) and three re-
lowest plate, which takes the role of the lower select gate.
ference threshold are needed in order to discriminate
Furthermore, such Bon-die[ 3-D approaches can be
among them.
mixed with previous techniques, and considerable densi-
Several devices implementing the 2 bit per cell
ties can be achieved.
technology are commercially available, and indeed MLC
has become a synonym for 2 bit per cell. The concept has
B. Increased Information Content been extended recently to 3 bits per cell, where eight
The density of the memory can also be increased acting
different combinations are stored inside the same cell.
at cell level: in its simplest form, a nonvolatile memory cell
The circuitry required to read multiple bits out of a cell
stores one bit of information: F1_ when the cell is erased
is of course more complex than in the case of single bit, but
and F0_ when it is programmed. Assuming a current
the savings in term of area (and the increase in density) is
sensing technique, the amount of electrical charge stored
worth the complexity. The real disadvantage lies in
in the floating gate is sensed against a single reference
reduced endurance and reliability. In terms of endurance,
current, and this kind of solution is referred to as SLC. It
as already mentioned previously, an SLC solution can
is shown in Fig. 21.
withstand up to 100 000 program/erase cycles for each
block, while a MLC solution is usually limited to 10 000.
For this reason, wear leveling algorithms must be used, as
already outlined in a previous section. In terms of
reliability, it is clear that the more levels are used, the
more read disturbs can happen, and therefore the ECC
capability must be strengthened.

C. Integration and Packaging


As mentioned before, systems like CompactFlash can
mount quartz on board, but when a complete system is
embedded in a single package, a card clock generator must
be integrated on-chip. On-chip oscillators have larger
frequency variance with respect to discrete quartz or
resonator, and oscillator variations directly influence
system performances. This means that optimizing clock
Fig. 21. Voltage–current characteristics of SLC memory cells. frequency to compensate process skew, or configure the

Vol. 97, No. 1, January 2009 | Proceedings of the IEEE 157


Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on March 6, 2009 at 12:59 from IEEE Xplore. Restrictions apply.
Micheloni et al.: Non-Volatile Memories for Removable Media

Fig. 23. Frequency tuning circuit.

oscillator following temperature is an important point in


system design.
Target performances in fact require the working
frequency to be higher than a given limit to guarantee Fig. 25. Example of Arrhenius plot to achieve a certain VT shift.
the generation of control signal pulses with the right
minimum timings. On the other hand, integrated oscilla-
tors have an intrinsic error on output frequency. Designers every successive boot. Particularly useful of this calibration
are then forced to use lower frequencies than ideal ones to procedure is that it uses only the external clock and no
be sure deviations cannot drive synchronism timings under other input. This means that simultaneously other tests
acceptable values. can be executed on other circuits of the I/O interface.
In order to overcome this issue, it is possible to use a Power networks in Flash cards are not robustly stable
special circuitry that corrects the frequency of the oscil- but tend to bounce. This introduces noise and reduces
lator while it is working, sensing its output every period. In voltage margins. Voltage drop caused by parasitic induc-
this way, oscillator configuration follows a reference tances further reduces voltage margins.
voltage value in order to compensate both temperature Preferably an integrated voltage regulator should be
and supply voltage variations (see Fig. 23). implemented in order to reduce power consumption.
This circuitry should be accurate enough to change the Implementing low voltage technologies is a good choice
frequency of the oscillator while generating neither even if a voltage regulator to transform external supply to
instability nor noise in the system environment: fast the lower internal supply is required because low voltage
ripple on supply should be ignored. transistors switching at a lower voltage consume less than
A different approach is calibrating the system before standard ones.
shipping. Calibration compensates process deviations, Thermal dissipation is another critical topic for highly
allowing design to take care of a minor uncertainty. integrated cards, like SD cards. Both NAND Flash mem-
Synchronous Flash cards have an external clock for the ories and the memory controller are specified by the man-
I/O interface. This external clock can be easily compared ufacturers for standalone use inside their specific package.
with internal clock monitoring the carry of two counters, When several components are stacked inside the card, one
as shown in Fig. 24. package has to dissipate all the thermal energy. In other
The proposed algorithm increases the frequency of the words, the system operating temperature can change in
internal oscillator until it reaches the external one, used in relationship with the number of memories working in
this phase as a reference. Optimum configuration found parallel. In extreme cases, data retention can be jeopar-
can be stored inside a nonvolatile register and loaded at dized; data retention of a nonvolatile memory cell describes
its capability of holding the charge in the floating gate.
Retention is a measure of the time that a nonvolatile
memory cell can retain the charge whether it is powered or
unpowered. In floating-gate memories, the stored charge
can leak away from the floating gate either through the gate
oxide or through the interpoly dielectric. Different charge
loss mechanisms are reported in the literature [21]–[24],
and they result in a threshold voltage ðVT Þ shift. Assuming
a mechanism with activation energy EA , the time tS to
achieve a certain VT shift is described by the Arrhenius plot
as shown in Fig. 25. As a result, if not properly designed in
terms of power management, the card itself can induce
Fig. 24. Frequency tuning in synchronous systems. charge loss from the floating gates and, therefore, data loss.

158 Proceedings of the IEEE | Vol. 97, No. 1, January 2009


Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on March 6, 2009 at 12:59 from IEEE Xplore. Restrictions apply.
Micheloni et al.: Non-Volatile Memories for Removable Media

Another unwanted consumption is due to the fact that The other critical condition is related to the extraction.
each Flash device has its own internal charge pumps, Users can extract the card from the host at any time. This
which individually contribute to the overall standby misuse is clearly out of specification, but critical condi-
current of the whole system as well as to voltage drops tions could be identified at design level, and special
by di=dt. Such consumption can be drastically reduced if strategies should be used to drive whenever possible the
one device (a master, or the controller) has one or more system to a safer condition. Especially, a write access that
voltage regulator(s) on board, or simply a switch by which is still busy should not lose data. The payload data stored in
it can control the supply of other devices (slaves, or the SRAM buffers of the controller must be safely
memories), in order to provide better performances in transferred to the Flash memory before system shutdown.
standby condition. When standby is required, the main Exception handling is a major functionality of the memory
device can gate the power supply to the others, so that controller firmware.
there is only one contributor to the standby current of the
whole system.
A critical aspect of portable systems is related to their V. CONCLUSION
characteristic of being removable, which adds two NAND Flash devices are best suited for applications
critical conditions during system use: insertion and requiring high-capacity data storage. This type of Flash
extraction. architecture combines higher storage space with faster
Insertion is critical whenever it is a hot insertion, i.e., program, erase, and read capabilities over the Bexecute in
the system is connected to a host that is already supplied. place[ advantage of the NOR architecture.
In fact, the system that presents a low capacitance on its Density requirements have pushed traditional integra-
supply line, when physically connected to the host that has tion solutions to the limit, and advanced techniques have
a bigger (and supplied) capacitance on the same low been developed, both at component and at system level. h
Ohmic lines (because supply lines are very conductive),
experiences a very fast rampup of VDD. Typical timing is
faster than 5 V=s. With this hard power-on, many inter- Acknowledgment
nal nets can be coupled to VDD and drive the system to The authors wish to thank A. Marelli [11] for the
unwanted conditions. fruitful discussions on ECC.

REFERENCES [9] G. Campardo, R. Micheloni et al., B40-mm2 [17] H. A. Castro et al., BA 125 MHz burst
3-V-only 50-MHz 64-Mb 2-b/cell CHE NOR mode 0.18 m 128 Mbit 2 bits per cell flash
[1] P. Cappelletti et al., Flash Memories. Flash memory,[ IEEE J. Solid-State Circuits, memory,[ in Proc. VLSI Symp., Jul. 2002,
Norwood, MA: Kluwer Academic, 1999. vol. 35, pp. 1655–1667, Nov. 2000. pp. 304–307.
[2] G. Campardo, R. Micheloni, and D. Novosel,
[10] R. Micheloni et al., BA 4 Gb 2b/cell NAND [18] D. Helmhurst et al., BA 1.8 V 128 Mb
VLSI-Design of Non-Volatile Memories.
flash memory with embedded 5b BCH ECC 125 MHz multi-level cell flash memory with
Berlin, Germany: Springer-Verlag, 2005.
for 36 MB/s system read throughput,[ in IEEE flexible read while write,[ in IEEE Int.
[3] C. Park et al., BA high performance controller Int. Solid-State Circuits Conf. Dig. Tech. Papers, Solid-State Circuits Conf. Dig. Tech. Papers,
for NAND Flash-based Solid State Disk Feb. 2006, pp. 142–143. Feb. 2003, pp. 286–287.
(NSSD),[ in Proc. IEEE Non-Volatile
Semiconduct. Memory Workshop (NVSMW), [11] R. Micheloni, A. Marelli, and R. Ravasio, [19] M. Taub et al., BA 90 nm 512 Mb 166 MHz
Feb. 2006, pp. 17–20. Error Correction Codes for Non-Volatile multilevel cell flash memory with 1.5 MByte/s
Memories. Berlin, Germany: programming,[ in IEEE Int. Solid-State Circuits
[4] D. Nobunaga et al., BA 50 nm 8 Gb NAND Springer-Verlag, 2008. Conf. Dig. Tech. Papers, Feb. 2005, pp. 54–55.
flash memory with 100 MB/s program
throughput and 200 MB/s DDR interface,[ in [12] K. Kanda et al., BA 120 mm2 16 Gb 4-MLC [20] C. Villa et al., BA 125 MHz burst-mode flexible
IEEE Int. Solid-State Circuits Conf. Dig. Tech. NAND flash memory with 43 nm CMOS read-while-write 256 Mbit 2 b/c 1.8 V NOR
Papers, Feb. 2008, pp. 426–427. technology,[ in IEEE Int. Solid-State Circuits flash memory,[ in IEEE Int. Solid-State Circuits
Conf. Dig. Tech. Papers, Feb. 2008, Conf. Dig. Tech. Papers, Feb. 2005, pp. 52–53.
[5] A. Kawaguchi, S. Nishioka, and H. Motoda, pp. 430–431.
BA flash-memory based file system,[ in Proc. [21] A. Modelli, A. Visconti, and R. Bez,
USENIX Winter Technical Conf., 1995, [13] S.-M. Jung, J. Jang, W. Cho et al., BThree BAdvanced flash memory reliability,[ in Proc.
pp. 155–164. dimensionally stacked NAND Flash memory IEEE Int. Conf. Integr. Circuit Design Technol.,
technology using stacking single crystal 2004, pp. 211–218.
[6] J. Kim, J. M. Kim, S. Noh, S. L. Min, and
Si layers on ILD and TANOS structure for [22] N. Mielke, H. Belgal, I. Kalastirsky,
Y. Cho, BA space-efficient flash translation
beyond 30 nm node,[ in IEDM Dig. Tech. P. Kalavade, A. Kurtz, Q. Meng, N. Righos,
layer for compactflash systems,[ IEEE Trans.
Papers, Dec. 2006, pp. 37–40. and J. Wu, BFlash EEPROM threshold
Consum. Electron., vol. 48, pp. 366–375,
May 2002. [14] K. T. Park et al., BA 45 nm 4 Gb 3-dimensional instabilities due to charge trapping during
[7] S.-W. Lee, D.-J. Park, T.-S. Chung, D.-H. Lee, double-stacked multi-level NAND flash program/erase cycling,[ IEEE Trans. Device
S.-W. Park, and H.-J. Songe, BFAST: memory with shared bitline structure,[ in Mater. Rel., vol. 4, no. 3, pp. 335–344,
A log-buffer based FTL scheme with fully IEEE Int. Solid-State Circuits Conf. Dig. Tech. 2004.
associative sector translation,[ in Proc. 2005 Papers, Feb. 2008, pp. 510–511. [23] A. Chimenton, P. Pellati, and P. Olivo,
US-Korea Conf. Sci., Technol., Entrepreneur., [15] H. Tanaka, M. Kido, K. Yahashi et al., BBit cost BAnalysis of erratic bits in flash memories,[
Aug. 2005. scalable technology with punch and plug IEEE Trans. Device Mater. Rel., vol. 1,
[8] T. Tanzawa, T. Tanaka, K. Takekuchi, process for ultra high flash memory,[ in Dig. pp. 179–184, Dec. 2001.
R. Shirota, S. Aritome, H. Watanabe, Symp. VLSI Technol., Jun. 2006, pp. 14–15. [24] H. Kurata et al., BRandom telegraph signal in
G. Hemink, K. Shimizu, S. Sato, Y. Takekuchi, [16] M. Bauer et al., BA multilevel-cell, 32 Mb flash flash memory: Its impact on scaling of
and K. Ohuchi, BA compact on-chip ECC for memory,[ in IEEE Int. Solid-State Circuits multilevel flash memory beyond the 90-nm
low cost Flash memories,[ IEEE J. Solid-State Conf. Dig. Tech. Papers, Feb. 1995, node,[ IEEE J. Solid-State Circuits, vol. 42,
Circuits, vol. 32, pp. 662–669, May 1997. pp. 132–135. pp. 1362–1369, Jun. 2007.

Vol. 97, No. 1, January 2009 | Proceedings of the IEEE 159


Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on March 6, 2009 at 12:59 from IEEE Xplore. Restrictions apply.
Micheloni et al.: Non-Volatile Memories for Removable Media

ABOUT THE AUTHORS


Rino Micheloni (Senior Member, IEEE) was born in Helmut Schwalm received the Dipl.-Ing. degree in
San Marino, Italy, in 1969. He received the Laurea electrical/communications engineering from the
degree (cum laude) in nuclear engineering from Technical University of Munich, Germany, in 1988.
the Politecnico di Milano, Milan, Italy, in 1994. He has experience in developing standard cell
In 1995, he joined the Memory Product Group, libraries, SRAMs/ROMs, and test chip design for
STMicroelectronics, Agrate Brianza, Italy. He was characterization and verification. Recently, he was
Project Leader of a 64-Mb four-level Flash mem- Project Leader for design development kit crea-
ory; after that, he designed a 0.13-m test chip tion based on logic and embedded DRAM process
exploring architectural solutions for Flash memo- technologies and then Concept Engineer for the
ries storing more than 2 bit/cell. Then he was central cell library development group of Infineon.
Product Development Manager of the NOR Multilevel Flash products for Since 2005, he has been Head of R&D Flash System Integration, Qimonda
code and data storage applications. From 2002 to 2006, he led the NAND Flash GmbH, Munich.
Multilevel Flash activities and the Error Correction Code development
team. In 2006, he joined Qimonda Flash GmbH, Unterhaching, Germany,
as Senior Principal for Flash Design. Currently, he is with Qimonda Italy
srl, Vimercate, Italy, leading the design center activities. He is the author/
coauthor of more than 20 papers in international journals or confer-
ences. He is coauthor of Chapter 6 of Floating Gate Devices: Operation
and Compact Modeling (Norwood, MA: Kluwer Academic, 2004) and of
Chapter 5 in Flash Memories (Norwood, MA: Kluwer Academic, 1999). He
is a coauthor of VLSI-Design of Non-Volatile Memories (Berlin, Germany: Michael Scheppler received the Dipl.-Ing. degree
Springer-Verlag, 2005); Memorie in Sistemi Wireless (Franco Angeli, in electrical engineering from the Technical Uni-
2005); and Error Correction Codes for Non-Volatile Memories (Berlin, versity of Munich, Germany, in 1987.
Germany: Springer-Verlag, 2008). He is author/coauthor of more than Thereafter, he was with the Institute of Inte-
100 patents (80 granted in the United States). grated Circuits and the Fraunhofer Institut for
Mr. Micheloni was Co-Guest Editor for the Special Issue on Flash Memory, Solid State Technology. He joined VLSI Technolo-
PROCEEDINGS OF THE IEEE, in April 2003. In 2003 and 2004, he received the gies, one of the pioneers of ASIC technology in
STMicroelectronics Exceptional Patent Awards for U.S. Patent 6 493 260, 1990. As a Chip Developer, he witnessed and
BNon-volatile memory device, having parts with different access time, contributed to the advent of videophones, GSM,
reliability, and capacity,[ and U.S. Patent 6 532 171, BNonvolatile Semicon- fuzzy logic, DVB, and the ARM processor. In 1997,
ductor Memory Capable of Selectively Erasing a Plurality of Elemental he joined Infineon, where he focused on cell library and IP development
Memory Units.[ In 2007, he received the Qimonda Award for IP impact. for chip card controllers. Since 2006, he has been with Qimonda Flash
GmbH, Munich, working on controller architectures and error correction.
Massimiliano Picca was born in Monza, Italy, in
1971. He received the master’s degree in electron-
ics engineering from Politecnico of Milan, Italy.
In 1997, he joined STMicroelectronics, Agrate
Brianza (MI), Italy, where he was a Designer of NOR
Flash memories and worked on several projects
on memory controllers, being responsible for the
development of MMC/SD memory controllers.
Since 2007, he has been with Qimonda as a Digital Stefano Commodaro was born in Genoa, Italy, in
Design Team Leader. 1970. He received the Laurea degree (cum laude)
in electronic engineering from the University of
Stefano Amato was born in Monza, Italy, in 1974. Genova, Italy, in 1993.
He received the Laurea degree in physics from the He joined the Flash Memory Division, STMicro-
University of Milan, Italy. electronics, Milan, Italy, where he was involved in
He was a Reliability Engineer for RF systems the design of both standard and multilevel NOR
with Andrew Corporation. In 2003, he joined Flash memory devices and Flash cards. In 2001, he
STMicroelectronics, Agrate Brianza (MI), Italy, joined LSI Logic Corp. as Field Engineer for the
working on different projects related to ASIC ASIC division. In 2006, he returned to Flash
controllers. He was responsible for analog circuit- memories: after a brief spell with Spansion Inc. as a Technical Marketing
ry of MMC/SD memory controller. Since 2007, he Engineer, he joined Qimonda in 2007 as the person responsible for
has been with Qimonda as an Analog Design concept engineering. He is a corecipient of several patents on Flash
Engineer. memories and coauthor of Flash Memories (Norwood, MA: Kluwer, 1999).

160 Proceedings of the IEEE | Vol. 97, No. 1, January 2009


Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on March 6, 2009 at 12:59 from IEEE Xplore. Restrictions apply.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy