0% found this document useful (0 votes)

7 views6 pages

DAC2011PowerCut

Uploaded by

rknet304mkii

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views6 pages

DAC2011PowerCut

Uploaded by

rknet304mkii

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Understanding the Impact of Power Loss on Flash Memory

Hung-Wei Tseng Laura M. Grupp Steven Swanson

The Department of Computer Science and Engineering
University of California, San Diego
{h1tseng,lgrupp,swanson}@cs.ucsd.edu

Abstract as a result, corruption of the storage array can potentially render

Flash memory is quickly becoming a common component in com- the entire drive inoperable: Not only will the in-progress write not
puter systems ranging from music players to mission-critical server succeed, but all the data on the drive may become inaccessible.
systems. As flash plays a more important role, data integrity in To ensure reliability, system designers must engineer the SSDs
flash memories becomes a critical question. This paper examines to withstand power failures and the resulting data corruption. To do
one aspect of that data integrity by measuring the types of errors this, they must understand what kinds of corruption power failure
that occur when power fails during a flash memory operation. Our can cause.
findings demonstrate that power failure can lead to several non- This paper characterizes the effect of power failure on flash
intuitive behaviors. We find that increasing the time before power memory devices. We designed a testing platform to repeatedly cut
failure does not always reduce error rates and that a power failure power to a raw flash device during program and erase operations.
during a program operation can corrupt data that a previous, suc- Our data show that flash memory’s behavior under power failure
cessful program operation wrote to the device. Our data also show is surprising in several ways. First, operations that come closer
that interrupted program operations leave data more susceptible to to completion do not necessarily exhibit fewer bit errors. Second,
read disturb and increase the probability that the programmed data power failure not only results in failure of the operation in progress,
will decay over time. Finally, we show that incomplete erase opera- it can also corrupt data already present in the flash device. Third,
tions make future program operations to the same block unreliable. power failure can negatively impact the integrity of future data writ-
ten to the device. Our results point out potential pitfalls in design-
Categories and Subject Descriptors ing flash file systems and the importance of understanding failure
modes in design embedded storage systems.
B.3.4 [Memory Structures]: Reliability, Testing, and Fault- The rest of this paper is organized as follows: Section 2 de-
Tolerance scribes the aspects of flash memory pertinent to this study. Sec-
tion 3 describes our experimental platform and methodology for
General Terms characterizing flash memory’s behavior during power failure. Sec-
Experimentation,Measurement,Performance,Reliability tion 4 presents our results and describes the sources of data corrup-
tion due to power failure. Section 5 provides a summary of related
Keywords work to put this project in context, and Section 6 concludes the
paper.
flash memory, power failure, power loss

1. INTRODUCTION 2. FLASH MEMORY

As flash-based solid-state drives (SSDs) make inroads into com- Flash memory has several unique characteristics that make
puter systems ranging from data centers to sensor networks, the power failure particularly dangerous. The long latency of program
integrity of flash memory as a storage technology becomes increas- and erase operations present large "window of vulnerability" and
ingly important. A key component of that integrity is what happens the complex programming algorithms and data encoding schemes
to the data on an SSD when power failure occurs unexpectedly. they employ can lead to non-intuitive failure modes. This section
Power loss in flash is potentially much more dangerous than it presents a summary of flash’s characteristics that are most pertinent
is for conventional hard drives. If power fails during a write to a to this work.
hard drive, the data being written may be irretrievable, but the other Flash memory stores data by trapping electrons using a floating
data on the disk remains intact. However, SSDs use complex flash gate transistor. The electrons affect the transistor’s threshold volt-
translation layers (FTLs) to manage the mapping between logical age, and the chip measures this change to read data from the cell.
block addresses and physical flash memory locations. FTLs must The flash chip organizes cells into pages (between 2 and 8 KB) and
store metadata about this mapping in the flash memory itself, and, pages into blocks (between 32 and 256 pages). Erasing a block sets
all the bits to ’1’. Finer grain erasure is not possible. Programs
operate on pages and convert 1s to 0s. To hide the difference in
granularity between programs and erases and increase reliability,
Permission to make digital or hard copies of all or part of this work for SSDs use complex flash translation layers (FTLs) to perform out-
personal or classroom use is granted without fee provided that copies are of-place update and remapping operations. To support these func-
not made or distributed for profit or commercial advantage and that copies tions, FTLs store metadata in the flash storage array along with the
bear this notice and the full citation on the first page. To copy otherwise, to user data.
republish, to post on servers or to redistribute to lists, requires prior specific Program and erase operations are iterative. Program operations
permission and/or a fee.
DAC 2011, June 5-10, 2011, San Diego, California, USA. selectively inject electrons into floating gates to change the thresh-
Copyright 2011 ACM ACM 978-1-4503-0636-2 ...$10.00. old voltage and then perform a read-verify operation to check if the
Logic Bits Abbrev. Manufa- Cell Cap. Tech. Page Pgs/
Gray coding 2’s complement coding cturer Type (GBit) Node Size Blk
Voltage 1st page 2nd page 1st page 2nd page (nm) (B)
Levels bit bit bit bit A-SLC2 A SLC 2 2048 64
Lowest 1 1 1 1 A-SLC4 A SLC 4 2048 64
1 0 1 0 A-SLC8 A SLC 8 60 2048 64
0 0 0 1 B-SLC2 B SLC 2 50 2048 64
Highest 0 1 0 0 B-SLC4 B SLC 4 72 2048 64
Table 1: The mapping of voltage level and logic bits in 2-bit E-SLC8 E SLC 8 2048 64
MLC chips using Gray coding and 2’s complement coding. A-MLC16 A MLC 16 4096 128
B-MLC32-2 B MLC 32 34 4096 256
cells have reached the desired threshold voltage. If any of the cells
D-MLC32 D MLC 32 4096 128
in a page has not reached the target threshold voltage, the chip will
repeat the program and read-verify process [15, 6]. For erase oper- E-MLC8 E MLC 8 4096 128
ations, the chip removes the electrons from cells within the block. F-MLC16 F MLC 16 41 4096 128
The chip will continue to remove electrons until the voltages of Table 2: Parameters for the 11 flash devices we studied in this
cells reach the erased state. work
There are two types of flash cells: single-level cell (SLC) and
multi-level cell (MLC). SLC devices store one bit per cell, while
MLC devices store two or more. SLC chips provides better and the last byte of the command to the flash chip. High-resolution
more consistent performance than MLC chips. According to em- measurements of the chips’ power consumption show that the chip
pirical measurements in [3] it takes an SLC chip 20 µs to perform a starts executing the command with a few microseconds.
read operation, 200 µs to perform a program operation, and 400 µs For program tests, we use cut off intervals varying from 0.4 µs
- 2 ms to perform an erase operation. to 2.4 ms at increments of 0.4 µs. For erase, we use power cut off
MLC chips achieve higher densities by using 2n threshold volt- intervals varying from 2 µs to 4.8 ms at increments of 2 µs.
age levels to represent n bits. MLC chips need 300 µs - 2 ms to
perform a program operation, and 2 ms – 4 ms to perform an erase
3.3 Flash devices
operation. In this paper, we focus on 2-bit MLC cells, since they The behavior of flash memory chips from different manufactur-
are most prevalent in current systems. ers varies because of architectural differences within the devices
For 2-bit MLC chips, cells store data of two different pages. and because of differences in manufacturing technologies. To un-
Manufactures require that pages within a block be programmed in derstand the variation in power failure performance, we selected 11
order, so to differentiate between the two pages in a cell, we refer chips that cover a variety of technologies and capacities.
to them as “first page” and “second page.” Programming a second Table 2 lists the flash memory chips that we studied in this work.
page is consistently slower than programming a first page, since They come from five different vendors. Their capacities range from
programming the second page requires a more complex program- 2 GBits to 32 GBits and their feature sizes range from 72 nm to
ming algorithm. Table 1 shows the mappings between threshold 34 nm. Values that are not publicly available from the manufacturer
voltages and logic bits of a 2-bit MLC cell using gray coding and are from [3].
2’s complement coding.
4. EXPERIMENTAL RESULTS
3. METHODOLOGY We found unexpected behavior for both program and erase op-
To study the effect of power failure on flash memory, we built erations in the presence of power failure. For both program and
a test platform that allows us to issue command to raw flash chips erase, the variation in bit error rate as the power cut off interval
and to cut off the power supply to the chip at precise moments. changes is non-monotonic, and our measurements show that power
This section describes our test bed, testing methodology, and the loss can lead to both immediate and long-term data integrity issues.
flash chips we used in this study. We describe the results for each operation in turn.

3.1 Experimental hardware 4.1 Program and power failure

For this work, we built a test platform that consists of three com- To understand the impact of power failure during programming,
ponents: the Xilinx XUP board, a custom flash testing board, and we begin by programming random data and cutting off power at
the power control circuit. different intervals. Then, we measure the resulting bit error rate.
The FPGA on the Xilinx XUP board contains a PowerPC 405 Figure 1 contains the results for SLC chips (a) and MLC chips (b).
core running Linux. A custom flash controller on the FPGA pro- Intuitively, the more time we give the flash chip to program a
vides us direct access to the flash device via the flash testing board. page before power failure, the fewer errors there should be. How-
The FPGA also controls the power to the flash chips by means of a ever, the graphs show that the bit error rate does not decrease mono-
pair of high-speed power transistors. Measurements with an oscil- tonically. Instead, the bit error rate for each chip has multiple
loscope show that the system can switch the chip’s power supply plateaus – where the error rate remains constant, and spikes – where
to 0 V within 3.7 µs. the bit error rate increases briefly.
For example, the error rate for E-SLC8 jumps dramatically at
3.2 Test procedure 30 µs, drops slowly until 75 µs when it plummets to nearly zero.
To test the impact of power failure during program and erase The other SLC chips exhibit much more predictable behavior.
operations, we cut power to flash at different points during the op- MLC behavior is much more complex. For example, B-MLC32-
eration. We define the power cut off interval as the time between 2’s error rate remains constant at 50% until 100 µs and then drops
issuing the command to the flash chip and when we trigger the sharply to 25% by 110 µs, where it remains until 200 µs. The
power cut off circuit. We start the cut off interval after sending error rate starts increasing at 200 µs and reaches 29% at 290 µs.
0.6
A-SLC2 1 State 11 State 10 State 00 State 01
A-SLC4
0.5
A-SLC8
0.8

Cell State Distribution

B-SLC2
0.4 B-SLC4
Bit Error Rate

E-SLC8 0.6
0.3
0.4
0.2
0.2
0.1
0
0 0 200 400 600 800 1000 1200 1400 1600 1800 2000
0 50 100 150 200 250 300 Power Cut Off Interval (us)
Power Cut Off Interval (us)
(a)
(a) 1 State 11 State 10 State 00 State 01
0.6
B-MLC32-2
0.8

Cell State Distribution

A-MLC16
0.5
D-MLC32
E-MLC8 0.6
0.4 F-MLC16
Bit Error Rate

0.4
0.3
0.2
0.2
0
0.1 0 200 400 600 800 1000 1200 1400 1600 1800 2000
Power Cut Off Interval (us)
0
0 500 1000 1500 2000 (b)
Power Cut Off Interval (us) 1 State 11 State 10 State 00 State 01

(b)
Cell State Distribution 0.8
Figure 1: The bit error rate of program operations with dif-
ferent power cut off intervals for (a) SLC chips and (b) MLC 0.6
chips
The error rate decreases again after 360 µs and then stays at 25% 0.4

until 500 µs. After 500 µs, the error rate decreases as steps until 0.2
it reaches 0 at 1400 µs. The chip also shows numerous spikes in
error rate, for example, at 540 µs. 0
0 200 400 600 800 1000 1200 1400 1600 1800 2000
These results are unexpected because programming flash chips
Power Cut Off Interval (us)
should only be able to move bits from 1 to 0, yet the non-monotone
error rates suggest that program operations are moving cells in both (c)
State 11 State 10 State 00 State 01
directions at different times. Below, we investigate this behavior in 1

finer detail.
0.8
Cell State Distribution

4.1.1 Per-page MLC error rates

0.6
To understand the cause of MLC chips’ non-monotonic error
rates, we examine the behavior of pairs of first and second pages 0.4
in more detail. A pair of corresponding bits from the two pages can
0.2
be in four states: 01 (i.e., the first page bit is ’0’ and the second
page bit is ’1’), 00, 11, and 10. We used program operations to 0
move the pair between those states and interrupted the power to see 0 200 400 600 800 1000 1200 1400 1600 1800 2000
which intermediate states the cell passes through. We consider four Power Cut Off Interval (us)

transitions: (1) 11→01: we program the first page bit to 0 from the (d)
erased state. (2) 01→00: we program a 0 to the second page bit Figure 2: Cell state breakdown for B-MLC32-2 for (a) 11→01,
after programming a 0 to the first page bit. (3) 01→01: we pro- (b) 01→00, (c) 01→01, (d) 11→10 transitions. The results show
gram a 1 to the second page bit after programming a 0 to the first that even for seeming no-ops, cells may pass through multiple
page bit. (intuitively, this should cause no change) (4) 11→10: we states
program a 0 to the second page bit from the erased state. Other
transitions are not possible because we must program the first page from state 11 to state 01 during programming. The chip reads the
first, and because programs can only convert 1s to 0s. For 01→00 cells as state 01 because the second page bits are not programmed
and 01→01, we only cut off power while programming the second yet, but the voltage levels are actually at state 00 instead of state 01
page. at this point.
Figure 2 shows the experimental results for B-MLC32-2. For Figure 2(b) provides some additional insight into this behavior.
each graph in Figure 2, the x-axis shows the power cut off interval, It shows the graph for the 01→00 transition. The cell states re-
and the y-axis depicts the distribution of cells for four different main at 01 until 300 µs, when they all instantly become 00. This
states in a block. Figure 2(a) plots the distribution of cell states for instantaneous change of cell states indicates, we believe, that the
the 11→01 transition. The graph shows the shift in state between chip switches reference voltages at this point to a new reference
0 and 220 µs, but it is not a smooth transition: There are two clear that allows the chip to distinguish state 00 from state 01. Since all
steps with spikes that suggest that some cells temporarily move the cells move to state 00 immediately after the chip applies a new
0.5 1
B-MLC-32-2 (1st page) Program w/o power failure
0.45 Program with power cut off interval of 1.35ms
E-MLC8 (1st page) 0.1
0.4
0.35 0.01

Bit Error Rate

0.3
0.25 0.001
0.2
0.0001
0.15
0.1 1e-05
0.05
0 1e-06
0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 500000 1e+06 1.5e+06 2e+06 2.5e+06 3e+06
Power Cut Off Interval (us) Number of Reads

Figure 3: A power failure while programming a second page Figure 4: Incompletely programmed pages are more suscepti-
can corrupt data programmed to the corresponding first page, ble to read disturb than completely programmed pages.
even if the first page program completed without interruption.
Besides backup batteries and capacitors, SSDs can take (at least)
reference, it appears that the cells were already at a voltage level three steps to mitigate the effects of retroactive data corruption.
corresponding to a 00 state after programming the first page was First, the FTL could program corresponding pairs of first and sec-
complete. ond pages together and treat them as a logical unit. If programming
Figure 2(c) shows the result when we try to perform a 01→01 the second page failed, the FTL would consider them to both have
transition, a seeming no-op. However, if power cuts off between failed. This is not as easy as it sounds, however, since first and
250 and 1000 µs, a large fraction of the cells will be in state 00 and second pages are not logically consecutive within the flash block:
some may be in state 10. In most cases the first page at address n is paired with the second
In Figure 2(d), we make a 11→10 transition. Though we only page at address n+6. Flash device datasheets require programming
change the second page bits, the cells move through all possible pages in a block in order because of the flash array organizations.
states during the operation. The chip changes state from 11 to 10 (a However, our experiment shows that for chips like E-MLC8, pro-
shift of one voltage level) during 200 µs to 600 µs. Between 500 µs gramming the first and second pages together does not increase the
and 900 µs, it seems to adjust reference voltage to differentiate program error rate.
between states 00 and 01 and result in the abrupt transitions from Second, since the retroactive data corruption never affects the
state 00 to state 10. Then it applies a new reference voltage at second page, the FTL could protect its metadata by storing it solely
900 µs. The chip can differentiate state 10 from state 00 with the in second pages. While it would leave user data exposed to retroac-
new threshold voltage. This also causes the transition of cell states tive data corruption, the SSD would, at least, remain operational.
from 00 to 10. Third, the FTL could adopt a specialized data encoding that
would avoid the cell state transitions that can lead to retroactive
4.1.2 Retroactive data corruption
corruption. For E-MLC8, corruption occurs only when making
The unpredictable effect of power loss during an MLC program a 01→01 transition. Sacrificing some bit capacity and applying
operation demonstrated above makes it clear that SSDs must as- some simple data coding techniques could prevent that transition
sume that data written during an interrupted program is corrupt. from occurring. However, for B-MLC32-2, this scheme does not
However, the data in Figure 2 also show something more danger- work since the retroactive data corruption happens in all the cases
ous: Power failure while programming a second page can corrupt where the first page bit is 0.
data that the chip successfully programmed into a first page. We
call this effect retroactive data corruption. 4.1.3 Read disturb sensitivity
Figure 2(d) demonstrates the phenomenon. We expect the pro- Power failure can also affect the data integrity of programmed
gram operation to move the cell from 11 to 10, leaving the first data by making it more susceptible to other flash failure modes. In
page’s data untouched. However, we can find cells in any of the this section, we examine the relationship of power failure and read
four states depending on when power failure occurred. disturb.
Figure 3 illustrates this effect in more detail. In this graph, we Read disturb arises because reading data from a flash array ap-
first program random data to first page bits in B-MLC32-2 and E- plies weak programming voltages to cells in pages not involved in
MLC8 without power failure. Then we cut off power when we the read operation. Measurements in [3] shows that it typically
program the corresponding second page bits with random data. The takes several million read operations to cause significant errors due
x-axis shows the power cut off intervals for second pages, and the to read disturb.
y-axis shows the bit error rates for the first pages. For B-MLC32- Figure 4 compares the read disturb sensitivity of pages pro-
2, the bit error rate of the first page reaches 25% with power cut grammed to completion (i.e., no power cut off) and pages pro-
off interval between 200 µs and 900 µs even though the program grammed with a power cut off interval of 1.35 ms using B-MLC32-
operation of the first page completed successfully! For E-MLC8, 2. For that interval, reading the page back reveals no errors.
the retroactive data corruption effect is more serious. The bit error For both sets of pages, the error rate starts at 0 after program-
rate can reach 50% if the power cut off interval for the second page ming. For the completely programmed page, errors from read dis-
is between 50 µs and 100 µs . turb appear after 2.8 million reads. For the partially programmed
Flash device datasheets make no mention of this phenomenon, page, errors appear after just 1000 reads and the error rate rises
so FTL designers may assume that once a program operation com- quickly to 3.1 × 10−3 . It appears that the power failure prevents
pletes, the data will remain intact regardless of any future failures. the program operation from completely tuning the voltage level on
This assumption is incorrect for MLC devices. Since retroactive some of the cells leaving them susceptible to read disturb.
data corruption can affect both user data and FTL metadata, it poses This effect is potentially dangerous, especially given the very
a serious threat to SSD reliability. steep increase in error rate. A common approach to dealing with
0.6
2.4e-06 A-SLC2
Barely programmed
2e-06 A-SLC4
Bit Error Rate

Partially programmed 0.5

1.6e-06 Fully programmed A-SLC8
1.2e-06 B-SLC2
0.4 B-SLC4

Bit Error Rate

8e-07
4e-07
E-SLC8
0 1 2 3 4 5 6 7 8 9 10 0.3
Year(s)

(a) 0.2
5e-07
Barely programmed
4e-07
Bit Error Rate

Partially programmed 0.1

3e-07 Fully programmed

2e-07
1e-07
0
0
0 50 100 150 200 250 300 350 400
0 1 2 3 4 5 6 7 8 9 10
Power Cut Off Interval (us)
Year(s)

(b) (a)
2e-07 0.8
Barely programmed B-MLC32-2
Bit Error Rate

Partially programmed
Fully programmed 0.7 A-MLC16
1e-07
D-MLC32
0.6 E-MLC8
F-MLC16

Bit Error Rate

0 0.5
0 1 2 3 4 5 6 7 8 9 10
Year(s) 0.4
(c)
0.3
Figure 5: Baking chips to accelerate aging reveals that power
0.2
failure during program operations reduces the long-term relia-
0.1
bility of data stored in flash chips.
0
0 100 200 300 400 500
read disturb is to copy data to a fresh page when errors begin to
Power Cut Off Interval (us)
appear. However, if the error rate rises too quickly, the data may
become uncorrectable before this can occur. The flash memory (b)
Figure 6: Bit error rates after an interrupted erase opera-
controller should copy the data programmed under power failure to
tion are well-behaved for SLC devices (a), but MLC behavior is
a fresh page as soon as possible.
much more complex (b).
4.1.4 Data Retention Figure 6 presents the bit error rates of erase operations for dif-
Programming data with power failure may also reduce the long- ferent power cut off intervals for (a) SLC chips and (b) MLC chips.
term stability of the data stored in the flash chip. We use a labora- For each test, we initially programmed the block with random data.
tory oven to accelerate the aging effect of flash memory chips up Behavior is similar among the SLC chips that we tested. The bit
to 10 years. According to the JESD22-A113 standard [5], we bake error rate stays at 50% (since the block contains random data) for
the chips for 9 hours and 20 minutes at 125°C to achieve the aging between 50 and 240 µs, after which all the cells become erased in
of one year. For each chip, we programed 5 blocks for each of the less than 10µs. However, chips do not report that the command has
three conditions: (1) Barely programmed: the power cut off interval completed until 400 µs– 2 ms.
is as short as possible without resulting in increased bit error rates For MLC chips, the bit error rate is, once again, non-monotone.
(for some MLC chips, the programmed error rate is never zero). For all chips except E-MLC8, it reaches as high as 75%. The timing
(2) Fully programmed: the program operation completes without of MLC chips also varies among different models: It takes between
power failure. (3) Partially programmed: the power cut off interval 50 µs – 475 µs for every cell to become 1. However, these chips
is halfway between barely programmed and fully programmed. report that erase command completes after 2 – 4 ms.
Figure 5 shows the result for (a) B-MLC32-2, (b) E-MLC8, and 4.2.2 Programming blocks after an erase failure
(c) B-SLC4. The x-axis of the graph is accelerated age in years.
The y-axis shows the average bit error rates in a block after aging In previous experiments, we found that erases appear to be com-
for each accelerated year. We slightly shift the points horizontally plete long before the chip reports that they are finished. Based
for partially programmed and fully programmed results to make the on discussions with flash manufacturers, we suspect that the chip
error bars visible. B-MLC32-2 does not exhibit any relationship spends this time fine-tuning the voltage levels in the cells. Presum-
between power failure and data retention. However, for E-MLC8, ably, this step is important (or the manufacturers would save time
the effect is clear. After 10 years, the error rate for barely pro- and skip it), so it is important to understand what impact cutting it
grammed data is 1.91 × 10−7 rather than 4.77 × 10−8 for partially short will have.
programmed or 0 for fully programmed. B-SLC4 shows the ro- To measure this effect, we cut power during erase operations,
bustness common among SLC chips. The chip only shows errors performed a complete program operation, and then read back the
for the barely programmed case. data to measure the bit error rate.
Figure 7 shows the results for (a) SLC and (b) MLC chips. The
4.2 Erase and power failure x-axis measures the power cut off interval of the previous erase
Erase operations are subject to a different set of reliability re- operation and the y-axis depicts the bit error rates of the later pro-
quirements than program operations. While it is important that an gramming operations. We start the experiment at 300 µs for SLC
erase operation reliably write a ’1’ to every bit in a block, it is and 500 µs for MLC, since by this time, the erase appears to be
equally important that the erase prepare the block properly for the complete. In both cases, interrupting an erase operation reduces
program operations that will follow it. We investigate both aspects reliability for future program operations to the block. For SLC and
of this reliability below. most MLC chips, the error rate rises from 0 to between 0.4% and
0.9%. For B-MLC32-2, the program error rate is never zero, and
4.2.1 Erasing bits the bit error rate rises from 1.2 × 10−7 to 0.2%. For E-MLC8, the
0.1
A-SLC2 at least one entire block for each write operation. We described
A-SLC4
A-SLC8 several alternative solutions.
B-SLC2 System software and embedded applications are critical in deal-
B-SLC4
Bit Error Rate

E-SLC8 ing with reliability issues like power loss. Kim et al. [8] designed
0.01 a software framework to mimic faults, including power failure, in
flash memory. This work (and the results in [3]) demonstrates that
flash has many non-intuitive error modes, so fault-injection frame-
works require input from real hardware measurements to ensure
the faults they inject are representative of what can occur in real
0.001 systems.
500 1000 1500 2000 2500
Power Cut Off Interval (us)
(a)
6. CONCLUSION
0.1 The flash memory devices we studied in this work demonstrated
B-MLC32-2 D-MLC32 F-MLC16 unexpected behavior when power failure occurs. The error rates do
A-MLC16 E-MLC8
0.01 not always decrease as the operation proceeds, and power failure
can corrupt the data from operations that completed successfully.
0.001
Bit Error Rate

We also found that relying on blocks that have been programmed or

0.0001 erased during a power failure is unreliable, even if the data appears
to be intact.
1e-05

1e-06
7. REFERENCES
[1] S. Boboila and P. Desnoyers. Write endurance in flash drives:
1e-07
500 1000 1500 2000 2500 3000 3500 4000
measurements and analysis. In FAST ’10: Proceedings of the 8th
USENIX conference on File and storage technologies, pages 9–9,
Power Cut Off Interval (us) Berkeley, CA, USA, 2010. USENIX Association.
(b) [2] T.-S. Chung, M. Lee, Y. Ryu, and K. Lee. Porce: An efficient power off
Figure 7: Bit error rates after an interrupted erase opera- recovery scheme for flash memory. Journal of Systems Architecture,
54(10):935 – 943, 2008.
tion are well-behaved for SLC devices (a), but MLC behavior is [3] L. Grupp, A. Caulfield, J. Coburn, S. Swanson, E. Yaakobi, P. Siegel,
much more complex (b) and J. Wolf. Characterizing flash memory: Anomalies, observations,
and applications. In MICRO-42: 42nd Annual IEEE/ACM International
program error rate varies between 0.4% and 0% for power cut off Symposium on Microarchitecture, pages 24 –33, 12 2009.
intervals between 1747 and 2062 µs. The program error rate of A- [4] A. Gupta, Y. Kim, and B. Urgaonkar. DFTL: a flash translation layer
MLC16 also bounces for power cut off interval between 2640 µs employing demand-based selective caching of page-level address
mappings. In ASPLOS ’09: In Proceeding of the 14th international
and 2814 µs. These frequent variations cause the two vertical bands conference on Architectural support for programming languages and
in the graph. operating systems, pages 229–240, 2009.
[5] JEDEC. Preconditioning of Plastic Surface Mount Devices Prior to
Reliability Testing.
5. RELATED WORK http://www.jedec.org/sites/default/files/docs/22a113F.pdf.
[6] T.-S. Jung, Y.-J. Choi, K.-D. Suh, B.-H. Suh, J.-K. Kim, Y.-H. Lim,
Flash manufactures provide limited information about many as- Y.-N. Koh, J.-W. Park, K.-J. Lee, J.-H. Park, K.-T. Park, J.-R. Kim, J.-H.
pects of their chips, including their behavior under power loss. Our Yi, and H.-K. Lim. A 117-mm2 3.3-v only 128-mb multilevel nand flash
memory for mass storage applications. IEEE Journal of Solid-State
work is similar in spirit to Grupp et al. [3] in that we empirically Circuits, 31(11):1575 –1583, Nov. 1996.
quantify flash behavior in order to better understand the opportuni- [7] J. Kim, J. M. Kim, S. H. Noh, S. L. Min, and Y. Cho. A space efficient
ties and design challenges that it presents. In addition to chip level flash translation layer for compactflash systems. IEEE Transactions on
Consumer Electronics, 48:366–375, 2002.
performance, Boboila et. al [1] also explored device-level charac- [8] S.-K. Kim, J. Choi, D. Lee, S. H. Noh, and S. L. Min. Virtual framework
teristics including timing, endurance, and FTL designs. However, for testing the reliability of system software on embedded systems. In
neither of the above works focus on the power failure behavior of SAC ’07: Proceedings of the 2007 ACM symposium on Applied
computing, pages 1192–1196, New York, NY, USA, 2007. ACM.
flash memory chips. [9] K. Y. Lee, H. Kim, K.-G. Woo, Y. D. Chung, and M. H. Kim. Design
Many high-end SSDs have backup batteries or capacitors to en- and implementation of mlc nand flash-based dbms for mobile devices.
sure that operations complete even if power fails. Our results argue Journal of Systems and Software, 82(9):1447–1458, 2009.
[10] P. March. Power Loss Recovery (PLR) for cell phones using NAND
that these systems should provide power until the chip signals that Flash memory. http://www.numonyx.com/en-
the operation is finished rather than until the data appears to be cor- US/ResourceCenter/SoftwareArticles/Pages/PLRforNAND.aspx.
rect. Low-end SSDs and embedded systems, however, often do not [11] Numonyx. How to operate Power Loss Recovery for the Numonyx
65nm Flash Memory Devices.
contain backup power sources due to cost or space constraints, and www.numonyx.com/Documents/Applications_Operation.pdf.
these systems must be extremely careful to prevent data loss and/or [12] C. Park, P. Talawar, D. Won, M. Jung, J. Im, S. Kim, and Y. Choi. A
reduced reliability after a power failure. High Performance Controller for NAND Flash-based Solid State Disk
(NSSD). In NVSMW ’06: Non-Volatile Semiconductor Memory
Existing work on recovery from power failure aims to restore or Workshop, 2006., pages 17 –20, feb. 2006.
repair flash file systems using logs and other techniques [16, 4, 12, [13] S. Park, J. H. Yu, and S. Y. Ohm. Atomic write FTL for robust flash file
10] or page-level atomic writes [7, 2]. Numonyx [11] also pro- system. In ISCE ’05: Proceedings of the Ninth International Symposium
on Consumer Electronics, 2005., pages 155 – 160, June 2005.
vides guidelines to repeat interrupted operations after power fail- [14] F. M. Systems. Power failure prevention, recovery, and test.
ure. These designs may work for SLC chips, but the retroactive http://www.fortasa.com/Attachments/040_Power Failure Corruption
data corruption we observed for MLC chips suggests that they will Prevention.pdf.
[15] K. Takeuchi, T. Tanaka, and T. Tanzawa. A multipage cell architecture
be less effective there. for high-speed programming multilevel NAND flash memories. IEEE
Some commercial systems avoid retroactive corruption by treat- Journal of Solid-State Circuits, 33(8):1228 –1238, Aug. 1998.
ing a block as the basic unit of atomic writes [9, 13, 14]. This [16] D. Woodhouse. JFFS2: The Journalling Flash File System, version 2.
http://sources.redhat.com/jffs2/.
approach is inefficient for small writes since it requires re-writing

Memory Fault Modelling
No ratings yet
Memory Fault Modelling
46 pages
Flash Software Design Concept
50% (2)
Flash Software Design Concept
6 pages
Technical Note Design and Use Considerations For NAND Flash Memory
100% (1)
Technical Note Design and Use Considerations For NAND Flash Memory
8 pages
Flash Memory
86% (7)
Flash Memory
18 pages
Flash Memory Thesis
100% (2)
Flash Memory Thesis
7 pages
Thesis On Flash Memory
100% (3)
Thesis On Flash Memory
6 pages
Term Paper On Flash Memory
100% (1)
Term Paper On Flash Memory
5 pages
FT01
100% (1)
FT01
4 pages
Sram PDF
No ratings yet
Sram PDF
22 pages
Research On Reliability Design of Data Storage For Embedded System
No ratings yet
Research On Reliability Design of Data Storage For Embedded System
4 pages
Reliability of 3D NAND Memories PDF
100% (1)
Reliability of 3D NAND Memories PDF
35 pages
Unit v -Sources of Power Dissipation
No ratings yet
Unit v -Sources of Power Dissipation
52 pages
Assigmennt MIS
No ratings yet
Assigmennt MIS
8 pages
Flash Memory Testing
No ratings yet
Flash Memory Testing
75 pages
(P) Debugging High Performance Embedded Memories, 2004
No ratings yet
(P) Debugging High Performance Embedded Memories, 2004
6 pages
23 24 M3T4b - AlmacenamientoEvolucionCintaMagnetica
No ratings yet
23 24 M3T4b - AlmacenamientoEvolucionCintaMagnetica
74 pages
Fast16 Papers Schroeder
No ratings yet
Fast16 Papers Schroeder
15 pages
20240813_NDIA GVSETS 2024 MOSA Session_[Papers]Challenges and Mitigations for Data Remanance in Field Programmable
No ratings yet
20240813_NDIA GVSETS 2024 MOSA Session_[Papers]Challenges and Mitigations for Data Remanance in Field Programmable
13 pages
Semiconductor Memories
No ratings yet
Semiconductor Memories
58 pages
FLASH DRIVE FAILURE Integrated Lesson
No ratings yet
FLASH DRIVE FAILURE Integrated Lesson
3 pages
DDR DRAM Memory Issues
No ratings yet
DDR DRAM Memory Issues
6 pages
Fundamental of Information Technology
No ratings yet
Fundamental of Information Technology
95 pages
murugan2011
No ratings yet
murugan2011
12 pages
Solid State Drives (SSD) and Their Challenges To Digital Forensics
No ratings yet
Solid State Drives (SSD) and Their Challenges To Digital Forensics
5 pages
Computer Class 5th Notes
No ratings yet
Computer Class 5th Notes
10 pages
Vidyabhushan Mohan MS
No ratings yet
Vidyabhushan Mohan MS
69 pages
Computer Science 11 Notes
No ratings yet
Computer Science 11 Notes
136 pages
Solid State Drives Data Reliability and Lifetime: Floating Gate MOS Transistor
No ratings yet
Solid State Drives Data Reliability and Lifetime: Floating Gate MOS Transistor
27 pages
Flash Memory Lifespan and Reliability White Paper
No ratings yet
Flash Memory Lifespan and Reliability White Paper
10 pages
The Flash Memory Lifespan Question - Why QLC May Be NAND Flash's Swan Song - Hackaday
No ratings yet
The Flash Memory Lifespan Question - Why QLC May Be NAND Flash's Swan Song - Hackaday
35 pages
H19-401_V1.0 Huawei Certification Exam Practice Questions
No ratings yet
H19-401_V1.0 Huawei Certification Exam Practice Questions
25 pages
Analysis of Data Remanence in a 90nm FPGA
No ratings yet
Analysis of Data Remanence in a 90nm FPGA
4 pages
Hardware Platforms For Flash Memory/NVRAM Software Development
No ratings yet
Hardware Platforms For Flash Memory/NVRAM Software Development
14 pages
FAT32 File Structure
No ratings yet
FAT32 File Structure
65 pages
Fortasa Reliability Paper
No ratings yet
Fortasa Reliability Paper
12 pages
CSEIT1726289.pdf: Paper Name
No ratings yet
CSEIT1726289.pdf: Paper Name
13 pages
How I Learned To Stop Worrying and Love Flash Endurance
No ratings yet
How I Learned To Stop Worrying and Love Flash Endurance
5 pages
PH4418 Physics in Industry - Semiconductors - Part2
No ratings yet
PH4418 Physics in Industry - Semiconductors - Part2
60 pages
Hardware view of the Embedded Systems-2
No ratings yet
Hardware view of the Embedded Systems-2
41 pages
What is Flash Memory_ _ Definition From TechTarget
No ratings yet
What is Flash Memory_ _ Definition From TechTarget
16 pages
Let's Talk About MemoryBIST!. Design For Testability (DFT) Is A - by Raghu Aratlakota - Medium
No ratings yet
Let's Talk About MemoryBIST!. Design For Testability (DFT) Is A - by Raghu Aratlakota - Medium
25 pages
How I Learned To Stop Worrying and Love Flash Endurance
No ratings yet
How I Learned To Stop Worrying and Love Flash Endurance
5 pages
Flash Memory
No ratings yet
Flash Memory
10 pages
Flash Memory
No ratings yet
Flash Memory
15 pages
Introduction To Flash Memory: Proceedings of The IEEE May 2003
No ratings yet
Introduction To Flash Memory: Proceedings of The IEEE May 2003
15 pages
Introduction To Flash Memory in VLSI
No ratings yet
Introduction To Flash Memory in VLSI
10 pages
SFS- Random Write Considered Harmful in Solid State Drives
No ratings yet
SFS- Random Write Considered Harmful in Solid State Drives
16 pages
Comp 231
No ratings yet
Comp 231
42 pages
CA I - Chapter 5 Caches 3
No ratings yet
CA I - Chapter 5 Caches 3
70 pages
2nd Sem CA Notes
No ratings yet
2nd Sem CA Notes
176 pages
.Trashed 1744221879 ACT Memory Test
No ratings yet
.Trashed 1744221879 ACT Memory Test
8 pages
A Flash-Memory Based File System
No ratings yet
A Flash-Memory Based File System
11 pages
Pic 16F877
100% (1)
Pic 16F877
115 pages
Memory
No ratings yet
Memory
29 pages
EMC VNX Series: Release 7.1
No ratings yet
EMC VNX Series: Release 7.1
154 pages
Serveraid M5014/M5015 Sas/Sata Controllers: User'S Guide
No ratings yet
Serveraid M5014/M5015 Sas/Sata Controllers: User'S Guide
92 pages
Mehrdad Nourani: EEDG/CE 6303: Testing and Testable Design
No ratings yet
Mehrdad Nourani: EEDG/CE 6303: Testing and Testable Design
54 pages
Raghuraman Mem
No ratings yet
Raghuraman Mem
8 pages
Non Volatile Memories For Removable Media
No ratings yet
Non Volatile Memories For Removable Media
13 pages
"Cache Memory" in (Microprocessor and Assembly Language) : Lecture-20
No ratings yet
"Cache Memory" in (Microprocessor and Assembly Language) : Lecture-20
19 pages
c240m6 SFF Specsheet
No ratings yet
c240m6 SFF Specsheet
135 pages
Matsonic ms5120
No ratings yet
Matsonic ms5120
31 pages
ITC2005 FullSpeed FP Memory BIST 45 2
No ratings yet
ITC2005 FullSpeed FP Memory BIST 45 2
9 pages
Hardwares: Engr. James Francis B. Aguilar
No ratings yet
Hardwares: Engr. James Francis B. Aguilar
32 pages
Checker Board
No ratings yet
Checker Board
9 pages
Flash Corruption Reasons
No ratings yet
Flash Corruption Reasons
8 pages
Flowchart Symbols and Their Meanings
No ratings yet
Flowchart Symbols and Their Meanings
6 pages
CS001 Midterm Solved McQs Papers by Waqar Sidhu
100% (3)
CS001 Midterm Solved McQs Papers by Waqar Sidhu
16 pages
Eijkhout - Intro To HPC
No ratings yet
Eijkhout - Intro To HPC
482 pages
Y9. Note. 1.2 Hardware and software
No ratings yet
Y9. Note. 1.2 Hardware and software
13 pages
Exploiting Memory Device Wear-Out Dynamics To Improve NAND Flash Memory System Performance
No ratings yet
Exploiting Memory Device Wear-Out Dynamics To Improve NAND Flash Memory System Performance
14 pages
Lattice WP Flash Corruption
No ratings yet
Lattice WP Flash Corruption
8 pages
Overview of 3D Vertical Nand Flash Memory Using Charge Trap Flash Technology Seminar Report
No ratings yet
Overview of 3D Vertical Nand Flash Memory Using Charge Trap Flash Technology Seminar Report
23 pages
Comparison of nvRAM PDF
No ratings yet
Comparison of nvRAM PDF
2 pages
Introduction To Peripherals & Interfacing
No ratings yet
Introduction To Peripherals & Interfacing
21 pages
Forticloud Faq
No ratings yet
Forticloud Faq
8 pages
SDC Files
No ratings yet
SDC Files
2 pages
Digital Planet: Tomorrow's Technology and You: Hardware Basics
No ratings yet
Digital Planet: Tomorrow's Technology and You: Hardware Basics
20 pages
Feature Briefing - NetBackup 7 1 - Auto Image Replication
No ratings yet
Feature Briefing - NetBackup 7 1 - Auto Image Replication
9 pages
TVLSI Assignment2 RaviYadav 2019HT80073
No ratings yet
TVLSI Assignment2 RaviYadav 2019HT80073
6 pages
Ram 033022
No ratings yet
Ram 033022
4 pages
Testability - Assignment
No ratings yet
Testability - Assignment
7 pages
Course Code: CSE 2203 Course Title: Digital Techniques
No ratings yet
Course Code: CSE 2203 Course Title: Digital Techniques
14 pages
Different Types of Storage Media
No ratings yet
Different Types of Storage Media
3 pages
T315-16 Alarm and Events - RevA
No ratings yet
T315-16 Alarm and Events - RevA
44 pages
T5 Homework 5 Secondary Storage Answers
No ratings yet
T5 Homework 5 Secondary Storage Answers
1 page
Troubleshooting and Fixing PC Hardware Conflicts for Peak Performance
From Everand
Troubleshooting and Fixing PC Hardware Conflicts for Peak Performance
Pasquale De Marco
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
Spanning Tree Protocol Essentials: Definitive Reference for Developers and Engineers
From Everand
Spanning Tree Protocol Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Desktop Support Crash Course: Technical Problem Solving And Network Troubleshooting
From Everand
Desktop Support Crash Course: Technical Problem Solving And Network Troubleshooting
Rob Botwright
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DAC2011PowerCut

Uploaded by

DAC2011PowerCut

Uploaded by

Understanding the Impact of Power Loss on Flash Memory

Hung-Wei Tseng Laura M. Grupp Steven Swanson

Abstract as a result, corruption of the storage array can potentially render

1. INTRODUCTION 2. FLASH MEMORY

3.1 Experimental hardware 4.1 Program and power failure

Cell State Distribution

Cell State Distribution

4.1.1 Per-page MLC error rates

Bit Error Rate

Partially programmed 0.5

Bit Error Rate

Partially programmed 0.1

Bit Error Rate

We also found that relying on blocks that have been programmed or

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.