DAC2011PowerCut
DAC2011PowerCut
E-SLC8 0.6
0.3
0.4
0.2
0.2
0.1
0
0 0 200 400 600 800 1000 1200 1400 1600 1800 2000
0 50 100 150 200 250 300 Power Cut Off Interval (us)
Power Cut Off Interval (us)
(a)
(a) 1 State 11 State 10 State 00 State 01
0.6
B-MLC32-2
0.8
0.4
0.3
0.2
0.2
0
0.1 0 200 400 600 800 1000 1200 1400 1600 1800 2000
Power Cut Off Interval (us)
0
0 500 1000 1500 2000 (b)
Power Cut Off Interval (us) 1 State 11 State 10 State 00 State 01
(b)
Cell State Distribution 0.8
Figure 1: The bit error rate of program operations with dif-
ferent power cut off intervals for (a) SLC chips and (b) MLC 0.6
chips
The error rate decreases again after 360 µs and then stays at 25% 0.4
until 500 µs. After 500 µs, the error rate decreases as steps until 0.2
it reaches 0 at 1400 µs. The chip also shows numerous spikes in
error rate, for example, at 540 µs. 0
0 200 400 600 800 1000 1200 1400 1600 1800 2000
These results are unexpected because programming flash chips
Power Cut Off Interval (us)
should only be able to move bits from 1 to 0, yet the non-monotone
error rates suggest that program operations are moving cells in both (c)
State 11 State 10 State 00 State 01
directions at different times. Below, we investigate this behavior in 1
finer detail.
0.8
Cell State Distribution
transitions: (1) 11→01: we program the first page bit to 0 from the (d)
erased state. (2) 01→00: we program a 0 to the second page bit Figure 2: Cell state breakdown for B-MLC32-2 for (a) 11→01,
after programming a 0 to the first page bit. (3) 01→01: we pro- (b) 01→00, (c) 01→01, (d) 11→10 transitions. The results show
gram a 1 to the second page bit after programming a 0 to the first that even for seeming no-ops, cells may pass through multiple
page bit. (intuitively, this should cause no change) (4) 11→10: we states
program a 0 to the second page bit from the erased state. Other
transitions are not possible because we must program the first page from state 11 to state 01 during programming. The chip reads the
first, and because programs can only convert 1s to 0s. For 01→00 cells as state 01 because the second page bits are not programmed
and 01→01, we only cut off power while programming the second yet, but the voltage levels are actually at state 00 instead of state 01
page. at this point.
Figure 2 shows the experimental results for B-MLC32-2. For Figure 2(b) provides some additional insight into this behavior.
each graph in Figure 2, the x-axis shows the power cut off interval, It shows the graph for the 01→00 transition. The cell states re-
and the y-axis depicts the distribution of cells for four different main at 01 until 300 µs, when they all instantly become 00. This
states in a block. Figure 2(a) plots the distribution of cell states for instantaneous change of cell states indicates, we believe, that the
the 11→01 transition. The graph shows the shift in state between chip switches reference voltages at this point to a new reference
0 and 220 µs, but it is not a smooth transition: There are two clear that allows the chip to distinguish state 00 from state 01. Since all
steps with spikes that suggest that some cells temporarily move the cells move to state 00 immediately after the chip applies a new
0.5 1
B-MLC-32-2 (1st page) Program w/o power failure
0.45 Program with power cut off interval of 1.35ms
E-MLC8 (1st page) 0.1
0.4
0.35 0.01
0.3
0.25 0.001
0.2
0.0001
0.15
0.1 1e-05
0.05
0 1e-06
0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 500000 1e+06 1.5e+06 2e+06 2.5e+06 3e+06
Power Cut Off Interval (us) Number of Reads
Figure 3: A power failure while programming a second page Figure 4: Incompletely programmed pages are more suscepti-
can corrupt data programmed to the corresponding first page, ble to read disturb than completely programmed pages.
even if the first page program completed without interruption.
Besides backup batteries and capacitors, SSDs can take (at least)
reference, it appears that the cells were already at a voltage level three steps to mitigate the effects of retroactive data corruption.
corresponding to a 00 state after programming the first page was First, the FTL could program corresponding pairs of first and sec-
complete. ond pages together and treat them as a logical unit. If programming
Figure 2(c) shows the result when we try to perform a 01→01 the second page failed, the FTL would consider them to both have
transition, a seeming no-op. However, if power cuts off between failed. This is not as easy as it sounds, however, since first and
250 and 1000 µs, a large fraction of the cells will be in state 00 and second pages are not logically consecutive within the flash block:
some may be in state 10. In most cases the first page at address n is paired with the second
In Figure 2(d), we make a 11→10 transition. Though we only page at address n+6. Flash device datasheets require programming
change the second page bits, the cells move through all possible pages in a block in order because of the flash array organizations.
states during the operation. The chip changes state from 11 to 10 (a However, our experiment shows that for chips like E-MLC8, pro-
shift of one voltage level) during 200 µs to 600 µs. Between 500 µs gramming the first and second pages together does not increase the
and 900 µs, it seems to adjust reference voltage to differentiate program error rate.
between states 00 and 01 and result in the abrupt transitions from Second, since the retroactive data corruption never affects the
state 00 to state 10. Then it applies a new reference voltage at second page, the FTL could protect its metadata by storing it solely
900 µs. The chip can differentiate state 10 from state 00 with the in second pages. While it would leave user data exposed to retroac-
new threshold voltage. This also causes the transition of cell states tive data corruption, the SSD would, at least, remain operational.
from 00 to 10. Third, the FTL could adopt a specialized data encoding that
would avoid the cell state transitions that can lead to retroactive
4.1.2 Retroactive data corruption
corruption. For E-MLC8, corruption occurs only when making
The unpredictable effect of power loss during an MLC program a 01→01 transition. Sacrificing some bit capacity and applying
operation demonstrated above makes it clear that SSDs must as- some simple data coding techniques could prevent that transition
sume that data written during an interrupted program is corrupt. from occurring. However, for B-MLC32-2, this scheme does not
However, the data in Figure 2 also show something more danger- work since the retroactive data corruption happens in all the cases
ous: Power failure while programming a second page can corrupt where the first page bit is 0.
data that the chip successfully programmed into a first page. We
call this effect retroactive data corruption. 4.1.3 Read disturb sensitivity
Figure 2(d) demonstrates the phenomenon. We expect the pro- Power failure can also affect the data integrity of programmed
gram operation to move the cell from 11 to 10, leaving the first data by making it more susceptible to other flash failure modes. In
page’s data untouched. However, we can find cells in any of the this section, we examine the relationship of power failure and read
four states depending on when power failure occurred. disturb.
Figure 3 illustrates this effect in more detail. In this graph, we Read disturb arises because reading data from a flash array ap-
first program random data to first page bits in B-MLC32-2 and E- plies weak programming voltages to cells in pages not involved in
MLC8 without power failure. Then we cut off power when we the read operation. Measurements in [3] shows that it typically
program the corresponding second page bits with random data. The takes several million read operations to cause significant errors due
x-axis shows the power cut off intervals for second pages, and the to read disturb.
y-axis shows the bit error rates for the first pages. For B-MLC32- Figure 4 compares the read disturb sensitivity of pages pro-
2, the bit error rate of the first page reaches 25% with power cut grammed to completion (i.e., no power cut off) and pages pro-
off interval between 200 µs and 900 µs even though the program grammed with a power cut off interval of 1.35 ms using B-MLC32-
operation of the first page completed successfully! For E-MLC8, 2. For that interval, reading the page back reveals no errors.
the retroactive data corruption effect is more serious. The bit error For both sets of pages, the error rate starts at 0 after program-
rate can reach 50% if the power cut off interval for the second page ming. For the completely programmed page, errors from read dis-
is between 50 µs and 100 µs . turb appear after 2.8 million reads. For the partially programmed
Flash device datasheets make no mention of this phenomenon, page, errors appear after just 1000 reads and the error rate rises
so FTL designers may assume that once a program operation com- quickly to 3.1 × 10−3 . It appears that the power failure prevents
pletes, the data will remain intact regardless of any future failures. the program operation from completely tuning the voltage level on
This assumption is incorrect for MLC devices. Since retroactive some of the cells leaving them susceptible to read disturb.
data corruption can affect both user data and FTL metadata, it poses This effect is potentially dangerous, especially given the very
a serious threat to SSD reliability. steep increase in error rate. A common approach to dealing with
0.6
2.4e-06 A-SLC2
Barely programmed
2e-06 A-SLC4
Bit Error Rate
(a) 0.2
5e-07
Barely programmed
4e-07
Bit Error Rate
2e-07
1e-07
0
0
0 50 100 150 200 250 300 350 400
0 1 2 3 4 5 6 7 8 9 10
Power Cut Off Interval (us)
Year(s)
(b) (a)
2e-07 0.8
Barely programmed B-MLC32-2
Bit Error Rate
Partially programmed
Fully programmed 0.7 A-MLC16
1e-07
D-MLC32
0.6 E-MLC8
F-MLC16
E-SLC8 ing with reliability issues like power loss. Kim et al. [8] designed
0.01 a software framework to mimic faults, including power failure, in
flash memory. This work (and the results in [3]) demonstrates that
flash has many non-intuitive error modes, so fault-injection frame-
works require input from real hardware measurements to ensure
the faults they inject are representative of what can occur in real
0.001 systems.
500 1000 1500 2000 2500
Power Cut Off Interval (us)
(a)
6. CONCLUSION
0.1 The flash memory devices we studied in this work demonstrated
B-MLC32-2 D-MLC32 F-MLC16 unexpected behavior when power failure occurs. The error rates do
A-MLC16 E-MLC8
0.01 not always decrease as the operation proceeds, and power failure
can corrupt the data from operations that completed successfully.
0.001
Bit Error Rate
1e-06
7. REFERENCES
[1] S. Boboila and P. Desnoyers. Write endurance in flash drives:
1e-07
500 1000 1500 2000 2500 3000 3500 4000
measurements and analysis. In FAST ’10: Proceedings of the 8th
USENIX conference on File and storage technologies, pages 9–9,
Power Cut Off Interval (us) Berkeley, CA, USA, 2010. USENIX Association.
(b) [2] T.-S. Chung, M. Lee, Y. Ryu, and K. Lee. Porce: An efficient power off
Figure 7: Bit error rates after an interrupted erase opera- recovery scheme for flash memory. Journal of Systems Architecture,
54(10):935 – 943, 2008.
tion are well-behaved for SLC devices (a), but MLC behavior is [3] L. Grupp, A. Caulfield, J. Coburn, S. Swanson, E. Yaakobi, P. Siegel,
much more complex (b) and J. Wolf. Characterizing flash memory: Anomalies, observations,
and applications. In MICRO-42: 42nd Annual IEEE/ACM International
program error rate varies between 0.4% and 0% for power cut off Symposium on Microarchitecture, pages 24 –33, 12 2009.
intervals between 1747 and 2062 µs. The program error rate of A- [4] A. Gupta, Y. Kim, and B. Urgaonkar. DFTL: a flash translation layer
MLC16 also bounces for power cut off interval between 2640 µs employing demand-based selective caching of page-level address
mappings. In ASPLOS ’09: In Proceeding of the 14th international
and 2814 µs. These frequent variations cause the two vertical bands conference on Architectural support for programming languages and
in the graph. operating systems, pages 229–240, 2009.
[5] JEDEC. Preconditioning of Plastic Surface Mount Devices Prior to
Reliability Testing.
5. RELATED WORK http://www.jedec.org/sites/default/files/docs/22a113F.pdf.
[6] T.-S. Jung, Y.-J. Choi, K.-D. Suh, B.-H. Suh, J.-K. Kim, Y.-H. Lim,
Flash manufactures provide limited information about many as- Y.-N. Koh, J.-W. Park, K.-J. Lee, J.-H. Park, K.-T. Park, J.-R. Kim, J.-H.
pects of their chips, including their behavior under power loss. Our Yi, and H.-K. Lim. A 117-mm2 3.3-v only 128-mb multilevel nand flash
memory for mass storage applications. IEEE Journal of Solid-State
work is similar in spirit to Grupp et al. [3] in that we empirically Circuits, 31(11):1575 –1583, Nov. 1996.
quantify flash behavior in order to better understand the opportuni- [7] J. Kim, J. M. Kim, S. H. Noh, S. L. Min, and Y. Cho. A space efficient
ties and design challenges that it presents. In addition to chip level flash translation layer for compactflash systems. IEEE Transactions on
Consumer Electronics, 48:366–375, 2002.
performance, Boboila et. al [1] also explored device-level charac- [8] S.-K. Kim, J. Choi, D. Lee, S. H. Noh, and S. L. Min. Virtual framework
teristics including timing, endurance, and FTL designs. However, for testing the reliability of system software on embedded systems. In
neither of the above works focus on the power failure behavior of SAC ’07: Proceedings of the 2007 ACM symposium on Applied
computing, pages 1192–1196, New York, NY, USA, 2007. ACM.
flash memory chips. [9] K. Y. Lee, H. Kim, K.-G. Woo, Y. D. Chung, and M. H. Kim. Design
Many high-end SSDs have backup batteries or capacitors to en- and implementation of mlc nand flash-based dbms for mobile devices.
sure that operations complete even if power fails. Our results argue Journal of Systems and Software, 82(9):1447–1458, 2009.
[10] P. March. Power Loss Recovery (PLR) for cell phones using NAND
that these systems should provide power until the chip signals that Flash memory. http://www.numonyx.com/en-
the operation is finished rather than until the data appears to be cor- US/ResourceCenter/SoftwareArticles/Pages/PLRforNAND.aspx.
rect. Low-end SSDs and embedded systems, however, often do not [11] Numonyx. How to operate Power Loss Recovery for the Numonyx
65nm Flash Memory Devices.
contain backup power sources due to cost or space constraints, and www.numonyx.com/Documents/Applications_Operation.pdf.
these systems must be extremely careful to prevent data loss and/or [12] C. Park, P. Talawar, D. Won, M. Jung, J. Im, S. Kim, and Y. Choi. A
reduced reliability after a power failure. High Performance Controller for NAND Flash-based Solid State Disk
(NSSD). In NVSMW ’06: Non-Volatile Semiconductor Memory
Existing work on recovery from power failure aims to restore or Workshop, 2006., pages 17 –20, feb. 2006.
repair flash file systems using logs and other techniques [16, 4, 12, [13] S. Park, J. H. Yu, and S. Y. Ohm. Atomic write FTL for robust flash file
10] or page-level atomic writes [7, 2]. Numonyx [11] also pro- system. In ISCE ’05: Proceedings of the Ninth International Symposium
on Consumer Electronics, 2005., pages 155 – 160, June 2005.
vides guidelines to repeat interrupted operations after power fail- [14] F. M. Systems. Power failure prevention, recovery, and test.
ure. These designs may work for SLC chips, but the retroactive http://www.fortasa.com/Attachments/040_Power Failure Corruption
data corruption we observed for MLC chips suggests that they will Prevention.pdf.
[15] K. Takeuchi, T. Tanaka, and T. Tanzawa. A multipage cell architecture
be less effective there. for high-speed programming multilevel NAND flash memories. IEEE
Some commercial systems avoid retroactive corruption by treat- Journal of Solid-State Circuits, 33(8):1228 –1238, Aug. 1998.
ing a block as the basic unit of atomic writes [9, 13, 14]. This [16] D. Woodhouse. JFFS2: The Journalling Flash File System, version 2.
http://sources.redhat.com/jffs2/.
approach is inefficient for small writes since it requires re-writing