Skip to main content

Predicting Hard Disk Failures in Data Centers Using Temporal Convolutional Neural Networks

  • Conference paper
  • First Online:
Euro-Par 2020: Parallel Processing Workshops (Euro-Par 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12480))

Included in the following conference series:

Abstract

In modern data centers, storage system failures are major contributors to downtimes and maintenance costs. Predicting these failures by collecting measurements from disks and analyzing them with machine learning techniques can effectively reduce their impact, enabling timely maintenance. While there is a vast literature on this subject, most approaches attempt to predict hard disk failures using either classic machine learning solutions, such as Random Forests (RFs) or deep Recurrent Neural Networks (RNNs). In this work, we address hard disk failure prediction using Temporal Convolutional Networks (TCNs), a novel type of deep neural network for time series analysis. Using a real-world dataset, we show that TCNs outperform both RFs and RNNs. Specifically, we can improve the Fault Detection Rate (FDR) of \(\approx \)7.5% (FDR = 89.1%) compared to the state-of-the-art, while simultaneously reducing the False Alarm Rate (FAR = 0.052%). Moreover, we explore the network architecture design space showing that TCNs are consistently superior to RNNs for a given model size and complexity and that even relatively small TCNs can reach satisfactory performance. All the codes to reproduce the results presented in this paper are available at https://github.com/ABurrello/tcn-hard-disk-failure-prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    For this input window size, the TCN achieves a FDR similar to the RF, but the following experiments show that the TCN FDR improves for longer inputs.

References

  1. Aggarwal, K., Atan, O., Farahat, A.K., Zhang, C., Ristovski, K., Gupta, C.: Two birds with one network: unifying failure event prediction and time-to-failure modeling. CoRR abs/1812.0 (2018). http://arxiv.org/abs/1812.07142

  2. Anantharaman, P., Qiao, M., Jadav, D.: Large scale predictive analytics for hard disk remaining useful life estimation. In: 2018 IEEE International Congress on Big Data (BigData Congress), pp. 251–254 (2018). https://doi.org/10.1109/BigDataCongress.2018.00044

  3. Apiletti, D., et al.: iSTEP, an integrated self-tuning engine for predictive maintenance in industry 4.0. In: 2018 IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications, pp. 924–931 (2018). https://doi.org/10.1109/BDCloud.2018.00136

  4. Backblaze: Backblaze dataset (2019). https://www.backblaze.com/b2/hard-drive-test-data.html

  5. Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)

  6. Burrello, A., Conti, F., Garofalo, A., Rossi, D., Benini, L.: Work-in-Progress: DORY: lightweight memory hierarchy management for deep NN inference on IoT Endnodes. In: 2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pp. 1–2 (2019). https://doi.org/10.1145/3349567.3351726

  7. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  8. Chen, P.M., Lee, E.K., Gibson, G.A., Katz, R.H., Patterson, D.A.: Raid: high-performance, reliable secondary storage. ACM Comput. Surv. (CSUR) 26(2), 145–185 (1994)

    Article  Google Scholar 

  9. Garofalo, A., Rusci, M., Conti, F., Rossi, D., Benini, L.: PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors. Philos. Trans. Royal Soc. A Math. Phys. Eng. Sci. 378(2164), 20190155 (2020). https://doi.org/10.1098/rsta.2019.0155

    Article  MathSciNet  Google Scholar 

  10. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017). http://arxiv.org/abs/1704.04861

  11. Pagliari, D.J., Poncino, M., Macii, E.: Energy-efficient digital processing via approximate computing. In: Bombieri, N., Poncino, M., Pravadelli, G. (eds.) Smart Systems Integration and Simulation, pp. 55–89. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-27392-1_4

    Chapter  Google Scholar 

  12. Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1003–1012 (2016)

    Google Scholar 

  13. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  14. Lima, F.D.S., Pereira, F.L.F., Leite, L.G.M., Gomes, J.P.P., Machado, J.C.: Remaining useful life estimation of hard disk drives based on deep neural networks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2018). https://doi.org/10.1109/IJCNN.2018.8489120

  15. Manousakis, I., Sankar, S., McKnight, G., Nguyen, T.D., Bianchini, R.: Environmental conditions and disk reliability in free-cooled datacenters. In: 14th USENIX Conference on File and Storage Technologies (FAST 16), pp. 53–65 (2016)

    Google Scholar 

  16. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

    Google Scholar 

  17. Schroeder, B., Gibson, G.A.: Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you? In: Proceedings of the 5th USENIX Conference on File and Storage Technologies, p. 1-es. FAST 2007 (2007)

    Google Scholar 

  18. Xiao, J., Xiong, Z., Wu, S., Yi, Y., Jin, H., Hu, K.: Disk failure prediction in data centers via online learning. In: Proceedings of the 47th International Conference on Parallel Processing. ICPP 2018 (2018). https://doi.org/10.1145/3225058.3225106

  19. Xu, C., Wang, G., Liu, X., Guo, D., Liu, T.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65(11), 3502–3508 (2016). https://doi.org/10.1109/TC.2016.2538237

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work has been partially supported by the H2020 project IoTwins (g.a. 857191).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessio Burrello .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Burrello, A., Pagliari, D.J., Bartolini, A., Benini, L., Macii, E., Poncino, M. (2021). Predicting Hard Disk Failures in Data Centers Using Temporal Convolutional Neural Networks. In: Balis, B., et al. Euro-Par 2020: Parallel Processing Workshops. Euro-Par 2020. Lecture Notes in Computer Science(), vol 12480. Springer, Cham. https://doi.org/10.1007/978-3-030-71593-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-71593-9_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71592-2

  • Online ISBN: 978-3-030-71593-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy