Abstract
Recent HPC systems utilize parallel file systems such as GPFS and Lustre to cope with the huge demand of data-intensive applications. Although most of the HPC systems provide performance tuning tools on compute nodes, there is not enough chance to tune I/O activities on parallel file systems including high-speed interconnects among compute nodes and file systems. We propose an I/O performance optimization framework using log data of parallel file systems and interconnects in a holistic way for improving performance of HPC systems including I/O nodes and parallel file systems. We demonstrate our framework at the K computer with two I/O benchmarks for the original and the enhanced MPI-IO implementations. Its I/O analysis has revealed that I/O performance improvements achieved by the enhanced MPI-IO implementation are due to effective utilization of parallel file systems and interconnects among I/O nodes compared with the original MPI-IO implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ajima, Y., Inoue, T., Hiramoto, S., Takagi, Y., Shimizu, T.: The Tofu interconnect. IEEE Micro 32(1), 21–31 (2012)
Ching, A., Choudhary, A., keng Liao, W., Ward, L., Pundit, N.: Evaluating I/O characteristics and methods for storing structured scientific data. In: Proceedings 20th IEEE International Parallel and Distributed Processing Symposium, p. 49. IEEE Computer Society, April 2006
fluentd. https://www.fluentd.org/
Ida, K., Ohno, Y., Inoue, S., Minami, K.: Performance profiling and debugging on the k computer. Fujitsu Sci. Tech. J. 48(3), 331–339 (2012)
Kumar, M., et al.: Understanding and analyzing interconnect errors and network congestion on a large scale HPC system. In: 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2018, pp. 107–114. IEEE, June 2018
Kunkel, J.M., et al.: The SIOX architecture – coupling automatic monitoring and optimization of parallel I/O. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 245–260. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_16
Liu, Y., Gunasekaran, R., Ma, X., Vazhkudai, S.S.: Automatic identification of application I/O signatures from noisy server-side traces. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 2014), USENIX, pp. 213–228 (2014)
Liu, Y., Gunasekaran, R., Ma, X., Vazhkudai, S.S.: Server-side log data analytics for I/O workload characterization and coordination on large shared storage systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016. ACM (2016)
Lockwood, G.K., Wright, N.J., Snyder, S., Carns, P., Brown, G., Harms, K.: TOKIO on ClusterStor: connecting standard tools to enable holistic I/O performance analysis. In: 2018 Cray User Group Meeting (CUG) (2018)
Lustre. http://lustre.org/
Luu, H., Winslett, M., Gropp, W., Ross, R., Carns, P., Harms, K., Prabhat, M., Byna, S., Yao, Y.: A multiplatform study of I/O behavior on petascale supercomputers. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2015, pp. 33–44. ACM (2015)
Madireddy, S., et al.: Analysis and correlation of application I/O performance and system-wide I/O activity. In: Proceedings of the 2017 International Conference on Networking, Architecture, and Storage (NAS), pp. 1–10. IEEE (2017)
MPI Forum. https://www.mpi-forum.org/
Patel, T., Byna, S., Lockwood, G.K., Tiwari, D.: Revisiting I/O behavior in large-scale storage systems: The expected and the unexpected. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019, pp. 65:1–65:13. ACM (2019)
Post-K (Fugaku) Information. https://postk-web.r-ccs.riken.jp/index.html
Saini, S., Rappleye, J., Chang, J., Barker, D., Mehrotra, P., Biswas, R.: I/O performance characterization of Lustre and NASA applications on Pleiades. In: 19th International Conference on High Performance Computing (HiPC), pp. 1–10 (2012)
Sakai, K., Sumimoto, S., Kurokawa, M.: High-performance and highly reliable file system for the K computer. Fujitsu Sci. Tech. J. 48(3), 302–309 (2012)
Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST 2002, USENIX Association (2002)
Thakur, R., Gropp, W., Lusk, E.: On implementing MPI-IO portably and with high performance. In: Proceedings of the Sixth Workshop on Input/Output in Parallel and Distributed Systems, pp. 23–32 (1999)
Tsujita, Y., Hori, A., Ishikawa, Y.: Locality-aware process mapping for high performance collective MPI-IO on FEFS with Tofu interconnect. In: Proceedings of the 21th European MPI Users’ Group Meeting, EuroMPI/ASIA 2014, pp. 157:157–157:162. ACM (2014). Challenges in Data-Centric Computing
Tsujita, Y., Hori, A., Kameyama, T., Uno, A., Shoji, F., Ishikawa, Y.: Improving collective MPI-IO using topology-aware stepwise data aggregation with I/O throttling. In: Proceedings of HPC Asia 2018: International Conference on High Performance Computing in Asia-Pacific Region, 28–31 January 2018, pp. 12–23. ACM (2018)
Uselton, A., Wright, N.: A file system utilization metric for I/O characterization. In: 2013 Cray User Group Meeting (2013)
Xie, B., et al.: Characterizing output bottlenecks in a supercomputer. In: Proceedings of 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 1–11. IEEE (2012)
Xu, C., et al.: LIOProf: exposing Lustre file system behavior for I/O middleware. In: 2016 Cray User Group Meeting, May 2016
Yang, B., et al.: End-to-end I/O monitoring on a leading supercomputer. In: Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2019, pp. 379–394. USENIX (2019)
Zimmer, C., Gupta, S., Larrea, V.G.V.: Finally, a way to measure frontend I/O performance. In: 2016 Cray User Group Meeting (CUG) (2016)
Acknowledgment
This research used computational resources of the K computer provided by the RIKEN Center for Computational Science.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Tsujita, Y., Furutani, Y., Hida, H., Yamamoto, K., Uno, A. (2020). Characterizing I/O Optimization Effect Through Holistic Log Data Analysis of Parallel File Systems and Interconnects. In: Jagode, H., Anzt, H., Juckeland, G., Ltaief, H. (eds) High Performance Computing. ISC High Performance 2020. Lecture Notes in Computer Science(), vol 12321. Springer, Cham. https://doi.org/10.1007/978-3-030-59851-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-59851-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59850-1
Online ISBN: 978-3-030-59851-8
eBook Packages: Computer ScienceComputer Science (R0)