Abstract
With the development of the big data, The traditional file system can no longer meet the demand of High Performance Computing and Big Data. Parallel file systems are getting more and more popular in High Performance Computing. As a typical parallel file system, PVFS has been widely used in big data computing area in recent years. However with the increasing of computing scale, there exist the needs to dynamic extend data nodes, which PVFS does not support at present. This paper put forward a dynamic data node extension method as well as the subsequent data migration algorithm based on PVFS. The algorithm first adds a new data node automatically and transparently. After that, the algorithm finds out the most loaded data node in the original file system using a new load evaluation method and transfer the data into the newly added data node to mitigate the imbalance of the system. The experimental results show that our dynamic data node extension method could improve the performance of PVFS and reduce the probability of hot point effectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Weil, S.A., Brandt, S.A., Miller, E.L, et al.: Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, pp. 307–320. USENIX Association (2006)
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. ACM SIGOPS Operating Syst. Rev. 37(5), 29–43 (2003)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Haddad, I.F.: PVFS: A parallel virtual file system for linux clusters. Linux J. 2000(80es), 5 (2000)
Kuhn, M., Kunkel, J.M., Ludwig, T.: Dynamic file system semantics to enable metadata optimizations in PVFS. Concurrency Comput. Pract. Experience 21(14), 1775–1788 (2009)
Tantisiriroj, W., Son, S.W., Patil, S., et al.: On the duality of data-intensive file system design: reconciling HDFS and PVFS. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p. 67. ACM (2011)
Wu, J., Wyckoff, P., Panda, D.: PVFS over InfiniBand: design and performance evaluation. In: Proceedings of the 2003 International Conference on Parallel Processing, pp. 125–132. IEEE (2003)
Pfister, G.F.: An introduction to the infiniband architecture. In: High Performance Mass Storage and Parallel I/O, chap. 42, pp. 617–632 (2001)
Hsiao, H.C., Chung, H.Y., Shen, H., et al.: Load rebalancing for distributed file systems in clouds. IEEE Trans. Parallel Distrib. Syst. 24(5), 951–962 (2013)
Wang, K., Zhou, X., Li, T., et al.: Optimizing load balancing and data-locality with data-aware scheduling. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 119–128. IEEE (2014)
Guoying, L., et al.: Data consistency for self-acting load balancing of parallel file system. In: Park, J.H.(James), et al. (eds.) Information Technology Convergence, Secure and Trust Computing, and Data Management. LNEE, vol. 180, pp. 135–143. Springer, Netherlands (2012)
Kobayashi, K., Mikami, S., Kimura H., et al.: The gfarm file system on compute clouds. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pp. 1034–1041. IEEE (2011)
Dong, B., Li, X., Xiao, L., et al.: Self-acting load balancing with parallel sub file migration for parallel file system. In: 2010 Third International Joint Conference on Computational Science and Optimization (CSO), vol. 2, pp. 317–321. IEEE (2010)
Jenkins, J., Zou, X., Tang, H., et al.: Parallel data layout optimization of scientific data through access-driven replication. Technical report-Not held in TRLN member libraries (2014)
Soares, T.S., Dantas, M.A.R., de Macedo, D.D.J., et al.: A data management in a private cloud storage environment utilizing high performance distributed file systems. In: 2013 IEEE 22nd International Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), pp. 158–163. IEEE (2013)
Huo, Y., Yang, K., Liang, H., et al.: Summary of parallel file system research. J. Chin. Comput. Syst. 29(9), 1631–1636 (2008)
Zhang, C., Yin, J., et al.: Dynamic load balancing algorithm of distributed file system. J. Chin. Comput. Syst. 32(7), 1424–1426 (2011)
Zhu, Y., Li, B., Sun, T., et al.: Parallel computing system scalability. Comput. Eng. Appl. 47(21), 47–49 (2011)
Acknowledgments
We would like to thank the anonymous reviewers for helping us refine this paper. Their constructive comments and suggestions are very helpful. This paper is partly founded by National Science and Technology Major Project of the Ministry of Science and Technology of China under grant 2011ZX05035-004-004HZ.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, X., Tang, J., Gao, H., Wu, G. (2015). A Dynamic Extension and Data Migration Method Based on PVFS. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9529. Springer, Cham. https://doi.org/10.1007/978-3-319-27122-4_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-27122-4_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27121-7
Online ISBN: 978-3-319-27122-4
eBook Packages: Computer ScienceComputer Science (R0)