Abstract
Today’s data-intensive applications require access to multiple types of storage platforms, such as parallel file systems, distributed file systems, and in-memory data systems. In addition, many applications are demanding the processing of data streams. The goal is to develop mechanisms to integrate and hide the diversity of data sources from applications and improve data access performance.
In this work, we propose the implementation of a data container-based solution for data-intensive applications, which provides a high-level programming interface to different storage systems, commonly used in both HPC and HPDA environments. Our approach, DICE, hides the complexity of dealing with data from multiple sources, reducing the effort to access items transparently to end users and developers. The abstraction is based on a series of plugins that facilitate extension to other existing file systems.
This work was supported by the EU project “ASPIDE: Exascale Programming Models for Extreme Data Processing” under grant 801091. This research was partially supported by Madrid regional Government (Spain) under the grant “Convergencia Big Data-HPC: de los sensores a las Aplicaciones. (CABAHLA-CM)”. Finally, this work was partially supported by the Spanish Ministry of Science and Innovation Project “New Data Intensive Computing Methods for High-End and Edge Computing Platforms (DECIDE)” Ref. PID2019-107858GB-I00.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Source code and examples available at https://gitlab.arcos.inf.uc3m.es/pbrox/dice.git.
References
Abbasi, H., Lofstead, J., Zheng, F., Schwan, K., Wolf, M., Klasky, S.: Extending I/O through high performance data services. In: 2009 IEEE International Conference on Cluster Computing and Workshops, pp. 1–10. IEEE (2009)
Alves, M.M., de Assumpção Drummond, L.M.: A multivariate and quantitative model for predicting cross-application interference in virtual environments. J. Syst. Softw. 128, 150–163 (2017)
Caíno-Lores, S., Lapin, A., Carretero, J., Kropf, P.: Applying big data paradigms to a large scale scientific workflow: lessons learned and future directions. Future Gener. Comput. Syst. 110, 440–452 (2018)
Carretero, J., Jeannot, E., Pallez, G., Singh, D.E., Vidal, N.: Mapping and scheduling HPC applications for optimizing I/O. In: Ayguadé, E., Hwu, W.W., Badia, R.M., Hofstee, H.P. (eds.) ICS 2020: 2020 International Conference on Supercomputing, pp. 33:1–33:12. ACM, Barcelona, Spain (2020)
Carretero, J., Zomaya, A.Y., Jeannot, E.: Ultrascale Computing Systems. Institution of Engineering and Technology (2019)
Dorier, M., Antoniu, G., Ross, R., Kimpe, D., Ibrahim, S.: CALCioM: mitigating I/O interference in HPC systems through cross-application coordination. In: IPDPS - International Parallel and Distributed Processing Symposium, pp. 155–164. Phoenix, United States (2014)
Fedak, G., He, H., Cappello, F.: Bitdew: a data management and distribution service with multi-protocol file transfer and metadata abstraction. J. Network Comput. Appl. 32(5), 961–975 (2009). (Next Generation Content Networks)
Gropp, W., Thakur, R., Lusk, E.: Using MPI-2: Advanced Features of the Message Passing Interface. MIT press (1999)
Isaila, F., Carretero, J., Ross, R.B.: CLARISSE: a middleware for data-staging coordination and control on large-scale HPC platforms. In: IEEE/ACM 16th International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2016, Cartagena, Colombia, 16–19 May 2016, pp. 346–355. IEEE Computer Society (2016)
Liu, Q., et al.: Hello adios: the challenges and lessons of developing leadership class i/o frameworks. Concurr. Comput. Pract. Exper. 26(7), 1453–1473 (2014)
Llopis, P., Blas, J., Isaila, F., Carretero, J.: VIDAS: object-based virtualized data sharing for high performance storage I/O. In: Proceedings of the 4th ACM Workshop on Scientific Cloud Computing, Science Cloud 2013, pp. 37–44. Association for Computing Machinery, New York, NY, USA (2013)
Sousa, L., Kropf, P., Kuonen, P., Prodan, R., Trinh, A.T., Carretero, J.: A Roadmap for Research in Sustainable Ultrascale Systems. Carlos III University of Madrid (2018)
Thapaliya, S., Bangalore, P., Lofstead, J., Mohror, K., Moody, A.: Managing I/O interference in a shared burst buffer system. In: 2016 45th International Conference on Parallel Processing (ICPP), pp. 416–425 (2016)
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, pp. 15–28. NSDI (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Brox, P., Garcia-Blas, J., Singh, D.E., Carretero, J. (2022). DICE: Generic Data Abstraction for Enhancing the Convergence of HPC and Big Data. In: Gitler, I., Barrios Hernández, C.J., Meneses, E. (eds) High Performance Computing. CARLA 2021. Communications in Computer and Information Science, vol 1540. Springer, Cham. https://doi.org/10.1007/978-3-031-04209-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-04209-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04208-9
Online ISBN: 978-3-031-04209-6
eBook Packages: Computer ScienceComputer Science (R0)