Abstract
Array-based storage system employs a renewed interest in the featured applications for their easy maintenance in the context of large volume data. However, the conventional schemes of array storages suffer from lack of scalability for dynamic data as they need to reallocate the whole array if the size of the array limit overflows. Therefore, the conventional array storage is difficult to use when the data grows overtime. To maintain such velocity of the future data, the array storage must be dynamic which can expand the size according to the growing nature of the data. Moreover, the address space of the array-based storage system overflows quickly if the length of dimension and the number of dimension is large. The index array models render dynamic storage system, but retrieval from index array model shows poor performance than the conventional schemes. In this paper, we demonstrate an index array-based scalable array storage that maintains the growing future data during runtime. The key idea is to convert an n-dimensional array into 2 dimensions and organize the array elements into ordered collections called segments. These segments divide the large allocation size into smaller one that delays the address space overflow. The retrieval performance of the proposed scheme outperforms other existing array systems. Since it converts an n-dimensional array into 2 dimensions, and it needs 2 indices only to maintain scalability. Therefore, it reduces the index overhead as well. The scheme also shows improved storage management performance than other approaches.













Similar content being viewed by others
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th $\{$USENIX$\}$ Symposium on Operating Systems Design and Implementation ($\{$OSDI$\}$ 16), pp 265–283
Ahsan SMM, Hasan KA (2011) An implementation scheme for multidimensional extendable array operations and its evaluation. In: International Conference on Informatics Engineering and Information Science. Springer, pp 136–150
Baumann P, Dehmel A, Furtado P, Ritsch R, Widmann N (1998) The multidimensional database system RasDaMan. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp 575–577
Baumann P, Dumitru AM, Merticariu V (2013) The array database that is not a database: file based array query answering in RasDaMan. In: International Symposium on Spatial and Temporal Databases. Springer, pp 478–483
Blanas S, Wu K, Byna S, Dong B, Shoshani A (2014) Parallel data analysis directly on scientific file formats. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp 385–396
Brown PG (2010) Overview of SCiDB: large scale array storage, processing and analysis. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp 963–968
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM TOCS 26(2):1–26
Cheng Y, Qin C, Rusu F (2012) Glade: big data analytics made easy. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp 697–700
Dumitru A, Merticariu V, Baumann P (2014) Exploring cloud opportunities from an array database perspective. In: Proceedings of Workshop on Data Analytics in the Cloud, pp 1–4
Dumitru AM, Merticariu V, Baumann P (2016) Array database scalability: intercontinental queries on petabyte datasets. In: Proceedings of the 28th International Conference on Scientific and Statistical Database Management, pp 1–5
Folk M, Heber G, Koziol Q, Pourmal E, Robinson D (2011) An overview of the hdf5 technology suite and its applications. In: Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, pp 36–47
Franzenburg A (2003) Distributed storage array. US patent app. 10/071,406
Furtado P, Baumann P (1999) Storage of multidimensional arrays based on arbitrary tiling. In: Proceedings 15th International Conference on Data Engineering (Cat. No. 99CB36337). IEEE, pp 480–489
Grolinger K, Higashino WA, Tiwari A, Capretz MA (2013) Data management in cloud environments: Nosql and newsql data stores. J Cloud Comput Adv Syst Appl 2(1):22
Hasan KA, Shaikh MAH (2017) Efficient representation of higher-dimensional arrays by dimension transformations. J Supercomput 73(6):2801–2822
Hasan KA, Tsuji T, Higuchi K (2007) An efficient implementation for MOLAP basic data structure and its evaluation. In: International Conference on Database Systems for Advanced Applications. Springer, pp 288–299
He J, Wu Y, Dong Y, Zhang Y, Zhou W (2016) Dynamic multidimensional index for large-scale cloud data. J Cloud Comput 5(1):10
Idreos S, Groffen F, Nes N, Manegold S, Mullender S, Kersten M (2012) Monetdb: two decades of research in column-oriented database. IEEE Data Eng Bull 35:40–45
McKinley KS, Carr S, Tseng CW (1996) Improving data locality with loop transformations. ACM TOPLAS 18(4):424–453
Nimako G, Otoo EJ, Ohene-Kwofie D (2013) Pexta: a parallel chunked extendible dense array i/o for global array (ga). In: 2013 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 1–8
Omar MT, Hasan KA (2016) A scalable storage system for structured data based on higher order index array. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, pp 247–252
Omar MT, Hasan KA (2016) Towards an efficient maintenance of address space overflow for array based storage system. In: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). IEEE, pp 133–138
Otoo EJ, Merrett T (1983) A storage scheme for extendible arrays. Computing 31(1):1–9
Otoo EJ, Nimako G, Ohene-Kwofie D (2013) Chunked extendible dense arrays for scientific data storage. Parallel Comput 39(12):802–818
Otoo EJ, Rotem D, Seshadri S (2007) Optimal chunking of large multidimensional arrays for data warehousing. In: Proceedings of the ACM Tenth International Workshop on Data Warehousing and OLAP, pp 25–32
Papadopoulos S, Datta K, Madden S, Mattson T (2016) The TileDB array data storage manager. Proc VLDB Endow 10(4):349–360
Reed DA, Dongarra J (2015) Exascale computing and big data. Commun ACM 58(7):56–68
Rew R, Davis G (1990) Netcdf: an interface for scientific data access. IEEE Comput Graph Appl 10(4):76–82
Rotem D, Zhao JL (1996) Extendible arrays for statistical databases and OLAP applications. In: Proceedings of 8th International Conference on Scientific and Statistical Data Base Management. IEEE, pp 108–117
Rusu F, Cheng Y (2013) A survey on array storage, query languages, and systems. arXiv preprint arXiv:1302.0103
Sarawagi S, Stonebraker M (1994) Efficient organization of large multidimensional arrays. In: Proceedings of 1994 IEEE 10th International Conference on Data Engineering. IEEE, pp 328–336
Shacham H, Page M, Pfaff B, Goh EJ, Modadugu N, Boneh D (2004) On the effectiveness of address-space randomization. In: Proceedings of the 11th ACM Conference on Computer and Communications Security, pp 298–307
Shaikh MAH, Hasan KA (2015) Efficient storage scheme for n-dimensional sparse array: Gcrs/gccs. In: 2015 International Conference on High Performance Computing & Simulation (HPCS). IEEE, pp 137–142
Shimada T, Tsuji T, Higuchi K (2008) A storage scheme for multidimensional data alleviating dimension dependency. In: 2008 Third International Conference on Digital Information Management. IEEE, pp 662–668
Soroush E, Balazinska M, Wang D (2011) Arraystore: a storage manager for complex parallel array processing. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp 253–264
Stonebraker M, Brown P, Poliakov A, Raman S (2011) The architecture of scidb. In: International Conference on Scientific and Statistical Database Management. Springer, pp 1–16
Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 374–383
Wang Y, Su Y, Agrawal G (2015) A novel approach for approximate aggregations over arrays. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management, pp 1–12
Xing H, Agrawal G (2018) Compass: compact array storage with value index. In: Proceedings of the 30th International Conference on Scientific and Statistical Database Management, pp 1–12
Xing H, Floratos S, Blanas S, Byna S, Prabhat M, Wu K, Brown P (2018) Arraybridge: interweaving declarative array processing in scidb with imperative hdf5-based programs. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, pp 977–988
Zhang Y, Kersten M, Manegold S (2013) SciQL: array data processing inside an RDBMS. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp 1049–1052
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Omar, M.T., Azharul Hasan, K.M. & Tsuji, T. A scalable array storage for efficient maintenance of future data. J Supercomput 77, 6540–6565 (2021). https://doi.org/10.1007/s11227-020-03554-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03554-x