Abstract
Nowadays, Big Data management has become a key basis for innovation, productivity growth, and competition. The correlated exploitation of data of this magnitude remains primordial to discover valuable insights and support decision making for domains of major interest. Furthermore, despite the complex aspects of Big Data environments, users are usually looking for a unified and appropriate view of this huge and heterogeneous data, to support the extraction of reliable and consistent knowledge. Thus, Big Data integration mechanisms must be considered to provide a uniform query interface, to mediate across large datasets and provide data scientists with a consistent integrated view suitable for analytical exploitations. Thus, this paper presents a semantic-based Big Data integration framework that relies on large-scale ontology matching and probabilistic-logical based assessment strategies. This framework applies optimization mechanisms and leverages parallel-computing paradigms (Hadoop and MapReduce) using commodity computational resources, to efficiently address the Big Data challenges and aspects. Several experiments were conducted and have proven the efficiency of this framework in terms of accuracy, performance, and scalability.


















Similar content being viewed by others
References
Abbes, H., Gargouri, F.: Mongodb-based modular ontology building for big data integration. J. Data Seman. 7(1), 1–27 (2018)
Alasoud, A., Haarslev, V., Shiri, N.: A hybrid approach for ontology integration. In: Proceedings of the VLDB Workshop on Ontologies-based techniques for DataBases and Information Systems (ODBIS), Trondheim, Norway (2005)
Algergawy, A., Massmann, S., Rahm, E.: A clustering-based approach for large-scale ontology matching. In: Proceedings of the East European Conference on Advances in Databases and Information Systems, Springer, pp. 415–428 (2011)
Algergawy, A., Babalou, S., Kargar, M.J., Davarpanah, S.H.: Seecont: A new seeding-based clustering approach for ontology matching. In: Proceedings of the East European Conference on Advances in Databases and Information Systems, Springer, pp 245–258 (2015)
Amin, M.B., Khan, W.A., Lee, S., Kang, B.H.: Performance-based ontology matching. Appl. Intell. 43(2), 356–385 (2015)
Ba, M., Diallo, G.: Large-scale biomedical ontology matching with servomap. IRBM 34(1), 56–59 (2013)
Bansal, S.K., Kagemann, S.: Integrating big data: a semantic extract-transform-load framework. Computer 48(3), 42–50 (2015)
Bello-Orgaz, G., Jung, J.J., Camacho, D.: Social big data: recent achievements and new challenges. Inf. Fus. 28, 45–59 (2016)
Brandes, U., Borgatti, S.P., Freeman, L.C.: Maintaining the duality of closeness and betweenness centrality. Soc. Netw. 44, 153–159 (2016)
Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: the dl-lite family. J. Autom. Reason. 39(3), 385–429 (2007)
Castano, S., Ferrara, A., Montanelli, S.: Matching techniques for data integration and exploration: from databases to big data. In: A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years, Springer, pp 61–76 (2018)
Cerbah, F.: Learning ontologies with deep class hierarchies by mining the content of relational databases. In: Advances in knowledge discovery and management, Springer, pp 271–286 (2010)
Cheatham, M., Pesquita, C.: Semantic data integration. In: Handbook of Big Data Technologies, Springer, pp 263–305 (2017)
Chen, J., Chen, Y., Du, X., Li, C., Lu, J., Zhao, S., Zhou, X.: Big data challenge: a data management perspective. Front. Comput. Sci. 7(2), 157–164 (2013)
Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mobile Netw. Appl. 19(2), 171–209 (2014)
Cruz, I.F., Xiao, H.: The role of ontologies in data integration. Eng. Intell. Syst. Electr. Eng. Commun. 13(4), 245 (2005)
Csató, L.: Measuring centrality by a generalization of degree. Central Eur. J. Oper. Res. 25(4), 771–790 (2017)
Curé, O., Lamolle, M., Duc, C.L.: Ontology based data integration over document and column family oriented nosql. arXiv preprint arXiv:13072603 (2013)
Daraio, C., Lenzerini, M., Leporelli, C., Moed, H.F., Naggar, P., Bonaccorsi, A., Bartolucci, A.: Data integration for research and innovation policy: an ontology-based data management approach. Scientometrics 106(2), 857–871 (2016a)
Daraio, C., Lenzerini, M., Leporelli, C., Naggar, P., Bonaccorsi, A., Bartolucci, A.: The advantages of an ontology-based data management approach: openness, interoperability and data quality. Scientometrics 108(1), 441–455 (2016b)
David, J., Guillet, F., Briand, H.: Matching directories and owl ontologies with aroma. In: Proceedings of the 15th ACM international conference on Information and knowledge management, ACM, pp 830–831 (2006)
Djeddi, W.E., Khadir, M.T.: A novel approach using context-based measure for matching large scale ontologies. In: International Conference on Data Warehousing and Knowledge Discovery, Springer, pp 320–331 (2014)
Do, H.H., Rahm, E.: Matching large schemas: approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)
Ehrig, M., Staab, S.: Qom–quick ontology mapping. In: Proceedings of the International Semantic Web Conference, Springer, pp 683–697 (2004)
El Idrissi Esserhrouchni, O., Frikh, B., Ouhbi, B.: Learning non-taxonomic relationships of financial ontology. In: Proceedings of the International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, SCITEPRESS-Science and Technology Publications, Lda, pp 479–489 (2015)
El Idrissi, Esserhrouchni O., Frikh, B., Ouhbi, B., Ibrahim, I.K.: Learning domain taxonomies: the taxoline approach. Int. J. Web Inf. Syst. 13(3), 281–301 (2017)
Emani, C.K., Cullot, N., Nicolle, C.: Understandable big data: a survey. Comput. Sci. Rev. 17, 70–81 (2015)
Erraissi, A., Belangour, A.: Capturing hadoop storage big data layer meta-concepts. In: Proceedings of the International Conference on Advanced Intelligent Systems for Sustainable Development, Springer, pp 413–421 (2018)
Essayeh, A., Abed, M.: Towards ontology matching based system through terminological, structural and semantic level. Procedia Comput. Sci. 60, 403–412 (2015)
Euzenat, J., Shvaiko, P., et al.: Ontology Matching, vol. 18. Springer, New York (2007)
Fürber, C., Hepp, M.: Towards a vocabulary for data quality management in semantic web architectures. In: Proceedings of the 1st International Workshop on Linked Web Data Management, ACM, pp 1–8 (2011)
Gao, C., Wei, D., Hu, Y., Mahadevan, S., Deng, Y.: A modified evidential methodology of identifying influential nodes in weighted networks. Phys. A Stat. Mech. Appl. 392(21), 5490–5500 (2013)
García, MdMR, García-Nieto, J., Aldana-Montes, J.F.: An ontology-based data integration approach for web analytics in e-commerce. Expert Syst. Appl. 63, 20–34 (2016)
George, L.: HBase: The Definitive Guide: Random Access to Your Planet-size Data. O’Reilly Media Inc, Newton (2011)
Gross, A., Hartung, M., Kirsten, T., Rahm, E.: On matching large life science ontologies in parallel. In: Proceedings of the International Conference on Data Integration in the Life Sciences, Springer, pp 35–49 (2010)
Hu, W., Qu, Y., Cheng, G.: Matching large ontologies: a divide-and-conquer approach. Data Knowl. Eng. 67(1), 140–160 (2008)
Hui, J., Li, L., Zhang, Z.: Integration of big data: a survey. In: Proceedings of the International Conference of Pioneering Computer Scientists, Engineers and Educators. pp. 101–121. Springer (2018)
Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: Wait-free coordination for internet-scale systems. In: Proceedings of the USENIX Annual Technical Conference, Boston, vol 8 (2010)
Jiménez-Ruiz, E., Grau ,B.C.: Logmap: Logic-based and scalable ontology matching. In: Proceedings of the International Semantic Web Conference, Springer, pp 273–288 (2011)
Jirkovskỳ, V., Obitko, M.: Semantic heterogeneity reduction for big data in industrial automation. In: Proceedings of the ITAT (2014)
Kiran, V., Vijayakumar, R.: Ontology based data integration of nosql datastores. In: Proceedings of the Industrial and Information Systems (ICIIS), 2014 9th International Conference on, IEEE, pp 1–6 (2014)
Klein, D.: Centrality measure in graphs. J. Math. Chem. 47(4), 1209–1223 (2010)
Krishnan, K.: Data Warehousing in the Age of Big Data. Newnes, Oxford (2013)
Landherr, A., Friedl, B., Heidemann, J.: A critical review of centrality measures in social networks. Bus. Inf. Syst. Eng. 2(6), 371–385 (2010)
Lenzerini, M.: Data integration: A theoretical perspective. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM, pp 233–246 (2002)
Li, L., Wei, Y., Tian, F.: A framework for ontology-based top-k global schema generation. J. Data Seman. 6(1), 31–53 (2017)
Liaw, S.T., Rahimi, A., Ray, P., Taggart, J., Dennis, S., de Lusignan, S., Jalaludin, B., Yeo, A., Talaei-Khoei, A.: Towards an ontology for data quality in integrated chronic disease management: a realist review of the literature. Int. J. Med. Inf. 82(1), 10–24 (2013)
Liu, J., Xiong, Q., Shi, W., Shi, X., Wang, K.: Evaluating the importance of nodes in complex networks. Phys. A Stat. Mech. Appl. 452, 209–219 (2016)
Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intell. Syst. 16(2), 72–79 (2001)
Mailavaram, A., Rani, B.P.: Big data: scalability storage. In: Innovations in Computer Science and Engineering, Springer, pp 473–481 (2019)
Mallede, W.Y., Marir, F., Vassilev, V.T.: Algorithms for mapping rdb schema to rdf for facilitating access to deep web. In: Proceedings of the First International Conference on Building and Exploring Web Based Environments, pp 32–41 (2013)
Malucelli, A., da Costa Oliveira, E.: Ontology-services to facilitate agents’ interoperability. In: Proceedings of the Pacific Rim International Workshop on Multi-Agents, Springer, pp 170–181 (2003)
Marsden, P.V.: Network centrality, measures of, 2nd edn. International Encyclopedia of the Social and Behavioral Sciences (2015)
Mena, E., Illarramendi, A., Kashyap, V., Sheth, A.P.: Observer: an approach for query processing in global information systems based on interoperation across pre-existing ontologies. Distrib. Parallel Databases 8(2), 223–271 (2000)
Mezghani, E., Exposito, E., Drira, K., Da Silveira, M., Pruski, C.: A semantic big data platform for integrating heterogeneous wearable data in healthcare. J. Med. Syst. 39(12), 185 (2015)
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Moawed, S., Algergawy, A., Sarhan, A., Eldosouky, A., Saake, G.: A latent semantic indexing-based approach to determine similar clusters in large-scale schema matching. In: New Trends in Databases and Information Systems, Springer, pp 267–276 (2014)
Nadal, S., Romero, O., Abelló, A., Vassiliadis, P., Vansummeren, S.: An integration-oriented ontology to govern evolution in big data ecosystems. Inf. Syst. 79, 3–19 (2019)
Noessner, J., Niepert, M., Stuckenschmidt, H.: Rockit: Exploiting parallelism and symmetry for map inference in statistical relational models. In: Proceedings of the AAAI Workshop: Statistical Relational Artificial Intelligence (2013)
Oldham, S., Fulcher, B., Parkes, L., Arnatkeviciute, A., Suo, C., Fornito, A.: Consistency and differences between centrality measures across distinct classes of networks. arXiv preprint arXiv:180502375 (2018)
Otero-Cerdeira, L., Rodríguez-Martínez, F.J., Gómez-Rodríguez, A.: Ontology matching: a literature review. Expert Syst. Appl. 42(2), 949–971 (2015)
Peukert, E., Berthold, H., Rahm, E.: Rewrite techniques for performance optimization of schema matching processes. In: Proceedings of the 13th International Conference on Extending Database Technology, ACM, pp 453–464 (2010)
Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. In: Journal on data semantics X, Springer, pp 133–173 (2008)
Putnik, G., Sluga, A., ElMaraghy, H., Teti, R., Koren, Y., Tolio, T., Hon, B.: Scalability in manufacturing systems design and operation: state-of-the-art and future developments roadmap. CIRP Ann. 62(2), 751–774 (2013)
Rahm, E.: Towards large-scale schema and ontology matching. In: Schema matching and mapping, Springer, pp 3–27 (2011)
Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1–2), 107–136 (2006)
Ruflin, N., Burkhart, H., Rizzotti, S.: Social-data storage-systems. In: Databases and social networks, ACM, pp 7–12 (2011)
Sakr, S.: Big Data 2.0 Processing Systems: A Survey. Springer, New York (2016)
Sandhya, N., Sekar, M.R.: Analysis of variant approaches for initial centroid selection in k-means clustering algorithm. In: Smart Computing and Informatics, Springer, pp 109–121 (2018)
Santipantakis, G., Kotis, K., Vouros, G.A.: Obdair: ontology-based distributed framework for accessing, integrating and reasoning with data in disparate data sources. Expert Syst. Appl. 90, 464–483 (2017)
Schneider, T., Hashemi, A., Bennett, M., Brady, M., Casanave, C., Graves, H., Gruninger, M., Guarino, N., Levenchuk, A., Lucier, E., et al.: Ontology for big systems: the ontology summit 2012 communique. Appl. Ontol. 7(3), 357–371 (2012)
Schuhmacher, M., Ponzetto, S.P.: Ranking entities in a large semantic network. In: Proceedings of the European Semantic Web Conference, Springer, pp 254–258 (2014)
Seddiqui, M.H., Aono, M.: An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size. Web Seman. 7(4), 344–356 (2009)
Sezer, O.B., Dogdu, E., Ozbayoglu, M., Onal, A.: An extended iot framework with semantics, big data, and analytics. In: Proceedings of the 2016 IEEE International Conference on Big Data (Big Data), IEEE, pp 1849–1856 (2016)
Shvachko, K., Kuang, H., Radia, S., Chansler, R., et al.: The hadoop distributed file system. MSST 10, 1–10 (2010)
Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25(1), 158–176 (2013)
Siddiqa, A., Hashem, I.A.T., Yaqoob, I., Marjani, M., Shamshirband, S., Gani, A., Nasaruddin, F.: A survey of big data management: taxonomy and state-of-the-art. J. Netw. Comput. Appl. 71, 151–166 (2016)
Siddiqa, A., Karim, A., Gani, A.: Big data storage technologies: a survey. Front. Inf. Technol. Electron. Eng. 18(8), 1040–1070 (2017)
Song, F., Zacharewicz, G., Chen, D.: An analytic aggregation-based ontology alignment approach with multiple matchers. In: Advanced Techniques for Knowledge Engineering and Innovative Applications, Springer, pp 143–159 (2013)
Steyskal, S., Polleres, A.: Mix’n’match: An alternative approach for combining ontology matchers. In: Proceedings of the OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”, Springer, pp 555–563 (2013)
Strohbach, M., Daubert, J., Ravkin, H., Lischka, M.: Big data storage. In: New horizons for a data-driven economy, pp. 119–141. Springer, Cham (2016)
Sure, Y., Staab, S., Studer, R.: Methodology for development and employment of ontology based knowledge management applications. ACM Sigmod. Record. 31(4), 18–23 (2002)
Taylor, R.C.: An overview of the hadoop/mapreduce/hbase framework and its current applications in bioinformatics. In: BMC bioinformatics, BioMed Central, vol 11, p S1 (2010)
Thorsby, J., Stowers, G.N., Wolslegel, K., Tumbuan, E.: Understanding the content and features of open data portals in american cities. Government Inf. Q. 34(1), 53–61 (2017)
Uzdanaviciute, V., Butleris, R.: Ontology-based foundations for data integration. In: Proceedings of the BUSTECH The First International Conference on Business Intelligence and Technology, Citeseer, pp 34–39 (2011)
Van Hage, W.R., Katrenko, S., Schreiber, G.: A method to combine linguistic ontology-mapping techniques. In: Proceedings of the International Semantic Web Conference, Springer, pp 732–744 (2005)
Vandecasteele, A., Napoli, A.: Spatial ontologies for detecting abnormal maritime behaviour. In: Proceedings of the OCEANS 2012 MTS/IEEE Yeosu Conference: The Living Ocean and Coast-Diversity of Resources and Sustainable Activities, IEEE-Institute of Electrical and Electronics Engineers, pp 7–pages (2012)
Wand, Y., Wang, R.Y.: Anchoring data quality dimensions in ontological foundations. Commun. ACM 39(11), 86–95 (1996)
Wang, P., Zhou, Y., Xu, B.: Matching large ontologies based on reduction anchors. In: Proceedings of the IJCAI, pp 2343–2348 (2011)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media Inc., Newton (2012)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 133–138 (1994)
Zamboulis, L., Poulovassilis, A., Wang, J.: Ontology-assisted data transformation and integration. In: Proceedings of the ODBIS, pp 29–36 (2008)
Zhou, K., Fu, C., Yang, S.: Big data driven smart energy management: from big data to big insights. Renew. Sustain. Energy Rev. 56, 215–225 (2016)
Zhou, L.: Ontology learning: state of the art and open issues. Inf. Technol. Manage. 8(3), 241–252 (2007)
Zhu, X., Song, B., Ni, Y., Ren, Y., Li, R.: Business Trends in the Digital Era: Evolution of Theories and Applications. Springer, New York (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mountasser, I., Ouhbi, B., Hdioud, F. et al. Semantic-based Big Data integration framework using scalable distributed ontology matching strategy. Distrib Parallel Databases 39, 891–937 (2021). https://doi.org/10.1007/s10619-021-07321-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-021-07321-6