Abstract
One of the main tasks during the early steps of a data warehouse project is the identification of the appropriate transformations and the specification of inter-schema mappings from the source to the target data stores. This is a challenging task, requiring firstly the semantic and secondly the structural reconciliation of the information provided by the available sources. This task is a part of the Extract-Transform-Load (ETL) process, which is responsible for the population of the data warehouse. In this paper, we propose a customizable and extensible ontology-driven approach for the conceptual design of ETL processes. A graph-based representation is used as a conceptual model for the source and target data stores. We then present a method for devising flows of ETL operations by means of graph transformations. In particular, the operations comprising the ETL process are derived through graph transformation rules, the choice and applicability of which are determined by the semantics of the data with respect to an attached domain ontology. Finally, we present our experimental findings that demonstrate the applicability of our approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual Modeling for ETL Processes. In: DOLAP, pp. 14–21 (2002)
Luján-Mora, S., Vassiliadis, P., Trujillo, J.: Data Mapping Diagrams for Data Warehouse Design with UML. In: ER, pp. 191–204 (2004)
Trujillo, J., Luján-Mora, S.: A UML Based Approach for Modeling ETL Processes in Data Warehouses. In: ER, pp. 307–320 (2003)
IBM: IBM Data Warehouse Manager (2006), http://www.ibm.com/software/data/db2/datawarehouse/
Informatica: Informatica PowerCenter (2007), http://www.informatica.com/products/powercenter/
Microsoft: Microsoft Data Transformation Services (2007), http://www.microsoft.com/sql/prodinfo/features/
Oracle: Oracle Warehouse Builder (2007), http://www.oracle.com/technology/products/warehouse/
Hüsemann, B., Lechtenbörger, J., Vossen, G.: Conceptual Data Warehouse Modeling. In: DMDW, p. 6 (2000)
Borst, W.N.: Construction of Engineering Ontologies for Knowledge Sharing and Reuse. PhD thesis, University of Enschede (1997)
Skoutas, D., Simitsis, A.: Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data. Int. J. Semantic Web Inf. Syst. 3(4), 1–24 (2007)
Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. VLDB J. 10(4), 334–350 (2001)
Shvaiko, P., Euzenat, J.: A Survey of Schema-Based Matching Approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)
Simitsis, A., Skoutas, D., Castellanos, M.: Natural Language Reporting for ETL Processes. In: DOLAP, pp. 65–72 (2008)
Skoutas, D., Simitsis, A.: Flexible and Customizable NL Representation of Requirements for ETL processes. In: NLDB, pp. 433–439 (2007)
Manola, F., Miller, E.: Rdf primer. W3C Recommendation, W3C (February 2004)
Brickley, D., Guha, R.: Rdf vocabulary description language 1.0: Rdf schema. W3C Recommendation, W3C (February 2004)
McGuinness, D.L., van Harmelen, F.: OWL Web Ontology Language Overview. W3C Recommendation, W3C (February 2004)
Skoutas, D., Simitsis, A.: Designing ETL Processes Using Semantic Web Technologies. In: DOLAP, pp. 67–74 (2006)
Rozenberg, G. (ed.): Handbook of Graph Grammars and Computing by Graph Transformations. Foundations, vol. 1. World Scientific, Singapore (1997)
Simitsis, A., Vassiliadis, P., Sellis, T.K.: State-Space Optimization of ETL Workflows. IEEE Trans. Knowl. Data Eng. 17(10), 1404–1419 (2005)
Tzitzikas, Y., Hainaut, J.L.: How to Tame a Very Large ER Diagram (Using Link Analysis and Force-Directed Drawing Algorithms). In: Delcambre, L.M.L., Kop, C., Mayr, H.C., Mylopoulos, J., Pastor, Ó. (eds.) ER 2005. LNCS, vol. 3716, pp. 144–159. Springer, Heidelberg (2005)
Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M., Skiadopoulos, S.: A Generic and Customizable Framework for the Design of ETL Scenarios. Inf. Syst. 30(7), 492–525 (2005)
AGG: AGG Homepage (2007), http://tfs.cs.tu-berlin.de/agg
Papastefanatos, G., Vassiliadis, P., Simitsis, A., Vassiliou, Y.: Policy-regulated Management of ETL Evolution. J. Data Semantics (to appear)
Mazón, J.N., Trujillo, J.: Enriching data warehouse dimension hierarchies by using semantic relations. In: Bell, D.A., Hong, J. (eds.) BNCOD 2006. LNCS, vol. 4042, pp. 278–281. Springer, Heidelberg (2006)
Niemi, T., Toivonen, S., Niinimäki, M., Nummenmaa, J.: Ontologies with Semantic Web/Grid in Data Integration for OLAP. Int. J. Semantic Web Inf. Syst. 3(4), 25–49 (2007)
Romero, O., Abelló, A.: Automating Multidimensional Design from Ontologies. In: DOLAP, pp. 1–8 (2007)
Kedad, Z., Métais, E.: Ontology-based data cleaning. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 137–149. Springer, Heidelberg (2002)
Gottlob, G.: Web Data Extraction for Business Intelligence: The Lixto Approach. In: BTW, pp. 30–47 (2005)
Mazón, J.N., Trujillo, J., Serrano, M., Piattini, M.: Applying MDA to the development of data warehouses. In: DOLAP, pp. 57–66 (2005)
QVT: QVT (2007), http://www.omg.org/docs/ptc/07-07-07.pdf
Ehrig, K., Guerra, E., de Lara, J., Lengyel, L., Levendovszky, T., Prange, U., Taentzer, G., Varró, D., Gyapay, S.V.: Model transformation by graph transformation: A comparative study. In: MTiP (2005)
Sanfeliu, A., Fu, K.: A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. SMC 13(3), 353–362 (1983)
Messmer, B.T., Bunke, H.: A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(5), 493–504 (1998)
Myers, R., Wilson, R.C., Hancock, E.R.: Bayesian Graph Edit Distance. IEEE Trans. Pattern Anal. Mach. Intell. 22(6), 628–635 (2000)
Yahoo!: Pipes (2007), http://pipes.yahoo.com/
Microsoft: Popfly (2007), http://www.popfly.com/
Google: Mashup Editor (2007), http://www.googlemashups.com/
Huynh, D.F., Miller, R.C., Karger, D.R.: Potluck: Semi-ontology alignment for casual users. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 903–910. Springer, Heidelberg (2007)
Ambite, J.L., Kapoor, D.: Automatically Composing Data Workflows with Relational Descriptions and Shim Services. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 15–29. Springer, Heidelberg (2007)
Petrovic, M., Liu, H., Jacobsen, H.A.: G-ToPSS: Fast Filtering of Graph-based Metadata. In: WWW, pp. 539–547 (2005)
Giunchiglia, F., Yatskevich, M., Shvaiko, P.: Semantic Matching: Algorithms and Implementation. In: Spaccapietra, S., Atzeni, P., Fages, F., Hacid, M.-S., Kifer, M., Mylopoulos, J., Pernici, B., Shvaiko, P., Trujillo, J., Zaihrayeu, I. (eds.) Journal on Data Semantics IX. LNCS, vol. 4601, pp. 1–38. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Skoutas, D., Simitsis, A., Sellis, T. (2009). Ontology-Driven Conceptual Design of ETL Processes Using Graph Transformations. In: Spaccapietra, S., Zimányi, E., Song, IY. (eds) Journal on Data Semantics XIII. Lecture Notes in Computer Science, vol 5530. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03098-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-03098-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03097-0
Online ISBN: 978-3-642-03098-7
eBook Packages: Computer ScienceComputer Science (R0)