Session Five - Data Integration
Session Five - Data Integration
Data Warehouse – Advantage and Limitations Data Warehouse – Advantage and Limitations
Advantages Limitations
Integration at the lowest level, eliminating need for Process would take a considerable amount of time
integration queries. and effort
Runtime schematic cleaning is not needed – Requires an understanding of the domain
performed at the data staging environment More scalable when accompanied with a
Independent of original data source metadata repository – increased load.
Query optimization is possible. Tightly coupled architecture.
Data Warehousing (DW) Extract, Transform , Load (ETL)
• ETL is a process that extracts the data from different source
systems, then transforms the data (like applying
calculations, concatenations, etc.) and finally loads the data
into the data warehouse system. ETL provides the
foundation for data analytics and machine learning work
streams. ETL is often used by an organization to:
• Extract data from legacy systems
• Cleanse the data to improve data quality and establish
consistency
• Load data into a target database , usually a DW
manipulation and data structure is large. It’s Data would be present in various servers The entire DW would be present in one
mainly used in the dot net platform and is always server
Requires high speed network connections Requires no network connections
performed with C# or using VB.NET
It is easier to create as compared to DW Its creation is not easy as that of federated
It’s is a much faster way of accessing the data database
than using Memory Stream. Requires no creation of new database DW must be created from scratch
Requires network expert to set up the Requited database experts such as data
network connection steward
Data Integration Technologies Data Integration Technologies
The technologies that are used for data integration Modeling techniques
include: Entity-Relational Modeling - An Entity–relationship
model (ER model) describes the structure of a database with
Data interchange – it is a structured transmission of the help of a diagram, which is known as Entity Relationship
organizational data between two or more organizations through
electronic means. Used for the transfer of electronic documents Diagram (ER Diagram).
from one computer to another.
Object Brokering - an object request broker (ORB) is a
middleware software. It gives programmers the freedom to make
calls from one computer to another over via a computer network.
Standardize, correct and normalize data Match, link and consolidate multiple data sources
Verify and validate data accuracy Gain access to the right data sources at the right
time
Apply business rules
Deliver high-quality information
Increase the quality of information