0% found this document useful (0 votes)
7 views12 pages

Sem3 Unit1 DW

The document provides an overview of data warehousing, including its definition, evolution, and the role of database management systems. It explains the differences between OLTP and OLAP, and outlines the ETL process for preparing data for storage and analysis in a data warehouse. Key components such as data extraction, transformation, and loading methods are also discussed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views12 pages

Sem3 Unit1 DW

The document provides an overview of data warehousing, including its definition, evolution, and the role of database management systems. It explains the differences between OLTP and OLAP, and outlines the ETL process for preparing data for storage and analysis in a data warehouse. Key components such as data extraction, transformation, and loading methods are also discussed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

UNIT-1

DATA MINING TECHNIQUES USING R


An idea on Data Warehouse:

• A data warehouse is a centralized storage system that allows for the storing, analyzing,
and interpreting of data in order to facilitate better decision-making.

• A data warehouse can be defined as a collection of organizational data and information


extracted from operational sources and external data sources.

• Data warehouses are primarily designed to facilitate searches and analyses and usually
contain large amounts of historical data.

• In a data warehouse, data from many different sources is brought to a single location
and then translated into a format the data warehouse can process and store.
Evolution of Data warehouse:

• Punch cards, first used for storing computer-generated data, became essential in government
and business by the 1950s, carried the famous warning "Do not fold, spindle, or mutilate,"
remained widely used until the mid-1980s, and are still utilized for voting ballots and
standardized tests.

• Magnetic storage began replacing punch cards in the 1960s, with disk storage (hard drives and
floppies) becoming popular in 1964, enabling direct data access and improving efficiency over
magnetic tapes.

• IBM pioneered disk storage by inventing the floppy and hard disk drives, improving their
technology over time, began manufacturing in 1956, and sold their hard disk business to Hitachi
in 2003.
Database Management Systems:
• Disk storage was quickly followed by software called a database management system (DBMS). In 1966, IBM
came up with its own DBMS called, at the time, an information management system.

• DBMS Functions:
Locate data efficiently.
Resolve conflicts.
Allow deletion and storage optimization.
Improve data retrieval speed.

• In the late 1960s and early ‘70s, commercial online applications came into play, shortly after disk storage
and DBMS software became popular.

• As a result, there were a large number of commercial applications which could be applied to online
processing. Some examples included:
Claims processing.
Bank teller processing.
Automated teller processing (ATMs).
Airline reservation processing.
Retail point-of-sale processing.
Manufacturing control processing.
Data Warehouse Alternatives:
Understanding OLTP, OLAP, and Data Warehousing:

• OLTP (Online Transaction Processing) → Handles real-time transactional data (e.g., banking, retail
purchases).

• OLAP (Online Analytical Processing) → Enables complex queries & analysis for decision-making
(e.g., sales trends, customer segmentation).

• Data Warehouse → Serves as a centralized storage for historical data, combining OLTP data and
external sources.

How OLTP Data Connects to a Data Warehouse:

• Extract, Transform, Load (ETL) Process ✔ Data from OLTP databases (MySQL, PostgreSQL) is extracted. ✔ It is
cleaned, formatted, and optimized for analysis. ✔ Loaded into a data warehouse (Snowflake, Redshift, Big Query).

How OLAP Utilizes the Data Warehouse:

• Multidimensional Analysis ✔ OLAP systems query pre-aggregated data efficiently from the warehouse. ✔ Uses
cube structures to allow fast summaries across dimensions (e.g., time, location, product).
OLTP-Online Transaction Processing OLAP-Online Analytical Processing

Works with current data Works with historical data

Day to day transactional operations Data analysis and decision making

Normalized data structure-Structured format Star scheme and snow flake schema models to study the data

(ex: Tabular Format)

Used by frontline employees Data Analysts, Executives

Oracle, My SQL tools Tableau, Power BI and SAP tools

Data is updated in real time Data is periodically refreshed


ETL Process in Data Warehouse:
The ETL process, which stands for Extract, Transform, and Load, is a critical methodology used to prepare data for
storage, analysis, and reporting in a data warehouse.

It involves three distinct stages that help to streamline raw data from multiple sources into a clean, structured, and
usable form.

Here’s a detailed breakdown of each phase:

Extraction The Extract phase is the first step in the ETL process, where raw data is collected from various data
sources.
Types of data sources can include:
Structured: SQL databases, ERPs, CRMs.
Semi-structured: JSON, XML.
Unstructured: Emails, web pages, flat files.

Transformation Data extracted in the previous phase is often raw and inconsistent. During transformation,
the data is cleaned, aggregated, and formatted according to business rules.

Common transformations include:


Data Filtering: Removing irrelevant or incorrect data.
Data Sorting: Organizing data into a required order for easier analysis.
Data Aggregating: Summarizing data to provide meaningful insights (e.g., averaging sales data).
Loading This phase involves transferring the transformed data into a data warehouse, data lake, or another
target system for storage.

Depending on the use case, there are two types of loading methods:

Full Load: All data is loaded into the target system, often used during the initial population of the warehouse.
Incremental Load: Only new or updated data is loaded, making this method more efficient for ongoing data
updates.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy