0% found this document useful (0 votes)

7 views12 pages

Sem3 Unit1 DW

The document provides an overview of data warehousing, including its definition, evolution, and the role of database management systems. It explains the differences between OLTP and OLAP, and outlines the ETL process for preparing data for storage and analysis in a data warehouse. Key components such as data extraction, transformation, and loading methods are also discussed.

Uploaded by

gettogethermoments

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views12 pages

Sem3 Unit1 DW

Uploaded by

gettogethermoments

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

UNIT-1

DATA MINING TECHNIQUES USING R

An idea on Data Warehouse:

• A data warehouse is a centralized storage system that allows for the storing, analyzing,
and interpreting of data in order to facilitate better decision-making.

• A data warehouse can be defined as a collection of organizational data and information

extracted from operational sources and external data sources.

• Data warehouses are primarily designed to facilitate searches and analyses and usually
contain large amounts of historical data.

• In a data warehouse, data from many different sources is brought to a single location
and then translated into a format the data warehouse can process and store.
Evolution of Data warehouse:

• Punch cards, first used for storing computer-generated data, became essential in government
and business by the 1950s, carried the famous warning "Do not fold, spindle, or mutilate,"
remained widely used until the mid-1980s, and are still utilized for voting ballots and
standardized tests.

• Magnetic storage began replacing punch cards in the 1960s, with disk storage (hard drives and
floppies) becoming popular in 1964, enabling direct data access and improving efficiency over
magnetic tapes.

• IBM pioneered disk storage by inventing the floppy and hard disk drives, improving their
technology over time, began manufacturing in 1956, and sold their hard disk business to Hitachi
in 2003.
Database Management Systems:
• Disk storage was quickly followed by software called a database management system (DBMS). In 1966, IBM
came up with its own DBMS called, at the time, an information management system.

• DBMS Functions:
Locate data efficiently.
Resolve conflicts.
Allow deletion and storage optimization.
Improve data retrieval speed.

• In the late 1960s and early ‘70s, commercial online applications came into play, shortly after disk storage
and DBMS software became popular.

• As a result, there were a large number of commercial applications which could be applied to online
processing. Some examples included:
Claims processing.
Bank teller processing.
Automated teller processing (ATMs).
Airline reservation processing.
Retail point-of-sale processing.
Manufacturing control processing.
Data Warehouse Alternatives:
Understanding OLTP, OLAP, and Data Warehousing:

• OLTP (Online Transaction Processing) → Handles real-time transactional data (e.g., banking, retail
purchases).

• OLAP (Online Analytical Processing) → Enables complex queries & analysis for decision-making
(e.g., sales trends, customer segmentation).

• Data Warehouse → Serves as a centralized storage for historical data, combining OLTP data and
external sources.

How OLTP Data Connects to a Data Warehouse:

• Extract, Transform, Load (ETL) Process ✔ Data from OLTP databases (MySQL, PostgreSQL) is extracted. ✔ It is
cleaned, formatted, and optimized for analysis. ✔ Loaded into a data warehouse (Snowflake, Redshift, Big Query).

How OLAP Utilizes the Data Warehouse:

• Multidimensional Analysis ✔ OLAP systems query pre-aggregated data efficiently from the warehouse. ✔ Uses
cube structures to allow fast summaries across dimensions (e.g., time, location, product).
OLTP-Online Transaction Processing OLAP-Online Analytical Processing

Works with current data Works with historical data

Day to day transactional operations Data analysis and decision making

Normalized data structure-Structured format Star scheme and snow flake schema models to study the data

(ex: Tabular Format)

Used by frontline employees Data Analysts, Executives

Oracle, My SQL tools Tableau, Power BI and SAP tools

Data is updated in real time Data is periodically refreshed

ETL Process in Data Warehouse:
The ETL process, which stands for Extract, Transform, and Load, is a critical methodology used to prepare data for
storage, analysis, and reporting in a data warehouse.

It involves three distinct stages that help to streamline raw data from multiple sources into a clean, structured, and
usable form.

Here’s a detailed breakdown of each phase:

Extraction The Extract phase is the first step in the ETL process, where raw data is collected from various data
sources.
Types of data sources can include:
Structured: SQL databases, ERPs, CRMs.
Semi-structured: JSON, XML.
Unstructured: Emails, web pages, flat files.

Transformation Data extracted in the previous phase is often raw and inconsistent. During transformation,
the data is cleaned, aggregated, and formatted according to business rules.

Common transformations include:

Data Filtering: Removing irrelevant or incorrect data.
Data Sorting: Organizing data into a required order for easier analysis.
Data Aggregating: Summarizing data to provide meaningful insights (e.g., averaging sales data).
Loading This phase involves transferring the transformed data into a data warehouse, data lake, or another
target system for storage.

Depending on the use case, there are two types of loading methods:

Full Load: All data is loaded into the target system, often used during the initial population of the warehouse.
Incremental Load: Only new or updated data is loaded, making this method more efficient for ongoing data
updates.

DWDM
No ratings yet
DWDM
107 pages
03 DM BI Data Warehousing
No ratings yet
03 DM BI Data Warehousing
94 pages
Applied Microsoft Analysis Services 2005
100% (1)
Applied Microsoft Analysis Services 2005
713 pages
DMDW Notes Unit 1
No ratings yet
DMDW Notes Unit 1
29 pages
Unit 2 Datawarehouse
No ratings yet
Unit 2 Datawarehouse
17 pages
Exadata Customer Reference Booklet 1870297
No ratings yet
Exadata Customer Reference Booklet 1870297
110 pages
DMDW Co1 Session 2
No ratings yet
DMDW Co1 Session 2
39 pages
DWDM 2 Unit Notes
No ratings yet
DWDM 2 Unit Notes
14 pages
DSS ch2
No ratings yet
DSS ch2
112 pages
MCS 221 Notes
No ratings yet
MCS 221 Notes
24 pages
Module 1 DMDW
No ratings yet
Module 1 DMDW
64 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
86 pages
Data Mining and Warehousing Lecture-1,2
No ratings yet
Data Mining and Warehousing Lecture-1,2
37 pages
Lecture 1 & 2
No ratings yet
Lecture 1 & 2
14 pages
Data Warehouse and Data Sources
No ratings yet
Data Warehouse and Data Sources
18 pages
INFORMATION MANAGEMENT Unit 3 NEW
100% (1)
INFORMATION MANAGEMENT Unit 3 NEW
61 pages
DWM Unit 1
No ratings yet
DWM Unit 1
48 pages
04olap New
No ratings yet
04olap New
55 pages
Data Warehousing
100% (1)
Data Warehousing
51 pages
Session Five - Data Integration
No ratings yet
Session Five - Data Integration
11 pages
DMBI Winter 23
No ratings yet
DMBI Winter 23
45 pages
Unit 1
No ratings yet
Unit 1
54 pages
Lec 11 - DW
No ratings yet
Lec 11 - DW
32 pages
2-Data Warehousing
No ratings yet
2-Data Warehousing
30 pages
DW Unit I Notes
No ratings yet
DW Unit I Notes
28 pages
Informatica Interview Questions
100% (1)
Informatica Interview Questions
11 pages
Data Warehousing Mining
No ratings yet
Data Warehousing Mining
26 pages
U1-U5 Consolidated PDF
No ratings yet
U1-U5 Consolidated PDF
222 pages
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
Business Intelligence Overview
No ratings yet
Business Intelligence Overview
20 pages
Unit B Data Warehousing
No ratings yet
Unit B Data Warehousing
26 pages
CH - 3
No ratings yet
CH - 3
45 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 2
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 2
86 pages
2025-Handouts - OLAP - Lecture 1
No ratings yet
2025-Handouts - OLAP - Lecture 1
10 pages
04OLAP
No ratings yet
04OLAP
50 pages
Data Warehouse 2
No ratings yet
Data Warehouse 2
33 pages
BIDW Concepts
100% (1)
BIDW Concepts
56 pages
DWDM202
No ratings yet
DWDM202
6 pages
Oracle Data Integrator (Odi) Best Practices
No ratings yet
Oracle Data Integrator (Odi) Best Practices
65 pages
Panel 1 - Sesi 2 AMF
No ratings yet
Panel 1 - Sesi 2 AMF
34 pages
In T e G R A Ti o N: Integration of Data
No ratings yet
In T e G R A Ti o N: Integration of Data
21 pages
Unit 2 Data Warehouse New
No ratings yet
Unit 2 Data Warehouse New
45 pages
Module 2
No ratings yet
Module 2
43 pages
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
100% (1)
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
77 pages
Data Warehouse For Bignners
No ratings yet
Data Warehouse For Bignners
14 pages
DWDM Book
No ratings yet
DWDM Book
58 pages
CCS341-DATA WAREHOUSING - 1805692571-Ccs341-Question-Bank
No ratings yet
CCS341-DATA WAREHOUSING - 1805692571-Ccs341-Question-Bank
10 pages
Lab Workbook-Informatica EDC Migrating To MSFT AZURE DWM
No ratings yet
Lab Workbook-Informatica EDC Migrating To MSFT AZURE DWM
17 pages
Eb Data Warehouse Automation in Azure For Dummies en
No ratings yet
Eb Data Warehouse Automation in Azure For Dummies en
46 pages
Customer Relationship Management: Unit - IV: Lesson - 8
No ratings yet
Customer Relationship Management: Unit - IV: Lesson - 8
76 pages
Ass 1
No ratings yet
Ass 1
31 pages
Selected Topics of Recent Trends in Information Technology
No ratings yet
Selected Topics of Recent Trends in Information Technology
21 pages
Assignment - AL - 801 Done
No ratings yet
Assignment - AL - 801 Done
18 pages
CS614 Mcqs FinalTerm by Vu Topper RM
100% (1)
CS614 Mcqs FinalTerm by Vu Topper RM
29 pages
DW Concepts
100% (1)
DW Concepts
40 pages
Cat Data Mining
No ratings yet
Cat Data Mining
4 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
70 pages
DW Concepts
No ratings yet
DW Concepts
40 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
DW Concepts
No ratings yet
DW Concepts
40 pages
Unit1 (DW&DM)
No ratings yet
Unit1 (DW&DM)
30 pages
Total Experience: 6 Year(s) 0 Month(s) Annual Salary: Rs 7.0 Lac(s)
No ratings yet
Total Experience: 6 Year(s) 0 Month(s) Annual Salary: Rs 7.0 Lac(s)
7 pages
Unit 5
No ratings yet
Unit 5
6 pages
Business Intelligence/Data Integration/Etl/Integration: An Introduction Presented By: Chandrashekar P
No ratings yet
Business Intelligence/Data Integration/Etl/Integration: An Introduction Presented By: Chandrashekar P
40 pages
Company Profile PDF
No ratings yet
Company Profile PDF
24 pages
Assignment 2
No ratings yet
Assignment 2
6 pages
Data Warehouse - Concept and Fundamentals: Sridevi
No ratings yet
Data Warehouse - Concept and Fundamentals: Sridevi
25 pages
Kettle ETL Tool
100% (1)
Kettle ETL Tool
42 pages
Azure DataEngineer Course Outline
No ratings yet
Azure DataEngineer Course Outline
4 pages
MIS Unit 3
No ratings yet
MIS Unit 3
39 pages
ITM PPT Final
No ratings yet
ITM PPT Final
12 pages
Ab Initio - V1.1
No ratings yet
Ab Initio - V1.1
26 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
CDB Aia - Data Integration - Internship - Detailed Handbook
No ratings yet
CDB Aia - Data Integration - Internship - Detailed Handbook
3 pages
BTSDSB2018
No ratings yet
BTSDSB2018
21 pages
DMBI Unit-2
No ratings yet
DMBI Unit-2
36 pages
Preview: A Model of Organizational Competencies For Business Intelligence Success
No ratings yet
Preview: A Model of Organizational Competencies For Business Intelligence Success
24 pages
Education Course Catalog en
No ratings yet
Education Course Catalog en
9 pages
Tableau Neil Raden Guided Open Ended Analytics
No ratings yet
Tableau Neil Raden Guided Open Ended Analytics
15 pages
Gandhi TCS Resume
No ratings yet
Gandhi TCS Resume
5 pages
Guidewire Cloud Data Access Data Sheet
No ratings yet
Guidewire Cloud Data Access Data Sheet
3 pages
Informatica ETL Developme NT
No ratings yet
Informatica ETL Developme NT
5 pages
DS331 Lecture (11) Cairo University General Purpose DSS
No ratings yet
DS331 Lecture (11) Cairo University General Purpose DSS
12 pages
Madalam Reddy Haritha - 10 Year(s)
No ratings yet
Madalam Reddy Haritha - 10 Year(s)
5 pages
How Business Intelligence Can Help Youto Better Understand Your Customers
No ratings yet
How Business Intelligence Can Help Youto Better Understand Your Customers
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Sem3 Unit1 DW

Uploaded by

Sem3 Unit1 DW

Uploaded by

UNIT-1

DATA MINING TECHNIQUES USING R

• A data warehouse can be defined as a collection of organizational data and information

How OLTP Data Connects to a Data Warehouse:

How OLAP Utilizes the Data Warehouse:

Works with current data Works with historical data

Day to day transactional operations Data analysis and decision making

(ex: Tabular Format)

Used by frontline employees Data Analysts, Executives

Oracle, My SQL tools Tableau, Power BI and SAP tools

Data is updated in real time Data is periodically refreshed

Here’s a detailed breakdown of each phase:

Common transformations include:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.