0% found this document useful (0 votes)

12 views6 pages

DWDM202

Uploaded by

Harshit Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views6 pages

DWDM202

Uploaded by

Harshit Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Data warehousing is a process used to collect, store, and manage large volumes of data from

various sources in a centralized repository. It is designed to facilitate efficient querying and

analysis, enabling organizations to make informed decisions based on comprehensive data
insights. It helps businesses make better decisions by providing access to historical data.

Database System: Primarily designed for transactional processing (OLTP - Online Transaction
Processing). It is optimized for fast and efficient real-time operations like inserting,
updating, deleting, and retrieving small sets of data. Uses normalized data structures to
eliminate redundancy and ensure data integrity. This makes transactional operations
efficient. Typically stores operational data from a single application or system. Processes
real-time operational data that constantly changes. Used in applications like banking
systems, e-commerce, healthcare records, and CRM where real-time transactions are
essential. Eg Technology- MySQL, PostgreSQL, Oracle DB.

Data Warehouse- Designed for analytical processing (OLAP - Online Analytical Processing).
It is optimized for complex queries, reporting, and data analysis over large datasets. Uses
denormalized data structures (star schema, snowflake schema) to optimize read-heavy
operations and speed up complex queries. Integrates data from multiple sources
(databases, external files, APIs) to create a unified data repository for analysis. Processes
historical data for long-term storage and analysis, often in batch mode. Used in business
intelligence, reporting, trend analysis, and predictive analytics where historical and large-
scale data analysis is required. Eg Amazon Redshift, Google BigQuery, Snowflake.

Componets data warehousing Data sources provide raw data to the warehouse. These
sources can include operational databases (OLTP systems), external data sources,
spreadsheets, cloud services, and log files. The data may come from different formats like
SQL databases, APIs, or flat files, requiring transformation before storage. 2. ETL (Extract,
Transform, Load) Process- The ETL process is responsible for extracting data from various
sources, transforming it into a suitable format, and loading it into the data warehouse. The
transformation step includes cleaning, aggregating, and normalizing data to ensure
consistency and accuracy. Data Staging Area This is a temporary storage area where raw
data is held before transformation. It allows for data validation, cleansing, and
preprocessing to avoid loading erroneous or redundant data into the warehouse. Business
Intelligence (BI) Tools - BI tools provide a user-friendly interface for querying, reporting, data
visualization, and dashboard creation. These tools help business users generate insights
from data warehouse queries without needing technical expertise.
NEED- Business User: Business users require a data warehouse to view summarized data
from the past. Since these people are non-technical, the data may be presented to them in
an elementary form. 2)Store historical data: Data Warehouse is required to store the time
variable data from the past. This input is made to be used for various purposes. 3) Make
strategic decisions: Some strategies may be depending upon the data in the data
warehouse. So, data warehouse contributes to making strategic decisions. 4) For data
consistency and quality: Bringing the data from different sources at a commonplace, the
user can effectively undertake to bring the uniformity and consistency in data. 5) High
response time: Data warehouse has to be ready for somewhat unexpected loads and types
of queries, which demands a significant degree of flexibility and quick response time.

Data gathering . Interviews-Conduct one-on-one or group discussions with stakeholders.

Helps gain in-depth insights into user needs. Surveys and Questionnaires Used to collect
structured responses from a large audience. Effective for identifying common requirements.
Workshops- Facilitates real-time collaboration among stakeholders. Useful for
brainstorming and resolving conflicting needs. Observation (Job Shadowing) - Watching
users perform tasks in their actual work environment. Helps uncover unstated or implicit
requirements. Prototyping- Developing mockups or wireframes to visualize requirements.
Helps stakeholders validate expectations early. Document Analysis- Reviewing existing
documents, reports, or system artifacts. Useful for legacy system upgrades or process
automation.
Characteristics- Subject-Oriented: Data warehouses are organized around key subjects or
business areas (e.g., sales, finance, customer data) rather than individual transactions.
Integrated: Data from different sources (such as databases, CRM systems, ERP systems, etc.)
is cleaned, transformed, and integrated into a consistent format, allowing for
comprehensive analysis. Time-Variant: Data warehouses store historical data, enabling
users to analyze trends over time. This is crucial for forecasting and decision-making. Non-
Volatile: Once data is entered into a data warehouse, it is not typically changed or deleted.
This stability allows for consistent reporting and analysis.

Data Mart: A subset of a data warehouse focused on a specific business unit, department,
or function (e.g., sales, marketing, finance). Usually derived from a single source or a subset
of the data warehouse. Uses a bottom-up approach—individual data marts are built first,
and they may later contribute to a data warehouse. Data marts focus on a specific subject
area, making it easier for users to access and analyze relevant data without having to
navigate the entire data warehouse. Processes more straightforward queries related to
departmental needs. Eg A sales data mart storing only sales-related data for analysis.
Features- Faster Access , Subject-Oriented , Improved Performance
Bottom Tier(Data Source Layer) – Data Extraction and Storage − The bottom tier of the
architecture is the data warehouse database server. It is the relational database system. We
use the back end tools and utilities to feed data into the bottom tier. These back end tools
and utilities perform the Extract, Clean, Load, and refresh functions. Relational databases
(e.g., Oracle, SQL Server), data warehouses (e.g., Amazon Redshift, Google BigQuery).
Middle Tier (Data Processing Layer)-OLAP & Business Logic− In the middle tier, we have the
OLAP Server that can be implemented in either of the following ways. By Relational OLAP
(ROLAP), which is an extended relational database management system. The ROLAP maps
the operations on multidimensional data to standard relational operations. By
Multidimensional OLAP (MOLAP) model, which directly implements the multidimensional
data and operations. Eg Application servers, ETL tools (e.g., Apache NiFi, Talend)
Top-Tier (Presentation Layer) – Reporting & Visualization − This tier is the front-end client
layer. This layer holds the query tools and reporting tools, analysis tools and data mining
tools. I (e.g., Tableau, Power BI, Looker).

Metadata is data about the data or documentation about the information which is required
by the users. In data warehousing, metadata is one of the essential aspects. Metadata
includes the following:• The location and descriptions of warehouse systems and
components.• Names, definitions, structures, and content of data-warehouse and end-
users views. •Identification of authoritative data sources. •Integration and transformation
rules used to populate data. It is used for building, maintaining, managing, and using the
data warehouses. Metadata allow users access to help understand the content and find
data. Necessary- First, it acts as the glue that links all parts of the data warehouses. Next, it
provides information about the contents and structures to the developers. Finally, it opens
the doors to the end-users and makes the contents recognizable in their terms. TYPES-
Operational metadata contains all of the information about the operational data sources.
•Extraction and Transformation Metadata(this category of metadata contains information
about all the data transformation that takes place in the data staging area.) •End-User
Metadata(The end-user metadata allows the end-users to use their business terminology
and look for the information in those ways in which they usually think of the business.)

ETL stands for Extract, Transform, Load, and it is a crucial process in data warehousing and
data integration. ETL is used to collect data from various sources, transform it into a suitable
format, and load it into a target database or data warehouse for analysis and reporting. The
ETL process is essential for organizations that need to consolidate data from multiple
sources for analysis and reporting. Extract: The extraction phase involves retrieving data
from various source systems. These sources can be databases, flat files, APIs, cloud services,
or even web scraping. Transform: The transformation phase involves cleaning, enriching,
and converting the extracted data into a format suitable for analysis. This step is crucial for
ensuring data quality and consistency. Data Cleaning/Enrichment/Formatting Load: The
loading phase involves writing the transformed data into the target database or data
warehouse. This is where the data becomes available for analysis and reporting.
Data mining is the process of extracting meaningful patterns, trends, and insights from large
datasets using computational techniques. It transforms raw data into valuable insights by
leveraging algorithms, tools, and domain expertise. It works across structured,
unstructured, and semi-structured data to uncover patterns like associations, classifications,
and anomalies. Data mining automates the discovery of non-obvious patterns that are too
complex for manual detection. For example, retailers use data mining to identify purchasing
trends, while healthcare providers use it to predict disease outbreaks.

Techniques-Classification patterns involve predicting categorical labels based on input

features. This process is often used in supervised learning, where a model is trained on a
labeled dataset. Common algorithms-Decision Trees, Naïve Bayes, Support Vector Machines
(SVM). Clustering patterns group similar data points into clusters based on their
characteristics. Unlike classification, clustering is an unsupervised learning technique,
meaning it does not rely on labeled data. Algorithms: K-Means, Hierarchical
Clustering. Regression patterns involve predicting continuous numerical values based on
input features. This type of analysis is used to model relationships between variables and
forecast future outcomes. For example, predicting house prices based on features like size,
location Eg- Linear Regression, Logistic Regression. Anomaly Detection (Outlier Detection)
Identifies unusual patterns or deviations in data. Ex: Fraud detection in banking. Algor:
Isolation Forest, Local Outlier Factor (LOF).

Association Rules: Association patterns identify relationships between variables in large

datasets. They are commonly used in market basket analysis to find items that frequently
co-occur in transactions. For example, if customers who buy bread often also buy butter,
this relationship can be expressed as an association rule: "If bread is purchased, then butter
is likely to be purchased." The most famous algorithm for mining association rules is the
Apriori algorithm, which uses support and confidence metrics to evaluate the strength of
the rules. Concepts: Support: Measures how frequently an itemset appears in the dataset.
Confidence: Measures the likelihood that item B is bought when item A is bought. Lift:
Measures how much more likely A and B occur together compared to random chance.
Data preprocessing is an important step in the data mining process. It refers to the cleaning,
transforming, and integrating of data in order to make it ready for analysis. The goal of data
preprocessing is to improve the quality of the data and to make it more suitable for the
specific data mining task. For ex a dataset with customer records might have missing ages,
duplicate entries, or mismatched units (e.g., weight in pounds vs. kilograms). Pre-processing
resolves these issues to ensure accurate analysis. Data Cleaning: This involves identifying
and correcting errors or inconsistencies in the data, such as missing values, outliers, and
duplicates. Various techniques can be used for data cleaning, such as imputation, removal,
and transformation. Handling Missing Values – Removing or filling missing data using
methods like mean, median, or interpolation. Removing Duplicates – Eliminating redundant
records to ensure data integrity. Data Integration: This involves combining data from
multiple sources to create a unified dataset. Data integration can be challenging as it
requires handling data with different formats, structures, and semantics. Techniques such
as record linkage and data fusion can be used for data integration. Combining Data from
Multiple Sources – Merging data from different databases, files, or APIs. Handling Schema
Conflicts – Resolving differences in naming conventions, data types, or units.

Data Reduction: This involves reducing the size of the dataset while preserving the
important information. It can be achieved through techniques such as feature selection and
feature extraction. Feature Selection – Identifying and keeping only the most relevant
features. Sampling – Selecting a representative subset of data to reduce computational
complexity. Data Transformation: This involves converting the data into a suitable format
for analysis. Common techniques used in data transformation include normalization,
standardization, and discretization. Normalization – Scaling numerical data to a common
range (e.g., 0-1) to prevent bias. Standardization – Transforming data to have zero mean and
unit variance. Data discretization is the process of converting continuous data into discrete
categories or intervals. This technique is particularly useful in data mining and machine learning, as
many algorithms perform better with categorical data. Discretization helps simplify the analysis,
improve interpretability, and reduce the impact of noise in the data. Converting Continuous Data to
Categorical – Grouping numerical values into bins (e.g., age groups: 0–18, 19–35, 36–60).
Issues- Privacy and Ethics: Data mining often involves analyzing sensitive information, such
as personal health records, financial data, or user behavior. This raises significant privacy
and ethical concerns. Scalability: As the volume of data generated continues to grow
exponentially, data mining faces scalability challenges. Processing massive datasets, such as
those generated by social media platforms, IoT devices, or e-commerce transactions,
requires substantial computational resources. Scalability & Performance -Handling Large
Datasets: Processing petabytes of data requires high computational power. Real-time Data
Mining: Challenges in extracting insights from streaming data efficiently. Overfitting occurs
when a model learns the training data too well, capturing noise and outliers rather than the
underlying patterns. As a result, while the model may perform exceptionally well on the
training dataset, it fails to generalize to new, unseen data, leading to poor performance in
real-world applications. Interpretability: As data mining models become more complex,
particularly with the rise of deep learning, interpretability becomes a significant challenge.

Data Mining applications- Market Basket Analysis- Helps businesses understand what
products are frequently bought together. Ex: A supermarket finds that customers who buy
diapers also buy baby wipe. Disease Prediction and Diagnosis- Machine learning models
analyze patient data to predict diseases like cancer or diabetes. Ex: AI-assisted radiology for
detecting tumors in X-rays. Risk Management = Predicts the likelihood of loan defaults or
stock market crashes. Ex: Banks use credit scoring models to assess loan applicants. Social
Media and Internet - Sentiment Analysis Analyzes user opinions from social media posts,
reviews, and comments. Ex: Twitter sentiment analysis to gauge public opinion on political
issues. Transportation: A diversified transportation company with a large direct sales force
can apply data mining to identify the best prospects for its services. A large consumer
merchandise organization can apply information mining to improve its business cycle to
retailers.

Text Book Engish
67% (3)
Text Book Engish
64 pages
Unit 1
No ratings yet
Unit 1
33 pages
Data Notes
No ratings yet
Data Notes
37 pages
Unit 1 (DWDM).docx
No ratings yet
Unit 1 (DWDM).docx
50 pages
HTCB Unit 1
No ratings yet
HTCB Unit 1
5 pages
DMBI Unit-1
No ratings yet
DMBI Unit-1
37 pages
Physeo Pathology ATF
100% (1)
Physeo Pathology ATF
412 pages
Introduction to Data Warehousing_ Overview
No ratings yet
Introduction to Data Warehousing_ Overview
21 pages
All Unit
No ratings yet
All Unit
17 pages
Data Warehousing
No ratings yet
Data Warehousing
8 pages
Data Warehousing and DSS
No ratings yet
Data Warehousing and DSS
53 pages
2acc4f4b-a476-48d3-a07e-b392f3fc5605
No ratings yet
2acc4f4b-a476-48d3-a07e-b392f3fc5605
37 pages
Makharij of The Arabic Alphabet
0% (1)
Makharij of The Arabic Alphabet
3 pages
Unit 1
No ratings yet
Unit 1
29 pages
Unit-2 DM
No ratings yet
Unit-2 DM
21 pages
Data warehousing and Data mining
No ratings yet
Data warehousing and Data mining
11 pages
$RRWYO9T
No ratings yet
$RRWYO9T
71 pages
DWH Notes
No ratings yet
DWH Notes
30 pages
CS2202_DataWarehouse_OLAP
No ratings yet
CS2202_DataWarehouse_OLAP
49 pages
Datawarehouse Unit-2
No ratings yet
Datawarehouse Unit-2
59 pages
Unit 1 (DWDM)
No ratings yet
Unit 1 (DWDM)
52 pages
Unit 1 (DWDM)
No ratings yet
Unit 1 (DWDM)
51 pages
1a Ravi
No ratings yet
1a Ravi
37 pages
BMIS Chapter 4 SCMSB
No ratings yet
BMIS Chapter 4 SCMSB
35 pages
Unit I
No ratings yet
Unit I
18 pages
Module 2
No ratings yet
Module 2
43 pages
DW UNIT 1
No ratings yet
DW UNIT 1
29 pages
Analysis, Decision-Making, As Well As Other Activities Such As Support For Optimization of Organizational Operational Processes
No ratings yet
Analysis, Decision-Making, As Well As Other Activities Such As Support For Optimization of Organizational Operational Processes
4 pages
UNIT II
No ratings yet
UNIT II
45 pages
Download (eBook PDF) Introduction to Communication Disorders: A Lifespan Evidence-Based Perspective 6th Edition ebook All Chapters PDF
83% (6)
Download (eBook PDF) Introduction to Communication Disorders: A Lifespan Evidence-Based Perspective 6th Edition ebook All Chapters PDF
51 pages
Why do we Read High Fiction Project (Harvard)
No ratings yet
Why do we Read High Fiction Project (Harvard)
70 pages
BIDA NOTES (1)
No ratings yet
BIDA NOTES (1)
67 pages
Unit1 (DW&DM)
No ratings yet
Unit1 (DW&DM)
30 pages
1 & 2 Data Warehousing_021052
No ratings yet
1 & 2 Data Warehousing_021052
80 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
53 pages
Data Warehouse
No ratings yet
Data Warehouse
56 pages
UNIT 1
No ratings yet
UNIT 1
18 pages
DW Unit I Notes
No ratings yet
DW Unit I Notes
28 pages
Advanced Database Presentation
No ratings yet
Advanced Database Presentation
11 pages
DWDM
No ratings yet
DWDM
12 pages
Lec 11- DW
No ratings yet
Lec 11- DW
32 pages
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
Data Warehouse For Bignners
No ratings yet
Data Warehouse For Bignners
14 pages
CS 2208 DATA MINING AND WAREHOUSING NOTES
No ratings yet
CS 2208 DATA MINING AND WAREHOUSING NOTES
14 pages
Business Intelligence
No ratings yet
Business Intelligence
17 pages
Elnet LT - User Manual
No ratings yet
Elnet LT - User Manual
78 pages
Data Warehouse
No ratings yet
Data Warehouse
39 pages
Conservation Markets The Environment In Southern And Eastern Africa Commodifying The Wild Michael Bollig download
No ratings yet
Conservation Markets The Environment In Southern And Eastern Africa Commodifying The Wild Michael Bollig download
88 pages
Development of Science & Maths Edu
No ratings yet
Development of Science & Maths Edu
11 pages
DWDM Notes - Final
No ratings yet
DWDM Notes - Final
46 pages
City of Lapu-Lapu vs. Peza
No ratings yet
City of Lapu-Lapu vs. Peza
29 pages
03 Data Warehouse
No ratings yet
03 Data Warehouse
27 pages
Data Warehousing
No ratings yet
Data Warehousing
4 pages
Data Warehousing and Its Role in BI
No ratings yet
Data Warehousing and Its Role in BI
10 pages
Business Intelligence?: BI Used For?
No ratings yet
Business Intelligence?: BI Used For?
9 pages
190.Top-Weld Manufacturing, Inc. vs. ECED, S.A.
No ratings yet
190.Top-Weld Manufacturing, Inc. vs. ECED, S.A.
12 pages
Name of The Experiment: Sequential Circuit Design - Latch, Flip Flop and Registers Important: Submit Your Prelab at The Beginning of The Lab
No ratings yet
Name of The Experiment: Sequential Circuit Design - Latch, Flip Flop and Registers Important: Submit Your Prelab at The Beginning of The Lab
7 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
ALL SEMESTER MARKSHEET (ALL IN ONE) - Compressed - Rem675757oved
100% (1)
ALL SEMESTER MARKSHEET (ALL IN ONE) - Compressed - Rem675757oved
1 page
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
21 pages
Shakespeare and The English Equity Jurisdiction. The Merchant of Venice and The Two Texts
No ratings yet
Shakespeare and The English Equity Jurisdiction. The Merchant of Venice and The Two Texts
24 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Data Warehouse and Data Mining Notes
No ratings yet
Data Warehouse and Data Mining Notes
31 pages
Data Warehouse References
No ratings yet
Data Warehouse References
40 pages
1 - FIDP - Diaz, Roger M.
No ratings yet
1 - FIDP - Diaz, Roger M.
2 pages
Unit 1
No ratings yet
Unit 1
22 pages
Model CV Curriculum Vitae European Engleza
No ratings yet
Model CV Curriculum Vitae European Engleza
2 pages
DW Module-1
No ratings yet
DW Module-1
4 pages
DWM Unit 1
No ratings yet
DWM Unit 1
34 pages
WA Data Warehouse
No ratings yet
WA Data Warehouse
16 pages
Cambridge International AS Level: 8021/23 English General Paper
No ratings yet
Cambridge International AS Level: 8021/23 English General Paper
8 pages
Selected Topics of Recent Trends in Information Technology
No ratings yet
Selected Topics of Recent Trends in Information Technology
21 pages
Chapter - 1. Number System
No ratings yet
Chapter - 1. Number System
31 pages
Mandala (Breaking Bad) - Wikipedia
No ratings yet
Mandala (Breaking Bad) - Wikipedia
3 pages
Chettinad Architecture
100% (1)
Chettinad Architecture
46 pages
Data Warehouse - Concept and Fundamentals: Sridevi
No ratings yet
Data Warehouse - Concept and Fundamentals: Sridevi
25 pages
Jain Research Project Guidelines
No ratings yet
Jain Research Project Guidelines
7 pages
J Msea 2016 06 014
No ratings yet
J Msea 2016 06 014
6 pages
ST Patrick's Day A2B1 SV
No ratings yet
ST Patrick's Day A2B1 SV
3 pages
CASE No. 205 People of The Philippines vs. Anacleto Q. Olvis Et Al
No ratings yet
CASE No. 205 People of The Philippines vs. Anacleto Q. Olvis Et Al
2 pages
8 Q DOFF Pamphlet Editable
No ratings yet
8 Q DOFF Pamphlet Editable
4 pages
Business Mathematics Paper
0% (1)
Business Mathematics Paper
2 pages
A Destination Wedding
No ratings yet
A Destination Wedding
2 pages
Grodan Brochure Vital V1 EN
No ratings yet
Grodan Brochure Vital V1 EN
2 pages
Mahendra Kumar Pathak Oht
No ratings yet
Mahendra Kumar Pathak Oht
3 pages
2014-2016 Audi RS3 Honeycomb Grille With Quattro
No ratings yet
2014-2016 Audi RS3 Honeycomb Grille With Quattro
1 page
DWDM Notes/Unit 1
No ratings yet
DWDM Notes/Unit 1
31 pages
Solved Paris Participates in Her Employer S Nonqualified Deferred Compensation Plan For
No ratings yet
Solved Paris Participates in Her Employer S Nonqualified Deferred Compensation Plan For
1 page
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DWDM202

Uploaded by

DWDM202

Uploaded by

Data warehousing is a process used to collect, store, and manage large volumes of data from

various sources in a centralized repository. It is designed to facilitate efficient querying and

Data gathering . Interviews-Conduct one-on-one or group discussions with stakeholders.

Techniques-Classification patterns involve predicting categorical labels based on input

Association Rules: Association patterns identify relationships between variables in large

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.