0% found this document useful (0 votes)
16 views12 pages

DWDM

A data warehouse is a centralized repository designed for business intelligence and analytical reporting, integrating data from various sources for easier analysis and decision-making. It features characteristics such as integration, subject-orientation, time-variance, and non-volatility, and supports decision-making through efficient querying and reporting. Data warehouses come in different types, including Enterprise Data Warehouses, Operational Data Stores, and Data Marts, each serving specific organizational needs.

Uploaded by

Varun kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views12 pages

DWDM

A data warehouse is a centralized repository designed for business intelligence and analytical reporting, integrating data from various sources for easier analysis and decision-making. It features characteristics such as integration, subject-orientation, time-variance, and non-volatility, and supports decision-making through efficient querying and reporting. Data warehouses come in different types, including Enterprise Data Warehouses, Operational Data Stores, and Data Marts, each serving specific organizational needs.

Uploaded by

Varun kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

A data warehouse is a centralized repository of integrated, structured, and historical

data that is specifically designed to support business intelligence (BI) and analytical
reporting.

It serves as a storage solution for collecting, organizing, and managing large volumes
of data from various sources, making it easier to analyze and gain insights for
decision-making purposes.

Data warehouses are used to facilitate efficient querying, analysis, and reporting,
providing a comprehensive view of an organization's data across different time
periods and business functions.

Key characteristics and features of a data warehouse include:

1. Integration: Data warehouses consolidate data from disparate sources, such


as operational databases, spreadsheets, external systems, and more. This
integration helps ensure that data from different departments and systems
can be analyzed together.
2. Subject-Oriented: Data warehouses are organized around specific subjects or
domains of interest to the business, such as sales, inventory, customer
behavior, finance, etc. This subject-oriented approach simplifies querying and
analysis for different business functions.
3. Time-Variant: Data warehouses store historical data over different time
periods, allowing for the analysis of trends, changes, and historical
performance. This time dimension is crucial for making informed decisions.
4. Non-Volatile: Once data is loaded into a data warehouse, it is rarely updated
or changed. Instead, new data is added over time, maintaining data integrity
and providing a historical record of changes.
5. Data Modeling: Data warehouses use specialized data modeling techniques,
such as star schema or snowflake schema, to structure data for optimal
querying and reporting performance. These models involve fact tables
(holding measures) and dimension tables (holding attributes).
6. ETL Processes: Data Extraction, Transformation, and Loading (ETL) processes
are employed to extract data from source systems, transform it to fit the data
warehouse schema, and then load it into the warehouse.
7. Support for Decision-Making: Data warehouses provide decision-makers
with a unified and consistent source of data for strategic, tactical, and
operational decisions. They enable organizations to make informed choices
based on historical and current data.
8. Scalability: Data warehouses are designed to handle large volumes of data
and provide scalable performance as the data grows.
Data warehousing is the process of collecting and managing data from multiple sources to
provide meaningful insights and support informed decision-making. It involves integrating
data from different sources and creating a centralized repository of data that can be
analyzed to extract insights.
By consolidating data into a single repository, businesses can gain insights into their
operations, identify trends, and make data-driven decisions.
Moreover, a data warehouse provides historical data that can be used to analyze trends and
patterns over time.
The ability to perform time-series analysis is critical for businesses to identify long-term
trends, plan for future needs, and forecast demand.
Data warehouses were used in large corporations and government agencies to store
historical data for reporting and decision-making purposes.

How Data Warehouse Works?

A data warehouse serves as a central repository for information that arrives from one or more
data sources, including structured, semi-structured, and unstructured data.

The data is processed, transformed, and ingested into the data warehouse, where users can
access the processed data through various tools, such as Business Intelligence software, SQL
clients, and spreadsheets`.

Data warehousing makes data mining possible by enabling organizations to identify patterns
and relationships within the data using advanced statistical and machine learning algorithms.

a data warehouse provides organizations with a centralized platform to store and manage
their data, making it easier to access and analyze large volumes of information for reporting,
analysis, and decision-making purposes.

Using Data Warehouse Information

Once data is collected and stored in a data warehouse`, it can be used for various purposes.
Some of the most common uses of data warehouse information include -

• Business Intelligence - Data warehouses provide a centralized platform for collecting


and analyzing data, enabling organizations to gain insights into their business
operations. Business Intelligence (BI) tools can be used to query the data warehouse
and generate reports, dashboards, and visualizations that help businesses make
informed decisions.
• Decision-Making - The insights gained from data warehouse analysis can help
organizations make better decisions. By identifying patterns and trends within the
data, decision-makers can develop strategies to improve business outcomes, such as
increasing revenue, reducing costs, or improving customer satisfaction.
• Marketing - Data warehouses can be used to collect and analyze customer data,
allowing organizations to understand their target audience better. Businesses can
develop more effective marketing strategies by analyzing customer behavior, such as
personalized marketing campaigns or targeted advertising.
• Performance Management - Data warehouse information can be used to track and
monitor key performance indicators (KPIs) and measure the success of business
initiatives. By tracking KPIs over time, organizations can identify areas where
improvements are needed and develop plans to address those areas.

Types of Data Warehouse (DWH)

There are three types of Data Warehouses (DWH) -

• Enterprise Data Warehouse (EDW) - The enterprise data warehouse (EDW) is a


comprehensive data repository that stores all of an organization's historical and
current data from multiple sources, allowing for centralized data management. The
EDW consolidates data from across the organization, creating a single source of truth
that supports business intelligence, analytics, and decision-making. EDW can store
massive amounts of data and can handle complex data modeling and data integration
tasks.
• Operational Data Store (ODS) - An operational data store (ODS) is a database that
supports operational and transactional systems by providing real-time data integration.
The ODS provides a platform for integrating, cleaning, and consolidating operational
data from various sources in real time. It is designed for operational reporting and
short-term analysis of business processes, such as order fulfillment, inventory control,
and customer service.
• Data Mart - A data mart is a smaller, more focused version of a data warehouse that
supports a specific business unit, department, or function within an organization. It is
used to serve the needs of a particular group of users and contains a subset of data
from the EDW or other data sources. Data marts can be designed for different
business areas, such as sales, marketing, finance, or HR, and can be either
independent or integrated with an EDW.

Functions of Data Warehouse Tools and Utilities

Data warehouse tools and utilities are designed to perform various functions that help manage
and analyze data stored in a data warehouse. Some of the key functions of data warehouse
tools and utilities are -

• Data Extraction - This involves extracting data from various sources, such as
transactional databases, operational systems, and external data sources. The data is
then cleaned, transformed, and loaded into the data warehouse.
• Data Cleaning - This involves identifying and correcting errors or inconsistencies in
the data. Data cleaning ensures that the data is accurate and reliable for analysis.
• Data Transformation - This involves converting the data into a format that is
suitable for analysis. Data transformation may involve merging data from multiple
sources, reformatting data, or creating new variables.
• Data Integration - This involves integrating data from multiple sources into a single
data warehouse. This allows for a more comprehensive view of an organization's data,
which can improve decision-making.
• Data Storage - Data warehouse tools and utilities provide various storage options,
such as relational databases, columnar databases, or cloud-based storage. The choice
of storage depends on the size and complexity of the data and the organization's
needs.
• Data Analysis - This involves using various tools and techniques to analyze data
stored in the data warehouse. Data analysis can help identify patterns, trends, and
insights that can inform business decisions.
• Data Refresh - The data refresh refers to updating the warehouse data with the latest
information from the source systems. This process is important because the data in the
warehouse needs to be as up-to-date as possible to support accurate and timely
decision-making. Data refresh can be performed periodically or in real-time,
depending on the organization's needs and data availability.

Why We Need Data Warehouse

A few of the most common reasons to explain why we need a data warehouse include the
following -

• Centralized data management - Data warehouses provide a centralized location for


storing and managing data from multiple sources. By consolidating data into a single
location, organizations can better manage their data and reduce the complexity of
querying multiple data sources.
• Improved data quality - Data warehouses allow organizations to clean, transform,
and integrate data from different sources. This process helps to improve data quality
by eliminating errors and inconsistencies and standardizing data formats.
• Support for decision-making - Data warehouses provide a foundation for decision-
making by providing a historical view of the organization's data. Organizations can
make more informed decisions about their business operations by analyzing trends
and patterns in the data.
• Faster access to data - Data warehouses are optimized for querying and analysis,
allowing faster access to data than traditional databases. This speed is critical for
decision-making and can help organizations stay ahead of their competition.
• Cost savings - By consolidating data into a single location, organizations can reduce
the need for expensive hardware and software licenses. Additionally, organizations
can reduce costs associated with errors and inefficiencies by improving data quality
and supporting better decision-making.

Advantages & Disadvantages of Data Warehouse

• It provides a central repository for critical data, making it easy for business users to
access information from various sources.
• By providing a consolidated view of data from different sources, data
warehouses enable organizations to make informed decisions based on accurate and
consistent data.
• Data warehouses provide historical data that can be used to identify trends and
patterns over time, leading to better decision-making and planning.
• Saves time by allowing users to access critical data from multiple sources in a single
place.
• Data warehouses can easily scale to meet the needs of growing organizations,
allowing them to store and analyze large volumes of data.

Data warehouses also have several disadvantages, as shown below -

• Implementing and maintaining a data warehouse can be expensive, including


hardware, software, and personnel costs.
• Not suitable for unstructured data.
• Not suitable for `real-time or near-real-time data processing.
• Integrating data from multiple sources into a single data warehouse can be complex
and time-consuming.
• Data in the warehouse may become outdated quickly.

Conclusion

• Data warehousing is a process that involves collecting, storing, and managing data
from multiple sources to enable better decision-making.
• The advantage of a data warehouse is its ability to provide a centralized and optimized
repository for efficient querying, analysis, and reporting on integrated data from
multiple heterogeneous sources.
• There are different types of data warehouses, including enterprise data warehouses,
operational data stores, and data marts, each with its own benefits and use cases.
Data mining is the process of discovering patterns and extracting knowledge from large
amounts of data. It involves the use of various techniques such as classification, regression,
clustering, association rule mining, and anomaly detection, among others. These data
mining techniques are used in various fields, including business, finance, healthcare, and
science, to gain insights and make informed decisions.

Introduction

Data mining techniques are used to extract useful knowledge and insights from large datasets.
A good data mining technique should have the following characteristics -

• Scalability -
The technique should be able to handle large amounts of data efficiently.
• Robustness -
The technique should be able to handle noisy or incomplete data without
compromising the quality of the results.
• Accuracy -
The technique should produce accurate results with a low error rate.
• Interpretability -
The technique should produce results that domain experts can easily understand and
interpret.

Data mining techniques are important because they enable organizations to discover hidden
patterns, relationships, and insights in their data. These insights can be used to make
informed decisions, improve business processes, and identify new opportunities. Data mining
techniques are widely used in fields such as marketing, finance, healthcare, and scientific
research.

Knowledge Discovery From Data

The process of Knowledge Discovery From Data (KDD) involves several steps, which are
as follows -

• Data cleaning -
The first step involves preprocessing and cleaning of data to remove noise, missing
values, or inconsistencies.
• Data integration -
Data from multiple sources are combined into a single dataset.
• Data selection -
The relevant data is selected from the dataset for further analysis.
• Data transformation -
The selected data is transformed into a format suitable for mining.
• Data mining -
Data mining techniques are applied to extract useful patterns or knowledge from the
data.
• Pattern evaluation -
The patterns discovered by data mining are evaluated based on criteria such as their
significance and usefulness.
• Knowledge representation -
The results are represented in a form that is easily understandable by users.
• Knowledge utilization -
The knowledge gained from the data is used for decision-making or to improve
business processes.

Data Mining Techniques

Let’s explore the most commonly used data mining techniques below -

Classification

Classification is a supervised learning technique in data mining that assigns predefined


classes to objects or instances based on their attributes or features. It involves building a
model from a set of training data that consists of labeled examples, where the class label of
each example is known. The model is then used to classify new, unseen data based on their
attributes.

For example, consider a bank that wants to identify customers who are likely to default on
their loans. The bank can use classification to build a model that predicts the default risk of a
customer based on their credit score, income, and other relevant factors. The model can then
be used to classify new loan applicants as low or high-risk.

Classification algorithms used in data mining include decision trees, naive Bayes, support
vector machines (SVM), and logistic regression, among others. These algorithms differ in
their assumptions, strengths, and weaknesses and are chosen based on the characteristics of
the data and the problem being solved.

Clustering

Clustering is an unsupervised learning technique in data mining that involves grouping


similar objects or instances together based on their attributes or features. Unlike
classification, clustering does not involve predefined classes but rather groups objects based
on their similarity. The objective of clustering is to discover inherent patterns and structures
in the data that may not be immediately apparent.

For example, consider a retailer that wants to segment its customers based on their shopping
behavior. The retailer can use clustering to group customers with similar purchasing patterns,
such as those who buy high-end products or shop frequently. This information can be used to
tailor marketing strategies and promotions to each segment.

Clustering algorithms used in data mining include k-means, hierarchical clustering, and
density-based clustering, among others. These algorithms differ in their assumptions and how
they define similarity or distance between objects.

Regression

Regression is a supervised learning technique in data mining that involves building a model
to predict a continuous or numerical output variable based on one or more input variables or
predictors. Regression aims to establish a functional relationship between the input and
output variables.

For example, consider a real estate agency that wants to predict the price of a house based on
its features, such as size, location, and the number of bedrooms. The agency can use
regression to build a model that predicts the price of a house based on these features. The
model can then be used to estimate the price of new houses or to identify undervalued
properties.

Regression algorithms used in data mining include linear regression, polynomial regression,
and decision tree regression, among others. These algorithms differ in their assumptions and
how they model the relationship between the input and output variables.

Association Rules Mining

Association rule mining is an unsupervised learning technique in data mining that involves
discovering relationships or associations between variables in a dataset. It aims to find
patterns of co-occurrence or correlation among variables frequently occurring together in the
data.

For example, consider a retailer that wants to increase its sales by offering promotions or
discounts to customers who buy certain products. The retailer can use association rule mining
to identify which products are often bought together, such as bread and butter or shampoo
and conditioner. This information can be used to create targeted promotions and cross-selling
strategies.

Association rule mining algorithms used in data mining include Apriori, FP-Growth, and
Eclat, among others. These algorithms differ in their approach to identifying frequent
itemsets or sets of variables that occur together.

Advantages and Disadvantages

Data mining techniques have several advantages, which are as follows -

• Identification of patterns and trends -


Data mining techniques help identify patterns and trends in large datasets, providing
valuable insights for decision-making.
• Automated processing -
Data mining techniques automate the process of analyzing data, reducing the time
and effort required for manual analysis.
• Prediction and forecasting -
Data mining techniques can be used to build predictive models that help forecast
future trends and events.
• Improved decision-making -
Data mining techniques provide valuable information and insights that help make
informed decisions.
• Increased efficiency -
Data mining techniques help identify areas of inefficiency or waste, allowing
organizations to streamline their operations and improve efficiency.
• Personalization -
Data mining techniques can help personalize customer recommendations and
experiences based on their preferences and behaviors.

Data mining techniques also have several disadvantages, as mentioned below -

• Privacy concerns -
Data mining techniques can be used to extract sensitive information about
individuals, which can raise privacy concerns.
• Reliance on data quality -
Data mining techniques rely on the quality and accuracy of the data, and inaccurate
or incomplete data can lead to incorrect conclusions.
• Complexity -
Data mining techniques can be complex and require specialized skills and knowledge,
making it difficult for non-experts to use them effectively.
• Cost -
Implementing data mining techniques can be expensive, requiring specialized
hardware, software, and personnel.
• Overfitting -
Data mining techniques can sometimes lead to overfitting, where the model is too
closely fitted to the training data and does not generalize well to new data.

Conclusion

• Data mining techniques are a set of methods and tools used to extract knowledge
and insights from large and complex datasets.
• These techniques can help discover patterns, relationships, and trends that may not
be apparent otherwise and improve decision-making and efficiency.
• A few of the most commonly used data mining techniques include classification,
regression, association rules mining, clustering, outlier detection, etc.
• Data mining techniques are widely used in various fields, including finance,
healthcare, marketing, and more, and continue to evolve and improve with advances
in technology and data science.
OLTP vs OLAP

The two terms OLTP and OLAP look kinda similar, but they relate to two different types of
systems.

Online transaction processing (OLTP) is a system that stores capture and processes data
in a real-time.

In, Online analytical processing (OLAP), complex queries are used to analyze collected
historical data from the OLTP systems.

What is OLAP ?

OLAP is an abbreviated form of Online Analytical processing, OLAP consists of software or


tools that are used for analytics and getting insights from databases for making business
decisions.

OLAP software provides an environment for analyzing data from multiple databases at one
time. OLAP systems can make use of transactions from databases of OLTP systems and
applies queries on that data for analytical purposes, data mining, or BI(business
intelligence) projects, the major factor that determines the performance of these systems is
the response time taken to analyze the database.

OLAP databases help decision-makers to take decisions based on the analytic data provided
by OLAP systems.

Examples of OLAP systems :

• Data Warehouses
• Movie recommendation system
• Music recommendation systems
• Marketing trends analytical system

What is OLTP ?

OLTP is an abbreviated form of Online Transaction Processing, OLTP systems provide


transaction-oriented applications. Transaction data is gathered and maintained in a database
by an OLTP system. Individual database entries comprise numerous fields or columns in each
transaction. OLTP systems are used by organizations for day-to-day transactions.

Examples of OLTP systems :

• Banking software.
• Online ticket booking software.
• Messaging.
• Data Entry.
• E-Commerce purchasing and order management.
• Difference between OLAP and OLTP
OLAP (Online Analytical OLTP (Online Transaction
Category Processing) Processing)

It is well-known as an online
It is well-known as an online
Definition database query management
database modifying system.
system.

Consists of historical data from Consists of only operational current


Data source
various Databases. data.

It makes use of a data It makes use of a standard database


Method used
warehouse. management system (DBMS).

It is subject-oriented. Used
It is application-oriented. Used for
Application for Data Mining, Analytics,
business tasks.
Decisions making, etc.

In an OLAP database, tables are In an OLTP database, tables


Normalized
not normalized. are normalized (3NF).

The data is used in planning,


Usage of The data is used to perform day-to-
problem-solving, and decision-
data day fundamental operations.
making.

It provides a multi-dimensional It reveals a snapshot of present


Task
view of different business tasks. business tasks.

It serves the purpose to extract It serves the purpose to Insert,


Purpose information for analysis and Update, and Delete information from
decision-making. the database.

The size of the data is relatively small


Volume of A large amount of data is stored
as the historical data is archived in
data typically in TB, PB
MB, and GB.

Relatively slow as the amount of


Very Fast as the queries operate on
Queries data involved is large. Queries
5% of the data.
may take hours.

The OLAP database is not often


The data integrity constraint must be
Update updated. As a result, data
maintained in an OLTP database.
integrity is unaffected.
OLAP (Online Analytical OLTP (Online Transaction
Category Processing) Processing)

It is comparatively fast in processing


Processing The processing of complex
because of simple and
time queries can take a lengthy time.
straightforward queries.

Only read and rarely write


Operations Both read and write operations.
operations.

With lengthy, scheduled batch


The user initiates data updates, which
Updates operations, data is refreshed on a
are brief and quick.
regular basis.

Database Design with a focus on the Design that is focused on the


Design subject. application.

Improves the efficiency of


Productivity Enhances the user’s productivity.
business analysts.

Conclusion

• The two terms OLTP and OLAP look quite similar, but they relate to two different
types of systems.
• OLAP is an abbreviated form of Online Analytical processing, OLAP consists of
software or tools that are used for analytics and getting insights from databases for
making business decisions.
• OLTP is an abbreviated form of Online Transaction Processing, OLTP systems
provide transaction-oriented applications.
• There are many differences between oltp vs olap systems based on some parameters,
as discussed in this article.
• There are many benefits and drawbacks of using OLTP vs OLAP systems, as
discussed in this article so that we can choose which system can be used and when.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy