0% found this document useful (0 votes)
21 views16 pages

Unit-1 DM

Data mining

Uploaded by

tejabikkili
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views16 pages

Unit-1 DM

Data mining

Uploaded by

tejabikkili
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

UNIT-1Overview and concepts Data Warehousing and

Business Intelligence

Why reporting and Analysing data, Raw data to valuable information Lifecycle
of Data – What is Business Intelligence – BI and DW in today’s perspective –
What is data warehousing – The building Blocks: Defining Features – Data
warehouses and data Imarts – Overview of the components – Metadata
warehousing – Trends in data warehousing.

Why reporting and Analysis data:-

* Data reporting: Gathering data into one place and presenting it in visual
representations.

* Data analysis: Interpreting your data and giving it context.

6 Differences Between Data Analysis and Reporting

Respondents mentioned these six main differences between data analysis and
reporting:

1. Reqiured skills

2. Order Of Operation

3. Time Needed To Implement

4. Ease of Automation

5. Impact on Strategy

6. Data Context
1. Required Skills

Many of the professionals we consulted consider data analysis a higher-skill task


than data reporting. This isn’t to say that reporting doesn’t matter as much because
of the difference in skill level, of course — you’ll learn more about its importance
throughout this blog post. But, data analysis takes more advanced knowledge and
practice to master compared to reporting.

2. Order of Operations

Since data reporting organizes and visualizes data, it must happen before you can
perform data analysis. Even if you decide to complete only your data analysis in-
house, you’ll need to have some form of data to work with, such as an external
report. Stevens explains how analysis follows reporting. “Reports don’t give
conclusions, but a proper analysis of the question raised in the reports will. Data
analysis provides answers to the why, and also gives the way forward.”

3. Time Needed to Implement

After you get a finished data report or analysis, someone will need to implement that
information. Depending on your business structure, it can take longer to implement
one than the other. So, when you consider the time you’ll need to take performing
and implementing data analysis and reporting, keep your industry, work style and
team structure in mind.

4. Ease of Automation

Since data analysis requires a human touch, it’s harder to automate than data
reporting. On the other hand, analysis’s need for human involvement lends it its
strengths. How do these concepts apply to data? “Close reading” data works the
same way. It transforms cold information into warm insight and actionable
intelligence.

5. Impact on Strategy

Most businesses collect data to inform some kind of strategy, whether for customer
retention, finances, lead generation, or another aspect of the business. Most of the
survey respondents who contributed their opinions agree that analysis has a much
bigger role to play in building strategies than reporting. Data reports give you a look
into your organization’s current performance. Meanwhile, analysis turns that data
into actionable insights to guide your future actions.

6.Data Context

While reporting provides data without context so you can draw your own
conclusions, analysis makes those conclusions for you to deliver context.

Raw data to valuable information Lifecycle of Data


Introduction
Raw data, often referred to as source or primary data, is data that has not yet been
processed, coded, formatted, or analyzed. While raw data is a valuable resource,
it can be challenging to work with or understand as it’s visually cluttered and can
lack cohesion. Organizations can collect and use raw data to learn more about
customers, sales, the success of marketing campaigns, and other useful targets,
but first they need to structure and organize the data into a form that’s easier to
read and visualize.

How Is Raw Data Used?

Raw data is data collected from one or multiple sources that remains in its

unaltered initial state. At this point, the data might contain human, machine, or
instrumental errors—depending on the collection method. Collecting raw data is

the first step toward gaining a more thorough understanding of a demographic,

system, concept, or environment.

Raw Data Collection Steps

Defining Goals

Define the information you want to extract to lay the groundwork for your raw
data-gathering goals. For example, if the desired data is user-base and customer
information, online and in-person surveys focused on a specific age and
geographical demographic can be used to gather it.

Other types of raw data may require advance planning. For instance, collecting
data from log records would require having a monitoring system in place for
anywhere from a few weeks to a year to collect data before being able to pull it.

Choosing A Collection Method

Choosing the appropriate raw data collection method can reduce the percentage
of human or machine errors you’d have to scrub out when cleaning a raw
database. Generally, electronic collecting methods tend to result in lower error
rates—manual collection can introduce variables that leave room for
interpretation, such as illegible handwriting or hard-to-understand accents in
audio or video recordings.

Collecting Data

Raw data tends to be large in volume and highly complex. During the collection
process, the overall volume of data is only an estimate—once you process the
data by cleaning it of errors and invalid data points, you’ll have a more accurate
sense of scope.

9 Types Of Data Processing Choosing the best approach to handle raw


data is critical for effective data management. It will depend upon the type and
volume of data and the pace of collection, among other concerns.
Here are the nine most common types of data processing:

• Batch Processing—A set or batch of data is processed in bulk at regular


intervals. This method is suitable for operations that do not require
instant replies, such as payroll processing, when efficiency trumps real-
time engagement.
• Real-Time Processing—Data is processed as soon as it is created or
received, ensuring quick reactions. Commonly used in instances when
rapid decision-making is essential, such as financial transactions or
monitoring systems.
• Online Processing—Similar to real-time processing in that it handles
data as it is entered or requested, as seen in interactive systems such as
online databases. Allows for quick data retrieval and updating in
response to changing user requirements.
• Distributed Processing—Spreads processing work across
interconnected computers to improve overall system efficiency and
performance. Frequently used in large-scale data processing
applications where centralized processing is unfeasible.
• Parallel Processing—Processes many jobs or programs at the same
time using multiple processors, enhancing processing speed for difficult
calculations. Ideal for jobs that can be broken down into parallelizable
subtasks.
• Multi-Processing—Performs many processes or applications at the
same time on a computer with multiple processors. By processing
several jobs concurrently, improves overall system speed and
throughput.
• Transaction Processing—Individual transactions or company
processes are processed in real-time. In systems such as online banking,
where the rapid and correct processing of financial transactions is
critical for ensuring data integrity, this is essential.
• Manual Data Processing—Involves human interaction in data
processing in the absence of automated technologies. This might include
procedures like manually inputting data into a system, which is
inefficient yet may be required for particular jobs or data kinds.
• EDP (Electronic Data Processing)—Data processing and analysis
using electronic devices and computers. Encompasses a wide range of
automated operations, ranging from simple data input to complicated
computations, that are routinely employed in current data processing
applications across a variety of sectors.
Business Intelligence :-

What is Business Intelligence?


Business Intelligence (BI) is a set of ideas, methodologies, processes,
architectures, and technologies that change raw data into significant and useful
data for business purpose. Business Intelligence can handle large amounts of data
to help identify and evolve new opportunities for the business.

Purpose of Business Intelligence Systems:

‘The purpose of business intelligence systems is to utilise all your underlying


business data to help executives make better business decisions,’ said Tony
Banham, technology and solutions director for Oracle Greater China.

Business Intelligence process consists of 3 distinct tasks:

1. The first task BI has to do is to gather the necessary data about the business.
The key to this is automating the process. Gathering data was very time and
money consuming in the past, but todays with the usage of modern computers,
it’s much easier to collect data from various sources.

2. The second task is to analyse the collected data and then further extract
information from it. The extracted information is then transformed into
knowledge.

3. The final task is to use the newly gathered knowledge to improve the business.
There are many business intelligence tools to complete the process of gathering
knowledge.

What is data warehousing:-


The concept of the data warehouse (Figure 1.1) is a lone scheme that is the
repository of all of the organization’s data (or simply data) in a pattern that can
be competently analysed so that significant accounts can be arranged for
administration and other information workers.
Need for Data Warehouse
An ordinary Database can store MBs to GBs of data and that too for a specific
purpose. For storing data of TB size, the storage shifted to the Data Warehouse.
Besides this, a transactional database doesn’t offer itself to analytics. To
effectively perform analytics, an organization keeps a central Data Warehouse
to closely study its business by organizing, understanding, and using its
historical data for making strategic decisions and analyzing trends.
Benefits o0f Data Warehouse
• Better business analytics: Data warehouse plays an important role
in every business to store and analysis of all the past data and records
of the company. which can further increase the understanding or
analysis of data for the company.
• Faster Queries: The data warehouse is designed to handle large
• queries that’s why it runs queries faster than the database.
• Improved data Quality: In the data warehouse the data you
gathered from different sources is being stored and analyzed it does
not interfere with or add data by itself so your quality of data is
maintained and if you get any issue regarding data quality then the data
warehouse team will solve this.
• Historical Insight: The warehouse stores all your historical data
which contains details about the business so that one can analyze it at
any time and extract insights from it.

Applications of Data Warehousing


Data Warehousing can be applied anywhere where we have a huge amount of
data and we want to see statistical results that help in decision making.
• Social Media Websites: The social networking websites like
Facebook, Twitter, Linkedin, etc. are based on analyzing large data
sets. These sites gather data related to members, groups, locations, etc.,
and store it in a single central repository. Being a large amount of data,
Data Warehouse is needed for implementing the same.
• Banking: Most of the banks these days use warehouses to see the
spending patterns of account/cardholders. They use this to provide
them with special offers, deals, etc.
• Government: Government uses a data warehouse to store and
analyze tax payments which are used to detect tax thefts.

Advantages of Data Warehousing


• Intelligent Decision-Making: With centralized data in warehouses,
decisions may be made more quickly and intelligently.
• Business Intelligence: Provides strong operational insights through
business intelligence.
• Historical Analysis: Predictions and trend analysis are made easier
by storing past data.
• Data Quality: Guarantees data quality and consistency for
trustworthy reporting.
• Scalability: Capable of managing massive data volumes and
expanding to meet changing requirements.
• Effective Queries: Fast and effective data retrieval is made possible
by an optimized structure.
• Cost reductions: Data warehousing can result in cost savings over
time by reducing data management procedures and increasing overall
efficiency, even when there are setup costs initially.
• Data security: Data warehouses employ security protocols to
safeguard confidential information, guaranteeing that only authorized
personnel are granted access to certain data.

Disadvantages of Data Warehousing


• Cost: Building a data warehouse can be expensive, requiring
significant investments in hardware, software, and personnel.
• Complexity: Data warehousing can be complex, and businesses may
need to hire specialized personnel to manage the system.
• Time-consuming: Building a data warehouse can take a significant
amount of time, requiring businesses to be patient and committed to
the process.
• Data integration challenges: Data from different sources can be
challenging to integrate, requiring significant effort to ensure
consistency and accuracy.
• Data security: Data warehousing can pose data security risks, and
businesses must take measures to protect sensitive data from
unauthorized access or breaches.

Data Marts
As corporate-wide data warehouses came into use, it was discovered that in
many situations full-blown data warehouse was overkill for applications. Data
marts evolved to solve this problem. A data mart is a special type of a data
warehouse. It is focused on a single subject (or functional area), such as Sales,
Finance, or Marketing. Whereas data warehouses have an enterprise- wide
depth, the information in data marts pertains to a single department. The primary
use for a data mart is Business Intelligence (BI) applications. Implementing a
data mart can be less expensive than implementing a data warehouse, thus
making it more practical for the small business.
Types of Data Mart
There are three common types of data marts:
• Independent Data Mart
• Dependent Data Mart
• Hybrid Data Mart
1. Independent Data Mart
An independent data mart is created and maintained separately from the data
warehouse. It is created to satisfy the particular needs of a specific business unit
or department. Independent data marts are typically smaller in size and more
rapidly and readily set up. They offer flexibility and agility since they are not
constrained by the challenges of the centralized data warehouse. Nevertheless,
data redundancy and inconsistency may result if it is replicated over several
different data marts.
2. Dependent Data Mart
A dependent data mart is generated right out of a data warehouse. It takes some
of the data from the data warehouse and arranges it to meet the needs of a
specific industry. Dependent data marts, which profit from the data integration,
data quality, and consistency provided by the data warehouse, allow for the
centralization and preservation of all data in a single source of truth. They are
often developed to serve particular reporting and analytical needs, and they are
frequently updated from the data warehouse. Dependent data marts offer data
consistency and prevent data duplication because they rely on the data
warehouse as their main source of data.
3. Hybrid Data Mart
Both independent and dependent data mart components can be found in a hybrid
data mart. As well as combining additional data sources particular to a given
business unit or department, it makes use of the centralized data warehouse for
the integration and consistency of the core data. By offering flexibility and
agility for department-specific needs while keeping the integrity and
consistency of shared data from the data warehouse, hybrid data marts offer the
benefits of both strategies. This strategy creates a balance between localized
data management and centralized control.

Difference between Data Warehouse and Data Mart


Data Warehouse Data Mart
Data warehouse is a Centralised While it is a decentralised system.
system.
In data warehouse, lightly While in Data mart, highly
denormalization takes place. denormalization takes place.
Data warehouse is top-down model. While it is a bottom-up model.

To built a warehouse is difficult. While to build a mart is easy.

In data warehouse, Fact constellation While in this, Star schema and


schema is used. snowflake schema are used.
Data Warehouse is flexible. While it is not flexible.
Data Warehouse is the data-oriented While it is the project-oriented in
in nature. nature.
Data Ware house has long life. While data-mart has short life than
warehouse.
In Data Warehouse, Data are While in this, data are contained in
contained in detail form. summarized form.
Data Warehouse is vast in size. While data mart is smaller than
warehouse.
The Data Warehouse might be The Size of Data Mart is less than
somewhere between 100 GB and 1 100 GB.
TB+ in size.
It uses a lot of data and has Operational data are not present in
comprehensive operational data. Data Mart.
It collects data from various data It generally stores data from a data
sources. warehouse.
Long time for processing the data Less time for processing the data
because of large data. because of handling only a small
amount of data.
Complicated design process of Easy design process of creating
creating schemas and views. schemas and views.
Metadata warehousing:-
Metadata is simply defined as Data about data. The data that is used to represent
other data is known as Metadata. We can define metadata is the road-map to a
data warehouse. Metadata in a data warehouse defines the warehouse objects.
Metadata is data that describes and contextualizes other data.
It provides information about the content, format, structure, and other
characteristics of data, and can be used to improve the organization,
discoverability, and accessibility of data. Metadata can be stored in various
forms, such as text, XML, or RDF, and can be organized using metadata
standards and schemas. Examples file meta data, image metadata, music
metadata, video metadata, web metadata, document metadata.

Types of Metadata:

There are many types of metadata that can be used to describe different aspects
of data, such as its content, format, structure, and provenance. Some common
types of metadata inclu de:
1. Descriptive metadata: This type of metadata provides information
about the content, structure, and format of data, and may include
elements such as title, author, subject, and keywords. Descriptive
metadata helps to identify and describe the content of data and can be
used to improve the discoverability of data through search engines and
other tools.
2. Administrative metadata: This type of metadata provides
information about the management and technical characteristics of
data, and may include elements such as file format, size, and creation
date. Administrative metadata helps to manage and maintain data over
time and can be used to support data governance and preservation.
3. Structural metadata: This type of metadata provides information
about the relationships and organization of data, and may include
elements such as links, tables of contents, and indices. Structural
metadata helps to organize and connect data and can be used to
facilitate the navigation and discovery of data.
4. Provenance metadata: This type of metadata provides information
about the history and origin of data, and may include elements such as
the creator, date of creation, and sources of data. Provenance metadata
helps to provide context and credibility to data and can be used to
support data governance and preservation.
5. Rights metadata: This type of metadata provides information about
the ownership, licensing, and access controls of data, and may include
elements such as copyright, permissions, and terms of use. Rights
metadata helps to manage and protect the intellectual property rights
of data and can be used to support data governance and compliance.
6. Educational metadata: This type of metadata provides information
about the educational value and learning objectives of data, and may
include elements such as learning outcomes, educational levels, and
competencies. Educational metadata can be used to support the
discovery and use of educational resources, and to support the design
and evaluation of learning environments.

Benefits of Metadata :-

A metadata repository is a centralized database or system that is used to store


and manage metadata. Some of the benefits of using a metadata repository
include:
1. Improved data quality: A metadata repository can help ensure that
metadata is consistently structured and accurate, which can improve
the overall quality of the data.
2. Increased data accessibility: A metadata repository can make it
easier for users to access and understand the data, by providing context
and information about the data.
3. Enhanced data integration: A metadata repository can facilitate
data integration by providing a common place to store and manage
metadata from multiple sources.
4. Improved data governance: A metadata repository can help enforce
metadata standards and policies, making it easier to ensure that data is
being used and managed appropriately.
5. Enhanced data security: A metadata repository can help protect the
privacy and security of metadata, by providing controls to restrict
access to sensitive or confidential information.
Trends in data warehousing:-
Data warehousing for agencies has become extremely important in the past few
years. Data warehousing trends have been evolving thanks to advances in data
analytics and cloud-based tools like BigQuery.
Nowadays, data warehousing is trending in the world of marketing agencies.

11data Warehousing Trends:

#1. Single Data Warehouse:


In the last decade, companies have shown the tendency to use a multitude of SaaS
apps—an average of 110 for enterprises—which leads to having data scattered
across different platforms. This practice compounds costs in maintenance and
subscription fees and can increase IT staffing needs for data specialists .

Consolidating your data storage sources into a single service as much as possible
can reduce such costs. Following this data warehousing trend is guaranteed to
optimise operations in your marketing agency, or any other large agency.

#2. Green Data Warehousing


Cloud data warehousing has increasingly become a part of energy reduction plans
for businesses. It’s great to see a green data warehousing trend making its way
onto the scene while we are tackling climate change.

Cloud data centres operate with energy efficiencies well above industry averages.
In 2021, companies in the EU reported an 80% energy consumption
reduction following the migration of their enterprise data to SaaS storage
providers.

#3. Outsourcing Data Management

Outsourcing data management operations is becoming a trend amongst marketing


agencies looking to elevate their operations. Outsourcing data operations enables
businesses to reduce their in-house IT staffing costs. In exchange, businesses also
gain access to the services of experienced specialists, whose exclusive
compensation can otherwise exceed their operating budgets.

Data management operations suitable for outsourcing include:

• Data warehousing and automation,


• Encoding,
• Compilation,
• Auditing,
• System architecture design, and more.

#4. Data Warehousing AI Solutions

As the growth of the volume of data being processed is forecasted to spike,


businesses increasingly need to offload data operations to faster machine
learning-enabled AI systems.

With trend and pattern analysis capabilities improving by leaps and bounds,
businesses that integrate AI into their data warehousing solutions can reduce their
operational costs and perform data operations more efficiently.

#5. Virtual Data Warehousing


Virtual data warehouses are sets of separate databases that can be queried
simultaneously by means of middleware. Virtual data warehousing is trending
because it’s cost-effective and can be deployed faster than physical solutions. By
forgoing physical data replication, virtualization can improve operating speeds
and reduce operating costs by as much as four times.

#6. In-Memory Computing


In-memory computing uses clusters of servers to pool total available RAM and
CPU power. This design distributes data handling and processing tasks across the
cluster for radically enhanced speed and scalability.

Although first popularised by the financial services industry, in-memory


computing experienced rapid growth during the recent shift to remote work and
has also become a trend in data warehousing.

Presently, this data warehousing trend appears to have become a permanent


practice across all industries owing to the continued work-from-home practices
many companies have adopted.

#7. In-Database Analytics

In-database analytics refers to a method of analysis that processes data within


its storage site—a database or data warehouse. In-database analytics are built
into the storage architecture and replace the use of separate applications after
transfers.

Performing analytical processes on the interior minimises data movement,


reduces bandwidth overhead requirements, and eliminates security risks of
distributing sensitive data across multiple sites and devices.

For these reasons, in-database analytics have become one of the latest data
warehousing trends, especially amongst marketing agencies.

#8. Data Compression


As marketing agencies accumulate more data over time, they have seen a need
to compress their data and save on storage space. As their stored data continues
to balloon, companies increasingly rely on compression tools to mitigate rising
storage costs.

Data compression reduces the number of bits (binary digits) necessary to store
data. It works by creating reference libraries for the 1s and 0s of binary data and
then replacing longer strings with shorter reference tags.

Compressing your data frees up storage capacity, accelerates data transfers, and
reduces overall storage costs. That’s why it’s been one of the most popular data
warehousing trends for a while, and it’s likely to stick around.
#9. Analytics on Demand
In a SaaS-heavy work environment, users may be extracting data from
warehouses through dozens of different applications. Analytics on demand has
thus become a trend in marketing agencies.

On-demand analytics refers to IT architectures that allow users to access data in


sandboxes—a virtual machine host—using a wide variety of software
platforms. This approach is helping many companies meet the growing need for
better and faster analytics processes.

#10. Hadoop Integration


The open-source application Hadoop uses a distributed file system and a parallel
processing tool called MapReduce to process large data sets at high speeds.

#11. Simplified Data Warehousing for Marketing Agencies


custom data warehouses have become a trend amongst marketing agencies.

Most data warehousing applications cater to developers. Nevertheless, marketing


and sales people increasingly need access to the same data management tools to
take better advantage of their data.

For users with a non-technical background who may be unfamiliar with


programming and writing queries, no-code data warehousing solutions built
with a user-friendly interface open up this powerful technology for broader use.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy