0% found this document useful (0 votes)

81 views15 pages

DMW Notes UNIT-1 2023-24

Dmw

Uploaded by

Rocky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views15 pages

DMW Notes UNIT-1 2023-24

Dmw

Uploaded by

Rocky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

B.K.

BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani

5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I
What is Data Mining?
 Discovery of useful summaries of data - Ullman
 Extracting or “Mining” knowledge form large amounts of data
 The efficient discovery of previously unknown patterns in large databases
 Technology which predict future trends based on historical data
 It helps businesses make proactive and knowledge-driven decisions

Many definitions:
 Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful)
information or patterns from data in large databases
 Look for hidden patterns & trends that are not immediately apparent from summarizing the data.
 E.g. correlation between grades in two subjects.

1| Prepared by: Manoj Kumar Saini

B.K. BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani
5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I
Phases / Steps in data mining
Data Mining: A KDD (Knowledge Discovery from Data) Process

Stages of Data Mining Process

1. Data gathering, e.g., operational sources, www.
2. Data cleansing: eliminate errors and/or bogus data, e.g., patient fever = 125.
3. Feature extraction/ Selection & Transformation: obtaining only the interesting attributes of the data,
e.g., “date acquired” is probably not useful for clustering celestial objects, as in Skycat.
4. Pattern extraction and discovery. This is the stage that is often thought of as “data mining” and is
where we shall concentrate our effort.
5. Visualization of the data.
6. Evaluation of results; not every discovered fact is useful, or even true! Judgment is necessary before
following your software's conclusions.

2| Prepared by: Manoj Kumar Saini

B.K. BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani
5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I
Data Mining Functionalities — What Kinds of Patterns Can Be Mined?
Data mining functionalities are used to specify the kind of patterns to be found in data mining tasks. In
general, data mining tasks can be classified into two categories:
 Predictive Mining
 Use some variables to predict unknown or future values of other variables.
 Descriptive Mining
 Find human-interpretable patterns that describe the data.

3| Prepared by: Manoj Kumar Saini

B.K. BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani
5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I
Classification of Data Mining Systems:

1) Concept/Class Description: Characterization and Discrimination

Concept description is a form of data generalization.

A concept typically refers to a collection of data such as frequent_buyers, graduate_students, and so

on.
A description of a concept in a summarized, concise and yet a precise term is known as concept
description.
These descriptions can be derived via-

i) Characterization: provides a concise and succinct summarization of the given collection of data
ii) Comparison/Discrimination: provides descriptions comparing two or more collections of data.

For example, customers who purchase computer products frequently  80% of such customers
have age between 20 & 40 and have a university degree
Whereas, customers who do not purchase computer products frequently  60% of such customers
are either senior citizens or youth without a university degree.

2) Mining Frequent Patterns, Associations, and Correlations

Frequent patterns are patterns that occur frequently in data. There are many kinds of frequent patterns,
including itemsets (set of items), subsequences, and substructures.
Associations and Item-sets
An association is a rule of the form: if X then Y Denoted by X Y
Ex: If India wins in cricket, sales of sweets go up.
If a customer buys a computer he also buys an antivirus.
For any rule if X Y => Y X
then X & Y are called “Interesting Item-sets”
Ex. People buying school uniform in June also buys school bags
(People buying school bags in June also buys school uniform)

3) Classification and Prediction

Classification and prediction are two forms of data analysis that can be used to extract models
describing important data classes and to predict future data trends. Whereas classification predicts
categorical (discrete, unordered) labels, prediction models continuous-valued functions.

4| Prepared by: Manoj Kumar Saini

B.K. BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani
5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I

4) Clustering
 Given points in some space, often a high-dimensional space. Group the points into a small number
of clusters
 Each cluster consisting of points that are “near” in some sense
 Points in the same cluster are “similar” and are “dissimilar” to points in other clusters

5| Prepared by: Manoj Kumar Saini

B.K. BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani
5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I
5) Anomaly Detection/Outliers
 Objects whose characteristics are significantly different from the rest of the data
 Such observations are known as ANOMALIES or OUTLIERS
 False alarms to be avoided
 Applications
 Fraud detection
 Network intrusions
 Unusual patterns of disease
 Ecosystem disturbances
Examples of Discovered Patterns
 Association rules
o 98% of people who purchase diapers also buy beer
 Classification
o People with age less than 25 and salary > 40k drive sports cars
 Similar time sequences
o Stocks of companies A and B perform similarly
 Outlier Detection
o Residential customers for telecom company with businesses at home

6| Prepared by: Manoj Kumar Saini

B.K. BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani
5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I
Data Mining Applications
Some examples of “successes":
1. Decision trees constructed from bank-loan histories to produce algorithms to decide whether to grant a
loan.
2. Patterns of traveler behavior mined to manage the sale of discounted seats on planes, rooms in hotels,
etc.
3. “Diapers and beer" Observation that customers who buy diapers are more likely to buy beer than
average allowed supermarkets to place beer and diapers nearby, knowing many customers would walk
between them. Placing potato chips between increased sales of all three items.
4. Skycat and Sloan Sky Digital Sky Survey: clustering sky objects by their radiation levels in different
bands allowed astronomers to distinguish between galaxies, nearby stars, and many other kinds of
celestial objects.
(168 million records and some 500 attributes)
5. Comparison of the genotype of people with/without a condition allowed the discovery of a set of genes
that together account for many cases of diabetes. This sort of mining has become much more important
as the human genome has fully been decoded

Examples
 BANK AGENT:
◦ Must I grant a mortgage to this customer?
 SUPERMARKET MANAGER:
◦ When customers buy eggs, do they also buy oil?
 PERSONNEL MANAGER:
◦ What kind of employees do I have?
 AGRICULTURAL SCIENTIST:
◦ What would be the wheat yield this year?
 NETWORK ADMINISTRATOR:
◦ Which website visitor is a hacker?
◦ Which incoming mail is a spam?
 TRADER in a RETAIL COMPANY:
◦ How many flat TVs do we expect to sell next month?

7| Prepared by: Manoj Kumar Saini

B.K. BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani
5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I
Data Mining Issues

 Mining Methodology and User Interaction Issues

− Mining different kinds of knowledge in databases − Different users may be interested in different
kinds of knowledge. Therefore it is necessary for data mining to cover a broad range of knowledge
discovery task.
− Interactive mining of knowledge at multiple levels of abstraction − The data mining process
needs to be interactive because it allows users to focus the search for patterns, providing and refining
data mining requests based on the returned results.
− Incorporation of background knowledge − To guide discovery process and to express the
discovered patterns, the background knowledge can be used. Background knowledge may be used to
express the discovered patterns not only in concise terms but at multiple levels of abstraction.
− Data mining query languages and ad hoc data mining − Data Mining Query language that allows
the user to describe ad hoc mining tasks, should be integrated with a data warehouse query language
and optimized for efficient and flexible data mining.
− Presentation and visualization of data mining results − Once the patterns are discovered it needs
to be expressed in high level languages, and visual representations. These representations should be
easily understandable.

8| Prepared by: Manoj Kumar Saini

B.K. BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani
5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I
− Handling noisy or incomplete data − The data cleaning methods are required to handle the noise
and incomplete objects while mining the data regularities. If the data cleaning methods are not there
then the accuracy of the discovered patterns will be poor.
− Pattern evaluation − The patterns discovered should be interesting because either they represent
common knowledge or lack novelty.

 Performance Issues
− Efficiency and scalability of data mining algorithms − In order to effectively extract the
information from huge amount of data in databases, data mining algorithm must be efficient and
scalable.
− Parallel, distributed, and incremental mining algorithms − The factors such as huge size of
databases, wide distribution of data, and complexity of data mining methods motivate the
development of parallel and distributed data mining algorithms. These algorithms divide the data
into partitions which is further processed in a parallel fashion. Then the results from the partitions is
merged. The incremental algorithms, update databases without mining the data again from scratch.

 Diverse Data Types Issues

− Handling of relational and complex types of data − The database may contain complex data
objects, multimedia data objects, spatial data, temporal data etc. It is not possible for one system to
mine all these kind of data.
− Mining information from heterogeneous databases and global information systems − The data
is available at different data sources on LAN or WAN. These data source may be structured, semi
structured or unstructured. Therefore mining the knowledge from them adds challenges to data
mining.

9| Prepared by: Manoj Kumar Saini

B.K. BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani
5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I
DATA PREPROCESSING
Why Preprocess Data?

 Data in the real world is dirty

 Incomplete: lacking attribute values, lacking certain attributes of interest, or containing only
aggregate data. e.g., occupation=“ ”
 Noisy: containing errors or outliers. e.g., Salary=“-10”
 Inconsistent: containing discrepancies in codes or names
 e.g., Age=“30” on 10/10/2013 and Birthday=“22/04/1984”
 e.g., Was rating “1,2,3”, now rating “A, B, C”
 e.g., discrepancy between duplicate records
 No quality data, no quality mining results!
 Quality decisions must be based on quality data. e.g., duplicate or missing data may cause
incorrect or even misleading statistics.

Sources of Dirty Data

 Incomplete data may come from

 “Not applicable” data value when collected
 Different considerations between the time when the data was collected and when it is
analyzed.
 Human/hardware/software problems
 Noisy data (incorrect values) may come from
 Faulty data collection instruments
 Human or computer error at data entry
 Errors in data transmission
 Inconsistent data may come from
 Different data sources
 Functional dependency violation (e.g., modify some linked data)
Duplicate records also need data cleaning

10 | Prepared by: Manoj Kumar Saini

B.K. BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani
5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I
Forms of data preprocessing

Major Tasks in Data Preprocessing:

 Data cleaning
 Fill in missing values, smooth noisy data, identify or remove outliers, and resolve
inconsistencies
 Data integration
 Integration of multiple databases, data cubes, or files
 Data transformation
 Normalization and aggregation
 Data reduction (sampling)
 Obtains reduced representation in volume but produces the same or similar analytical results
 Data discretization
 Part of data reduction but with particular importance, especially for numerical data

11 | Prepared by: Manoj Kumar Saini

B.K. BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani
5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I
Data Cleaning
Real-world data tend to be incomplete, noisy, and inconsistent. Data cleaning (or data cleansing) routines
attempt to fill in missing values, smooth out noise while identifying outliers, and correct inconsistencies in
the data.
1) Missing Data
Data is not always available - E.g., many tuples have no recorded value for several attributes, such as
customer income in sales data
Missing data may be due to -
a. equipment malfunction
b. inconsistent with other recorded data and thus deleted
c. data not entered due to misunderstanding
d. certain data may not be considered important at the time of entry
e. not register history or changes of the data
Missing data may need to be inferred.
Ways to Handle Missing Data-
 Ignore the tuple: usually done when class label is missing (assuming the tasks is
classification—not effective when the percentage of missing values per attribute varies
considerably)
 Fill in the missing value manually: this approach is time-consuming and may not be
feasible given a large data set with many missing values.
 Use a global constant to fill in the missing value: Replace all missing attribute values by
the same constant, such as a label like “Unknown” or . Here the problem is that the
program may mistakenly interpret “Unknown” as new interesting class.
 Use the attribute mean to fill in the missing value: For example, suppose that the average
income of the customers is Rs. 56,000. Use this value to replace the missing value for
income.
 Use the attribute mean for all samples belonging to the same class to fill in the missing
value: smarter choice.
 Use the most probable value to fill in the missing value: This may be determined with
regression, inference-based tools using a Bayesian formalism, or decision tree induction.
2) Noisy Data - Noise is a random error or variance in a measured variable
Incorrect attribute values may be due to
a. faulty data collection instruments
b. data entry problems
c. data transmission problems
d. technology limitation
e. inconsistency in naming convention
Other data problems which requires data cleaning
a) duplicate records
b) incomplete data
c) inconsistent data
Smooth out the data to remove noise
12 | Prepared by: Manoj Kumar Saini
B.K. BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani
5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I
Smoothing Techniques
 Binning – Binning methods smooth a sorted data value by consulting its “neighborhood”, that is,
the values around it. The sorted values are distributed into a number of “buckets,” or bins.
Binning method –
 First sort the data and partition into (equi-depth or equi-width) bins.
 Smooth by bin means, bin median or by bin boundaries.
 Equal-width (distance) partitioning:
 It divides the range into N intervals of equal size: uniform grid
 if A and B are the lowest and highest values of the attribute, the width of intervals will
be: W = (B-A)/N.
 Most straightforward
 But outliers may dominate presentation
 Skewed data is not handled well.
 Equidepth (frequency) partitioning:
 It divides the range into N intervals, each containing approximately same number of
samples.
 Good data scaling
 Managing categorical attributes can be tricky.
Example: Sorted data for price: 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34
 Partition into (equi-depth) bins:
- Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
 Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
 Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34
3) Clustering: Outliers may be detected by clustering, where similar values are organized into groups,
or “clusters.” Intuitively, values that fall outside of the set of clusters may be considered outliers
Combined computer and human inspection: detect suspicious values and check by human.
4) Regression: Data can be smoothed by fitting the data to a function, such as with regression. Linear
regression involves finding the “best” line to fit two attributes (or variables), so that one attribute
can be used to predict the other. Multiple linear regression is an extension of linear regression,
where more than two attributes are involved and the data are fit to a multidimensional surface.

13 | Prepared by: Manoj Kumar Saini

B.K. BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani
5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I
Data transformation
In data transformation, the data are transformed or consolidated into forms appropriate for mining.
Data transformation can involve the following:
 Smoothing: remove noise from data. (done in data cleaning)
 Aggregation: summarization, data cube construction where aggregation operations are applied to the
data in the construction of a data cube.
 Generalization: concept hierarchy climbing
 Normalization: scaled to fall within a small, specified range
i) min-max normalization - performs a linear transformation on the original data.
Maps the value v of an attribute A from original range [minA, maxA] to v / in new range
[new_maxA, new_maxA] by computing

v  minA
v'  (new _ maxA  new _ minA)  new _ minA
maxA  minA
Ex. Let income range $12,000 to $98,000 normalized to [0.0, 1.0]. Then $73,000 is mapped to
73,600  12,000
(1.0  0)  0  0.716
98,000  12,000

ii) z-score normalization (zero-mean normalization) – the values for an attribute, A, are normalized
based on the mean, μ and standard deviation σ of A as

v  A
v' 
 A

This method of normalization is useful when the actual minimum and maximum of attribute A
are unknown, or when there are outliers that dominate the min-max normalization.
Ex. Let μ = 54,000, σ = 16,000. Then 73600 will be 73,600  54,000
 1.225
16,000

iii) normalization by decimal scaling – it normalizes by moving the decimal point of values of
attribute A. The number of decimal points moved depends on the maximum absolute value of A.
𝒗
Here 𝒗| = Where j is the smallest integer such that Max(|ν’|) < 1
𝟏𝟎𝒋
Ex. Suppose that the recorded values of A range from -986 to 917.
The maximum absolute value of A is 986. To normalize by decimal scaling, we therefore divide
each value by 1,000 (i.e., j = 3) so that -986 normalizes to -0.986 and 917 normalizes to 0.917.

 Attribute/feature construction
 New attributes constructed from the given ones
Ex. we may wish to add the attribute “area” based on the attributes “height” and “width”.

14 | Prepared by: Manoj Kumar Saini

B.K. BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani
5CS5-16/5IT6-16 – DATA MINING & WAREHOUSING, Classroom Notes Unit – I
Data Discretization and Concept Hierarchy Generation

Data discretization technique can be used to reduce the number of values for a given continuous
attribute by dividing the range of the attribute into intervals.
 Interval labels used to replace actual data values which reduces and simplifies the original data.
 This leads to a concise, easy-to-use, knowledge-level representation of mining results.
Categorization based on the use of class information:-
 Supervised discretization- this type of discretization process uses class information.
 Unsupervised discretization- it does not uses class information.
Categorization based on the direction it precedes:-
 Top-down discretization or splitting - If the process starts by first finding one or a few
points (called split points or cut points) to split the entire attribute range, and then repeats
this recursively on the resulting intervals, it is called top-down discretization or splitting.
 Bottom-up discretization or merging - it starts by considering all of the continuous values
as potential split-points, removes some by merging neighborhood values to form intervals,
and then recursively applies this process to the resulting intervals.

Concept hierarchies can be used to reduce the data by collecting and replacing low-level concepts
(such as numerical values for the attribute age) with higher-level concepts (such as youth, middle-
aged, or senior).

Benefits of Data Discretization and Concept Hierarchy OR

Why discretization techniques and concept hierarchies are typically applied before data mining as
a preprocessing step, rather than during mining?
 The generalized data may be more meaningful and easier to interpret that contributes to a
consistent representation of data mining results among multiple mining tasks.
 In addition, mining on a reduced data set requires fewer input/output operations and is more
efficient than mining on a larger, ungeneralized data set.

Concept hierarchies for numerical attributes can be constructed automatically based on data
discretization by: binning, histogram analysis, entropy-based discretization, 2 -merging, cluster
analysis, and discretization by intuitive partitioning.

======== *****=======

15 | Prepared by: Manoj Kumar Saini

Cosmetic Store Management System Project Report
55% (11)
Cosmetic Store Management System Project Report
77 pages
GTAG 9 Identity and Access Management 11 07
100% (1)
GTAG 9 Identity and Access Management 11 07
32 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
1 IT326 - Ch1 - Introduction
No ratings yet
1 IT326 - Ch1 - Introduction
37 pages
Lecture 1.1.1 1.1.2
No ratings yet
Lecture 1.1.1 1.1.2
32 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
Data Warehouse Presentation
No ratings yet
Data Warehouse Presentation
28 pages
Ware House Server
No ratings yet
Ware House Server
89 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
No ratings yet
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
36 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Unit - I
No ratings yet
Unit - I
22 pages
Data Mining
No ratings yet
Data Mining
26 pages
Data Mining 1 2 and 3
No ratings yet
Data Mining 1 2 and 3
20 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
Data Mining 1
No ratings yet
Data Mining 1
56 pages
Bca DM Unit I
No ratings yet
Bca DM Unit I
20 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
24 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Paper - Xvii Data Mining and Warehousing
No ratings yet
Paper - Xvii Data Mining and Warehousing
140 pages
DMWH M1
No ratings yet
DMWH M1
25 pages
Introduction
No ratings yet
Introduction
26 pages
Unit 1
No ratings yet
Unit 1
59 pages
DM Notes
No ratings yet
DM Notes
91 pages
Wao
No ratings yet
Wao
9 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
47 pages
Data Mining
No ratings yet
Data Mining
88 pages
Data Mining
No ratings yet
Data Mining
63 pages
ICS 2408 Lecture 1 Introduction
No ratings yet
ICS 2408 Lecture 1 Introduction
32 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
84 pages
Data Mining, Data Pattern, Machine Learning (Week 2
No ratings yet
Data Mining, Data Pattern, Machine Learning (Week 2
19 pages
DM Unit 1
No ratings yet
DM Unit 1
10 pages
Unit 1 DM
No ratings yet
Unit 1 DM
24 pages
01 Intro
No ratings yet
01 Intro
23 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Why We Need Data Mining?
No ratings yet
Why We Need Data Mining?
39 pages
III-IT-Data Mining Unit 1-Session 2-Part1
No ratings yet
III-IT-Data Mining Unit 1-Session 2-Part1
17 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
Data Mining Merged PDF CS1 CS8
No ratings yet
Data Mining Merged PDF CS1 CS8
272 pages
Unit 1 Data Mining Task
No ratings yet
Unit 1 Data Mining Task
7 pages
CSC 425 Data Mining and Warehousing 2024
No ratings yet
CSC 425 Data Mining and Warehousing 2024
54 pages
DWDM LS1 Fall 24 25
No ratings yet
DWDM LS1 Fall 24 25
42 pages
DM Module1 Notes
No ratings yet
DM Module1 Notes
25 pages
Chapter 1 Intro
No ratings yet
Chapter 1 Intro
23 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
38 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
323 pages
DM Introduction
No ratings yet
DM Introduction
32 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
Data Mining: V Mounika Revathi Dept of Cse Sitam
No ratings yet
Data Mining: V Mounika Revathi Dept of Cse Sitam
13 pages
Data Mining
No ratings yet
Data Mining
254 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
DWDM R19 Unit 1
No ratings yet
DWDM R19 Unit 1
27 pages
Unit III
No ratings yet
Unit III
101 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
87 pages
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
0% (1)
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
20 pages
L&T VinPlus Inst. Guideline
No ratings yet
L&T VinPlus Inst. Guideline
37 pages
Resume: Sandeep Baurai
No ratings yet
Resume: Sandeep Baurai
3 pages
Docs Jboss Org Hibernate Orm 5 2 Userguide HTML Single Hiber
No ratings yet
Docs Jboss Org Hibernate Orm 5 2 Userguide HTML Single Hiber
365 pages
Net Full Stack & UI Developer
No ratings yet
Net Full Stack & UI Developer
4 pages
A Generic Framework For Rule-Based Classification
No ratings yet
A Generic Framework For Rule-Based Classification
18 pages
Hibernate IntroPPT
No ratings yet
Hibernate IntroPPT
23 pages
AutoCAD Making VB - Net As Easy As VBA
100% (1)
AutoCAD Making VB - Net As Easy As VBA
28 pages
SQL Server Online Training
No ratings yet
SQL Server Online Training
7 pages
Sap Hana Bods Bwbi Res
No ratings yet
Sap Hana Bods Bwbi Res
5 pages
11.2.0.1 To 11.2.0.3 GI &DB Upgrade-V.1.2
No ratings yet
11.2.0.1 To 11.2.0.3 GI &DB Upgrade-V.1.2
29 pages
Oracle Privileges Full List
No ratings yet
Oracle Privileges Full List
12 pages
Cencon User Manual PDF
No ratings yet
Cencon User Manual PDF
485 pages
Table List in Service Now
No ratings yet
Table List in Service Now
66 pages
Net Developer VenkatakrishnaB Hyderabad
No ratings yet
Net Developer VenkatakrishnaB Hyderabad
3 pages
Mis Dell Edited
No ratings yet
Mis Dell Edited
22 pages
Java Hibernate, JSF Primefaces and MySQL (Part 1)
No ratings yet
Java Hibernate, JSF Primefaces and MySQL (Part 1)
38 pages
SRS Ai Resume Builder
No ratings yet
SRS Ai Resume Builder
5 pages
Maxload Pro: Software For Cargo Load Planning & Optimization
No ratings yet
Maxload Pro: Software For Cargo Load Planning & Optimization
4 pages
Synopsis - Live Streaming
No ratings yet
Synopsis - Live Streaming
5 pages
INF6320 - Week 6 - Enterprise Information Systems 2022 - 23
No ratings yet
INF6320 - Week 6 - Enterprise Information Systems 2022 - 23
44 pages
Block Chain Based Product Traceability System For Supply Chain Managagement
No ratings yet
Block Chain Based Product Traceability System For Supply Chain Managagement
36 pages
Documentation6 13 15
No ratings yet
Documentation6 13 15
3 pages
Ramesh Yamdra
No ratings yet
Ramesh Yamdra
5 pages
SQL Tuning - Udemy
No ratings yet
SQL Tuning - Udemy
14 pages
Wa0014
No ratings yet
Wa0014
24 pages
2.4c Informix Application Development 4GL Lab
No ratings yet
2.4c Informix Application Development 4GL Lab
36 pages
Puja Pallavi Resume Old
No ratings yet
Puja Pallavi Resume Old
2 pages
Heatwave-En A4
No ratings yet
Heatwave-En A4
282 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DMW Notes UNIT-1 2023-24

Uploaded by

DMW Notes UNIT-1 2023-24

Uploaded by

B.K.

BIRLA INSTITUTE OF ENGINEERING & TECHNOLOGY, Pilani

1| Prepared by: Manoj Kumar Saini

Stages of Data Mining Process

2| Prepared by: Manoj Kumar Saini

3| Prepared by: Manoj Kumar Saini

1) Concept/Class Description: Characterization and Discrimination

A concept typically refers to a collection of data such as frequent_buyers, graduate_students, and so

2) Mining Frequent Patterns, Associations, and Correlations

3) Classification and Prediction

4| Prepared by: Manoj Kumar Saini

5| Prepared by: Manoj Kumar Saini

6| Prepared by: Manoj Kumar Saini

7| Prepared by: Manoj Kumar Saini

 Mining Methodology and User Interaction Issues

8| Prepared by: Manoj Kumar Saini

 Diverse Data Types Issues

9| Prepared by: Manoj Kumar Saini

 Data in the real world is dirty

Sources of Dirty Data

 Incomplete data may come from

10 | Prepared by: Manoj Kumar Saini

Major Tasks in Data Preprocessing:

11 | Prepared by: Manoj Kumar Saini

13 | Prepared by: Manoj Kumar Saini

14 | Prepared by: Manoj Kumar Saini

Benefits of Data Discretization and Concept Hierarchy OR

15 | Prepared by: Manoj Kumar Saini

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.