0% found this document useful (0 votes)

9 views50 pages

Chapter 1 Data Mining (Cont.)

data mining fundamentals

Uploaded by

nguyenvietdung2003hn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views50 pages

Chapter 1 Data Mining (Cont.)

data mining fundamentals

Uploaded by

nguyenvietdung2003hn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

NGUYEN MAI DUNG

DATA MINING
7080809 Data Mining
WHY DATA MINING
The world is data rich but information poor.
WHAT IS DATA MINING
Data mining—searching for knowledge (interesting patterns) in data.
Data mining is looking for hidden, valid, and potentially useful patterns in huge
data sets.
Data Mining is all about discovering unsuspected/ previously unknown
relationships amongst the data.
It is a multi-disciplinary skill that uses Machine learning, Statistics, AI and
Database technology.
Data mining is also called as Knowledge discovery, Knowledge extraction, data/
pattern analysis, information harvesting.
Knowledge Discovery from Data.
WHAT IS/IS NOT DATA MINING?
- Look up phone number in phone directory.

- Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly…
in Boston area).

- Query a Web search engine for information about “Amazon”.

- Group together similar documents returned by search engine according to their context
(e.g. Amazon rainforest, Amazon.com).
THE KNOWLEDGE
DISCOVERY PROCESS
THE KNOWLEDGE DISCOVERY PROCESS
- Data cleaning (to remove noise and inconsistent data)

- Data integration (where multiple data sources may be combined)

- Data selection (where data relevant to the analysis task are retrieved from the database)

- Data transformation (where data are transformed and consolidated into forms appropriate for
mining by performing summary or aggregation operations)

- Data mining (an essential process where intelligent methods are applied to extract data patterns)

- Pattern evaluation (to identify the truly interesting patterns representing knowledge based on
interestingness measures)

- Knowledge presentation (where visualization and knowledge representation techniques are used
to present mined knowledge to users)
WHAT KINDS OF DATA CAN BE MINED?
WHAT KINDS OF DATA CAN BE MINED?
- Data mining can be applied to any kind of data as long as the data are meaningful for a
target application.

- The most basic forms of data for mining applications are database data, data
warehouse data, and transactional data.

- Data mining can also be applied to other forms of data (e.g., data streams, ordered/
sequence data, graph or networked data, spatial data, text data, multimedia data, and
the WWW).
DATABASE DATA
A database system, also called a database management system (DBMS),
consists of a collection of interrelated data, known as a database, and a set of
software programs to manage and access the data.
The software programs provide mechanisms for defining database structures
and data storage; for specifying and managing concurrent, shared, or
distributed data access; and for ensuring consistency and security of the
information stored despite system crashes or attempts at unauthorized
access.
DATABASE DATA
- A relational database for AllElectronics.
DATABASE DATA
- Relational data can be accessed by database queries written in a relational query
language (e.g., SQL) or with the assistance of graphical user interfaces.
- Show me a list of all items that were sold in the last quarter.
- Show me the total sales of the last month, grouped by branch.
- How many sales transactions occurred in the month of December?
- Which salesperson had the highest sales?
- When mining relational databases, we can go further by searching for trends or data
patterns.
- Analyze customer data to predict the credit risk of new customers based on their
income, age, and previous credit information.
- Detect deviations—that is, items with sales that are far from those expected in
comparison with the previous year.
DATA WAREHOUSES
A data warehouse is a repository of information collected from multiple
sources, stored under a unified schema, and usually residing at a single
site.
Data warehouses are constructed via a process of data cleaning, data
integration, data transformation, data loading, and periodic data
refreshing.
DATA WAREHOUSES
A DATA CUBE FOR ALLELECTRONICS.
DATA MINING FUNCTIONALITIES
DATA MINING FUNCTIONALITIES
- These include
- Characterization and discrimination

- The mining of frequent patterns, associations, and correlations

- Classi cation and regression

- Clustering analysis

- Outlier analysis
fi
CHARACTERIZATION & DISCRIMINATION .

- Class/concept descriptions.
- These descriptions can be derived using
- Data characterization, by summarizing the data of the class under study (often
called the target class),
- Data discrimination, by comparison of the target class with one or a set of
comparative classes (often called the contrasting classes),
- Both data characterization and discrimination.
DATA CHARACTERIZATION
- Summarization of the general characteristics or features of a target class of data.

- The data corresponding to the user-speci ed class are typically collected by a query.

- The output of data characterization can be presented in various forms.

- Examples include pie charts, bar charts, curves, multidimensional data cubes, and
multidimensional tables, including crosstabs.

- The resulting descriptions can also be presented as generalized relations or in rule form
(called characteristic rules). fi
DATA CHARACTERIZATION
- A customer relationship manager at AllElectronics may order the following data mining
task: Summarize the characteristics of customers who spend more than $5000 a year
at AllElectronics.

- The result is a general pro le of these customers, such as that they are 40 to 50 years
old, employed, and have excellent credit ratings.

- The data mining system should allow the customer relationship manager to drill down
on any dimension, such as on occupation to view these customers according to their
type of employment.
fi
DATA DISCRIMINATION
- A comparison of the general features of the target class data objects against the
general features of objects from one or multiple contrasting classes.

- The target and contrasting classes can be speci ed by a user, and the corresponding
data objects can be retrieved through database queries.

- “How are discrimination descriptions output?” The forms of output presentation are
similar to those for characteristic descriptions

- Discrimination descriptions expressed in the form of rules are referred to as

discriminant rules.

fi
DATA DISCRIMINATION
- A customer relationship manager at AllElectronics may want to compare two groups of
customers—those who shop for computer products regularly (e.g., more than twice a
month) and those who rarely shop for such products (e.g., less than three times a year).

- The resulting description provides a general comparative pro le of these customers,

such as that 80% of the customers who frequently purchase computer products are
between 20 and 40 years old and have a university education, whereas 60% of the
customers who infrequently buy such products are either seniors or youths, and have
no university degree.

- Drilling down on a dimension like occupation, or adding a new dimension like income
level, may help to nd even more discriminative features between the two classes.
fi
fi
MINING FREQUENT PATTERNS, ASSOCIATIONS, &
CORRELATIONS
MINING FREQUENT PATTERNS, ASSOCIATIONS, & CORRELATIONS
- Frequent patterns are patterns that occur frequently in data.
- A frequent itemset typically refers to a set of items that often appear together in a
transactional data set.
- milk and bread, which are frequently bought together in grocery stores by many
customers.
- A frequently occurring subsequence, such as the pattern that customers, tend to
purchase rst a laptop, followed by a digital camera, and then a memory card, is a
(frequent) sequential pattern.
- A substructure can refer to di erent structural forms (e.g., graphs, trees, or lattices) that
may be combined with itemsets or subsequences.
- If a substructure occurs frequently, it is called a (frequent) structured pattern. Mining
frequent patterns leads to the discovery of interesting associations and correlations
within data.
fi
ff
ASSOCIATION ANALYSIS
- A marketing manager, you want to know which items are frequently purchased
together (i.e., within the same transaction).

- buys(X,“computer”) buys(X,“software”) [support = 1%,con dence = 50%]

fi
ASSOCIATION ANALYSIS
- A marketing manager, you want to know which items are frequently purchased together (i.e.,
within the same transaction).
- buys(X,“computer”) buys(X,“software”) [support = 1%,con dence = 50%]

- X is a variable representing a customer

- A con dence, or certainty, of 50% means that if a customer buys a computer, there is a
50% chance that she will buy software as well.
- A 1% support means that 1% of all the transactions under analysis show that computer
and software are purchased together.
- This association rule involves a single attribute or predicate (i.e., buys) that repeats.
Association rules that contain a single predicate are referred to as single-dimensional
association rules.
- Dropping the predicate notation, the rule can be written simply as “computer
50%].”
software [1%,
fi
fi
ASSOCIATION ANALYSIS
- Suppose, instead, that we are given the AllElectronics relational database related to
purchases. A data mining system may nd association rules like

- age(X , “20..29”) ∧ income(X , “40K..49K”) buys(X , “laptop”) [support = 2%,

con dence = 60%].
fi
fi
ASSOCIATION ANALYSIS
- Suppose, instead, that we are given the AllElectronics relational database related to
purchases. A data mining system may nd association rules like
- age(X , “20..29”) ∧ income(X , “40K..49K”)
= 60%].
buys(X , “laptop”) [support = 2%, con dence

- The rule indicates that of the AllElectronics customers under study, 2% are 20 to 29 years
old with an income of $40,000 to $49,000 and have purchased a laptop (computer) at
AllElectronics.
- There is a 60% probability that a customer in this age and income group will purchase a
laptop.
- This is an association involving more than one attribute or predicate (i.e., age, income, and
buys).
- Adopting the terminology used in multidimensional databases, where each attribute is
referred to as a dimension, the above rule can be referred to as a multidimensional
association rule.
fi
fi
CLASSIFICATION AND REGRESSION
FOR PREDICTIVE ANALYSIS
CLASSIFICATION AND REGRESSION FOR PREDICTIVE ANALYSIS
- Classi cation is the process of nding a model (or function) that describes and
distinguishes data classes or concepts.

- The model are derived based on the analysis of a set of training data (i.e., data objects
for which the class labels are known).

- The model is used to predict the class label of objects for which the the class label is
unknown.

- The derived model may be represented in various forms, such as classi cation rules
(i.e., IF-THEN rules), decision trees, mathematical formulae, or neural networks
fi
fi
fi
CLASSIFICATION AND REGRESSION FOR PREDICTIVE ANALYSIS
- A decision tree is a owchart-like tree structure, where each node denotes a test on an
attribute value, each branch represents an outcome of the test, and tree leaves
represent classes or class distributions.

- Decision trees can easily be converted to classi cation rules.

- A neural network, when used for classi cation, is typically a collection of neuron-like
processing units with weighted connections between the units.

- There are many other methods for constructing classi cation models, such as natıve
Bayesian classi cation, support vector machines, and k-nearest-neighbor classi cation.
fi
fl
fi
fi
fi
fi
CLASSIFICATION AND REGRESSION FOR PREDICTIVE ANALYSIS
- Whereas classi cation predicts categorical (discrete, unordered) labels, regression
models continuous-valued functions.
- Regression is used to predict missing or unavailable numerical data values rather than
(discrete) class labels.
- Regression analysis is a statistical methodology that is most often used for numeric
prediction. Regression also encompasses the identi cation of distribution trends based
on the available data.
- Classi cation and regression may need to be preceded by relevance analysis, which
attempts to identify attributes that are signi cantly relevant to the classi cation and
regression process.
- Such attributes will be selected for the classi cation and regression process. Other
attributes, which are irrelevant, can then be excluded from consideration.
fi
fi
fi
fi
fi
fi
CLASSIFICATION AND REGRESSION
- Classify a large set of items in the store, based on three kinds of responses to a sales
campaign: good response, mild response and no response.

- You want to derive a model for each of these three classes based on the descriptive
features of the items, such as price, brand, place made, type, and category.

- The resulting classi cation should maximally distinguish each class from the others,
presenting an organized picture of the data set.
fi
CLASSIFICATION AND REGRESSION
- The resulting classi cation is expressed as a decision tree.
- The decision tree, for instance, may identify price as being the single factor that best
distinguishes the three classes.
- The tree may reveal that, in addition to price, other features that help to further
distinguish objects of each class from one another include brand and place made.
- Such a decision tree may help you understand the impact of the given sales campaign
and design a more e ective campaign in the future.
fi
ff
CLASSIFICATION AND REGRESSION
- To predict the amount of revenue that each item will generate during an upcoming sale,
based on the previous sales data.

- This is an example of regression analysis because the regression model constructed

will predict a continuous function (or ordered value.)
CLUSTER ANALYSIS
CLUSTER ANALYSIS
- Unlike classi cation and regression, which analyze class-labeled (training) data sets,
clustering analyzes data objects without consulting class labels.

- In many cases, class- labeled data may simply not exist at the beginning. Clustering can be
used to generate.

- Class labels for a group of data. The objects are clustered or grouped based on the
principle of maximizing the intraclass similarity and minimizing the interclass similarity.

- That is, clusters of objects are formed so that objects within a cluster have high similarity
in comparison to one another, but are rather dissimilar to objects in other clusters.

- Each cluster so formed can be viewed as a class of objects, from which rules can be
derived.
fi
CLUSTER ANALYSIS
- Cluster analysis can be
performed customer data to
identify homogeneous
subpopulations of customers.
These clusters may represent
individual target groups for
marketing.
OUTLIER ANALYSIS
OUTLIER ANALYSIS
- A data set may contain objects that do not comply with the general behavior or model
of the data. These data objects are outliers.

- Many data mining methods discard outliers as noise or exceptions. However, in some
applications (e.g., fraud detection) the rare events can be more interesting than the
more regularly occurring ones. The analysis of outlier data is referred to as outlier
analysis or anomaly mining.
OUTLIER ANALYSIS
- Outliers may be detected using statistical tests that assume a distribution or probability
model for the data, or using distance measures where objects that are remote from any
other cluster are considered outliers.

- Rather than using statistical or distance measures, density-based methods may identify
outliers in a local region, although they look normal from a global statistical distribution
view.
OUTLIER ANALYSIS
- Outlier analysis may uncover fraudulent usage of credit cards by detecting purchases
of unusually large amounts for a given account number in comparison to regular
charges incurred by the same account.

- Outlier values may also be detected with respect to the locations and types of
purchase, or the purchase frequency.
ARE ALL PATTERNS INTERESTING?
ARE ALL PATTERNS INTERESTING?
- What makes a pattern interesting?

- Can a data mining system generate all of the interesting patterns?

- Can the system generate only the interesting ones?

ARE ALL PATTERNS INTERESTING?
- What makes a pattern interesting?
- A pattern is interesting if it is (1) easily understood by humans, (2) valid on new or test data with
some degree of certainty, (3) potentially useful, and (4) novel.

- A pattern is also interesting if it validates a hypothesis that the user sought to con rm.

- An interesting pattern represents knowledge.

- Objective measures of pattern interestingness are based on the structure of discovered patterns
and the statistics underlying them; accuracy and coverage for classi cation (IF-THEN) rules.

- Subjective interestingness measures are based on user beliefs in the data. These measures nd
patterns interesting if the patterns are unexpected (contradicting a user’s belief) or o er strategic
information on which the user can act.
-

fi
fi
ff
fi
ARE ALL PATTERNS INTERESTING?
- Can a data mining system generate all of the interesting patterns?
- Refers to the completeness of a data mining algorithm. It is often unrealistic and
ine cient for data mining systems to generate all possible patterns.

- Instead, user- provided constraints and interestingness measures should be used to

focus the search.

- Association rule mining is an example where the use of constraints and interestingness
measures can ensure the completeness of mining.
ffi
ARE ALL PATTERNS INTERESTING?
- Can the system generate only the interesting ones?
- An optimization problem in data mining.

- It is highly desirable for data mining systems to generate only interesting patterns.

- This would be e cient for users and data mining systems because neither would have
to search through the patterns generated to identify the truly interesting ones.
ffi
ARE ALL PATTERNS INTERESTING?
- Measures of pattern interestingness are essential for the e cient discovery of patterns
by target users.

- Such measures can be used after the data mining step to rank the discovered patterns
according to their interestingness, ltering out the uninteresting ones.

- More important, such measures can be used to guide and constrain the discovery
process, improving the search e ciency by pruning away subsets of the pattern space
that do not satisfy prespeci ed interestingness constraints.
fi
ffi
fi
ffi
WHICH TECHNOLOGIES ARE USED?
WHICH TECHNOLOGIES ARE USED?

200-301 Exam - Free Actual Q&as, Page 1 - ExamTopics
100% (4)
200-301 Exam - Free Actual Q&as, Page 1 - ExamTopics
579 pages
001lecture - 1 Introduction-1
No ratings yet
001lecture - 1 Introduction-1
40 pages
Directory Structures and Implementations
No ratings yet
Directory Structures and Implementations
18 pages
Pioneer X Hm82 S X Hm82d XC Hm82d K X Hm72 X Hm72d
100% (1)
Pioneer X Hm82 S X Hm82d XC Hm82d K X Hm72 X Hm72d
110 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
39 pages
Charotar University of Science and Technology
No ratings yet
Charotar University of Science and Technology
39 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Data Mining Unit-1
No ratings yet
Data Mining Unit-1
59 pages
Fundamentals of Data Mining
No ratings yet
Fundamentals of Data Mining
36 pages
Introduction
No ratings yet
Introduction
26 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Java For Selenium
No ratings yet
Java For Selenium
45 pages
Copernicus Product Catalogue 20200302
No ratings yet
Copernicus Product Catalogue 20200302
76 pages
Patterns Mined +frequent Patterns
No ratings yet
Patterns Mined +frequent Patterns
18 pages
Sem 05 ECE 2007 Batch
No ratings yet
Sem 05 ECE 2007 Batch
225 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
Data Mining
No ratings yet
Data Mining
14 pages
Quiz Let 464 Study Guide 2
No ratings yet
Quiz Let 464 Study Guide 2
17 pages
Unit 1: Scs5623 - Data Mining and Warehousing
No ratings yet
Unit 1: Scs5623 - Data Mining and Warehousing
13 pages
Chapter 1
No ratings yet
Chapter 1
16 pages
Data Mining-Unit-1
No ratings yet
Data Mining-Unit-1
21 pages
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
No ratings yet
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
100 pages
What Motivated Data Mining?: Huge Amount of Raw DATA Is Available - The Motivation For The Data Mining Is To
No ratings yet
What Motivated Data Mining?: Huge Amount of Raw DATA Is Available - The Motivation For The Data Mining Is To
83 pages
Unit I Dbmi
No ratings yet
Unit I Dbmi
35 pages
03 01 PatMax Logic
No ratings yet
03 01 PatMax Logic
15 pages
CSC 425 Data Mining and Warehousing 2024
No ratings yet
CSC 425 Data Mining and Warehousing 2024
54 pages
Plag Report
No ratings yet
Plag Report
18 pages
Bi - Unit 3
No ratings yet
Bi - Unit 3
18 pages
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
No ratings yet
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
80 pages
Unit 1
No ratings yet
Unit 1
21 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
Practice Exercises in OS
No ratings yet
Practice Exercises in OS
11 pages
Bca DM Unit I
No ratings yet
Bca DM Unit I
20 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Data Mining
No ratings yet
Data Mining
52 pages
Why We Need Data Mining?
No ratings yet
Why We Need Data Mining?
39 pages
Module 1
No ratings yet
Module 1
41 pages
Data Mining
No ratings yet
Data Mining
6 pages
How To Participate in A Zoom Meeting
No ratings yet
How To Participate in A Zoom Meeting
6 pages
Data Mining 1
No ratings yet
Data Mining 1
56 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
DM-unit 1
No ratings yet
DM-unit 1
22 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
24 pages
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
Lecture Notes 1.1 & 1.2
No ratings yet
Lecture Notes 1.1 & 1.2
8 pages
DcTrack Installation
No ratings yet
DcTrack Installation
4 pages
Introduction To Data Mining - 125604
No ratings yet
Introduction To Data Mining - 125604
7 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
24 pages
5th Sem
No ratings yet
5th Sem
1 page
DM 1 PDF
No ratings yet
DM 1 PDF
67 pages
DWDM 01 Introduction
No ratings yet
DWDM 01 Introduction
43 pages
02-Data Mining Functionalities-2
No ratings yet
02-Data Mining Functionalities-2
23 pages
Edp 1 PDF
No ratings yet
Edp 1 PDF
10 pages
Vaccine Portal
No ratings yet
Vaccine Portal
3 pages
Data Mining
No ratings yet
Data Mining
7 pages
BACS1113 ASSIGNMENT (JAN 2022) : (P3: Practical Skills)
No ratings yet
BACS1113 ASSIGNMENT (JAN 2022) : (P3: Practical Skills)
3 pages
Data Warehouse & Mining
No ratings yet
Data Warehouse & Mining
28 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
100% (1)
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
Data Mining 1 2 and 3
No ratings yet
Data Mining 1 2 and 3
20 pages
Archana Data Mining
No ratings yet
Archana Data Mining
24 pages
Kinds of Data: 1. Data Bases Data 2.data Warehouses Data 3. Transactional Data
No ratings yet
Kinds of Data: 1. Data Bases Data 2.data Warehouses Data 3. Transactional Data
24 pages
Alternate Autonomous AP Upgrade Procedure
No ratings yet
Alternate Autonomous AP Upgrade Procedure
14 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
Duval
No ratings yet
Duval
9 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
TE Comp Sem VI - AI For May 2022 Examination
No ratings yet
TE Comp Sem VI - AI For May 2022 Examination
3 pages
Internet Safety - Crossword Puzzle
No ratings yet
Internet Safety - Crossword Puzzle
2 pages
DM Unit1 Intro
No ratings yet
DM Unit1 Intro
12 pages
Bank Statement PDF
50% (2)
Bank Statement PDF
3 pages
DMWH M1
No ratings yet
DMWH M1
25 pages
Service Manual: History Information For The Following Manual
No ratings yet
Service Manual: History Information For The Following Manual
71 pages
UPDPSWin 3000MU
No ratings yet
UPDPSWin 3000MU
5 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
12 pages
Android Instructions - Freedom Pro Keyboard
No ratings yet
Android Instructions - Freedom Pro Keyboard
2 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
Unit 6 Fds 2023
No ratings yet
Unit 6 Fds 2023
67 pages
Absract:: Data, Information, and Knowledge
No ratings yet
Absract:: Data, Information, and Knowledge
7 pages
Controlcasepciv4 241115112355 3cfe7e3f
No ratings yet
Controlcasepciv4 241115112355 3cfe7e3f
27 pages
q8, q9, q10 Question and Answers
No ratings yet
q8, q9, q10 Question and Answers
16 pages
B-Jac Us
No ratings yet
B-Jac Us
8 pages
Data Mining - Tasks: Data Characterization Data Discrimination
No ratings yet
Data Mining - Tasks: Data Characterization Data Discrimination
4 pages
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
No ratings yet
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
6 pages
Abx Micros Range
No ratings yet
Abx Micros Range
3 pages
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter 1 Data Mining (Cont.)

Uploaded by

Chapter 1 Data Mining (Cont.)

Uploaded by

NGUYEN MAI DUNG

- Query a Web search engine for information about “Amazon”.

- Data integration (where multiple data sources may be combined)

- The mining of frequent patterns, associations, and correlations

- Classi cation and regression

- The output of data characterization can be presented in various forms.

- Discrimination descriptions expressed in the form of rules are referred to as

- The resulting description provides a general comparative pro le of these customers,

- buys(X,“computer”) buys(X,“software”) [support = 1%,con dence = 50%]

- X is a variable representing a customer

- age(X , “20..29”) ∧ income(X , “40K..49K”) buys(X , “laptop”) [support = 2%,

- Decision trees can easily be converted to classi cation rules.

- This is an example of regression analysis because the regression model constructed

- Can a data mining system generate all of the interesting patterns?

- Can the system generate only the interesting ones?

- An interesting pattern represents knowledge.

- Instead, user- provided constraints and interestingness measures should be used to

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.