0% found this document useful (0 votes)

76 views57 pages

BA 2023 - 2024 T03 Descriptive Data Mining

The document discusses descriptive data mining techniques used to identify relationships between observations without an outcome variable. It focuses on cluster analysis, which segments observations into similar groups based on measured variables. Two common clustering methods are hierarchical clustering, which sequentially merges the most similar clusters, and k-means clustering, which assigns observations to clusters to maximize similarity within clusters. The document also discusses different ways to measure similarity between observations, including Euclidean distance, Manhattan distance, and similarity coefficients for categorical variables.

Uploaded by

jhkkpmynkg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views57 pages

BA 2023 - 2024 T03 Descriptive Data Mining

Uploaded by

jhkkpmynkg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Business Analytics | 2023-2024

Descriptive data mining

João Lourenço
joao.lourenco@tecnico.ulisboa.pt

Source: Camm, J. D., Cochran, J. J., Fry, M. J., & Ohlmann, J. W. (2021). Business Analytics (4th ed.). Boston, MA: Cengage.
Introduction

The increase in the use of data-mining techniques in

business has been caused largely by three events:
• The explosion in the amount of data being produced and electronically
tracked.
• The ability to electronically warehouse these data.
• The affordability of computer power to analyze the data.

Academic Year 2023/2024 Business Analytics 2

Introduction

• Observation: Set of recorded values of variables

associated with a single entity.
• Unsupervised learning: A descriptive data-mining
technique used to identify relationships between
observations.
• Thought of as high-dimensional descriptive analytics.
• There is no outcome variable to predict; instead, qualitative
assessments are used to assess and compare the results.

Academic Year 2023/2024 Business Analytics 3

Cluster Analysis
Measuring Similarity Between Observations
Hierarchical Clustering
k-Means Clustering
Hierarchical Clustering versus k-Means Clustering

Academic Year 2023/2024 Business Analytics 4

Cluster Analysis
• Goal of clustering is to segment observations into similar
groups based on observed variables.
• Can be employed during the data-preparation step to
identify variables or observations that can be aggregated or
removed from consideration.
• Commonly used in marketing to divide customers into
different homogenous groups; known as market
segmentation.
• Used to identify outliers.

Academic Year 2023/2024 Business Analytics 5

Cluster Analysis
• Clustering methods:
• Bottom-up hierarchical clustering starts with each observation
belonging to its own cluster and then sequentially merges the most
similar clusters to create a series of nested clusters.
• k-means clustering assigns each observation to one of k clusters in
a manner such that the observations assigned to the same cluster
are as similar as possible.
• Both methods depend on how two observations are
similar—hence, we have to measure similarity between
observations.
Academic Year 2023/2024 Business Analytics 6
Measuring Similarity Between Observations:
When observations include numeric variables, Euclidean distance is
the most common method to measure dissimilarity between
observations.
Let observations u = ( u1 , u2 ,  , uq ) and v = ( v1 , v2 , , vq ) each comprise
measurements of q variables.
The Euclidean distance between observations u and v is:

(u1 - v1 ) + (u2 - v2 ) + ××× + ( uq - vq )

2 2 2
duv =

Academic Year 2023/2024 Business Analytics 7

Cluster Analysis
Measuring Similarity Between Observations:
Illustration:
• KTC is a financial advising company that provides personalized
financial advice to its clients.
• KTC would like to segment its customers into several groups (or
clusters) so that the customers within a group are similar and
dissimilar with respect to key characteristics.
• For each customer, KTC has an observation of seven variables: Age,
Female, Income, Married, Children, Car Loan, Mortgage.
Example: The observation u = (61, 0, 57881, 1, 2, 0, 0) corresponds
to a 61-year-old male with an annual income of $57,881, married
with two children, but no car loan and no mortgage.

Academic Year 2023/2024 Business Analytics 8

Cluster Analysis

Figure 1: Euclidean Distance

Euclidean distance becomes smaller

as a pair of observations become
more similar with respect to their
variable values.

Academic Year 2023/2024 Business Analytics 9

Cluster Analysis
Let observation u = (23, $20,375) correspond to a 23-year-old
customer with an annual income of $20,375 and observation v =
(36, $19,475) correspond to a 36-year-old with an annual income
of $19,475. As measured by Euclidean distance, the dissimilarity
between these two observations is
We see that when using the raw variable
values, the amount of dissimilarity
between observations is dominated by the
Income variable because of the difference
in the magnitude of the measurements.
For the data we are using, the standardized (or normalized) values of observations u and v are
(−1.76, −0.56) and (−0.76, −0.62), respectively. The dissimilarity between these two observations
based on standardized values is
Observe that observations u and v are
actually much more different in age than
in income.

Academic Year 2023/2024 Business Analytics 10

Cluster Analysis
• Euclidean distance is highly influenced by the scale on which
variables are measured:
• Common to standardize the units of each variable j of each
observation u.
Example: u j , the value of variable j , in observation u, is replaced with
its z -score z j .
• The conversion to z-scores also makes it easier to identify outlier
measurements, which can distort the Euclidean distance between
observations.

Academic Year 2023/2024 Business Analytics 11

Measuring Similarity Between Observations (cont.):
Manhattan distance is a dissimilarity measure that is more robust to
outliers than Euclidean distance. The Manhattan distance between
observations u and v is

Academic Year 2023/2024 Business Analytics 12

Cluster Analysis
Figure 2: Manhattan distance

From Figure 2, we observe that the

Manhattan distance between two
observations is the sum of the lengths of
the perpendicular line segments
connecting observations u and v.
In contrast to Euclidean distance, which
corresponds to the straight-line “as the
crow flies” segment between two
observations, Manhattan distance
corresponds to the distance as if
travelled along rectangular city blocks.

Academic Year 2023/2024 Business Analytics 13

Cluster Analysis

The Manhattan distance between the standardized

observations u = (–1.77, –0.56) and v = (0.15, –0.62)
is

Note: After conversion to z-scores, unequal weighting of variables can also be considered
by multiplying the variables of each observation by a selected set of weights.
For instance, after standardizing the units on customer observations so that income and
age are expressed as their respective z-scores (instead of expressed in dollars and years),
we can multiply the income z-scores by 2 if we wish to treat income with twice the
importance of age. Therefore, standardizing removes bias due to the difference in
measurement units, and variable weighting allows the analyst to introduce any desired bias
based on the business context.

Academic Year 2023/2024 Business Analytics 14

Cluster Analysis
• When clustering observations solely on the basis of categorical variables
encoded as 0–1, a better measure of similarity between two observations
can be achieved by counting the number of variables with matching values.
• The simplest overlap measure is called the matching coefficient and is
computed as:

Academic Year 2023/2024 Business Analytics 15

Cluster Analysis
• A weakness of the matching coefficient is that if two observations
both have a 0 entry for a categorical variable, this is counted as a
sign of similarity between the two observations.
• To avoid misstating similarity due to the absence of a feature, a
similarity measure called Jaccard’s coefficient does not count
matching zero entries and is computed as:

Academic Year 2023/2024 Business Analytics 16

Cluster Analysis

Table 1: Comparison of Similarity Matrixes for

Observations with Binary Variables
Observation Female Married Loan Mortgage
1 1 0 0 0
2 0 1 1 1
3 1 1 1 0
4 1 1 0 0
5 1 1 0 0

Academic Year 2023/2024 Business Analytics 17

Cluster Analysis

Table 1: Comparison of Similarity Matrixes for Observations

with Binary Variables (cont.)
• Similarity Matrix Based on Matching Coefficient:
Observation 1 2 3 4 5
1 1
2 0 1
3 0.5 0.5 1
4 0.75 0.25 0.75 1
5 0.75 0.25 0.75 1 1

Academic Year 2023/2024 Business Analytics 18

Cluster Analysis

Table 1: Comparison of Similarity Matrixes for Observations

with Binary Variables (cont.)
• Similarity Matrix Based on Jaccard’s Coefficient:
Observation 1 2 3 4 5
1 1
2 0 1
3 0.333 0.5 1
4 0.5 0.25 0.667 1
5 0.5 0.25 0.667 1 1

Academic Year 2023/2024 Business Analytics 19

Cluster Analysis
Matching distance:
Subtracting the matching coefficient from 1 results in a distance measure
for binary variables. The matching distance between observations u and v
(consisting entirely of binary variables) is

Academic Year 2023/2024 Business Analytics 20

Cluster Analysis
Jaccard distance:
Subtracting Jaccard’s coefficient from 1 results in the Jaccard distance
measure for binary variables. That is, the Jaccard distance between
observations u and v (consisting entirely of binary variables) is

Academic Year 2023/2024 Business Analytics 21

Cluster Analysis
Hierarchical Clustering:
• Determines the similarity of two clusters by considering the similarity
between the observations composing either cluster.
• Starts with each observation in its own cluster and then iteratively combines
the two clusters that are the most similar into a single cluster.
• Given a way to measure similarity between observations, there are several
clustering method alternatives for comparing observations in two clusters to
obtain a cluster similarity measure:
• Single linkage.
• Complete linkage.
• Group average linkage.
• Median linkage.
• Centroid linkage.
Academic Year 2023/2024 Business Analytics 22
Cluster Analysis
• Single linkage: The similarity between two clusters is defined by the similarity
of the pair of observations (one from each cluster) that are the most similar.
• Complete linkage: This clustering method defines the similarity between two
clusters as the similarity of the pair of observations (one from each cluster)
that are the most different.
• Group Average linkage: Defines the similarity between two clusters to be the
average similarity computed over all pairs of observations between the two
clusters.
• Median linkage: Analogous to group average linkage except that it uses the
median of the similarities computed between all pairs of observations
between the two clusters.
• Centroid linkage uses the averaging concept of cluster centroids to define
between-cluster similarity.
Academic Year 2023/2024 Business Analytics 23
Cluster Analysis

Figure 3: Measuring
Similarity Between
Clusters

Academic Year 2023/2024 Business Analytics 24

Cluster Analysis
• Ward’s method merges two clusters such that the
dissimilarity of the observations with the resulting single
cluster increases as little as possible.
• When McQuitty’s method considers merging two clusters A
and B, the dissimilarity of the resulting cluster AB to any other
cluster C is calculated as: ((dissimilarity between A and C) +
(dissimilarity between B and C)) divided by 2).
• A dendrogram is a chart that depicts the set of nested
clusters resulting at each step of aggregation.

Academic Year 2023/2024 Business Analytics 25

Cluster Analysis
Figure 4: Dendrogram
for KTC Using
Matching Coefficients
and Group Average
Linkage

Academic Year 2023/2024 Business Analytics 26

Cluster Analysis
Three clusters

Composition of these three clusters

Cluster 1: {4, 5, 6, 11, 19, 28, 1, 7, 21, 22, 23, 30, 13, 17, 18, 15, 27}
5 mix of males and females, 15 out of 17 married, no car loans, 5 out of 17 with
mortgages

Cluster 2: {2, 26, 8, 10, 20, 25}

5 all males with car loans, 5 out of 6 married, 2 out of 6 with mortgages

Cluster 3: {3, 9, 14, 16, 12, 24, 29}

5 all females with car loans, 4 out of 7 married, 5 out of 7 with mortgages
Academic Year 2023/2024 Business Analytics 27
Cluster Analysis

k-Means Clustering:
• Given a value of k, the k-means algorithm randomly assigns each
observation to one of the k clusters.
• After all observations have been assigned to a cluster, the resulting
cluster centroids are calculated.
• Using the updated cluster centroids, all observations are reassigned to
the cluster with the closest centroid.

Academic Year 2023/2024 Business Analytics 28

Cluster Analysis

Figure 5: Clustering
Observations by
Age and Income
Using
k-Means Clustering
with k = 3

Academic Year 2023/2024 Business Analytics 29

Cluster Analysis
Three clusters

Cluster 1 is characterized by relatively younger, lower-income customers (Cluster 1’s

centroid is at [33 years, $20,364]).

Cluster 2 is characterized by relatively older, higher-income customers (Cluster 2’s

centroid is at [58 years, $47,729]).

Cluster 3 is characterized by relatively older, lower-income customers (Cluster 3’s

centroid is at [53 years, $21,416]).

Academic Year 2023/2024 Business Analytics 30

Cluster Analysis
Table 2: Average Distances Within Clusters
Average Distance Between
No. of Observations Observations in Cluster
Cluster 1 12 0.622
Cluster 2 8 0.739
Cluster 3 10 0.520

Table 3: Distances Between Cluster Centroids

Cluster 1 Cluster 2 Cluster 3
Cluster 1 0 2.784 1.529
Cluster 2 2.784 0 1.964
Cluster 3 1.529 1.964 0

Academic Year 2023/2024 Business Analytics 31

Cluster Analysis

Hierarchical Clustering versus k-Means Clustering

Hierarchical Clustering k-Means Clustering
Suitable when we have a small data set (e.g., Suitable when you know how many clusters you
fewer than 500 observations) and want to easily want and you have a larger data set (e.g., more
examine solutions with increasing numbers of than 500 observations).
clusters.
Convenient method if you want to observe how Partitions the observations, which is appropriate
clusters are nested. if trying to summarize the data with k “average”
observations that describe the data with the
minimum amount of error.

Academic Year 2023/2024 Business Analytics 32

Association Rules
Evaluating Association Rules

Academic Year 2023/2024 Business Analytics 33

Association Rules
• Association rules: If-then statements which convey the likelihood of
certain items being purchased together.
• Although association rules are an important tool in market basket
analysis, they are also applicable to other disciplines.
• Antecedent: The collection of items (or item set) corresponding to the if
portion of the rule.
• Consequent: The item set corresponding to the then portion of the rule.
• Support count of an item set: Number of transactions in the data that
include that item set.

Academic Year 2023/2024 Business Analytics 34

Academic Year 2023/2024 Business Analytics 35

Association Rules
Table 4: Shopping-Cart Transactions
Transaction Shopping Cart
1 bread, peanut butter, milk, fruit, jelly
2 bread, jelly, soda, potato chips, milk, fruit, vegetables, peanut butter
3 whipped cream, fruit, chocolate sauce, beer
4 steak, jelly, soda, potato chips, bread, fruit
5 jelly, soda, peanut butter, milk, fruit
6 jelly, soda, potato chips, milk, bread, fruit
7 fruit, soda, potato chips, milk
8 fruit, soda, peanut butter, milk
9 fruit, cheese, yogurt
10 yogurt, vegetables, beer
• Investigating the rule “if {bread, jelly}, then {peanut butter}” we see the support count of
{bread, jelly, peanut butter} is 2.
Academic Year 2023/2024 Business Analytics 36
Association Rules
• Confidence: Helps identify reliable association rules:

• Lift ratio: Measure to evaluate the efficiency of a rule:

• For the data in Table 4, the rule “if {bread, jelly}, then {peanut butter}”
has confidence = 2/4 = 0.5 and a lift ratio = 0.5/(4/10) = 1.25.

Academic Year 2023/2024 Business Analytics 37

Association Rules

• This measure of confidence can be viewed as the conditional probability

of the consequent item set occurs given that the antecedent item set
occurs.
• A high value of confidence suggests a rule in which the consequent is
frequently true when the antecedent is true, but a high value of
confidence can be misleading.

Academic Year 2023/2024 Business Analytics 38

Association Rules

• A lift ratio greater than 1 suggests that there is some usefulness to the rule
and that it is better at identifying cases when the consequent occurs than
no rule at all.
• For the data in Table 4, the rule “if {bread, jelly}, then {peanut butter}” has
confidence = 2/4 = 0.5 and a lift ratio = 0.5/(4/10) = 1.25.
-----> A lift ratio = 1.25 is 25% better than guessing at random.

Academic Year 2023/2024 Business Analytics 39

Association Rules
Table 5: Association Rules for Hy-Vee
Antecedent (A) Consequent (C) Support for A Support for C Support for A & C Confidence (%) Lift Ratio

Bread Fruit, Jelly 4 5 4 100.0 2.00

Bread Jelly 4 5 4 100.0 2.00
Bread, Fruit Jelly 4 5 4 100.0 2.00
Fruit, Jelly Bread 5 4 4 80.0 2.00
Jelly Bread 5 4 4 80.0 2.00
Jelly Bread, Fruit 5 4 4 80.0 2.00
Fruit, Potato Soda 4 6 4 100.0 1.67
Chips
Peanut Butter Milk 4 4 6 100.0 1.67
Peanut Butter Milk, Fruit 4 6 4 100.0 1.67

Academic Year 2023/2024 Business Analytics 40

Association Rules
Table 5: Association Rules for Hy-Vee (continued)
Antecedent (A) Consequent (C) Support for A Support for C Support for A & C Confidence (%) Lift Ratio
Peanut Butter, Milk 4 6 4 100.0 1.67
Fruit
Potato Chips Fruit, Soda 4 6 4 100.0 1.67

Potato Chips Soda 4 6 4 100.0 1.67

Fruit, Soda Potato Chips 6 4 4 66.7 1.67
Milk Peanut Butter 6 4 4 66.7 1.67
Milk Peanut Butter, 6 4 4 66.7 1.67
Fruit
Milk, Fruit Peanut Butter 6 4 4 66.7 1.67
Soda Fruit, Potato 6 4 4 66.7 1.67
Chips

Academic Year 2023/2024 Business Analytics 41

Association Rules
Table 5: Association Rules for Hy-Vee (continued)
Antecedent (A) Consequent (C) Support for A Support for C Support for A & C Confidence (%) Lift Ratio
Soda Potato Chips 6 4 4 66.7 1.67
Fruit, Soda Milk 6 6 5 83.3 1.39
Milk Fruit, Soda 6 6 5 83.3 1.39
Milk Soda 6 6 5 83.3 1.39
Milk, Fruit Soda 6 6 5 83.3 1.39
Soda Milk 6 6 5 83.3 1.39
Soda Milk, Fruit 6 6 5 83.3 1.39

Academic Year 2023/2024 Business Analytics 42

Association Rules
Evaluating Association Rules:
• An association rule is ultimately judged on how actionable it is and how
well it explains the relationship between item sets.
• For example, Walmart mined its transactional data to uncover strong
evidence of the association rule, “If a customer purchases a Barbie doll,
then a customer also purchases a candy bar.”
• An association rule is useful if it is well supported and explains an
important previously unknown relationship.

Academic Year 2023/2024 Business Analytics 43

Text Mining
Voice of the Customer at Triad Airline
Preprocessing Text Data for Analysis
Movie Reviews

Academic Year 2023/2024 Business Analytics 44

Text Mining
• Text, like numerical data, may contain information that can help solve
problems and lead to better decisions.
• Text mining is the process of extracting useful information from text
data.
• Text data is often referred to as unstructured data because in its raw
form, it cannot be stored in a traditional structured database (rows and
columns).
• Audio and video data are also examples of unstructured data.
• Data mining with text data is more challenging than data mining with
traditional numerical data, because it requires more preprocessing to
convert the text to a format amenable for analysis.

Academic Year 2023/2024 Business Analytics 45

Text Mining
Voice of the Customer at Triad Airline:
• Triad solicits feedback from its customers through a follow-up e-mail
the day after the customer has completed a flight.
• Survey asks the customer to rate various aspects of the flight and asks
the respondent to type comments into a dialog box in the e-mail;
includes:
• Quantitative feedback from the ratings.
• Comments entered by the respondents which need to be analyzed.
• A collection of text documents to be analyzed is called a corpus.

Academic Year 2023/2024 Business Analytics 46

Text Mining
Table 6: Ten Respondents’ Concerns for Triad Airlines
Concerns
The wi-fi service was horrible. It was slow and cut off several times.
My seat was uncomfortable.
My flight was delayed 2 hours for no apparent reason.
My seat would not recline.
The man at the ticket counter was rude. Service was horrible.
The flight attendant was rude. Service was bad.
My flight was delayed with no explanation.
My drink spilled when the guy in front of me reclined his seat.
My flight was canceled.
The arm rest of my seat was nasty.

Academic Year 2023/2024 Business Analytics 47

Text Mining
Voice of the Customer at Triad Airline:
• To be analyzed, text data needs to be converted to structured data (rows
and columns of numerical data) so that the tools of descriptive statistics,
data visualization and data mining can be applied.
• Think of converting a group of documents into a matrix of rows and
columns where the rows correspond to a document and the columns
correspond to a particular word.
• A presence/absence or binary term-document matrix is a matrix with the
rows representing documents and the columns representing words.
• Entries in the columns indicate either the presence or the absence of a particular
word in a particular document.

Academic Year 2023/2024 Business Analytics 48

Text Mining
Voice of the Customer at Triad Airline (continued):
• Creating the list of terms to use in the presence/absence matrix can be a
complicated matter:
• Too many terms results in a matrix with many columns, which may be difficult to
manage and could yield meaningless results.
• Too few terms may miss important relationships.
• Term frequency along with the problem context are often used as a guide.
• In Triad’s case, management used word frequency and the context of
having a goal of satisfied customers to come up with the following list of
terms they feel are relevant for categorizing the respondent’s comments:
delayed, flight, horrible, recline, rude, seat, and service.

Academic Year 2023/2024 Business Analytics 49

Text
Table 7: The Presence/Absence Term-Document Matrix for
Triad Airlines
Term
Document Delayed Flight Horrible Recline Rude Seat Service
1 0 0 1 0 0 0 1
2 0 0 0 0 0 1 0
3 1 1 0 0 0 0 0
4 0 0 0 1 0 1 0
5 0 0 1 0 1 0 1
6 0 1 0 0 1 0 1
7 1 1 0 0 0 0 0
8 0 0 0 1 0 1 0
9 0 1 0 0 0 0 0
10 0 0 0 0 0 1 0

Academic Year 2023/2024 Business Analytics 50

Text Mining
Preprocessing Text Data for Analysis:
• The text-mining process converts unstructured text into numerical data
and applies quantitative techniques.
• Which terms become the headers of the columns of the term-document
matrix can greatly impact the analysis.
• Tokenization is the process of dividing text into separate terms, referred to
as tokens:
• Symbols and punctuations must be removed from the document, and all
letters should be converted to lowercase.
• Different forms of the same word, such as “stacking,” “stacked,” and
“stack” probably should not be considered as distinct terms.
• Stemming is the process of converting a word to its stem or root word.

Academic Year 2023/2024 Business Analytics 51

Text Mining
Preprocessing Text Data for Analysis (continued):
• The goal of preprocessing is to generate a list of most-relevant terms that is
sufficiently small so as to lend itself to analysis:
• Frequency can be used to eliminate words from consideration as tokens.
• Low-frequency words probably will not be very useful as tokens.
• Consolidating words that are synonyms can reduce the set of tokens.
• Most text-mining software gives the user the ability to manually specify
terms to include or exclude as tokens.
• The use of slang, humor, and sarcasm can cause interpretation problems and
might require more sophisticated data cleansing and subjective intervention
on the part of the analyst to avoid misinterpretation.
• Data preprocessing parses the original text data down to the set of tokens
deemed relevant for the topic being studied.
Academic Year 2023/2024 Business Analytics 52
Text Mining
Preprocessing Text Data for Analysis (continued):
• When the documents in a corpus contain many words and when the
frequency of word occurrence is important to the context of the
business problem, preprocessing can be used to develop a frequency
term-document matrix.
• A frequency term-document matrix is a matrix whose rows represent
documents and columns represent tokens, and the entries in the
matrix are the frequency of occurrence of each token in each
document.

Academic Year 2023/2024 Business Analytics 53

Text Mining
Movie Reviews:
• A new action film has been released, and we now have a sample of 10
reviews from movie critics.
• Using preprocessing techniques, we have reduced the number of
tokens to only two: “great” and “terrible.”
• Table 8 displays the corresponding frequency term-document matrix.
• To demonstrate the analysis of a frequency term-document matrix with
descriptive data mining, we apply k-means clustering with k = 2 to the
frequency term-document matrix to obtain the two clusters in Figure 6.

Academic Year 2023/2024 Business Analytics 54

Text Mining
Table 8: The Frequency Term-Document Matrix for Movie
Reviews
Term
Document Great Terrible
1 5 0
2 5 1
3 5 1
4 3 3
5 5 1
6 0 5
7 4 1
8 5 3
9 1 3
10 1 2

Academic Year 2023/2024 Business Analytics 55

Text Mining
Figure 6: Two Clusters Using k-Means Clustering on Movie
Reviews

Academic Year 2023/2024 Business Analytics 56

Questions?

Academic Year 2023/2024 Business Analytics 57

Pattern Recognition - Clustering - Classification
No ratings yet
Pattern Recognition - Clustering - Classification
177 pages
Dissertation Topics in Finance For Mba in India
100% (2)
Dissertation Topics in Finance For Mba in India
6 pages
Download
No ratings yet
Download
44 pages
How Telkomsel Transformed To Reach Digital-First Consumers - McKinsey
No ratings yet
How Telkomsel Transformed To Reach Digital-First Consumers - McKinsey
8 pages
Data Cloud Certification 3 of 3
No ratings yet
Data Cloud Certification 3 of 3
9 pages
ML-Module 5-P1
No ratings yet
ML-Module 5-P1
45 pages
Fashion Week Volunteer Cover Letter Sample
100% (2)
Fashion Week Volunteer Cover Letter Sample
7 pages
Clustering and Association Rule
No ratings yet
Clustering and Association Rule
69 pages
Lecture 8 BA
No ratings yet
Lecture 8 BA
19 pages
DM - Topic Four - Part III (Autosaved)
No ratings yet
DM - Topic Four - Part III (Autosaved)
67 pages
Unit - 4 DMA
No ratings yet
Unit - 4 DMA
145 pages
M4 - Clustering
No ratings yet
M4 - Clustering
43 pages
Lecture 23
No ratings yet
Lecture 23
29 pages
SIP Report
No ratings yet
SIP Report
35 pages
Unit 2
No ratings yet
Unit 2
89 pages
Automated Regulatory Compliance
No ratings yet
Automated Regulatory Compliance
17 pages
Unit 2
No ratings yet
Unit 2
19 pages
DM Clustering
No ratings yet
DM Clustering
51 pages
Analysis of Digitalization Transformation in AirAsia
No ratings yet
Analysis of Digitalization Transformation in AirAsia
7 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
BUSINESS ANALYTICS Ashok
No ratings yet
BUSINESS ANALYTICS Ashok
15 pages
Lec 5
No ratings yet
Lec 5
10 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
SEMINAR
No ratings yet
SEMINAR
19 pages
DWM Unit-Vi
No ratings yet
DWM Unit-Vi
30 pages
Cengage EBA 2e Chapter04
No ratings yet
Cengage EBA 2e Chapter04
35 pages
Clustering 1
No ratings yet
Clustering 1
75 pages
Cluster Analysis
No ratings yet
Cluster Analysis
60 pages
Chp-10 (Topic Not in Book) Types of Data in Cluster Analysis.
No ratings yet
Chp-10 (Topic Not in Book) Types of Data in Cluster Analysis.
13 pages
Market Research
No ratings yet
Market Research
88 pages
Data Analysis - Groups - INCOMPLETE
No ratings yet
Data Analysis - Groups - INCOMPLETE
24 pages
Markup 01 Statistika Lanjut - Cluster Analysis 1
No ratings yet
Markup 01 Statistika Lanjut - Cluster Analysis 1
60 pages
TME 6301 - Group Project
No ratings yet
TME 6301 - Group Project
7 pages
Installation of Kali Linux & Learn Tools Like
No ratings yet
Installation of Kali Linux & Learn Tools Like
11 pages
Descriptive Data Mining
No ratings yet
Descriptive Data Mining
8 pages
BCA Semester VI Data Mining Module 4 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 4 (Presentation Kind of N
56 pages
Cluster Analysis
No ratings yet
Cluster Analysis
46 pages
Innovations
No ratings yet
Innovations
7 pages
Mitps: PGP 23 Term 4 End Term Exam
No ratings yet
Mitps: PGP 23 Term 4 End Term Exam
12 pages
Unsupervised Learning: Uses of Cluster Analysis
No ratings yet
Unsupervised Learning: Uses of Cluster Analysis
2 pages
How Are Hadoop and Big Data Related?
No ratings yet
How Are Hadoop and Big Data Related?
18 pages
Fleetrun: The Solution For Fleet Maintenance Control
No ratings yet
Fleetrun: The Solution For Fleet Maintenance Control
13 pages
Ten Mistakes To Avoid: in Dimensional Modeling
No ratings yet
Ten Mistakes To Avoid: in Dimensional Modeling
16 pages
Clustering Today
No ratings yet
Clustering Today
52 pages
Introduction To DM
No ratings yet
Introduction To DM
27 pages
Cluster Analysis: Introduction - I: Dr. A. Ramesh
No ratings yet
Cluster Analysis: Introduction - I: Dr. A. Ramesh
28 pages
Module - 1
No ratings yet
Module - 1
132 pages
MIS Ch12
No ratings yet
MIS Ch12
32 pages
Clustering
No ratings yet
Clustering
15 pages
DM Chapter 5 (Clustering)
No ratings yet
DM Chapter 5 (Clustering)
40 pages
Clustering and Applications and Trends in Data Mining
No ratings yet
Clustering and Applications and Trends in Data Mining
42 pages
Leapstack (EN)
No ratings yet
Leapstack (EN)
2 pages
Corporate Banking IndusInd Bank Investor Day 20221122
No ratings yet
Corporate Banking IndusInd Bank Investor Day 20221122
26 pages
Lect8 IoT BigDataAnalyticsTechniques
No ratings yet
Lect8 IoT BigDataAnalyticsTechniques
20 pages
DM 10,11 Clustering PDF
No ratings yet
DM 10,11 Clustering PDF
65 pages
Lecture 02 - Cluster Analysis 1
No ratings yet
Lecture 02 - Cluster Analysis 1
59 pages
IBM Financial Crimes Insight For Claim Fraud
No ratings yet
IBM Financial Crimes Insight For Claim Fraud
6 pages
Chapter 6 DATA MINING R1
No ratings yet
Chapter 6 DATA MINING R1
81 pages
Lecture 8-9 - Clustering
No ratings yet
Lecture 8-9 - Clustering
43 pages
$RFJCVA0
No ratings yet
$RFJCVA0
54 pages
Lecture 10
No ratings yet
Lecture 10
26 pages
Clustering
0% (1)
Clustering
127 pages
Rakesh Internship File
No ratings yet
Rakesh Internship File
57 pages
Analytics
No ratings yet
Analytics
38 pages
10.cluster Analysis
No ratings yet
10.cluster Analysis
68 pages
Framing A Winning Data Monetization Strategy
100% (1)
Framing A Winning Data Monetization Strategy
16 pages
Cluster Analysis
No ratings yet
Cluster Analysis
29 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
Mod 5 Busan
No ratings yet
Mod 5 Busan
5 pages
Note On Cluster Analysis PDF
No ratings yet
Note On Cluster Analysis PDF
7 pages
MDA Session 4
No ratings yet
MDA Session 4
5 pages
Cluster Analysis: Prentice-Hall, Inc
No ratings yet
Cluster Analysis: Prentice-Hall, Inc
33 pages
Lecture 7 Testing and Selection
No ratings yet
Lecture 7 Testing and Selection
21 pages
Clustering X
No ratings yet
Clustering X
2 pages
Cluster Analysis
No ratings yet
Cluster Analysis
4 pages
Unit-V Cluster Analysis?: Unsupervised Classification Stand-Alone Tool Preprocessing Step
No ratings yet
Unit-V Cluster Analysis?: Unsupervised Classification Stand-Alone Tool Preprocessing Step
24 pages
Portfolio and Risk Analytics Brochure4
No ratings yet
Portfolio and Risk Analytics Brochure4
11 pages
Revisiting Customer Analytics Capability For Data-Driven Retailing
No ratings yet
Revisiting Customer Analytics Capability For Data-Driven Retailing
13 pages
Unit - 4 - Modified
No ratings yet
Unit - 4 - Modified
152 pages
ERP Assignment 1
No ratings yet
ERP Assignment 1
29 pages
Business Analytics: Aviral Apurva Anureet Bansal Devansh Agarwaal Dhwani Dhingra Chirag Verma
No ratings yet
Business Analytics: Aviral Apurva Anureet Bansal Devansh Agarwaal Dhwani Dhingra Chirag Verma
49 pages
Cluster Analysis and DBSCAN
No ratings yet
Cluster Analysis and DBSCAN
44 pages
Vijayalakshmi Biodata 02.03.2022
No ratings yet
Vijayalakshmi Biodata 02.03.2022
8 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
UNIT V DWM Notes
No ratings yet
UNIT V DWM Notes
18 pages
IBM
No ratings yet
IBM
72 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
51 pages
John Doe CV
No ratings yet
John Doe CV
2 pages
Chapter 4 Descriptive Data Mining
No ratings yet
Chapter 4 Descriptive Data Mining
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

BA 2023 - 2024 T03 Descriptive Data Mining

Uploaded by

BA 2023 - 2024 T03 Descriptive Data Mining

Uploaded by

Business Analytics | 2023-2024

Descriptive data mining

The increase in the use of data-mining techniques in

Academic Year 2023/2024 Business Analytics 2

• Observation: Set of recorded values of variables

Academic Year 2023/2024 Business Analytics 3

Academic Year 2023/2024 Business Analytics 4

Academic Year 2023/2024 Business Analytics 5

(u1 - v1 ) + (u2 - v2 ) + ××× + ( uq - vq )

Academic Year 2023/2024 Business Analytics 7

Academic Year 2023/2024 Business Analytics 8

Figure 1: Euclidean Distance

Euclidean distance becomes smaller

Academic Year 2023/2024 Business Analytics 9

Academic Year 2023/2024 Business Analytics 10

Academic Year 2023/2024 Business Analytics 11

Academic Year 2023/2024 Business Analytics 12

From Figure 2, we observe that the

Academic Year 2023/2024 Business Analytics 13

The Manhattan distance between the standardized

Academic Year 2023/2024 Business Analytics 14

Academic Year 2023/2024 Business Analytics 15

Academic Year 2023/2024 Business Analytics 16

Table 1: Comparison of Similarity Matrixes for

Academic Year 2023/2024 Business Analytics 17

Table 1: Comparison of Similarity Matrixes for Observations

Academic Year 2023/2024 Business Analytics 18

Table 1: Comparison of Similarity Matrixes for Observations

Academic Year 2023/2024 Business Analytics 19

Academic Year 2023/2024 Business Analytics 20

Academic Year 2023/2024 Business Analytics 21

Academic Year 2023/2024 Business Analytics 24

Academic Year 2023/2024 Business Analytics 25

Academic Year 2023/2024 Business Analytics 26

Composition of these three clusters

Cluster 2: {2, 26, 8, 10, 20, 25}

Cluster 3: {3, 9, 14, 16, 12, 24, 29}

Academic Year 2023/2024 Business Analytics 28

Academic Year 2023/2024 Business Analytics 29

Cluster 1 is characterized by relatively younger, lower-income customers (Cluster 1’s

Cluster 2 is characterized by relatively older, higher-income customers (Cluster 2’s

Cluster 3 is characterized by relatively older, lower-income customers (Cluster 3’s

Academic Year 2023/2024 Business Analytics 30

Table 3: Distances Between Cluster Centroids

Academic Year 2023/2024 Business Analytics 31

Hierarchical Clustering versus k-Means Clustering

Academic Year 2023/2024 Business Analytics 32

Academic Year 2023/2024 Business Analytics 33

Academic Year 2023/2024 Business Analytics 34

Academic Year 2023/2024 Business Analytics 35

• Lift ratio: Measure to evaluate the efficiency of a rule:

Academic Year 2023/2024 Business Analytics 37

• This measure of confidence can be viewed as the conditional probability

Academic Year 2023/2024 Business Analytics 38

Academic Year 2023/2024 Business Analytics 39

Bread Fruit, Jelly 4 5 4 100.0 2.00

Academic Year 2023/2024 Business Analytics 40

Potato Chips Soda 4 6 4 100.0 1.67

Academic Year 2023/2024 Business Analytics 41

Academic Year 2023/2024 Business Analytics 42

Academic Year 2023/2024 Business Analytics 43

Academic Year 2023/2024 Business Analytics 44

Academic Year 2023/2024 Business Analytics 45

Academic Year 2023/2024 Business Analytics 46

Academic Year 2023/2024 Business Analytics 47

Academic Year 2023/2024 Business Analytics 48

Academic Year 2023/2024 Business Analytics 49

Academic Year 2023/2024 Business Analytics 50

Academic Year 2023/2024 Business Analytics 51

Academic Year 2023/2024 Business Analytics 53

Academic Year 2023/2024 Business Analytics 54

Academic Year 2023/2024 Business Analytics 55

Academic Year 2023/2024 Business Analytics 56

Academic Year 2023/2024 Business Analytics 57

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.