0% found this document useful (0 votes)

53 views10 pages

Analytics Concerns

Uploaded by

Natalya Lewis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views10 pages

Analytics Concerns

Uploaded by

Natalya Lewis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Chapter 2

Practical Data Analysis: An Example

Before talking about the full-fledged data analysis process and diving into the details
of individual methods, this chapter demonstrates some typical pitfalls one encoun-
ters when analyzing real-world data. We start our journey through the data analysis
process by looking over the shoulders of two (pseudo) data analysts, Stan and Laura,
working on some hypothetical data analysis problems in a sales environment. Being
differently skilled, they show how things should and should not be done. Through-
out the chapter, a number of typical problems that data analysts meet in real work
situations are demonstrated as well. We will skip algorithmic and other details here
and only briefly mention the intention behind applying some of the processes and
methods. They will be discussed in depth in subsequent chapters.

2.1 The Setup

Disclaimer The data and the application scenario used in this chapter are fictional.
However, the underlying problems are motivated by actual problems which are en-
countered in real-world data analysis scenarios. Explaining particular applicational
setups would have been entirely out of the scope of this book, since in order to un-
derstand the actual issue, a bit of domain knowledge is often helpful if not required.
Please keep this in mind when reading the following. The goal of this chapter is
to show (and sometimes slightly exaggerate) pitfalls encountered in real-world data
analysis setups and not the reality in a supermarket chain. We are painfully aware
that people familiar with this domain will find some of the encountered problems
strange, to say the least. Have fun.

The Data For the following examples, we will use an artificial set of data sources
from a hypothetical supermarket chain. The data set consists of a few tables, which
have already been extracted from an in-house database:1

1 Often just getting the data is a problem of its own. Data analysis assumes that you have access to

the data you need—an assumption which is, unfortunately, frequently not true.

M.R. Berthold et al., Guide to Intelligent Data Analysis, 15

Texts in Computer Science 42,
DOI 10.1007/978-1-84882-260-3_2, © Springer-Verlag London Limited 2010
16 2 Practical Data Analysis: An Example

• Customers: data about customers, stemming mostly from information collected

when these customers signed up for frequent shopper cards.
• Products: A list of products with their categories and prices.
• Purchases: A list of products together with the date they were purchased and the
customer card ID used during checkout.

The Analysts Stan and Laura are responsible for the analytics of the southern and
northern parts, respectively, of a large supermarket chain. They were recently hired
to help better understand customer groups and behavior and try to increase revenue
in the local stores. As is unfortunately all too common, over the years the stores
have already begun all sorts of data acquisition operations, but in recent years quite
a lot of this data has been merged—however, still without a clear picture in mind.
Many other stores had started to issue frequent shopping cards, so the directors of
marketing of the southern and northern markets decided to launch a similar program.
Lots of data have been recorded, and Stan and Laura now face the challenge to fit
existing data to the questions posed. Together with their managers, they have sat
down and defined three data analysis questions to be addressed in the following
year:
• differentiate the different customer groups and their behavior to better understand
their impact on the overall revenue,
• identify connections between products to allow for cross selling campaigns, and
• help design a marketing campaign to attract core customers to increase their pur-
chases.
Stan is a representative of the typical self-taught data analysis newbie with little
experience on the job and some more applied knowledge about the different tech-
niques, whereas Laura has some training in statistics, data processing, and data anal-
ysis process planning.

2.2 Data Understanding and Pattern Finding

The first analysis task is a standard data analysis setup: customer segmentation—
find out which types of customers exist in your database and try to link them to
the revenue they create. This can be used later to care for clientele that are re-
sponsible for the largest revenue source or foster groups of customers who are
under-represented. Grouping (or clustering) records in a database is the predomi-
nant method to find such customer segments: the data is partitioned into smaller
subsets, each forming a more coherent group than the overall database contains. We
will go into much more detail on this type of data analysis methods in Chap. 7. For
now it suffices to know that some of the most prominent clustering methods return
one typical example for each cluster. This essentially allows us to reduce a large
data set to a small number of representative examples for the subgroups contained
in the database.
2.2 Data Understanding and Pattern Finding 17

Table 2.1 Stan’s clustering

result Cluster-id Age Customer revenue

1 46.5 € 1,922.07
2 39.4 € 11,162.20
3 39.1 € 7,279.59
4 46.3 € 419.23
5 39.0 € 4,459.30

The Naive Approach Stan quickly jumps onto the challenge, creates a dump of
the database containing customer purchases and their birth date, and computes the
age of the customers based on their birth date and the current day. He realizes that
he is interested in customer clusters and therefore needs to somehow aggregate the
individual purchases to their respective “owner.” He uses an aggregating operator in
his database to compute the total price of the shopping baskets for each customer.
Stan then applies a well-known clustering algorithm which results in five prototyp-
ical examples, as shown in Table 2.1.
Stan is puzzled—he was expecting the clustering algorithm to return reasonably
meaningful groups, but this result looks as if all shoppers are around 40–50 years
old but spend vastly different amount of money on products. He looks into some of
the customers’ data in some of these clusters but cannot seem to find any interesting
relations or any reason why some seem to buy substantially more than others. He
changes some of the algorithm’s settings, such as the number of clusters created, but
the results are similarly uninteresting.

The Sound Approach Laura takes a different approach. Routinely she first tries
to understand the available data and validates that some basic assumptions are in fact
true. She uses a basis data summarization tool to report the different values for the
string attributes. The distribution of first names seems to match the frequencies she
would expect. Names such as “Michael” and “Maria” are most frequent, and “Rose-
marie” and “Anneliese” appear a lot less often. The frequencies of the occupations
also roughly match her expectations: the majority of the customers are employ-
ees, while the second and third groups are students and freelancers, respectively.
She proceeds to checking the attributes holding numbers. In order to check the age
of the customers, she also computes the customers’ ages from their birth date and
checks minimum and maximum. She spots a number of customers who obviously
reported a wrong birthday, because they are unbelievably young. As a consequence,
she decides to filter the data to only include people between the ages of 18 and 100.
In order to explore the data more quickly, she reduces the overall customer data set
to 5,000 records by random sampling and then plots a so-called histogram, which
shows different ranges of the attribute age and how many customers fall into that
range. Figure 2.1 shows the result of this analysis.
This view confirms Laura’s assumptions—the majority of shoppers is middle
aged, and the number of shoppers continuously declines toward higher age groups.
18 2 Practical Data Analysis: An Example

Fig. 2.1 A histogram for the distribution of the value of attribute age using 8 bins

Fig. 2.2 A histogram for the distribution of the value of attribute age using 40 bins

She creates a second histogram to better inspect the subtle but strange cliff at around
age 48 using finer setting for the bins. Figure 2.2 shows the result of this analysis.
Surprised, she notices the huge peak in the bin of ages 38–40. She discusses this
observation with colleagues and the administrator of the shopping card database.
They have no explanation for this odd concentration of 40-year-old people ei-
ther. After a few other investigations, a colleague of the person who—before his
retirement—designed the data entry forms suspects that this may have to do with
the coding of missing birth dates. And, as it turns out, this is in fact the case: forms
where people entered no or obviously nonsensical birth dates were entered into the
form as zero values. For technical reasons, these zeros were then converted into the
Java 0-date which turns out to be January 1, 1970. So these people all turn up with
the same birth date in the customer database and in turn have the same age after the
2.2 Data Understanding and Pattern Finding 19

Table 2.2 Laura’s clustering

result Cluster Age Avg. cart price Avg. purchases/
month

1 75.3 € 19.- 5.6

2 42.1 € 78.- 7.8
3 38.1 € 112.- 9.3
4 30.6 € 16.- 4.8
5 44.7 € 45.- 3.7

conversion Laura performed initially. Laura marks those entries in her database as
“missing” in order to be able to distinguish them in future analyses.
Similarly, she inspects the shopping basket and product database and cleans up a
number of other outliers and oddities. She then proceeds with the customer segmen-
tation task. As in her previous data analysis projects, Laura first writes down her
domain knowledge in form of a cognitive map, indicating relationships and depen-
dencies between the attributes of her database. Having thus recalled the interactions
between the variables of interest, she is well aware that the length of customer’s
history and the number of overall shopping trips affect the overall basket price, and
so she settles on the average basket price as a better estimator for the value of a
particular customer. She considers also distinguishing the different product cate-
gories, realizing that those, of course, also potentially affect the average price. For
the first step, she adds the average number of purchases per month, another indicator
for the revenue a customer brings in. Data aggregation is now a bit more complex,
but the modern data analysis tool she is using allows her to do the required join-
ing and pivoting operations effortlessly. Laura knows that clustering algorithms are
very sensitive to attributes with very different magnitudes, so she normalizes the
three attributes to make sure they all three contribute equally to the clustering result.
Running the same clustering algorithm that Stan was using, with the same setting
for the number of clusters to be found, she gets the result shown in Table 2.2.
Obviously, there is a cluster (#1) of older customers who have a relatively small
average basket price. There is also another group of customers (#4) which seems
to correlate to younger shoppers, also purchasing smaller baskets. The middle-aged
group varies wildly in price, however. Laura realizes that this matches her assump-
tion about family status—people with families will likely buy more products and
hence combine more products into more expensive baskets, which seems to explain
the difference between clusters #2/#3 and cluster #5. The latter also seem to shop
significantly less often. She goes back and validates some of these assumptions by
looking at shopping frequency and average basket size as well and also determines
the overall impact on store revenues for these different groups. She finally discusses
these results with her marketing and campaign specialists to develop strategies to
foster the customer groups which bring in the largest chunk of revenue and develop
the ones which seem to be under-represented.
20 2 Practical Data Analysis: An Example

2.3 Explanation Finding

The second analysis goal is another standard shopping basket analysis problem: find
product dependencies in order to better plan campaigns.

The Naive Approach Stan recently read in a book on practical data analysis how
association rules can find arbitrary such connections in market basket data. He runs
the association rule mining algorithm in his favorite data analysis tool with the de-
fault settings and inspects the results. Among the top-ranked generated rules, sorted
by their confidence, Stan finds the following output:
’foie gras’ (p1231) <- ’champagne Don Huberto’ (p2149),
’truffle oil de Rossini’ (p578) [s=1E-5, c=75%]
’Tortellini De Cecco 500g’ (p3456)’
<- ’De Cecco Sugo Siciliana’ (p8764) [s=1E-5, c=60%]
He quickly infers that this representation must mean that foie gras is bought when-
ever champagne and truffle oil are bought together and similarly for the other rule.
Stan knows that the confidence measure c is important, as it indicates the strength
of the dependency (the first rule holds in 3 out of 4 cases). He considers the sec-
ond measure of frequency s to be less important and deliberately ignores its fairly
small value. The two rules shown above are followed by a set of other, similarly lux-
ury/culinary product-oriented rules. Stan concludes that luxury products are clearly
the most important products on the shelf and recommends to his marketing man-
ager to launch a campaign to advertise some of the products on the right side of
these rules (champagne, truffle oil) to increase the sales of the left side (foie gras).
In parallel, he increases orders for these products, expecting a recognizable increase
in sales. He proudly sends the results of his analysis to Laura.

The Sound Approach Laura is puzzled by those nonintuitive results. She reruns
the analysis and notices the support values of the rules extracted by Stan—some
of the rules Stan extracted have indeed a remarkably high confidence, and some
do almost forecast shopping behavior. However, they have very low support values,
meaning that only a small number of shopping baskets containing the products were
ever observed. The rules that Stan found are not representative at all for his customer
base. To confirm this, she runs a quick query on her database and sees that, indeed,
there is essentially no influence on the overall revenue.
She notices that the problem of low support is caused by the fact that Stan ran
the analysis on product IDs, so in effect he was forcing the rules to differentiate
between brands of champagne and truffle oil. She reruns the analysis based on the
product categories instead, ranks them by a mix of support and confidence, and finds
a number of association rules with substantially higher support:
tomatoes <- capers, pasta [s=0.007, c=32%]
tomatoes <- apples [s=0.013, c=22%]
Laura focuses on rules with a much higher support measure s than before and also
realizes that the confidence measure c is significantly higher than one would expect
2.4 Predicting the Future 21

by chance. The first rule seems to be triggered by a recent fashion of Italian cooking,
whereas the apple/tomato-rule is a known aspect.
However, she is still irritated by one of the rules discovered by Stan, which has
a higher than suspected confidence despite a relatively low support. Are there some
gourmets among the customers who prefer a very specific set of products? Rerun-
ning this analysis on the shopping card owners yields almost the same results, so
the (potential) gourmets appear among their regular customers. Just to be sure, she
inspects how many different customers (resp. shopping cards) occur for baskets that
support this rule. As she had conjectured, there is a very limited number of cus-
tomers that seem to have a strong affection for these products. Those few customers
have bought this combination frequently, thus inflating the overall support measure
(which refers to shopping baskets). This means that the support in terms of the num-
ber of customers is even smaller than the support in terms of number of shopping
baskets. The response to any kind of special promotion would fall even shorter than
expected from Stan’s rule.
Apparently the time period in which the analyzed data has been collected influ-
ences the results. Thinking about it, she develops an idea how to learn about changes
in the customers shopping behavior: She identifies a few rules, some rather promis-
ing other well-known facts, and decides to monitor those combinations on a regular
basis (say quarterly). She got to know that a chain of liquor stores will soon open
a number of shops close to the own markets, so she picks some rules with bever-
ages in their conclusion part to see if the opening has any impact on the established
shopping patterns of the own customers. As she fears a loss of potential sales, she
plans a comparison of rules obtained not only over time but also among markets in
the vicinity of such stores versus the other markets. She wonders whether promot-
ing the products in the rule’s antecedent may help to bring back the customer and
decides to discuss this with the marketing&sales team to determine if and where
appropriate campaigns should be launched, once she has the results of her analysis.

2.4 Predicting the Future

The third and final analysis goal we consider in this brief overview is a forecasting
or prediction problem. The idea is to find some relationship in our existing data that
can help us to predict if and how customers will react to coupon mailings and how
this will affect our future revenue.

The Naive Approach Stan believes that no detailed analysis is required for this
problem and notices that it is fairly straightforward to monitor success. He has seen
at a competitor how discount coupons attract customers to purchase additional prod-
ucts. So he suggests launching a coupon campaign that gives customers a discount of
10% if they purchase products for more than €50. This coupon is mailed to all cus-
tomers on record. Throughout the course of the next month, he carefully monitors
his database and is positively surprised when he sees that his campaign is obviously
22 2 Practical Data Analysis: An Example

working: the average price of shopping baskets is going up in comparison with pre-
vious months. However, at the end of the quarter he is shocked to see that overall
revenues for the past quarter actually fell. His management is finally fed up with the
lack of performance and fires Stan.

The Sound Approach Laura, who is promoted to head of analytics for the north-
ern and southern super market chain first cancels Stan’s campaign and looks into the
underlying data. She quickly realizes that even though quite a number of customers
did in fact use the coupons and increased their shopping baskets, their average num-
ber of baskets per month actually went down—so quite a number of people seem
to have simply combined smaller shopping trips to be able to benefit from the dis-
count offer. However, for some shoppers, the combined monthly shopping basket
value did go up markedly, so there might be value here. Laura wonders how she can
discriminate between those customers who simply use the coupons to discount their
existing purchases and those who are actually enticed to purchase additional items.
She notices that one of the earlier generated customer segments correlates better
than others with the group of customers whose revenue went up—this fraction of
customers is significantly higher than in the other groups. She considers using this
very simple, manually designed predictor for a future campaign but wants to first
make sure that she cannot do better with some smarter techniques. She decides that
in the end it is not so important if she can actually understand the extracted model
but only how well it performs.
To provide good starting points for the modeling technique, she decides to gen-
erate a few potentially informative attributes first. Models that rely on thousands
of details typically perform poor, so providing how often every product has been
bought by the customer in the last month is not an option for her. To get robust mod-
els, she wants to aggregate the tiny bits of information, but what kind of aggregation
could be helpful? She returns to her cognitive map to review the dependencies. One
aspect is the availability of competitors: She reckons that customers may have alter-
native (possibly specialized) markets nearby but have been attracted by the coupon
this time, keeping them away from the competitors. She decides to aggregate the
money spent by the customer per month for a number of product types (such as bev-
erages, thinking of the chain of liquor stores again). She conjectures that customers
that perform well on average, but underperform in a specific segment only, may
be enticed by the coupon to buy products for the underperforming segment also.
Providing the segment performance before and after Stan’s campaign should help a
predictor to detect such dependencies if they exist.
The cognitive map brings another idea into her mind: people who appreciate the
full assortment but live somewhat further away from the own stores may see the
coupon as a kind of travel compensation. So she adds a variable expressing a coarse
estimation of the distance between the customer home and the nearest available
market (which is only possible for the shopping card owners). She continues to use
her cognitive map to address many different aspects and creates attributes that may
help to verify her hypotheses. She then investigates the generated attributes visually
and also technically by means of feature selection methods.
2.5 Concluding Remarks 23

After selecting the most promising attributes, she trains a classifier to distin-
guish the groups. She uses part of the data to simulate an independent test scenario
and thereby evaluates the expected impact of a campaign—are the costs created
by sending coupons to customers who do not purchase additional products offset
by customers buying additional items? After some additional model fine tuning,
she reaches satisfactory performance. She discusses the results with the market-
ing&sales team and deploys the prediction system to control the coupon mailings
for the next quarter. She keeps monitoring the performance of these coupon cam-
paigns over future quarters and updates her model sporadically.

2.5 Concluding Remarks

In this chapter we have, very briefly and informally, touched upon a number of issues
data analysts may encounter while making sense of real-world data. Many other
problems can arise, and many more methods for data analysis exist in the academic
literature and in real-world data analysis tools. We will attempt at covering the most
prominent and most often used examples in the following chapters.
Note that one of the biggest problems data analysts very often have is that the data
they get is not suited to answer the questions they are asked. For instance, if we were
supposed to use the data in our customer database to find out how to differentiate
Asian shopping behavior from European, we would have a very hard time. This
data can only be used to distinguish between different types of European shoppers
because it contains data from European markets only. Note also that we are (why
ever) assuming that we used a nice, representative sample of all different types of
European shoppers to generate the data—very often this is not the case, and the data
itself is already biased and will bias our analysis results—in this example we could
be heavily biased by the type of supermarket chain we used to record the data in
the first place. An upscale delicatessen supermarket will have dramatically different
shopping patterns than the low-scale discounter. We will be discussing these points
later in more depth as well.
http://www.springer.com/978-1-84882-259-7

My SAP FI Material Full
No ratings yet
My SAP FI Material Full
506 pages
Data Analytics Lecture Notes
100% (1)
Data Analytics Lecture Notes
10 pages
Unsw Mechanical Engineering Thesis Database
100% (3)
Unsw Mechanical Engineering Thesis Database
5 pages
Distribution Test Statistic / Formula Conditions
0% (1)
Distribution Test Statistic / Formula Conditions
10 pages
Creating An OFM Project
No ratings yet
Creating An OFM Project
9 pages
Power BI - Notes
No ratings yet
Power BI - Notes
13 pages
Form 4 pp1
No ratings yet
Form 4 pp1
9 pages
Data Processing and Analysis
100% (3)
Data Processing and Analysis
38 pages
Data Analytics - 4 Manuscripts - Data Science For Beginners, Data Analysis With Python, SQL Computer Programming For Beginners, Statistics For Beginners
100% (1)
Data Analytics - 4 Manuscripts - Data Science For Beginners, Data Analysis With Python, SQL Computer Programming For Beginners, Statistics For Beginners
481 pages
Acl 104
No ratings yet
Acl 104
49 pages
Module 3 - Data and Database Management
No ratings yet
Module 3 - Data and Database Management
21 pages
Practical File (2) - Output
No ratings yet
Practical File (2) - Output
37 pages
Data Analytics Process
No ratings yet
Data Analytics Process
10 pages
Advantages and Disadvantages of Data Analytics
No ratings yet
Advantages and Disadvantages of Data Analytics
6 pages
Data Sources Advance Data Handling
No ratings yet
Data Sources Advance Data Handling
23 pages
Database Management System Set2 Lab Practicals
100% (2)
Database Management System Set2 Lab Practicals
8 pages
Week2 - Master The Data
No ratings yet
Week2 - Master The Data
28 pages
Assessment 6 - Attempt Review
No ratings yet
Assessment 6 - Attempt Review
26 pages
DBMS Workbook
No ratings yet
DBMS Workbook
30 pages
Difference Between File System and DBMS
No ratings yet
Difference Between File System and DBMS
3 pages
Intro To DDBMS
No ratings yet
Intro To DDBMS
12 pages
Data Analytics and Its Processess - Models - Methods
No ratings yet
Data Analytics and Its Processess - Models - Methods
55 pages
Overview of Data Analysis
No ratings yet
Overview of Data Analysis
11 pages
BIM Workflow
No ratings yet
BIM Workflow
42 pages
Entity Framework
No ratings yet
Entity Framework
154 pages
Archiving 1
No ratings yet
Archiving 1
2 pages
m2 Final
No ratings yet
m2 Final
151 pages
ALJGSB Student Handbook & Regulations (Undergraduate)
No ratings yet
ALJGSB Student Handbook & Regulations (Undergraduate)
94 pages
Cassandra: Advanced Topics On Nosql Databases
No ratings yet
Cassandra: Advanced Topics On Nosql Databases
7 pages
Ai PL-300
No ratings yet
Ai PL-300
193 pages
Essay Format
No ratings yet
Essay Format
16 pages
Tasks and Responsibilities:: CP. Google: Sexe: Male - Day OF Birth: 16/10/1988 - Nationality: Malian, Married
No ratings yet
Tasks and Responsibilities:: CP. Google: Sexe: Male - Day OF Birth: 16/10/1988 - Nationality: Malian, Married
4 pages
ITSG Vacancy
No ratings yet
ITSG Vacancy
9 pages
Project Presentation2
No ratings yet
Project Presentation2
22 pages
Lecture 3 (DS) - Steps in Data Science Process
No ratings yet
Lecture 3 (DS) - Steps in Data Science Process
57 pages
Data Science and Data Analytics: Part B
No ratings yet
Data Science and Data Analytics: Part B
42 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
86 pages
Course 2
No ratings yet
Course 2
25 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
94 pages
Unit 1
No ratings yet
Unit 1
30 pages
BBA (B&amp I) 2nd Semester Regular &amp Reappear Result 2005 To 2009 Batch
No ratings yet
BBA (B&amp I) 2nd Semester Regular &amp Reappear Result 2005 To 2009 Batch
139 pages
Oral Assessement Instructions
No ratings yet
Oral Assessement Instructions
6 pages
Data Analytics Part 3
No ratings yet
Data Analytics Part 3
54 pages
Tamas Dravavolgyi v20200210
No ratings yet
Tamas Dravavolgyi v20200210
3 pages
Statistics Tutorial 1
No ratings yet
Statistics Tutorial 1
12 pages
Notes 3 (Prepare Coursera)
No ratings yet
Notes 3 (Prepare Coursera)
67 pages
Curso Data Analis
No ratings yet
Curso Data Analis
7 pages
2024 COMPUTER STUDIES NED MOCK PI Final
No ratings yet
2024 COMPUTER STUDIES NED MOCK PI Final
9 pages
Business Undestanding and Data Collection
No ratings yet
Business Undestanding and Data Collection
27 pages
Chapter 1-Introduction To Data
No ratings yet
Chapter 1-Introduction To Data
18 pages
Data Sources Data Handling Data Visualization
No ratings yet
Data Sources Data Handling Data Visualization
23 pages
Archiving Financial Accounting Documents
No ratings yet
Archiving Financial Accounting Documents
9 pages
1 Introduction To Data Analytics
No ratings yet
1 Introduction To Data Analytics
14 pages
Course 2 - 121756
No ratings yet
Course 2 - 121756
29 pages
Data Analytics Notes (Autorecovered)
No ratings yet
Data Analytics Notes (Autorecovered)
60 pages
Unit 1 Notes Final Part C
No ratings yet
Unit 1 Notes Final Part C
38 pages
FTA-Module 1-Notes
No ratings yet
FTA-Module 1-Notes
24 pages
SQLQueries 2012 Vol 3 First 100 Pages
No ratings yet
SQLQueries 2012 Vol 3 First 100 Pages
100 pages
Top 65 SQL Data Analysis Q&A
No ratings yet
Top 65 SQL Data Analysis Q&A
53 pages
Ebs 12.2 Cloning and Upgrade - V1
No ratings yet
Ebs 12.2 Cloning and Upgrade - V1
11 pages
(123doc) - Trac-Nghiem-Va-Dap-An-He-Thong-Thong-Tin
No ratings yet
(123doc) - Trac-Nghiem-Va-Dap-An-He-Thong-Thong-Tin
36 pages
MTECH IAR 2023 Abstract Syllabus Final
No ratings yet
MTECH IAR 2023 Abstract Syllabus Final
16 pages
How Does A Business Forecast
No ratings yet
How Does A Business Forecast
10 pages
Term2 Datascience Notes
No ratings yet
Term2 Datascience Notes
8 pages
ASM 2 Web
No ratings yet
ASM 2 Web
37 pages
Google Certificate Notes
No ratings yet
Google Certificate Notes
36 pages
Data Understanding and Prepration
100% (1)
Data Understanding and Prepration
10 pages
Week 1 Lecture
No ratings yet
Week 1 Lecture
26 pages
Unit 2
No ratings yet
Unit 2
81 pages
Data Analytics
No ratings yet
Data Analytics
32 pages
MGMT 134 C1
No ratings yet
MGMT 134 C1
5 pages
6 Phrase of Data Analysis
No ratings yet
6 Phrase of Data Analysis
9 pages
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
No ratings yet
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
28 pages
Documento General
No ratings yet
Documento General
42 pages
PVsyst Tutorials
No ratings yet
PVsyst Tutorials
120 pages
Lesson 1 Notes
No ratings yet
Lesson 1 Notes
14 pages
2.1 Data Analytics
No ratings yet
2.1 Data Analytics
16 pages
BDA-24 - Lect (3-4) - (Fundamentals of Data Analysis)
No ratings yet
BDA-24 - Lect (3-4) - (Fundamentals of Data Analysis)
15 pages
CSCE 4523 Introduction To Database Management Systems
No ratings yet
CSCE 4523 Introduction To Database Management Systems
9 pages
Ba Unit 1a
No ratings yet
Ba Unit 1a
18 pages
1 Introduction
No ratings yet
1 Introduction
9 pages
Data Analytics For Accounting - Chapter 3 - Data Analytics For Accouting - Performing The Test Plan and
No ratings yet
Data Analytics For Accounting - Chapter 3 - Data Analytics For Accouting - Performing The Test Plan and
10 pages
DA Interview Questions
No ratings yet
DA Interview Questions
7 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
8 pages
Data Analysis: Types, Process, Methods, Techniques and Tools
No ratings yet
Data Analysis: Types, Process, Methods, Techniques and Tools
6 pages
Flair Data Analytics Tutorial
No ratings yet
Flair Data Analytics Tutorial
9 pages
PrE7 Chapter 8 Data Analytics
No ratings yet
PrE7 Chapter 8 Data Analytics
20 pages
17 ch17 p17-1-17-46
No ratings yet
17 ch17 p17-1-17-46
46 pages
208 RM Lab File1 PDF
No ratings yet
208 RM Lab File1 PDF
31 pages
Top Practices To Ensure A Successful BPM Migration
No ratings yet
Top Practices To Ensure A Successful BPM Migration
39 pages
Data Analytics Unit1
No ratings yet
Data Analytics Unit1
24 pages
Oracle Automatic Storage Management: Notes
0% (1)
Oracle Automatic Storage Management: Notes
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Analytics Concerns

Uploaded by

Analytics Concerns

Uploaded by

Chapter 2

Practical Data Analysis: An Example

2.1 The Setup

M.R. Berthold et al., Guide to Intelligent Data Analysis, 15

• Customers: data about customers, stemming mostly from information collected

2.2 Data Understanding and Pattern Finding

Table 2.1 Stan’s clustering

Table 2.2 Laura’s clustering

1 75.3 € 19.- 5.6

2.3 Explanation Finding

2.4 Predicting the Future

2.5 Concluding Remarks

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.