0% found this document useful (0 votes)

33 views69 pages

IE5005 Lecture 02

Uploaded by

Braewyn Hsu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views69 pages

IE5005 Lecture 02

Uploaded by

Braewyn Hsu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

IE5005 Data Analytics for Industrial Engineers

Lecture 02. Descriptive Analytics and Data Visualization Tools

Dr. Wang Zhiguo

zhiguo.w@nus.edu.sg

Semester 1 AY2024/25
Course Outline

Descriptive analytics

01 •
•
•
Categories of Descriptive Analytics Methods
Fundamentals for descriptive analytics
Hands-on demo: descriptive analytics with Python

02
Data visualization tools
• Introduction to data visualization tools
• Hands-on demo with Tableau Public (basics)

03 Hands-on demo with Tableau Public (advanced)

•
•
Dashboard
Stories
Acknowledgement

The following materials have helped a lot in my preparation for

Lecture 02 & 03 materials.

Wilke, C. O. (2019). Fundamentals of data visualization: a

primer on making informative and compelling figures. O'Reilly
Media.

Belorkar, A., Guntuku, S. C., Hora, S., & Kumar, A.

(2020). Interactive Data Visualization with Python: Present your
data as an effective and compelling story. Packt Publishing Ltd.

Knaflic, C. N. (2015). Storytelling with data: A data visualization

guide for business professionals. John Wiley & Sons.

Introduction to Tableau. DataCamp.

Share Data Through the Art of Visualization, Google Data

Analytics Course. 3
01
Descriptive Analytics
• Categories of Descriptive Analytics Methods
• Fundamentals for descriptive analytics
• Descriptive analytics with Python
Categorization of analytical methods and models
Data analytics is generally thought to comprise 3 or 4* broad categories of techniques:

• Descriptive analytics encompasses the set of techniques that describes what has
happened in the past. Examples are data queries, reports, descriptive statistics, data
visualization including data dashboards, some data-mining techniques, and basic
what-if spreadsheet models.

• Diagnostic analytics is the process of using data to determine the causes of trends
and correlations between variables. It can be viewed as a logical next step after using
descriptive analytics to identify trends. (e.g. hypothesis testing, diagnostic regression
analysis, correlation/causation).

5
*Some literature divide into 3 categories where diagnostic analytics is considered as part of descriptive analytics.
Categorization of analytical methods and models

• Predictive analytics consists of techniques that use models constructed from past data
to predict the future or ascertain the impact of one variable on another. For example,
past data on product sales may be used to construct a mathematical model to predict
future sales. Linear regression, time series analysis, some data-mining techniques,
and simulation, often referred to as risk analysis, all fall under the banner of predictive
analytics.

• Prescriptive analytics indicates a course of action to take; that is, the output of a
prescriptive model is a decision. Predictive models provide a forecast or prediction,
but do not provide a decision. However, a forecast or prediction, when combined with
a rule, becomes a prescriptive model.

6
Reading

• Descriptive, answers the question, “What happened?”

• Diagnostic, answers the question, “Why did this happen?”
• Predictive, answers the question, “What might happen in the future?”
• Prescriptive, answers the question, “What should we do next?”

Check out the article “4 Examples of Business Analytics in Action” from Harvard
Business School. The article reveals how corporations use data insights to optimize their
decision-making process.

7
What is data?
❑ Data are facts and figures collected, analyzed, and summarized for presentation and
interpretation, including numbers, texts, images, audios, videos, and so on.

❑ Population Vs Sample

Name Weight (kg) Height (cm) Gender Year of Birth Performance

Andrew 77 175 M 1998 good
Bernhard 110 195 M 2003 average
Carolina 70 172 F 1999 average
Dennis 85 180 M 1998 poor
Eve 65 168 F 2002 good
… … … … … …

❑ Rows: observation, instance, or object.

❑ Columns: variable, attribute, or feature.
8
Data types
❑ Cross-sectional data are collected from multiple entities at the same point in time or
within same time interval.

7.8

6.1
wellness

4.6
3.9
3.5

#1 #2 #3 #4 #5
patient

9
Data types
❑ Time series data are repeated measurements of a single entity collected over multiple
points in time or a time period.

wellness of patient # 1

4.9
4.2
3.8 3.9
3.5

week 1 week 2 week 3 week 4 week 5

10
Data types

❑ Panel data (or time-series cross-section) are repeated measurements of multiple entities
collected over multiple points in time.

6.2 Patient # 2
5.9
5.6 5.4 Patient # 3
4.8 4.7 4.9 Patient # 1
4.6
4.2 4.2
3.9 3.8 3.9 3.9
3.5

week 1 week 2 week 3 week 4 week 5

#1 #2 #3

11
Data types
❑ Longitudinal data are repeated observations of a certain measure collected from multiple
entities over some extended time period.

Cohort study:
The repeated
Trend study: observations are
the repeated sampled from a
8.9
observations are cohort of patients
sampled from 7.8 7.5 under certain
wellness

random patients at 6.8 category. The same

6.1 6.3
each time point. the patient do not
5.4
same patient do not 4.8 4.7 4.9 necessarily
necessarily 4.2 4.2 3.9 participate from year
3.5 3.8
participate in the to year, but all
survey more than participants must
once meet whatever
categorical criteria
(from that cohort).
Week 1 Week 2 Week 3 Week 4 Week 5
12
Can you identify relationships between different data types?

cohort study
trend study panel study

longitudinal

cross Time’s
time series 60
10
2
7
1
6
4
5
3
9
8
sectional Up!

13
Types of measurement scales

❑ Qualitative variables are variables that can be placed into distinct categories in a
nominal or ordinal way.
• Nominal scale classifies data into mutually exclusive and exhausting categories in
which no order or ranking can be imposed on the data. [Gender (M, F)]
• Ordinal scale classifies data into categories that can be ranked, however, precise
differences between the ranks do not exist. [Rating (good, normal, poor)]

14
Types of measurement scales

❑ Quantitative variables are numerical and can be ordered or ranked.

• Interval (relative) scale ranks data and precise differences between units of
measure do exist. However, there is no meaningful/absolute zero. [Temperature]
• Ratio (absolute) scale possesses all the characteristics of interval scale, and there
exists a true/absolute zero. As a result, true ratios exist. [Height]

15
Types of measurement scales

Examples of different measurement scales

Nominal Ordinal Interval Ratio
• Gender • Rating (good, • Temperature • Height
• Zip code normal, poor) • IQ • Weight
• Color (red, green, • Grade (A, B, C, D) • Calendar year • Age
blue, …) • Judging (1st place, • Speed
• Religion 2nd place,…)
(Christianity, • Ranking of tennis
Buddhism, …) players
• Nationality • Size (small,
• Major (maths, medium, large)
computing,..) • Satisfaction
• Marital status (unsatisfied, meet
(single, married, the expectation,
divorced,…) satisfied)
16
Quiz 1
Name Weight (kg) Height (cm) Gender Year of Birth Performance
Andrew 77 175 M 1998 good
Bernhard 110 195 M 2003 average
Carolina 70 172 F 1999 average
Dennis 85 180 M 1998 poor
Eve 65 168 F 2002 good
… … … … … …

1. What type of data is this?

A. Cross-sectional
B. Time series
C. Panel
D. Longitudinal

17
Quiz 2
Name Weight (kg) Height (cm) Gender Year of Birth Performance
Andrew 77 175 M 1998 good
Bernhard 110 195 M 2003 average
Carolina 70 172 F 1999 average
Dennis 85 180 M 1998 poor
Eve 65 168 F 2002 good
… … … … … …

2. What are the respective scale type for variables (Name, Height, Year of Birth,
Performance)?
A. Nominal, Ordinal, Ordinal, Ordinal
B. Nominal, Ratio, Ordinal, Ordinal
C. Nominal, Ratio, Ratio, Interval
D. Nominal, Ratio, Interval, Ordinal Time’s
60
10
2
7
1
6
4
5
3
9
8
Up!

18
(Answer)
Name Weight (kg) Height (cm) Gender Year of Birth Performance
Andrew 77 175 M 1998 good
Bernhard 110 195 M 2003 average
Carolina 70 172 F 1999 average
Dennis 85 180 M 1998 poor
Eve 65 168 F 2002 good
nominal ratio ratio nominal interval ordinal

1. What type of data is this?

A. Cross-sectional
B. Time series
C. Panel
D. Longitudinal

2. What are the respective scale type for each variable?

19
Case Study: 50 wealthiest people in the world
Suppose the ages of top 50 wealthiest people in the world are listed in Forbes Magazine.
Here are the data in the original form (or what we call raw data):

49 57 38 73 81
74 59 76 65 69
54 56 69 68 78
65 85 49 69 61
48 81 68 37 43
78 82 43 64 67
52 56 81 77 79
85 40 85 59 80
60 71 57 61 69
61 83 90 87 74

Little insights can be obtained from looking at this raw data without proper organization of data.
20
Descriptive univariate analytics

To describe situation, draw conclusion, or make inference about events, the data analyst
must organize the data in some meaningful way. The most convenient methods of
organizing data include the construction of frequency distribution and statistical
measures.

After organizing the data, the analyst must present them to stakeholders. The most useful
intuitive way is to draw charts or plots.

So, we are going to look at

• frequency distribution
• statistical measures
• data visualization plots

21
Frequency table
A frequency table of ‘age’ can be constructed as

Age group Absolute frequency Relative frequency

35—41 3 6%
42—48 3 6%
49—55 4 8%
56—62 10 20%
63—69 10 20%
70—76 5 10%
77—83 10 20%
84—90 5 10%
Total 50 100%

After organizing the data into frequency table, the peaks (which class has the most data values
compared to other classes) and outliers (extremely large or small values relative to other data) can
be analyzed.

22
Frequency distribution

Frequency Distribution for Ages of Top 50 Wealthiest people

8
Frequency

0
35—41 42—48 49—55 56—62 63—69 70—76 77—83 84—90
Age Group

23
Frequency table
Frequency table of nominal or ordinal scale variable can also be constructed. For example, gender

Gender Absolute frequency Relative frequency

Male 45 90%
Female 5 10%
Total 50 100%

Cumulative frequency table which shows the number of data values less than or equal to a specific
value can also be constructed

Age (upper bound) Absolute frequency Relative frequency

≤41 3 6%
≤48 6 12%
≤55 10 20%
≤62 20 40%
≤69 30 60%
≤76 35 70%
≤83 40 90%
≤90 50 100% 24
Frequency distribution
So far, the frequency table (distribution) has been constructed from sample data. When it
comes to population, we call it

• probability mass function (p.m.f.) for the discrete attribute; and

• probability density function (p.d.f.) for the continuous attribute.

Q. How can we construct the probability distribution of a population? Do we need to

access to all instances of that population?

A large number of situations in real life follows some already known and well-defined
distribution function. So in many cases, we do not need to access all instances of a given
population.

25
Shape of frequency distribution

Right (Positively) Skewed Zero skew (symmetric) Left (Negatively) Skewed

Source of figure: https://www.scribbr.com/statistics/skewness/

26
Plot Qualitative Quantitative
Univariate data visualization 5

Pie yes no 45

male female
45
Most of times, NO.
(Yes, only when there
Bar yes
are small and limited 5
number of values)
male female

Line no yes

ages of top wealth

20% 20% 20%

Histogram no yes 6% 6%
8%
10% 10%

27
We can also summarize data using summary statistic.

A statistic is a descriptor which describes numerically a characteristic

of the sample of the population.

We are going to talk about 3 groups of statistical measures.

1. Measures of central tendency
2. Measures of location
3. Measures of dispersion or variation
Measures of central tendency

Central tendency statistics identify the central position within the dataset.

𝑥1 +𝑥2 +⋯𝑥𝑛
• (Arithmetic) Mean =
𝑛

• Mode (1, 2, 3, 3, 3, 4, 5)

• Median (1, 2, 3, 3, 3, 4, 5)

𝑥𝑚𝑎𝑥 + 𝑥𝑚𝑖𝑛
• Midrange =
2

29
Measures of location
Location statistics identify a value in a certain position and tell us its relative position in
comparison with other data values. Some commonly used location univariate statistics
include:

• 1st quartile (25%)

• 2nd quartile (Median)

• 3rd quartile (75%)

• Decile (10%, 20%, 30%,…)

• Percentile (5th percentile, 95th percentile, 99th percentile,…)

30
Example
Data for bottled water sales at Queensland Amusement Park for a sample 14 summer
days are available in “BottledWater.csv”.

High Temperature (degrees F) Bottled Water Sales (cases)

1 78 23
Convince yourself that the following
2 79 22
statistics are correct for variable
3 80 24
‘High Temperature (degrees F)’:
4 80 22
5 82 24
6 83 26
7 85 27
8 86 25
9 87 28
10 87 26
11 88 29
12 88 30
13 90 31
14 92 31

31
Descriptive analytics with Python

Dataset: BottledWater.csv

Python Notebook: IE5005 Lecture 02.ipynb

Download the above 2 files and save them in a

same folder. Then launch Jupyter Notebook
and open ‘IE5005 Lecture 02.ipynb’.

32
Boxplot

max
Boxplot presents a 5-number summary of the data.

Boxplot can also be used to describe how

3rd quartile symmetric/skewed the distribution of a variable is.
For example, in the plot on left, the values are
median
concentrated in the high part. So, it is left skewed.

1st quartile

min

Python Notebook: IE5005 Lecture 02.ipynb

33
What does ‘average’ or ‘mean’ refer to?
When we talk about ‘average’ or ‘mean’ in our daily life, we normally refer to the classic
𝑥1 +𝑥2 +⋯𝑥𝑛
arithmetic mean = . However, there are other types of means which are useful in
𝑛

special occasions.

σ 𝑤𝑖 𝑥𝑖 𝑤1 𝑥1 +𝑤2 𝑥2 +⋯𝑤𝑛 𝑥𝑛
❑ Weighted mean = σ 𝑤𝑖
= .
𝑤1 +𝑤2 +⋯𝑤𝑛

Arithmetic mean is a special case of weighted mean which assumes equal weightage in
each observation.

34
What does ‘average’ or ‘mean’ refer to?

σ 𝑤𝑖 𝑥𝑖 𝑤1 𝑥1 +𝑤2 𝑥2 +⋯𝑤𝑛 𝑥𝑛
❑ Weighted mean = σ 𝑤𝑖
= .
𝑤1 +𝑤2 +⋯𝑤𝑛

Arithmetic mean is a special case of weighted mean which assumes equal weightage in
each observation.

A student received an A in Mathematics (3 credits), a C in Psychology (3 credits), a B in biology (4

credits), a D in History (2 credits). Assuming a score of A, B, C, D, F, correspond to 4, 3, 2, 1, 0, grade
points respectively. What is the student’s grade point average?

Suggested answer
Course Credits (𝑤𝑖 ) Grade (𝑥𝑖 )
Mathematics 3 A (4 points)
Psychology 3 C (2 points) 3∙4+3∙2+4∙3+2∙1
𝑥ҧ = = 2.7
Biology 4 B (3 points) 3+3+4+2
History 2 D (1 point)
35
Are you using ‘average’ or ‘mean’ correctly?

The stock price on day 1 was $100 per share.

The stock price decreased by 50% on day 2.

The stock price then increased by 50% on day 3.

Since the stock price first decreased by 50% and then increased by 50%, the average
−50%+50%
growth rate of stock price is = 0%.
2

So, I’m neither earning nor losing money. Correct?

36
Are you using ‘average’ or ‘mean’ correctly?

The stock price on day 1 was $100 per share.

The stock price decreased by 50% on day 2. That is,

100 1 − 50% = $50
So the stock price became $50 on day 2.

The stock price then increased by 50% on day 3. That is,

50 1 + 50% = $75
The price became $75 on day 3.
Apparently, I’m losing money as the share price has dropped from $100 to $75.

−𝟓𝟎%+𝟓𝟎%
So the (arithmetic mean) growth rate computed as = 𝟎% can be misleading in
𝟐
this context.
37
Geometric mean

Therefore, we can tell, this statement is not right. If the average growth rate is 0%, we
should get back the stock price as $100 on day 3, isn’t it? This is where the arithmetic
mean (AM) may not work well.
We should adopt Geometric mean (GM)

𝑛
1
𝐺𝑀 = (ෑ 𝑥𝑖 )𝑛 = 𝑛 𝑥1 𝑥2 ⋯ 𝑥𝑛
𝑖=1
If we denote the average growth rate as 𝑅 , we can compute it as 1+𝑅 =

(1 − 50%)(1 + 50%) ≈ −13.40% . By average, that means it should be something

constant across all periods. If you compute 100 ∙ 1 − 13.40% ∙ 1 − 13.40% = 75, and you
get the price $75 on day 3. This is not by coincidence.

38
Are you using ‘average’ or ‘mean’ correctly?

Suppose the distance between your home and school is d. You drive from home to
school at a speed x = 60 km/h; and returns from school to home at a speed y = 20 km/h,
then your average driving speed is (60 + 20)/2 = 40 km/h. Correct or wrong?

60 km/h

home school
20 km/h

39
Harmonic mean

𝑡𝑜𝑡𝑎𝑙 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒
We know the average driving speed can be computed as =
𝑡𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒

120 120
For instance, the distance d = 120km. It takes you = 2 hours and = 6 hours
60 20

respectively, from-home-to-school and from-school-to-home.

𝑡𝑜𝑡𝑎𝑙 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 240
The average driving speed is = = 30km/h.
𝑡𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒 8

This is where we can use Harmonic mean (HM)

𝑛
𝐻𝑀 =
1 1
+ ⋯+
𝑥1 𝑥𝑛
2
Average speed = 1 1 = 30km/h.
+
60 20
40
Summary
When dealing with additive Example:
Arithmetic mean relationships (e.g., heights, weights) Average height of students in a class.

The geometric mean is used when Example:

dealing with data that represents Investment returns: If you have annual returns of
growth rates, ratios, or percentages 10%, 20%, and 30%, you would use the geometric
over time. It is particularly useful for mean to calculate the average rate of return over
Geometric mean
averaging numbers with multiplicative the three years. This is because the returns are
relationships (e.g., growth rates, multiplicative.
returns) Population growth: population growth rates for
several years
The harmonic mean is used when Example:
dealing with rates or ratios where the Average speed: If you travel a certain distance at 30
denominator is significant. It is often km/h and then the same distance at 60 km/h, the
used in situations involving averages of average speed is not the arithmetic mean of 30 and
Harmonic mean rates or ratios (e.g., speeds, 60 (45 km/h), but rather the harmonic mean, which
resistances) would be closer to 40 km/h.
Electrical resistance: When resistors are connected
in parallel, the total resistance is calculated using
the harmonic mean of the individual resistances.
41
Measure of dispersion (variation)
A dispersion statistic measures how distinct different values are. Some commonly used
measures include:

• Range = 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛

• Interquartile range (IQR) = 3rd quartile – 1st quartile
• Variance (population/sample)
σ(𝑋 − 𝜇)2 ത 2
σ(𝑋 − 𝑋)
2
𝜎 = 2
/ 𝑠 =
𝑁 𝑛−1
• Standard deviation (population/sample)
σ(𝑋 − 𝜇)2 ത 2
σ(𝑋 − 𝑋)
𝜎= 𝜎2 = / 𝑠= 𝑠2 =
𝑁 𝑛−1
• Coefficient of variation [compute the dispersion when the units are different]
𝜎
𝐶𝑉𝑎𝑟 = ∙ 100%
𝜇
• Mean absolute deviation
σ𝑁𝑖=1 |𝑋𝑖 − 𝜇|
𝑀𝐴𝐷 =
𝑁
42
Descriptive analytics with Python

Dataset: BottledWater.csv

Python Notebook: IE5005 Lecture 02.ipynb

Download the above 2 files and save them in a

same folder. Then launch Jupyter Notebook
and open ‘IE5005 Lecture 02.ipynb’.

43
Example
Data for bottled water sales at Queensland Amusement Park for a sample 14 summer
days are available in “BottledWater.csv”.

High Temperature (degrees F) Bottled Water Sales (cases)

44
Chebyshev’s theorem

The proportion of values from a data set that will fall within 𝑘 standard deviation of the
1
mean will be at least 1 − , where 𝑘 is a number greater than 1.
𝑘2

Remark: 𝑘 is not necessarily an integer.

For example
• We can estimate that at least 75% of the data values will fall within 2 standard
deviations of the mean of any data set. (𝑘 = 2).

45
Exercise 1

Suppose the mean housing price in a certain district is $50,000, and the standard
deviation is estimated to be $10,000. Estimate the price range for which at least 75% of
the houses will sell.

46
Exercise 1 (Answer)

Suppose the mean housing price in a certain district is $50,000, and the standard
deviation is estimated to be $10,000. Find the price range for which at least 75% of the
houses will sell.

Based on Chebyshev’s theorem, 75% of data values will fall within k = 2 standard
deviations around the mean.
50000 − 2 ∗ 10000, 50000 + 2 ∗ 10000 = [30000, 70000]
So, 75% of all houses sold should be estimated to be in range from $30,000 to $70,000.

47
Exercise 2

A survey of local companies found that the mean amount of travel allowance for
executives was $0.25 per mile. The standard deviation was $0.02. Using Chebyshev’s
theorem, find the minimum percentage of the data values that will fall between $0.20
and $0.30.

48
Exercise 2 (Answer)

Subtract the mean from the upper bound:

$0.30 − $0.25 = $0.05
Divide the difference by the standard deviation to get k
0.05
𝑘= = 2.5
0.02
Use Chebyshev’s theorem to find the percentage
1
1 − 2 = 84%
𝑘
Hence, at least 84% of data will fall between $0.20 and $0.30.

49
Choice of Visualization
• Some commonly used plots
• Which plot should I use for visualization?

How to choose visualization (different chart types).pdf

Some commonly used plots
Line chart
A line chart is used to track changes over short and long periods of time. When smaller
changes exist, line charts are better to use than bar graphs. Line charts can also be used
to compare changes over the same period of time for more than one group.

Let’s say you want to present the graduation frequency for a particular high school
between the years 2008-2012.

51
Source of Figure
Some commonly used plots
Bar chart
Bar charts use size to contrast and compare two or more values, using height or lengths
to represent the specific values.

The below is example data concerning sales of vehicles over the course of 5 months:

52
Source of Figure
Some commonly used plots
Heatmap
Similar to bar charts, heatmaps also use color to compare categories in a data set. They
are mainly used to show relationships between two variables and use a system of color-
coding to represent different values.

The following heatmap plots temperature changes for each city during the hottest and
coldest months of the year.

53
Source of Figure
Some commonly used plots
Pie chart
The pie chart is a circular graph that is divided into segments representing proportions
corresponding to the quantity it represents, especially when dealing with parts of a
whole. For example, let’s say you are determining favorite movie categories among avid
movie watchers.

54
Source of Figure
Some commonly used plots
Scatter plot
Scatterplots show relationships between different variables. Scatterplots are typically
used for two variables for a set of data, although additional variables can be displayed.

For example, you might want to show data of the relationship between temperature
changes and ice cream sales. It would resemble something like this:

55
Source of Figure
After-class Reading
The data visualization catalogue: This catalogue features a range of different diagrams,
charts, and graphs to help you find the best fit for your project. As you navigate each
category, you will get a detailed description of each visualization as well as some related
programming codes or software.

56
Which plot should I use?
With so many visualization options out there for you to choose from, how do you decide
what is the best way to represent your data?
A decision tree leading to the best chart

This depends on what story you want to tell.

Here is a simple decision tree to help you choose the suitable type of plot to use:

57
More resources to help you decide which plot to use

From Data to Viz

Selecting the best chart I

Selecting the best chart II

58
02
Data Visualization Tools

• Introduction to data visualization tools

• Hands-on demo with Tableau Public basics
Introduction to data visualization tools

Tableau public
Looker Studio
Excel

Google Analytics 4

Power BI

ggplot2
Matplotlib Seaborn
60
Excel/Google Sheets

Find out more about data visualization in spreadsheet here:

• Types of charts and graphs in Google Sheets: a Google Help Center page with a list of
chart examples you can download.
• Excel Charts: a tutorial outlining all of the different chart types in Excel, including
some subcategories.

61
Tableau Public

Tableau public (need to register with an email account)

https://public.tableau.com/app/discover

It provides two mode:

1. Web authoring
2. Download the tableau public edition to desktop with the registered account
https://www.tableau.com/products/public/download

62
Tableau Versions

Tableau Public Tableau Desktop

Free Paid
All visualizations All visualizations
Connect to Excel and CSV file only All listed data sources
15 million rows of data Unlimited rows of data
Save online only (publish) Save locally
Public reports

63
Power BI

Microsoft Power BI is a tool that helps organize and

visualize data from different sources. It lets you
connect to data, clean and structure it, create
visualizations, and easily share your findings with
others.

64
Power BI

According to Gartner, over 97% of Fortune 500

companies use Power BI. With so many
companies around the world using Power BI
learning and mastering this BI tool can help
you progress in your data-related career.

• Leaders: Competitive providers that are known to execute well

against their current vision and are often innovative giants in their
industry
• Visionaries: Full of providers that understand where the future
market is going or have a strong vision for where it will end up
• Niche Players: Highly focused on a small segment
• Challengers: Often dominate a large segment
65
Power BI (register a Power BI account)

https://www.microsoft.com/en-us/power-platform/products/power-bi/getting-started-with-power-bi

66
Power BI (installation)
Windows Users

https://www.microsoft.com/en/power-platform/products/power-bi/desktop?market=af

Mac Users

https://app.powerbi.com/

67
Feel free to share your feedback with me via this
link/QR code throughout the whole semester.

https://app.sli.do/event/hUgiGrg7Ln8KeEFVyCT9o3

68
Thank You!

Final UNIT II-DESCRIPTIVE ANALYTICS
No ratings yet
Final UNIT II-DESCRIPTIVE ANALYTICS
128 pages
Lecture 7 - Analyze The Data Using Statistics
No ratings yet
Lecture 7 - Analyze The Data Using Statistics
104 pages
Final (RK) - III 11 - 12 - La 4 - Learning Activity 5 - Analyzing Research Data
No ratings yet
Final (RK) - III 11 - 12 - La 4 - Learning Activity 5 - Analyzing Research Data
15 pages
Descriptive Analytics - Uni and Bi
No ratings yet
Descriptive Analytics - Uni and Bi
36 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
18 pages
BA1 Introduction 2025
No ratings yet
BA1 Introduction 2025
55 pages
Descriptive Analytics - Univariate and Bivariate
No ratings yet
Descriptive Analytics - Univariate and Bivariate
41 pages
Dealing With Different Type of Data
No ratings yet
Dealing With Different Type of Data
32 pages
Statistics For Decision-Making 2024
No ratings yet
Statistics For Decision-Making 2024
375 pages
Dataanalysiswithspssppt 221110071954 6ebd3b41
No ratings yet
Dataanalysiswithspssppt 221110071954 6ebd3b41
189 pages
Data Science Lecture No 03
No ratings yet
Data Science Lecture No 03
23 pages
DS1 Section D
No ratings yet
DS1 Section D
14 pages
Research Report
No ratings yet
Research Report
47 pages
1 Introduction
No ratings yet
1 Introduction
15 pages
22UCS303 DS-Unit III-N
No ratings yet
22UCS303 DS-Unit III-N
85 pages
Chapter 1 Classification and Graphical Presentation (Becon 2025)
No ratings yet
Chapter 1 Classification and Graphical Presentation (Becon 2025)
67 pages
Ids Unit-2
No ratings yet
Ids Unit-2
26 pages
Dashboard Creation in Motionboard - Day 1 - HO
No ratings yet
Dashboard Creation in Motionboard - Day 1 - HO
126 pages
CH 01
No ratings yet
CH 01
36 pages
Pa 1 2024
No ratings yet
Pa 1 2024
88 pages
Data Analytics With Python Lecture 1
No ratings yet
Data Analytics With Python Lecture 1
23 pages
Chapter 1
No ratings yet
Chapter 1
16 pages
Bridging Blaze Lbolytc Finals Reviewer
No ratings yet
Bridging Blaze Lbolytc Finals Reviewer
33 pages
QT Summary Document 1
No ratings yet
QT Summary Document 1
45 pages
CE2B - BARUELO, CHRISTIAN - Check-Up Activity On Introduction To Stats and DA
No ratings yet
CE2B - BARUELO, CHRISTIAN - Check-Up Activity On Introduction To Stats and DA
2 pages
File tổng hợp kiến thức SB
No ratings yet
File tổng hợp kiến thức SB
148 pages
Quantitative Methods 3
No ratings yet
Quantitative Methods 3
174 pages
EECM3724 Unit 1 Ch1 Slides 2022
No ratings yet
EECM3724 Unit 1 Ch1 Slides 2022
24 pages
LBYACST (Lecture Notes)
No ratings yet
LBYACST (Lecture Notes)
9 pages
CH 8 Data Analysis
No ratings yet
CH 8 Data Analysis
34 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
62 pages
Biostatistics and Research Methodology
From Everand
Biostatistics and Research Methodology
Dr. G. Nageswara Rao
5/5 (5)
Topic 1 Introduction To Statistics
No ratings yet
Topic 1 Introduction To Statistics
35 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Lecture 1,2&3
No ratings yet
Lecture 1,2&3
80 pages
Lesson 2 Notes
No ratings yet
Lesson 2 Notes
11 pages
QM 1
No ratings yet
QM 1
58 pages
02.introduction, Importance, Fundamental Aspects
No ratings yet
02.introduction, Importance, Fundamental Aspects
30 pages
3rd Sem Bcom BBA Professional Business Skills Full Notes Krishtalkz
0% (1)
3rd Sem Bcom BBA Professional Business Skills Full Notes Krishtalkz
63 pages
BoS - Session 1
100% (1)
BoS - Session 1
37 pages
Sbe10 01
No ratings yet
Sbe10 01
7 pages
01 Introduction
No ratings yet
01 Introduction
50 pages
Intro To Business Analytics
No ratings yet
Intro To Business Analytics
27 pages
Slides Week2 DataCollection
No ratings yet
Slides Week2 DataCollection
26 pages
Ba Lecture 2
No ratings yet
Ba Lecture 2
54 pages
MGT 1103
No ratings yet
MGT 1103
4 pages
Lecture 2-Introduction To Satistics
No ratings yet
Lecture 2-Introduction To Satistics
43 pages
Week 01, PT 1
No ratings yet
Week 01, PT 1
16 pages
MITSMR Leaders Guide To AI 2023
No ratings yet
MITSMR Leaders Guide To AI 2023
30 pages
IBM - Big Data Architecture and Patterns
No ratings yet
IBM - Big Data Architecture and Patterns
43 pages
Quantitative Methods For Management: Term II 4 Credits MGT 408
No ratings yet
Quantitative Methods For Management: Term II 4 Credits MGT 408
49 pages
Descriptive Statistics: Instructor: Maira Sami
No ratings yet
Descriptive Statistics: Instructor: Maira Sami
55 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Week 01, PT 1
No ratings yet
Week 01, PT 1
16 pages
Statistics Dont Delete
No ratings yet
Statistics Dont Delete
42 pages
SHRM Term Paper
No ratings yet
SHRM Term Paper
17 pages
Notes (Chapter 1 - 3)
No ratings yet
Notes (Chapter 1 - 3)
15 pages
1-Introduction To Statistics PDF
100% (1)
1-Introduction To Statistics PDF
37 pages
SDSCM Project - PPT Content
No ratings yet
SDSCM Project - PPT Content
20 pages
ScriptingAutomation V15.0 1
100% (1)
ScriptingAutomation V15.0 1
323 pages
Introduction To STATISTICS-new
No ratings yet
Introduction To STATISTICS-new
44 pages
7.data Analysis 1
No ratings yet
7.data Analysis 1
22 pages
Crpto On Chain Analytics Learning Roadmap
No ratings yet
Crpto On Chain Analytics Learning Roadmap
5 pages
The Future of Marketing Analytics Trends and Emerging Technologies
No ratings yet
The Future of Marketing Analytics Trends and Emerging Technologies
10 pages
Artificial Intelligence For Information Management: A Healthcare Perspective
No ratings yet
Artificial Intelligence For Information Management: A Healthcare Perspective
30 pages
Data Science & Its Applications
No ratings yet
Data Science & Its Applications
16 pages
Notes (Chapter 1 - 3)
No ratings yet
Notes (Chapter 1 - 3)
15 pages
Descriptive and Predictive Analytics
0% (1)
Descriptive and Predictive Analytics
45 pages
Statistics and Analysis Notes
No ratings yet
Statistics and Analysis Notes
8 pages
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
No ratings yet
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
53 pages
Fundamentals of Business Analytics With Spreadsheet
100% (1)
Fundamentals of Business Analytics With Spreadsheet
22 pages
SSI CRM Strategy
No ratings yet
SSI CRM Strategy
6 pages
Marketing Research
No ratings yet
Marketing Research
29 pages
404 BA - Artificial Intelligence in Business Applications CCE
No ratings yet
404 BA - Artificial Intelligence in Business Applications CCE
3 pages
Da Notes-1
No ratings yet
Da Notes-1
21 pages
The Role of Artificial Intelligence in Learning and Development
No ratings yet
The Role of Artificial Intelligence in Learning and Development
5 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
AI 4 Stock Market
No ratings yet
AI 4 Stock Market
9 pages
DataSeer Training Prospectus
No ratings yet
DataSeer Training Prospectus
25 pages
Predictve Analytics Imp Bits and Quetions
No ratings yet
Predictve Analytics Imp Bits and Quetions
5 pages
Decision Trees For Classification - Document
No ratings yet
Decision Trees For Classification - Document
4 pages
Wipro Analytics Maximizing Sales Opportunities
No ratings yet
Wipro Analytics Maximizing Sales Opportunities
8 pages
Comprehensive Guide To Business Analytics
No ratings yet
Comprehensive Guide To Business Analytics
10 pages
Ds Capstone Template Coursera
No ratings yet
Ds Capstone Template Coursera
49 pages
01 Business Intelligence and Analytics
No ratings yet
01 Business Intelligence and Analytics
15 pages
Predictive Analytics Healthcare Clinical Practice
No ratings yet
Predictive Analytics Healthcare Clinical Practice
3 pages
Unit 2
No ratings yet
Unit 2
35 pages
Predictive Analytics For Future Life Expectancy Using Machine Learning
No ratings yet
Predictive Analytics For Future Life Expectancy Using Machine Learning
6 pages
What Is Data Analysis
No ratings yet
What Is Data Analysis
6 pages
AI Associate Dump
No ratings yet
AI Associate Dump
9 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

IE5005 Lecture 02

Uploaded by

IE5005 Lecture 02

Uploaded by

IE5005 Data Analytics for Industrial Engineers

Lecture 02. Descriptive Analytics and Data Visualization Tools

Dr. Wang Zhiguo

03 Hands-on demo with Tableau Public (advanced)

The following materials have helped a lot in my preparation for

Wilke, C. O. (2019). Fundamentals of data visualization: a

Belorkar, A., Guntuku, S. C., Hora, S., & Kumar, A.

Knaflic, C. N. (2015). Storytelling with data: A data visualization

Introduction to Tableau. DataCamp.

Share Data Through the Art of Visualization, Google Data

• Descriptive, answers the question, “What happened?”

Name Weight (kg) Height (cm) Gender Year of Birth Performance

❑ Rows: observation, instance, or object.

week 1 week 2 week 3 week 4 week 5

week 1 week 2 week 3 week 4 week 5

random patients at 6.8 category. The same

❑ Quantitative variables are numerical and can be ordered or ranked.

Examples of different measurement scales

1. What type of data is this?

1. What type of data is this?

2. What are the respective scale type for each variable?

So, we are going to look at

Age group Absolute frequency Relative frequency

Frequency Distribution for Ages of Top 50 Wealthiest people

Gender Absolute frequency Relative frequency

Age (upper bound) Absolute frequency Relative frequency

• probability mass function (p.m.f.) for the discrete attribute; and

• probability density function (p.d.f.) for the continuous attribute.

Q. How can we construct the probability distribution of a population? Do we need to

Right (Positively) Skewed Zero skew (symmetric) Left (Negatively) Skewed

Source of figure: https://www.scribbr.com/statistics/skewness/

ages of top wealth

A statistic is a descriptor which describes numerically a characteristic

We are going to talk about 3 groups of statistical measures.

• 1st quartile (25%)

• 2nd quartile (Median)

• 3rd quartile (75%)

• Decile (10%, 20%, 30%,…)

• Percentile (5th percentile, 95th percentile, 99th percentile,…)

High Temperature (degrees F) Bottled Water Sales (cases)

Python Notebook: IE5005 Lecture 02.ipynb

Download the above 2 files and save them in a

Boxplot can also be used to describe how

Python Notebook: IE5005 Lecture 02.ipynb

A student received an A in Mathematics (3 credits), a C in Psychology (3 credits), a B in biology (4

The stock price on day 1 was $100 per share.

The stock price decreased by 50% on day 2.

The stock price then increased by 50% on day 3.

So, I’m neither earning nor losing money. Correct?

The stock price on day 1 was $100 per share.

The stock price decreased by 50% on day 2. That is,

The stock price then increased by 50% on day 3. That is,

(1 − 50%)(1 + 50%) ≈ −13.40% . By average, that means it should be something

respectively, from-home-to-school and from-school-to-home.

This is where we can use Harmonic mean (HM)

The geometric mean is used when Example:

• Range = 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛

Python Notebook: IE5005 Lecture 02.ipynb

Download the above 2 files and save them in a

High Temperature (degrees F) Bottled Water Sales (cases)

Remark: 𝑘 is not necessarily an integer.

Subtract the mean from the upper bound:

How to choose visualization (different chart types).pdf

This depends on what story you want to tell.

From Data to Viz

Selecting the best chart I

Selecting the best chart II

• Introduction to data visualization tools

Find out more about data visualization in spreadsheet here:

Tableau public (need to register with an email account)

It provides two mode:

Tableau Public Tableau Desktop

Microsoft Power BI is a tool that helps organize and

According to Gartner, over 97% of Fortune 500

• Leaders: Competitive providers that are known to execute well

You might also like