0% found this document useful (0 votes)
33 views69 pages

IE5005 Lecture 02

Uploaded by

Braewyn Hsu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views69 pages

IE5005 Lecture 02

Uploaded by

Braewyn Hsu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

IE5005 Data Analytics for Industrial Engineers

Lecture 02. Descriptive Analytics and Data Visualization Tools

Dr. Wang Zhiguo


zhiguo.w@nus.edu.sg

Semester 1 AY2024/25
Course Outline

Descriptive analytics

01 •


Categories of Descriptive Analytics Methods
Fundamentals for descriptive analytics
Hands-on demo: descriptive analytics with Python

02
Data visualization tools
• Introduction to data visualization tools
• Hands-on demo with Tableau Public (basics)

03 Hands-on demo with Tableau Public (advanced)




Dashboard
Stories
Acknowledgement

The following materials have helped a lot in my preparation for


Lecture 02 & 03 materials.

Wilke, C. O. (2019). Fundamentals of data visualization: a


primer on making informative and compelling figures. O'Reilly
Media.

Belorkar, A., Guntuku, S. C., Hora, S., & Kumar, A.


(2020). Interactive Data Visualization with Python: Present your
data as an effective and compelling story. Packt Publishing Ltd.

Knaflic, C. N. (2015). Storytelling with data: A data visualization


guide for business professionals. John Wiley & Sons.

Introduction to Tableau. DataCamp.

Share Data Through the Art of Visualization, Google Data


Analytics Course. 3
01
Descriptive Analytics
• Categories of Descriptive Analytics Methods
• Fundamentals for descriptive analytics
• Descriptive analytics with Python
Categorization of analytical methods and models
Data analytics is generally thought to comprise 3 or 4* broad categories of techniques:

• Descriptive analytics encompasses the set of techniques that describes what has
happened in the past. Examples are data queries, reports, descriptive statistics, data
visualization including data dashboards, some data-mining techniques, and basic
what-if spreadsheet models.

• Diagnostic analytics is the process of using data to determine the causes of trends
and correlations between variables. It can be viewed as a logical next step after using
descriptive analytics to identify trends. (e.g. hypothesis testing, diagnostic regression
analysis, correlation/causation).

5
*Some literature divide into 3 categories where diagnostic analytics is considered as part of descriptive analytics.
Categorization of analytical methods and models

• Predictive analytics consists of techniques that use models constructed from past data
to predict the future or ascertain the impact of one variable on another. For example,
past data on product sales may be used to construct a mathematical model to predict
future sales. Linear regression, time series analysis, some data-mining techniques,
and simulation, often referred to as risk analysis, all fall under the banner of predictive
analytics.

• Prescriptive analytics indicates a course of action to take; that is, the output of a
prescriptive model is a decision. Predictive models provide a forecast or prediction,
but do not provide a decision. However, a forecast or prediction, when combined with
a rule, becomes a prescriptive model.

6
Reading

• Descriptive, answers the question, “What happened?”


• Diagnostic, answers the question, “Why did this happen?”
• Predictive, answers the question, “What might happen in the future?”
• Prescriptive, answers the question, “What should we do next?”

Check out the article “4 Examples of Business Analytics in Action” from Harvard
Business School. The article reveals how corporations use data insights to optimize their
decision-making process.

7
What is data?
❑ Data are facts and figures collected, analyzed, and summarized for presentation and
interpretation, including numbers, texts, images, audios, videos, and so on.

❑ Population Vs Sample

Name Weight (kg) Height (cm) Gender Year of Birth Performance


Andrew 77 175 M 1998 good
Bernhard 110 195 M 2003 average
Carolina 70 172 F 1999 average
Dennis 85 180 M 1998 poor
Eve 65 168 F 2002 good
… … … … … …

❑ Rows: observation, instance, or object.


❑ Columns: variable, attribute, or feature.
8
Data types
❑ Cross-sectional data are collected from multiple entities at the same point in time or
within same time interval.

7.8

6.1
wellness

4.6
3.9
3.5

#1 #2 #3 #4 #5
patient

9
Data types
❑ Time series data are repeated measurements of a single entity collected over multiple
points in time or a time period.

wellness of patient # 1

4.9
4.2
3.8 3.9
3.5

week 1 week 2 week 3 week 4 week 5

10
Data types

❑ Panel data (or time-series cross-section) are repeated measurements of multiple entities
collected over multiple points in time.

6.2 Patient # 2
5.9
5.6 5.4 Patient # 3
4.8 4.7 4.9 Patient # 1
4.6
4.2 4.2
3.9 3.8 3.9 3.9
3.5

week 1 week 2 week 3 week 4 week 5


#1 #2 #3

11
Data types
❑ Longitudinal data are repeated observations of a certain measure collected from multiple
entities over some extended time period.

Cohort study:
The repeated
Trend study: observations are
the repeated sampled from a
8.9
observations are cohort of patients
sampled from 7.8 7.5 under certain
wellness

random patients at 6.8 category. The same


6.1 6.3
each time point. the patient do not
5.4
same patient do not 4.8 4.7 4.9 necessarily
necessarily 4.2 4.2 3.9 participate from year
3.5 3.8
participate in the to year, but all
survey more than participants must
once meet whatever
categorical criteria
(from that cohort).
Week 1 Week 2 Week 3 Week 4 Week 5
12
Can you identify relationships between different data types?

cohort study
trend study panel study

longitudinal

cross Time’s
time series 60
10
2
7
1
6
4
5
3
9
8
sectional Up!

13
Types of measurement scales

❑ Qualitative variables are variables that can be placed into distinct categories in a
nominal or ordinal way.
• Nominal scale classifies data into mutually exclusive and exhausting categories in
which no order or ranking can be imposed on the data. [Gender (M, F)]
• Ordinal scale classifies data into categories that can be ranked, however, precise
differences between the ranks do not exist. [Rating (good, normal, poor)]

14
Types of measurement scales

❑ Quantitative variables are numerical and can be ordered or ranked.


• Interval (relative) scale ranks data and precise differences between units of
measure do exist. However, there is no meaningful/absolute zero. [Temperature]
• Ratio (absolute) scale possesses all the characteristics of interval scale, and there
exists a true/absolute zero. As a result, true ratios exist. [Height]

15
Types of measurement scales

Examples of different measurement scales


Nominal Ordinal Interval Ratio
• Gender • Rating (good, • Temperature • Height
• Zip code normal, poor) • IQ • Weight
• Color (red, green, • Grade (A, B, C, D) • Calendar year • Age
blue, …) • Judging (1st place, • Speed
• Religion 2nd place,…)
(Christianity, • Ranking of tennis
Buddhism, …) players
• Nationality • Size (small,
• Major (maths, medium, large)
computing,..) • Satisfaction
• Marital status (unsatisfied, meet
(single, married, the expectation,
divorced,…) satisfied)
16
Quiz 1
Name Weight (kg) Height (cm) Gender Year of Birth Performance
Andrew 77 175 M 1998 good
Bernhard 110 195 M 2003 average
Carolina 70 172 F 1999 average
Dennis 85 180 M 1998 poor
Eve 65 168 F 2002 good
… … … … … …

1. What type of data is this?


A. Cross-sectional
B. Time series
C. Panel
D. Longitudinal

17
Quiz 2
Name Weight (kg) Height (cm) Gender Year of Birth Performance
Andrew 77 175 M 1998 good
Bernhard 110 195 M 2003 average
Carolina 70 172 F 1999 average
Dennis 85 180 M 1998 poor
Eve 65 168 F 2002 good
… … … … … …

2. What are the respective scale type for variables (Name, Height, Year of Birth,
Performance)?
A. Nominal, Ordinal, Ordinal, Ordinal
B. Nominal, Ratio, Ordinal, Ordinal
C. Nominal, Ratio, Ratio, Interval
D. Nominal, Ratio, Interval, Ordinal Time’s
60
10
2
7
1
6
4
5
3
9
8
Up!

18
(Answer)
Name Weight (kg) Height (cm) Gender Year of Birth Performance
Andrew 77 175 M 1998 good
Bernhard 110 195 M 2003 average
Carolina 70 172 F 1999 average
Dennis 85 180 M 1998 poor
Eve 65 168 F 2002 good
nominal ratio ratio nominal interval ordinal

1. What type of data is this?


A. Cross-sectional
B. Time series
C. Panel
D. Longitudinal

2. What are the respective scale type for each variable?

19
Case Study: 50 wealthiest people in the world
Suppose the ages of top 50 wealthiest people in the world are listed in Forbes Magazine.
Here are the data in the original form (or what we call raw data):

49 57 38 73 81
74 59 76 65 69
54 56 69 68 78
65 85 49 69 61
48 81 68 37 43
78 82 43 64 67
52 56 81 77 79
85 40 85 59 80
60 71 57 61 69
61 83 90 87 74

Little insights can be obtained from looking at this raw data without proper organization of data.
20
Descriptive univariate analytics

To describe situation, draw conclusion, or make inference about events, the data analyst
must organize the data in some meaningful way. The most convenient methods of
organizing data include the construction of frequency distribution and statistical
measures.

After organizing the data, the analyst must present them to stakeholders. The most useful
intuitive way is to draw charts or plots.

So, we are going to look at


• frequency distribution
• statistical measures
• data visualization plots

21
Frequency table
A frequency table of ‘age’ can be constructed as

Age group Absolute frequency Relative frequency


35—41 3 6%
42—48 3 6%
49—55 4 8%
56—62 10 20%
63—69 10 20%
70—76 5 10%
77—83 10 20%
84—90 5 10%
Total 50 100%

After organizing the data into frequency table, the peaks (which class has the most data values
compared to other classes) and outliers (extremely large or small values relative to other data) can
be analyzed.

22
Frequency distribution

Frequency Distribution for Ages of Top 50 Wealthiest people


12

10

8
Frequency

0
35—41 42—48 49—55 56—62 63—69 70—76 77—83 84—90
Age Group

23
Frequency table
Frequency table of nominal or ordinal scale variable can also be constructed. For example, gender

Gender Absolute frequency Relative frequency


Male 45 90%
Female 5 10%
Total 50 100%

Cumulative frequency table which shows the number of data values less than or equal to a specific
value can also be constructed

Age (upper bound) Absolute frequency Relative frequency


≤41 3 6%
≤48 6 12%
≤55 10 20%
≤62 20 40%
≤69 30 60%
≤76 35 70%
≤83 40 90%
≤90 50 100% 24
Frequency distribution
So far, the frequency table (distribution) has been constructed from sample data. When it
comes to population, we call it

• probability mass function (p.m.f.) for the discrete attribute; and

• probability density function (p.d.f.) for the continuous attribute.

Q. How can we construct the probability distribution of a population? Do we need to


access to all instances of that population?

A large number of situations in real life follows some already known and well-defined
distribution function. So in many cases, we do not need to access all instances of a given
population.

25
Shape of frequency distribution

Right (Positively) Skewed Zero skew (symmetric) Left (Negatively) Skewed

Source of figure: https://www.scribbr.com/statistics/skewness/

26
Plot Qualitative Quantitative
Univariate data visualization 5

Pie yes no 45

male female
45
Most of times, NO.
(Yes, only when there
Bar yes
are small and limited 5
number of values)
male female

Line no yes

ages of top wealth


20% 20% 20%

Histogram no yes 6% 6%
8%
10% 10%

27
We can also summarize data using summary statistic.

A statistic is a descriptor which describes numerically a characteristic


of the sample of the population.

We are going to talk about 3 groups of statistical measures.


1. Measures of central tendency
2. Measures of location
3. Measures of dispersion or variation
Measures of central tendency

Central tendency statistics identify the central position within the dataset.

𝑥1 +𝑥2 +⋯𝑥𝑛
• (Arithmetic) Mean =
𝑛

• Mode (1, 2, 3, 3, 3, 4, 5)

• Median (1, 2, 3, 3, 3, 4, 5)

𝑥𝑚𝑎𝑥 + 𝑥𝑚𝑖𝑛
• Midrange =
2

29
Measures of location
Location statistics identify a value in a certain position and tell us its relative position in
comparison with other data values. Some commonly used location univariate statistics
include:

• 1st quartile (25%)

• 2nd quartile (Median)

• 3rd quartile (75%)

• Decile (10%, 20%, 30%,…)

• Percentile (5th percentile, 95th percentile, 99th percentile,…)

30
Example
Data for bottled water sales at Queensland Amusement Park for a sample 14 summer
days are available in “BottledWater.csv”.

High Temperature (degrees F) Bottled Water Sales (cases)

1 78 23
Convince yourself that the following
2 79 22
statistics are correct for variable
3 80 24
‘High Temperature (degrees F)’:
4 80 22
5 82 24
6 83 26
7 85 27
8 86 25
9 87 28
10 87 26
11 88 29
12 88 30
13 90 31
14 92 31

31
Descriptive analytics with Python

Dataset: BottledWater.csv

Python Notebook: IE5005 Lecture 02.ipynb

Download the above 2 files and save them in a


same folder. Then launch Jupyter Notebook
and open ‘IE5005 Lecture 02.ipynb’.

32
Boxplot

max
Boxplot presents a 5-number summary of the data.

Boxplot can also be used to describe how


3rd quartile symmetric/skewed the distribution of a variable is.
For example, in the plot on left, the values are
median
concentrated in the high part. So, it is left skewed.

1st quartile

min

Python Notebook: IE5005 Lecture 02.ipynb


33
What does ‘average’ or ‘mean’ refer to?
When we talk about ‘average’ or ‘mean’ in our daily life, we normally refer to the classic
𝑥1 +𝑥2 +⋯𝑥𝑛
arithmetic mean = . However, there are other types of means which are useful in
𝑛

special occasions.

σ 𝑤𝑖 𝑥𝑖 𝑤1 𝑥1 +𝑤2 𝑥2 +⋯𝑤𝑛 𝑥𝑛
❑ Weighted mean = σ 𝑤𝑖
= .
𝑤1 +𝑤2 +⋯𝑤𝑛

Arithmetic mean is a special case of weighted mean which assumes equal weightage in
each observation.

34
What does ‘average’ or ‘mean’ refer to?

σ 𝑤𝑖 𝑥𝑖 𝑤1 𝑥1 +𝑤2 𝑥2 +⋯𝑤𝑛 𝑥𝑛
❑ Weighted mean = σ 𝑤𝑖
= .
𝑤1 +𝑤2 +⋯𝑤𝑛

Arithmetic mean is a special case of weighted mean which assumes equal weightage in
each observation.

A student received an A in Mathematics (3 credits), a C in Psychology (3 credits), a B in biology (4


credits), a D in History (2 credits). Assuming a score of A, B, C, D, F, correspond to 4, 3, 2, 1, 0, grade
points respectively. What is the student’s grade point average?

Suggested answer
Course Credits (𝑤𝑖 ) Grade (𝑥𝑖 )
Mathematics 3 A (4 points)
Psychology 3 C (2 points) 3∙4+3∙2+4∙3+2∙1
𝑥ҧ = = 2.7
Biology 4 B (3 points) 3+3+4+2
History 2 D (1 point)
35
Are you using ‘average’ or ‘mean’ correctly?

The stock price on day 1 was $100 per share.

The stock price decreased by 50% on day 2.

The stock price then increased by 50% on day 3.

Since the stock price first decreased by 50% and then increased by 50%, the average
−50%+50%
growth rate of stock price is = 0%.
2

So, I’m neither earning nor losing money. Correct?

36
Are you using ‘average’ or ‘mean’ correctly?

The stock price on day 1 was $100 per share.

The stock price decreased by 50% on day 2. That is,


100 1 − 50% = $50
So the stock price became $50 on day 2.

The stock price then increased by 50% on day 3. That is,


50 1 + 50% = $75
The price became $75 on day 3.
Apparently, I’m losing money as the share price has dropped from $100 to $75.

−𝟓𝟎%+𝟓𝟎%
So the (arithmetic mean) growth rate computed as = 𝟎% can be misleading in
𝟐
this context.
37
Geometric mean

Therefore, we can tell, this statement is not right. If the average growth rate is 0%, we
should get back the stock price as $100 on day 3, isn’t it? This is where the arithmetic
mean (AM) may not work well.
We should adopt Geometric mean (GM)

𝑛
1
𝐺𝑀 = (ෑ 𝑥𝑖 )𝑛 = 𝑛 𝑥1 𝑥2 ⋯ 𝑥𝑛
𝑖=1
If we denote the average growth rate as 𝑅 , we can compute it as 1+𝑅 =

(1 − 50%)(1 + 50%) ≈ −13.40% . By average, that means it should be something


constant across all periods. If you compute 100 ∙ 1 − 13.40% ∙ 1 − 13.40% = 75, and you
get the price $75 on day 3. This is not by coincidence.

38
Are you using ‘average’ or ‘mean’ correctly?

Suppose the distance between your home and school is d. You drive from home to
school at a speed x = 60 km/h; and returns from school to home at a speed y = 20 km/h,
then your average driving speed is (60 + 20)/2 = 40 km/h. Correct or wrong?

60 km/h

home school
20 km/h

39
Harmonic mean

𝑡𝑜𝑡𝑎𝑙 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒
We know the average driving speed can be computed as =
𝑡𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒

120 120
For instance, the distance d = 120km. It takes you = 2 hours and = 6 hours
60 20

respectively, from-home-to-school and from-school-to-home.


𝑡𝑜𝑡𝑎𝑙 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 240
The average driving speed is = = 30km/h.
𝑡𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒 8

This is where we can use Harmonic mean (HM)

𝑛
𝐻𝑀 =
1 1
+ ⋯+
𝑥1 𝑥𝑛
2
Average speed = 1 1 = 30km/h.
+
60 20
40
Summary
When dealing with additive Example:
Arithmetic mean relationships (e.g., heights, weights) Average height of students in a class.

The geometric mean is used when Example:


dealing with data that represents Investment returns: If you have annual returns of
growth rates, ratios, or percentages 10%, 20%, and 30%, you would use the geometric
over time. It is particularly useful for mean to calculate the average rate of return over
Geometric mean
averaging numbers with multiplicative the three years. This is because the returns are
relationships (e.g., growth rates, multiplicative.
returns) Population growth: population growth rates for
several years
The harmonic mean is used when Example:
dealing with rates or ratios where the Average speed: If you travel a certain distance at 30
denominator is significant. It is often km/h and then the same distance at 60 km/h, the
used in situations involving averages of average speed is not the arithmetic mean of 30 and
Harmonic mean rates or ratios (e.g., speeds, 60 (45 km/h), but rather the harmonic mean, which
resistances) would be closer to 40 km/h.
Electrical resistance: When resistors are connected
in parallel, the total resistance is calculated using
the harmonic mean of the individual resistances.
41
Measure of dispersion (variation)
A dispersion statistic measures how distinct different values are. Some commonly used
measures include:

• Range = 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛


• Interquartile range (IQR) = 3rd quartile – 1st quartile
• Variance (population/sample)
σ(𝑋 − 𝜇)2 ത 2
σ(𝑋 − 𝑋)
2
𝜎 = 2
/ 𝑠 =
𝑁 𝑛−1
• Standard deviation (population/sample)
σ(𝑋 − 𝜇)2 ത 2
σ(𝑋 − 𝑋)
𝜎= 𝜎2 = / 𝑠= 𝑠2 =
𝑁 𝑛−1
• Coefficient of variation [compute the dispersion when the units are different]
𝜎
𝐶𝑉𝑎𝑟 = ∙ 100%
𝜇
• Mean absolute deviation
σ𝑁𝑖=1 |𝑋𝑖 − 𝜇|
𝑀𝐴𝐷 =
𝑁
42
Descriptive analytics with Python

Dataset: BottledWater.csv

Python Notebook: IE5005 Lecture 02.ipynb

Download the above 2 files and save them in a


same folder. Then launch Jupyter Notebook
and open ‘IE5005 Lecture 02.ipynb’.

43
Example
Data for bottled water sales at Queensland Amusement Park for a sample 14 summer
days are available in “BottledWater.csv”.

High Temperature (degrees F) Bottled Water Sales (cases)

1 78 23
Convince yourself that the following
2 79 22
statistics are correct for variable
3 80 24
‘High Temperature (degrees F)’:
4 80 22
5 82 24
6 83 26
7 85 27
8 86 25
9 87 28
10 87 26
11 88 29
12 88 30
13 90 31
14 92 31

44
Chebyshev’s theorem

The proportion of values from a data set that will fall within 𝑘 standard deviation of the
1
mean will be at least 1 − , where 𝑘 is a number greater than 1.
𝑘2

Remark: 𝑘 is not necessarily an integer.

For example
• We can estimate that at least 75% of the data values will fall within 2 standard
deviations of the mean of any data set. (𝑘 = 2).

45
Exercise 1

Suppose the mean housing price in a certain district is $50,000, and the standard
deviation is estimated to be $10,000. Estimate the price range for which at least 75% of
the houses will sell.

46
Exercise 1 (Answer)

Suppose the mean housing price in a certain district is $50,000, and the standard
deviation is estimated to be $10,000. Find the price range for which at least 75% of the
houses will sell.

Based on Chebyshev’s theorem, 75% of data values will fall within k = 2 standard
deviations around the mean.
50000 − 2 ∗ 10000, 50000 + 2 ∗ 10000 = [30000, 70000]
So, 75% of all houses sold should be estimated to be in range from $30,000 to $70,000.

47
Exercise 2

A survey of local companies found that the mean amount of travel allowance for
executives was $0.25 per mile. The standard deviation was $0.02. Using Chebyshev’s
theorem, find the minimum percentage of the data values that will fall between $0.20
and $0.30.

48
Exercise 2 (Answer)

A survey of local companies found that the mean amount of travel allowance for
executives was $0.25 per mile. The standard deviation was $0.02. Using Chebyshev’s
theorem, find the minimum percentage of the data values that will fall between $0.20
and $0.30.

Subtract the mean from the upper bound:


$0.30 − $0.25 = $0.05
Divide the difference by the standard deviation to get k
0.05
𝑘= = 2.5
0.02
Use Chebyshev’s theorem to find the percentage
1
1 − 2 = 84%
𝑘
Hence, at least 84% of data will fall between $0.20 and $0.30.

49
Choice of Visualization
• Some commonly used plots
• Which plot should I use for visualization?

How to choose visualization (different chart types).pdf


Some commonly used plots
Line chart
A line chart is used to track changes over short and long periods of time. When smaller
changes exist, line charts are better to use than bar graphs. Line charts can also be used
to compare changes over the same period of time for more than one group.

Let’s say you want to present the graduation frequency for a particular high school
between the years 2008-2012.

51
Source of Figure
Some commonly used plots
Bar chart
Bar charts use size to contrast and compare two or more values, using height or lengths
to represent the specific values.

The below is example data concerning sales of vehicles over the course of 5 months:

52
Source of Figure
Some commonly used plots
Heatmap
Similar to bar charts, heatmaps also use color to compare categories in a data set. They
are mainly used to show relationships between two variables and use a system of color-
coding to represent different values.

The following heatmap plots temperature changes for each city during the hottest and
coldest months of the year.

53
Source of Figure
Some commonly used plots
Pie chart
The pie chart is a circular graph that is divided into segments representing proportions
corresponding to the quantity it represents, especially when dealing with parts of a
whole. For example, let’s say you are determining favorite movie categories among avid
movie watchers.

54
Source of Figure
Some commonly used plots
Scatter plot
Scatterplots show relationships between different variables. Scatterplots are typically
used for two variables for a set of data, although additional variables can be displayed.

For example, you might want to show data of the relationship between temperature
changes and ice cream sales. It would resemble something like this:

55
Source of Figure
After-class Reading
The data visualization catalogue: This catalogue features a range of different diagrams,
charts, and graphs to help you find the best fit for your project. As you navigate each
category, you will get a detailed description of each visualization as well as some related
programming codes or software.

56
Which plot should I use?
With so many visualization options out there for you to choose from, how do you decide
what is the best way to represent your data?
A decision tree leading to the best chart

This depends on what story you want to tell.

Here is a simple decision tree to help you choose the suitable type of plot to use:

57
More resources to help you decide which plot to use

From Data to Viz

Selecting the best chart I

Selecting the best chart II


58
02
Data Visualization Tools

• Introduction to data visualization tools


• Hands-on demo with Tableau Public basics
Introduction to data visualization tools

Tableau public
Looker Studio
Excel

Google Analytics 4

Power BI

ggplot2
Matplotlib Seaborn
60
Excel/Google Sheets

Find out more about data visualization in spreadsheet here:

• Types of charts and graphs in Google Sheets: a Google Help Center page with a list of
chart examples you can download.
• Excel Charts: a tutorial outlining all of the different chart types in Excel, including
some subcategories.

61
Tableau Public

Tableau public (need to register with an email account)


https://public.tableau.com/app/discover

It provides two mode:


1. Web authoring
2. Download the tableau public edition to desktop with the registered account
https://www.tableau.com/products/public/download

62
Tableau Versions

Tableau Public Tableau Desktop


Free Paid
All visualizations All visualizations
Connect to Excel and CSV file only All listed data sources
15 million rows of data Unlimited rows of data
Save online only (publish) Save locally
Public reports

63
Power BI

Microsoft Power BI is a tool that helps organize and


visualize data from different sources. It lets you
connect to data, clean and structure it, create
visualizations, and easily share your findings with
others.

64
Power BI

According to Gartner, over 97% of Fortune 500


companies use Power BI. With so many
companies around the world using Power BI
learning and mastering this BI tool can help
you progress in your data-related career.

• Leaders: Competitive providers that are known to execute well


against their current vision and are often innovative giants in their
industry
• Visionaries: Full of providers that understand where the future
market is going or have a strong vision for where it will end up
• Niche Players: Highly focused on a small segment
• Challengers: Often dominate a large segment
65
Power BI (register a Power BI account)

https://www.microsoft.com/en-us/power-platform/products/power-bi/getting-started-with-power-bi

66
Power BI (installation)
Windows Users

https://www.microsoft.com/en/power-platform/products/power-bi/desktop?market=af

Mac Users

https://app.powerbi.com/

67
Feel free to share your feedback with me via this
link/QR code throughout the whole semester.

https://app.sli.do/event/hUgiGrg7Ln8KeEFVyCT9o3

68
Thank You!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy