0% found this document useful (0 votes)
58 views177 pages

Tableau by Mughammd Asrol Smallsize

The document provides a comprehensive guide on data visualization using Tableau, covering topics such as choosing chart types, color management, and connecting data sources. It includes step-by-step instructions for creating various visualizations, including bar charts, scatter plots, and dashboards, while emphasizing the importance of maintaining color identities and providing context. Additionally, it outlines project requirements and encourages the use of datasets from sources like Kaggle for practical application.

Uploaded by

rajudurai109
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views177 pages

Tableau by Mughammd Asrol Smallsize

The document provides a comprehensive guide on data visualization using Tableau, covering topics such as choosing chart types, color management, and connecting data sources. It includes step-by-step instructions for creating various visualizations, including bar charts, scatter plots, and dashboards, while emphasizing the importance of maintaining color identities and providing context. Additionally, it outlines project requirements and encourages the use of datasets from sources like Kaggle for practical application.

Uploaded by

rajudurai109
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 177

Data Visualization using Tableau

Industrial Engineering Department


mie@binus.edu
Data Visualization
Choosing a Chart Type
Comparison
Composition
Distribution
Relationship
Why we need a Data visualization
Color management
Hue Saturation
Tone

Shade Tint
Using Hue

Qualitative / Categorical
Using Saturation

Relationship /Sequence Quantitative / Numerical


Maintain Color Identities

Once a color is assigned to an attribute,


be consistent with the usage of that color.

This eliminates confusion and establishes


identity throughout a workbook.
Use Hot Colors Sparingly
What are we supposed to be looking at?

'HOT' colors, like RED, are attention


grabbers.

They carry heavy visual weight and will


draw the eye of the reader.

Save them for elements truly


deserving of your reader's attention.
Be Careful Not To Become The Next
Vincent van Gogh
Which title is more impactful?
Do your titles capture attention?
Context is not just in the title
Adding additional context
Tableau
Tableau Software Variety

Tableau Public Tableau Desktop Tableau Server Tableau Prep Builder


Tableau Public is a free If you only want to create If you have a specific, restricted a modern approach to data
platform to explore, create reports but do not have a need audience and would like to preparation, making it easier and
and publicly share data for circulation or wider control the manner in which they faster to combine, shape, and
visualizations online. consumption. interact with your work. clean data for analysis.
Installation

How to install?
 Go to https://www.tableau.com/products/public/download
 Download the apps
 Install in your computer
Tableau: Basic
Tableau Work-Flow

CONNECT DATA SOURCE ANALYZE WITH SHARING WITH


VISUALIZATION DASHBOARD
Tableau

Input data
or file
Tableau
Public User
Interface
Connecting data source to tableau

The dataset extension may be any type: csv, xls, JSON, PDF, etc.

You can find any dataset from any sources

For this tutorial, we will start with the excel or CSV dataset file.

The dataset file save in your local computer and import to tableau easily, as the
following
Dataset from excel file in local
computer

Local computer file


After data connecting to Tableau

1. We select
first table

2. Go to worksheet
View your data
Please try on another data using
CSV file source
Our dataset for this section
Data elements content

independent
variables

Categorical/
Dimensions

Numerical/
Measures

Dependent
variables
Data field
Data types
Combine dimensions and measures

2
Make your first graph
You can choose any types of charts
Make your first graph

Pie chart Treemaps chart Bubble chart


Re-arrange your chart
This settings arrange your chart by regions
Calculated fields

Lets back to our data preview

There is no field about total unit sales


We need to make the new one to make our graph reliable
Calculated field
Calculated fields
Adding colors
Color by rep
Hold ctrl + drag and drop
Color by region
Hold ctrl + drag and drop
Color by total sales
with continuous color setting
What is the problem of your current
graph?
Add Labels
Edit labels
Edit the labels format
Final graph
Project

• Cari satu dataset menurut anda menarik di Kaggle.com atau github.com


• Di project hari ini buat bar char seperti praktik hari ini dengan dataset
anda masing masing.
• Setiap orang harus memiliki dataset yang berbeda beda.
• Dataset yang digunakan harus clear dan tidak ada missing value.
• Dataset tersebut akan digunakan hingga akhir mata kuliah ini sampai anda
membuat dashboard.
Project

No Nama Dataset Source


1 ATHAR DANISH Super Store Kaggle.com
2 GERALD HASIHOLAN MIKHAIL GORANSON Mental Health Kaggle.com
NAINGGOLAN
3 OLIVIER JOSEPHINE MLBB Hero Kaggle.com
4 REVALDO RICHIE Games dataset Kaggle.com
5 WINSTON KHOGRES Billionare Statistic Kaggle.com
Time series dataset
Timeseries

• We will use a timeseries


dataset to visualize using
tableau
• The dataset view is as follow:
• You only needs to visualize
using a graph in tableau
• You can choose measure and
dimension with double click
on one of it.
Aggregate timeseries dataset
visualization
• See columns and rows
Month-based timeseries dataset
visualization

Right click on coloumn Green, it means that taken from the dimension
Analysis without aggregate
Time-series dataset with detail
Time-series dataset with detail
Changes row settings
Area chart for easy investigation
Filter
Quick filter

Right click and


show filter
Please design your own dataset with 2 sheets,
especially using histogram and line/area chart
with filter
Project

No Nama Dataset Source


1 ATHAR DANISH Super Store Kaggle.com
2 GERALD HASIHOLAN MIKHAIL GORANSON Mental Health Kaggle.com
NAINGGOLAN
3 OLIVIER JOSEPHINE MLBB Hero Kaggle.com
4 REVALDO RICHIE Soccer Kaggle.com
5 WINSTON KHOGRES Billionare Statistic Kaggle.com
Relationship and connect data
sources
Connect dataset

Double click
on dataset
and add
other table
Connect dataset

You can choose


the way to joining
data
Please check your current dataset
and compare to previous dataset
Making hierarchy for maps dataset

Drag and drop city


to country

Rename the
dimension to
‘hierarchy’

Drag and drop state


to hierarchy, put
between city and
country
Making hierarchy for maps dataset

Drag and drop


hierarchy to sheet

Decompose the
country

See what happens


Filter the data by year
Visualized ‘sales’ by year using ‘size’
Visualize using calculate measure
Scatter Plot
Scatter Plot

A scatterplot shows the relationship


between two quantitative variables
measured for the same individuals
Scatter Plot
Setting filter for all sheet
in Tableau Project
• Go to first sheet
• Select filter part
• Select apply to worksheets
• Select all using this data source
• Check another sheet
Setting filter for all sheet
in Tableau Project
The data is adopted from previous sheet
Make up your scatter
Add profit margin to the color
Set the color
Set the size
Our first
dashboard
Design your
dashboard
interesting
• Filter scatter by
location
• You can also add
filter through
‘action’
manually
Do ‘action’ to your dashboard

See? What is the different?


Make ‘highlight’
in your dashboard
• If you back to the data, each
individual buy product from
many country or city
• In this dashboard we want to
highlight city data from all
data
• Firstly, go back to your
customer sheet and add ‘city’
to add data granularity
• After that, in the dashboard,
add ‘highlight’
Make ‘highlight’
add your customer data granularity
Set ‘highlight’ in dashboard
JOIN DATA IN DATABASE
Join

Inner join

Left join

Right outer join

Full outer join


Example
Join dataset

We will use P1-AmazingMartEU2 dataset

We will joining all sheet: list of orders, order breakdown and target sales.

It is run smoothly for joining list of orders and order breakdown. But it is find
difficulties to joining target sales, since it has different level of granulaty.

If you find this problem, you can use ‘blending’ alternatives.


Blending data
Joining and blending dataset

• In this section we will use Airline dataset.


• You have to identify your dataset and after that
you can go to tableau.
• Firstly, add the dataset and drag and drop sheet
airline1.
• After that you can open your sheet in Tableau.
• In the Sheet, please add new data source and
add Airline dataset once more.
• In this new dataset, please drag and frup
Airline2.
Joining dataset

• Start your
visualization with
airline 1, like this.
Joining dataset

• Now, add
‘revenue’ From
the second
dataset
• You will find a
‘sign’ that means
the data is joined.
Joining dataset
Joining multiple field

• Now go to data >


Edit blend
relationship
• Add new
relationship,
region to period
Joining multiple field

• Rename ‘year’ to be
‘period’
• Your data will be
revised.
Joining multiple field

• What happen if you remove ‘period’?


Primary and Secondary dataset

Primary

Secondary
How if you start joining and blending
data with Airline2?

Please try it!


Dual Axis Chart
Dataset

• In this section we will use


P1-AmazingMart dataset.
• Please check your dataset
which is has 3 sheet.
• You have to blending sheet 1
and 2 after that add with
sheet 3.
Dual axis
Dual axis – change your dataset to
bar and add category
Dual axis

• Data by category
Data by category
Data and its target
Data and its target

• The target data is


unreal, since it is
not in its
granularity level
• We have to set the
edit the blend
relationship to
make it real
• Add month and
year order date
relationship
Data and its target – adjusted
Data and its target – Area chart
Making dual axis

• First: Right click on the target


and choose dual axis
• Second: right click on the target
and choose synchronize
• Set the target as background
Dual axis – final
Advanced Calculated Field
Advanced calculated field

• In this section we will make a calculated field that contains from different
data sources.
• We will enrich our previous visualization with a deepen analysis by adding
a bar chart to know the differences of the actual data to target.
• Firstly, we have to set our 3 visualization data set to single visualization
using filter function
• We use our previous data and visualization
Filtered dataset visualization
Advanced calculated field

• We will calculated the differences


between sum of sales and sum of
target
• The calculated field is modeled in
‘sales’ field
• Drag and drop ‘target’ to the field
Advanced calculated field
Advanced Dashboard and Story Telling
Dataset

P1-UK-Bank-Customers
Geographical Dataset

• We have to set the


dimension into
geographical.
• Drag and drop region
into worksheet
• You will find 4
unknown dataset
• You have to fix it
Fix unknown state

• Edit location and set to UK


as your data show it
• After that set your
visualization into map
Make your visualization like this!
Sheet 2
Please design your worksheet like
this
• Number of customer by gender
Quick table calculation:
Pie chart with percent
Sheet 3
Age: common visualization

• Set age into


‘dimension’
• We can visualize age
distribution into line
or bar chart
• How if we want to
set age into interval
view?
Using Bin

• Age (bin) will set as dimension


Age distribution using bin

Quick table calculation


Sheet 3
Sheet 4
Sheet 4: Distribution of balance
Set bins interval automatically

• We will use parameters


in this section
• Firstly you have to
create parameters
• Rename your
parameter
• Show your parameter
like filter field.
Set bins automatically

• To applying automatic bins


into your worksheet, you have
to set ‘balance (bin)’ to your
interval bin setting.
• Right click on your balance
(bin) dimension and set your
automatic bin interval as the
source
• Your visualization
interval/range will
automatically set to your
designed bin.
Automatic bins for balance
Sheet 5:
treemap of job classification
Treemap – job classification
Developing dashboard
All your sheets
will be
connected
Drag and drop your sheet in the
canvas!
• Adjust size of sheet in the
worksheet
This is your
dashboard!
Design an interactive dashboard
Using filters
Using filters
you can use it for all sheets
Remove tooltip

• Go to each sheet
• Go to worksheet>tooltip>
unchekced
Project Preparation
Data pre-processing

• Silahkan dipilih satu data dengan dari sumber kaggle.com atau sumber lain
yang relevan.
• Siapkan PPT untuk memperjelas project yang akan Anda kerjakan sebagai
tugas akhir.
• Berikut ini adalah konten PPT yang Anda perlu siapkan hari ini.
• Berikan judul project, nama dan nim Anda. Judul perlu sesuai dengan kontekstual
data yang akan Anda gunakan.
• Berikan outline yang akan Anda jelaskan dalam PPT tersebut.
• Berikan sumarisasi tabel data.
Data pre-processing

• Sampaikan penjelasan utuh apa yang akan dibicarakan dalam tabel


• Pilih colom yang akan Anda jadikan sebagai referensi dalam membuat dashboard.
Minimal 5 kolom.
• Penjelasan mengenai rencana Visualisasi data 1
• Penjelasan mengenai rencana Visualisasi data 2
• Penjelasan mengenai rencana Visualisasi data 3
• Penjelasan mengenai rencana Visualisasi data 4
• Penjelasan mengenai rencana Visualisasi data 5
• Penjelasan umum tentang aspek yang akan dijelaskan dalam dashboard dan aspek
‘value’ yang akan diambil dari dashboard
Story telling
Start with story!
First story with
dashboard
Provide some
key points
Add
annotated
Completed
storyline
Statistic Descriptive
Statistical Description of Data

• Motivation
To better understand the data: central tendency, variation, and
spread

• Central tendency characteristics


Mean, median, and mode

• Data dispersion characteristics


Quantiles, interquartile range (IQR), and variance

Dispersion

Central tendency
Measuring Central Tendency

Mean (algebraic measure):


It provides a one-number summary of the location or central tendency for
the distribution of data.

Example
𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 𝑎𝑎𝑎𝑎𝑎𝑎 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
PT X is a car manufacturing company. Below are 8 samples of car production sample mean =
𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
time: 90+88 …+89
= = 86.37
90 88 84 91 8
84 85 80 89

Please find the arithmetic mean of car production time!


Central Tendency

Mean:
an Example
Measuring Central Tendency
• Median:
• The midpoint of the values after they have been ordered from the
smallest to the largest, or the largest to the smallest.
• Middle value if odd number of values,
or average of the middle two values otherwise
• For grouped data, estimated by interpolation:
𝑛𝑛
− ∑ 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑙𝑙
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 = 𝐿𝐿1 + ( 2 ) width
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚

• n is the number of values in the entire data set: 3194


• n/2 = 1597, therefore median interval is 21-50
• L1 is the lower boundary of the median interval: 21
• (∑ freq) l is the sum of the frequencies of the intervals that are lower
than the median interval: 950
• freqmedian is the frequency of the median interval: 1500
• width is the width of median interval: 29
 Therefore, median = 21 + (1597-950) / 1500 * 29 = 33.5
Measuring Central Tendency
• Mode:
• Value that occurs the most frequently in the data
• Empirical formula for unimodal (asymmetrical)
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 − 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ≈ 3 × (𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 − 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚)

Symmetric
Mode Mean Mean Mode
Mean
Median
Mode

Positively Negatively Median


Median skewed skewed
The Relative Positions of the Mean, Median
and the Mode
Measuring Dispersion
of Data

min max

Quartile Q1 (25th Persentil), Q3 (75th Box-Plot Visualized data distribution


percentile)
IQR Inter-Quartile Range = Q3 – Q1 Outlier Higher/lower 1.5 x IQR
Data Dispersion
Variance and Standard Deviation

VARIANCE The arithmetic mean of the squared deviations from the mean.

STANDARD DEVIATION The square root of the variance.

• The variance and standard deviations are nonnegative and are zero only if all
observations are the same.
• For populations whose values are near the mean, the variance and standard
deviation will be small.
• For populations whose values are dispersed from the mean, the population
variance and standard deviation will be large.
• The variance overcomes the weakness of the range by using all the values in the
population
Measuring Dispersion
of Data
Standard Deviation and Distribution
Symmetric vs. Skewed Data
• Median, mean and mode of symmetric, symmetric
positively and negatively skewed data

positively skewed negatively skewed

177
August 15, 2024 Data Mining: Concepts and Techniques
Content of your slides
• Title, name, nim, data sources
• Slides outline
• Dataset and sources + analysis
• Data pre-processing
• Statistical description of the dataset
• Visualization plan of the dataset, mentioned the attributes to set.
• Data visualization 1 + analysis
• Data visualization 2 + analysis
• Data visualization 3 + analysis
• Data visualization 4 + analysis
• Data visualization 5 + analysis
• Data visualization …n + Analysis
• Dashboard + analysis and settings
• Data storytelling + analysis
• Key points of the dataset
• Link of tableau project
• Link of video recording of data story telling
• Conclusion

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy