Tableau Self Notes PDF
Tableau Self Notes PDF
Explanatory Analysis:
• It aims to communicate insights and findings to others.
• It involves explaining some interesting observations.
• Focused on communicating insights to others
• Characteristics:
→ Emphasis on visualization and storytelling
→ Simplification of complex concepts
→ Hypothesis testing and confirmation
• Example: Creating a report or presentation for stakeholders to explain the
factors influencing sales and performance over the past quarter.
Exploratory Analysis:
• It focuses on uncovering patters, trends and insights within the data.
• Creating visuals to present your findings.
• Focused on understanding the data itself.
• Characteristics:
→ Emphasis on data exploration
→ Iterative and open-ended
→ Discovery-oriented
• Example: Analyzing customer survey data to identify potential factors
influencing customer satisfaction, without necessarily aiming to
communicate the findings to others immediately.
Classification Plots:
• Scatter plot:
→ Uses different colours or shapes for each class to visualize the
distribution of the data points.
→ Identify separability between classes.
• Box plot:
→ Visualize the distribution of a continuous variable
→ Useful for understanding the spread and central tendency.
• Histogram with Density Curve:
→ Compare the distribution of features between classes.
→ Helps identify difference on the distribution of features between
classes.
• Confusion Matrix Heatmap:
→ Visualize the performance of a classification model by displaying the
matrix as a heatmap
→ Helps identify the accuracy, precision, recall, and F1-score for each
class.
Regression Plots:
• Scatter Plot with Regression Line:
→ Relationship between the independent and dependent variable.
→ Overlay a regression line or curve to visualize trend
• Residual Plot:
→ Difference between observed and predicted values.
→ Helps to diagnose the assumptions of regression models.
• Predicted vs. Actual Plot:
→ Plot the predicted values against the actual values.
→ Helps asses the performance, identify patterns and outliers.
• Line Plot with Confidence intervals:
→ Plot the mean or median response variable against the independent
variable(s) with confidence intervals.
→ Visual representation of the uncertainty associated with the
regression estimates.
Tableau:
• Used for data visualization and dashboarding.
• Allows the users to create interactive and visually appealing charts, graphs
and dashboards.
• Advantages:
→ Drag-and-Drop interface for creating visualizations without coding.
→ Wide range of visualization options and customization features.
→ Can handle large datasets and real time data sources.
• Disadvantages:
→ Limited statistical analysis capabilities compared to specialized
statistical software like SPSS or R.
→ Steeper learning curve for advanced features and complex
visualizations.
Excel:
• Widely used spreadsheet software with basic data analysis and
visualization capabilities.
• Advantages:
→ Familiar and widely accessible interface for data entry, manipulation,
and analysis.
→ Basic statistical functions and charting options for simple analyses and
visualization.
→ Integration with MS office apps and third party add-ins.
• Disadvantages:
→ Limited scalability for handling large datasets.
→ Lack of advanced statistical procedures and modelling capabilities.
→ Prone to errors and version control issues.
R:
• It is a programming language and environment for statistical computing
and graphics.
• It is widely used for data analysis, statistical modelling, ML and data
visualization.
• Advantages:
→ Extensive collection of packages
→ Highly customizable and flexible for complex analyses.
→ Strong community support and active development.
• Disadvantages:
→ Steeper learning curve, especially for users with less programming
knowledge.
→ Requires writing of code for analysis and visualization.
→ Handling large dataset may require optimization.
Row Shelf:
• Located on the left side of the Tableau workspace.
• The row shelf organizes data along the vertical axis
• Fields placed on the row shelf are typically used to break down data into
categories or groups, such as regions, product names, or time periods.
• Multiple fields can be added to row shelf to create hierarchical structure.
Column Shelf:
• Located on the top side.
• The column shelf organizes data along the horizontal axis.
• Can add multiple fields to create more complex visualizations.
Dimensions vs Measures
Dimensions:
• Represents categorical or qualitative data attributes.
• Describes the characteristics of the data and provide context or grouping
criteria.
• Examples, product categories, customer segments, regions, etc.
• Used to create discrete headers, labels or groupings.
• Displayed along the row or column headers and can be used to split or
partition the data.
• Blue pills.
Measures:
• Represent numerical or quantitative data attributes.
• Represent the measurable aspects of the data, such as quantities, amount,
or metrics.
• Examples, sales revenue, profit margins, quantities sold, average
temperatures, etc.
• Often used for calculations, aggregation, and statistical analysis.
• Displayed as continuous axes, bar lengths, or data points.
• Green pills.
Pivoting: Conversion of rows into columns and vice versa
Basic plot constructions and their advantages:
Scatter Plot:
• It is used for investigating relationships between quantitative variables,
such as age and income.
• Allows you to compare the values of 2 quantitative variables that you have
plotted.
• Can detect outliers and clusters.
• Provides a visual representation of the distribution of data points.
Line Plot:
• It connects a series of data points using a line.
• Used to show changes in data over time.
• Presents sequential values to help you identify trends.
• Helps identifying patterns, fluctuations, or trends over time.
• Provide clear representation of trends with continuous lines connecting
data points.
Bar Chart:
Compares numerical values like integers and percentages.
Shows variations in categories or subcategories scaling width or height.
Can highlight the larges or smallest number in a set of data or show
relationship between values.
Histogram:
Plots quantitative data with ranges of data grouped into bins.
Shows distribution of values.
Tracks the different values found in one set of data as a series of connected
bars.
Box Plot:
Enables to study the distributional characteristics and overall patterns of a
variable.
Provides useful ways to visualize the range and other characteristics of a
variable, such as median, quartiles, and outliers.
Can be created for numerical fata only.
Pie Chart:
Useful for summarizing categorical variable or dimensions.
Pie chart represents dimensional variables and its size represents the count of
each category.
Heatmaps:
Uses colour and size to help visualize data.
It consists of one or more dimensions and 1 or 2 measures.