Unit 5
Unit 5
elements /context like graphs, charts, and maps;It converts large and small data sets into visuals, which is easy
to understand and process for humans;It provides tools an accessible way to see and understand trends,
outliers, and patterns in data. In the world of Big Data, the data visualization tools and technologies are
required to analyze vast amounts of information; These are common in your everyday life, but they always
appear in the form of graphs and charts. The combination of multiple visualizations and bits of information are
still referred to as Info graphics;Data visualizations are used to discover unknown facts and trends. You can
see visualizations in the form of line charts to display change over time;Visual Elements;Charts and
Graphs;Bar charts, line graphs, scatter plots, and pie charts are common forms of visual data representations
;Tables;Display raw data in a structured manner, often used when precision is more important than visual
appeal;Maps;Geographic data can be visualized through heat maps, choropleth maps, or point-based maps to
show patterns across locations;Effective Data Visualization in 5 Steps;Identify the Audience;The first step in
creating a data visualization is to understand your audience, including their knowledge, technical skills,
attention span, and interest in the topic;Remove Unnecessary Complexity; Keep your data visualization
simple by eliminating any elements that could confuse or distract the audience;Use Relevant Charts;To create
effective data visualizations, use the right charts for the data. For example;Pie Chart;Bar Chart;Tape
Diagram;Pictograph;Scatter plot;Time
Series;Area Chart;Bubble Graph;Line
Chart;Radar Chart;Venn diagram;Heat
map;Tell a Story;Data visualization should
tell a clear story, not just present
numbers. Provide relevant background
information, highlight key points, and
guide the audience towards the
conclusion;Test Your Data Visualization;
The final step is to test the data
visualization to ensure it's ready for
presentation. Check that the key points
are clear, the data is accurate, and the
charts are easy to understand. Ask a colleague to review it for errors and provide feedback on the design and
content;Tools;Tableau; A popular tool that helps creates interactive and shareable visualizations like charts
and graphs. It's easy to use and works well with large datasets;Power BI;A Microsoft tool that allows you to
create interactive reports and dashboards;Google Data Studio;A free tool by Google that helps creates
customizable reports and dashboards using data from various sources like Google Analytics and Google
Sheets;Excel;A well-known spreadsheet tool that includes basic charting and graphing features, making it easy
to create simple visualizations from data;Plotly;A tool for creating interactive plots and graphs, often used for
web-based data visualizations;Qlik;A tool that helps create interactive visualizations and dashboards, similar
to Tableau and Power BI, often used for data discovery and analytics;D3.js;A JavaScript library used for
creating custom and complex data visualizations on websites; R(ggplot2);A programming language and library
for creating detailed and customized statistical visualizations, especially used by data scientists and
analysts;Infogram;A simple tool for creating charts, infographics, and reports;Looker;A business intelligence
tool used to create data reports and dashboards. It integrates well with databases and helps in real-time data
analysis;Uses;Understanding Trends;Identifying Patterns and Relationships;Simplifying Complex Data;Making
Decisions;Communicating Insights;Applications; Business Intelligence and Reporting;Financial
Analysis;Healthcare;Marketing and Sales ;Human Resources.
Pixel-Oriented Visualization Technique;These are a way to display very large datasets by using pixels as a
visual representation of data values. Each data point is shown as a
tiny dot (a pixel) on the screen, and its color represents the value of
the data;If the data has many dimensions,a separate section of the
screen is created for each dimension. The data is arranged in the
same order across all these windows, so you can compare patterns
easily;This method makes it possible to show massive amounts of
data at once while still being able
to spot trends, patterns, or unusual
values.It uses every pixel on the
screen efficiently to give a clear and
compact view of the data;(Diagram
); For a data set of m dimensions,
create m windows on the screen,
one for each dimension; The m
dimension values of a record are
mapped to m pixels at the
corresponding positions in the
windows;The colors of the pixels
reflect the corresponding values;
(Diagram);To save space and show the connections among multiple dimensions, space filling is often done in a
circle segment;Advantages;High Data Density;Efficient Pattern Recognition;Scalability;Dimensional Flexibility;
Drawback;Hard to Understand;Color Confusion;Limited Detail;Screen Limitations;Applications; Financial
Market Analysis;Medical and Genomic Research; Network Security;Climate Science.
Geometric Projection Visualization Techniques:It simplify multi-dimensional data by projecting it into a lower-
dimensional space (2D or 3D) using mathematical
transformations, while preserving the original structure and
relationships;Line Plot;This is the plot that you can see in the
nook and corners of any sort of analysis between 2 variables.
;The line plots are nothing but the values on a series of data
points will be connected with straight lines;The plot may seem
very simple but it has more applications not only in machine
learning but in many other areas;Used to analyze the
performance of a model using the ROC- AUC curve;(Diagram);
Bar Plot;This is one of the widely used plots, that we would
have seen
multiple
times not just in data analysis, but we use this plot
also wherever there is a trend analysis in many
fields; We can visualize the data in a cool plot and
can convey the details straight forward to others
;This plot may be simple and clear but it’s not much
frequently used in Data science applications;
(Diagram);
;Stacked Bar Graph;Unlike a Multi-set Bar Graph which displays their bars side-by-side, Stacked Bar Graphs
segment their bars. Stacked Bar Graphs are used to show how a larger category is divided into smaller
categories and what the relationship of each part has on the total amount. There are two types of Stacked Bar
Graphs;(Diagram);Simple Stacked Bar Graphs place each value for the segment after the previous one. The
total value of the
bar is all the
segment values
added together.
Ideal for comparing
the total amounts
across each group
/segmented bar;
100% Stack Bar
Graphs show the
percentage-of-the-
whole of each group
and are plotted by
the percentage of each value to the total amount in each group. This makes it easier to see the relative
differences between quantities in each group;One major flaw of Stacked Bar Graphs is that they become
harder to read the more segments each bar has. Also comparing each segment to each other is difficult, as
they're not aligned on a common baseline;Scatter Plot;(Diagram);It is one of the most commonly used
plots used for visualizing simple data in Machine
learning and Data Science;This plot describes us
as a representation, where each point in the
entire dataset is present with respect to any 2 to
3 features(Columns);Scatter plots are available in
both 2-D as well as in 3-D. The 2-D scatter plot is
the common one, where we will primarily try to
find the patterns, clusters, and separability of the
data;The colors are assigned to different data
points based on how they were present in the
dataset i.e, target column representation;We can
color the data points as per their class label given
in the dataset;Box and Whisker Plot;This
plot can be used to obtain more statistical details
Visualizing Complex Data and Relations;For a large data set of high dimensionality, it would be difficult to
visualize all dimensions at the same time; Hierarchical visualization techniques partition all dimensions into
subsets (i.e., subspaces);The subspaces are visualized in a hierarchical manner;“Worlds-within-Worlds,” also
known as n-Vision, is a representative hierarchical visualization method;To visualize a 6-D data set, where the
dimensions are F,X1,X2,X3,X4,X5. ;We want to observe how F changes w.r.t. other dimensions. We can fix
X3,X4,X5 dimensions to selected values and visualize changes to F w.r.t. X1, X2 Most visualization techniques
were mainly for numeric data;Word Cloud;(Diagram);A visualisation method that displays how frequently
words appear in a given body of text, by making the size of each word
proportional to its frequency. All the words are then arranged in a cluster or
cloud of words. Alternatively, the words can also be arranged in any format:
horizontal lines, columns or within a shape;Word Clouds can also be used to
display words that have meta-data assigned to them. For example, in a Word
Cloud with all the World's country's names, the population could be assigned to
each name to determine its size;Colour used on Word Clouds is usually meaningless and is primarily aesthetic,
but it can be used to categorise words or to display another data variable;Typically, Word Clouds are used on
websites or blogs to depict keyword or tag usage;Word Clouds can also be used to compare two different
bodies of text together;Although being simple and easy to understand, Word Clouds have some major
flaws;Long words are emphasised over short words;Words whose letters contain many ascenders and
descenders may receive more attention.;They're not great for analytical accuracy, so used more for aesthetic
reasons instead.