Dashboard Design by Michael Burch ..
Dashboard Design by Michael Burch ..
Series Editors:
K.C. CHEN SANDEEP SHUKLA
National Taiwan University, Taipei, Taiwan Virginia Tech, USA
University of South Florida, USA Indian Institute of Technology Kanpur, India
The “River Publishers Series in Computing and Information Science and Technology” covers
research which ushers the 21st Century into an Internet and multimedia era. Networking
suggests transportation of such multimedia contents among nodes in communication and/or
computer networks, to facilitate the ultimate Internet.
Theory, technologies, protocols and standards, applications/services, practice and imple-
mentation of wired/wireless networking are all within the scope of this series. Based on
network and communication science, we further extend the scope for 21st Century life through
the knowledge in machine learning, embedded systems, cognitive science, pattern recognition,
quantum/biological/molecular computation and information processing, user behaviors and
interface, and applications across healthcare and society.
Books published in the series include research monographs, edited volumes, handbooks
and textbooks. The books provide professionals, researchers, educators, and advanced students
in the field with an invaluable insight into the latest research and developments.
Topics included in the series are as follows:-
• Artificial intelligence
• Cognitive Science and Brian Science
• Communication/Computer Networking Technologies and Applications
• Computation and Information Processing
• Computer Architectures
• Computer networks
• Computer Science
• Embedded Systems
• Evolutionary computation
• Information Modelling
• Information Theory
• Machine Intelligence
• Neural computing and machine learning
• Parallel and Distributed Systems
• Programming Languages
• Reconfigurable Computing
• Research Informatics
• Soft computing techniques
• Software Development
• Software Engineering
• Software Maintenance
Michael Burch
Marco Schmid
River Publishers
Published 2023 by River Publishers
River Publishers
Alsbjergvej 10, 9260 Gistrup, Denmark
www.riverpublishers.com
© 2023 River Publishers. All rights reserved. No part of this publication may
be reproduced, stored in a retrieval systems, or transmitted in any form or by
any means, mechanical, photocopying, recording or otherwise, without prior
written permission of the publishers.
Preface xi
List of Figures xv
1 Introduction 1
1.1 A Visualization Pipeline . . . . . . . . . . . . . . . . . . . 2
1.2 Human Users and Tasks . . . . . . . . . . . . . . . . . . . . 5
1.3 Programming Directions and Solutions . . . . . . . . . . . . 8
v
vi Contents
7 Conclusion 239
References 241
Index 269
xi
xii Preface
learned during the course, in order to solve a given realistic data science
problem. Such problems can be manifold, stemming from various application
domains, oftentimes with a link to a real-world data problem that students
are aware of or even more actively involved in, for example, in the context
of a company or industrial partner for which the students currently work
or for which they have worked a while ago. Such a link can build a bridge
between a theoretical course in visualization and a more realistic real-world
data example, creating some kind of synergy effect.
The book is organized in a way to explain aspects from all of the involved
fields, in a structured way, while at the same time giving plenty of real-world
visual examples as well as runnable code snippets among discussions. More-
over, each section is concluded with exercises worth solving and thinking
about, in order to learn the rules of designing and implementing dashboards
for interactive data visualizations. As a benefit we also provide one possible
solution, among many imaginable ones, to support the learning effect. We
primarily focus on interactive dashboards, although many other solutions
for a visualization problem exist and might solve the problem in a more
efficient way. On the other hand, dashboards are easy to understand and to
teach and quickly lead to a desired visualization tool solution equipped with
various interaction techniques. This is also due to the limited amount of time
planned for typical courses on visualization in universities that build a time
frame in which a maximum must be learned with a minimum of effort. In
many of the courses, it was an amazing experience to see the students’ tool
running in the end, either locally on our own computers or via a URL on
the web, making it accessible for everybody who has an internet connection.
This milestone was also reached for the weaker, less-experienced ones, even
when they started with nearly no background knowledge about one or all of
the involved fields in the beginning, again showing us that the content was
successfully explained.
The remainder of the book is as follows: Chapter 1 starts with introducing
the general problem and tries to show the bridges between all of the related
fields. Chapter 2 makes an attempt to describe the data-to-visualization map-
ping with respect to dashboards, while also perceptual and cognitive issues
related to visual and interface design are taken into account. In Chapter 3,
we are going to explain the major programming ingredients to create dash-
boards for interactive visualization, while Chapter 4 builds the basis from an
implementation perspective focusing on the programming language Python.
Applications are provided in Chapter 5 coming with code examples as well
as their visual outputs in the form of dashboards. The book is completed with
Preface xiii
xv
xvi List of Figures
Figure 2.5 Pie charts are based on the visual variable "angle"
or "area" while bar charts are based on "posi-
tion" or "length" in a common scale [73] which
seems to be better for solving comparison tasks for
quantities. . . . . . . . . . . . . . . . . . . . . . . 29
Figure 2.6 Visual objects can be observed while trying to solve
a search task: (a) Only blue circles. (b) Blue circles
with one red circle which is called the target object,
whereas the blue ones are the distractors. . . . . . . 32
Figure 2.7 Change blindness when comparing two images:
From the original image in (a) there are several dif-
ferences compared to the image
shown in (b). . . . . . . . . . . . . . . . . . . . . 32
Figure 2.8 Some of the popular Gestalt principles: (a) Reifica-
tion: An incomplete visual object can be completed.
(b) Invariance: A deformation of a visual pattern
still allows to recognize the original object. (c) Mul-
tistability: A visual object might be interpreted in
various ways (at least two ways). (d) Emergence:
A visual object or a person can be detected from a
noisy background. . . . . . . . . . . . . . . . . . . 33
Figure 2.9 Visual objects can be grouped in several ways: (a)
Symmetry. (b) Closure. (c) Similarity. (d) Proxim-
ity. There are even further laws like the law of good
form, common fate, or continuity. . . . . . . . . . 33
Figure 2.10 The Hermann grid illusion demonstrates how
"visual objects" in the form of gray dots can pop
out although there are no such gray dots included in
the image. . . . . . . . . . . . . . . . . . . . . . . 34
Figure 2.11 A pie chart is one way to visualize quantities, but a
bar chart makes it easier to compare the values due
to the fact that it encodes the quantities in the bar
lengths instead of the circle sector angles [73]. . . . 41
Figure 2.12 Two different ways to visually encode relational
data while the edges of the graph have directions
and weights, also known as a network [106]: (a) A
node-link diagram. (b) An adjacency matrix. . . . . 43
List of Figures xvii
xxi
List of Abbreviations/Acronyms
1D one-dimensional
2D two-dimensional
3D three-dimensional
AOI area of interest
ASCII American standard code for information
interchange
AWT abstract window toolkit
C programming language
C++ programming language
CSS cascading style sheets
CSV comma-separated values
D3 graph gallery
DNA deoxyribonucleic acid
EEG electroencephalography
GUI graphical user interface
HTML hypertext markup language
IDE integrated development environment
NCBI National Center for Biotechnology Information
NP nondeterministic polynomial time
PCP parallel coordinates plot
PX pixels
R programming language
SPLOM scatter plot matrix
UI user interface
URL uniform resource locator
XAI explainable artificial intelligence
xxiii
1
Introduction
1
2 Introduction
Figure 1.1 A visualization pipeline: Starting with raw data, processing it, transforming it,
building visual structures, and finally visualizing it in a dashboard.
1.1 A Visualization Pipeline 3
• Manipulations:
– Reading/parsing: Some steps have to be taken into account to
bring the raw data into a processed data form, for example, during
the reading and parsing process, the data can already be partially
cleaned or annotated with extra information. However, most of the
advanced enrichments can only be done after a more thorough data
transformation process.
– Transformation: The processed and partially cleaned data can fur-
ther be analyzed for common patterns, correlations, outliers and
anomalies, as well as certain sequential behavior in case the data
has a sequential or time-dynamic property.
– Visualization: The visual depiction of the data is of importance,
however, the rendering process has various options to use visual
variables, to use different display technologies, as well as interface
styles and designs, which is particularly important for dashboards.
• Users-in-the-Loop:
– Feedback: The user group is able to interact with all of the
aforementioned stages (however, in Figure 1.1, we only show the
interaction with the last stage). While interacting and trying to
visually explore a dataset, users typically form some kind of con-
fidence level that describes how well the interactive visualization
supports them in solving tasks at hand. This is typically evaluated
in a complex user study, giving concrete tasks based on formerly
stated hypotheses, by measuring dependent variables like error
rates and completion times. Modern evaluations even incorporate
eye tracking and further physiological measures to get even more
hints about visual attention behavior, visual task solution strate-
gies [46], or body-related issues such as blood pressure, EEG, or
stress levels. However, the analysis of such data is typically very
challenging and demands for further advanced visual analytics
systems, meaning statistical evaluation alone is not enough to find
insights.
Furthermore, if users interact with the visualizations, those get trans-
formed as well into different perspectives, layouts, filtered views, and so on.
Hence, those operations are another kind of modification, but typically work
on the visual level, not on the data level. However, mostly the interactions
require further algorithms to be applied which are running in the background
1.2 Human Users and Tasks 5
and which might cause longer waiting times depending on the algorithms’
performances and/or the dataset sizes in use.
Exercises
• Exercise 1.1.1: Imagine you have found a certain dataset on the web and
are interested in the patterns, correlations, and anomalies hidden in it.
Describe the ingredients you need to solve this problem by taking into
account the stages of the visualization pipeline illustrated in Figure 1.1.
• Exercise 1.1.2: How can the users be integrated into the design and
implementation process of a visualization tool or dashboard? What are
typical challenges when asking real users to apply a visualization tool to
a given dataset?
Figure 1.2 Searching for a visual object in a visual scene is denoted by the term search task
that typically requires focused visual attention and a visiting and checking strategy to identify
the visual object-of-interest. The observer might search for a set of neighbored dots visually
encoded in a certain color pattern.
1.2 Human Users and Tasks 7
Well-known and often occurring tasks from a much longer list are for
example:
• Search task: One of the most time-consuming task comes in the form
of a visual search, for which the entire display has to be visually
inspected in the worst case, to identify the visual object-of-interest (see
Figure 1.2). The search can be more efficient if a certain visual feature of
the object-of-interest is known beforehand, for example, a certain color
or color pattern, hence leading to some kind of visual pre-filtering of the
display.
• Counting task: In cases in which only a few objects are visually
represented, we might start inspecting them one-by-one and count the
visited objects, for example, to get an idea about how much information
is presented visually. The number of objects to be counted should not be
too large; otherwise, a counting task would become a tedious procedure
that one would probably not like to solve.
• Estimation task: If too many visual objects are depicted, we typically
do not count them one after the other. In such a scenario, we would
switch into some kind of estimation task that gives an approximate
number of visible objects. In most cases, groups of visual objects are
estimated based on the number of objects and those are later compared,
for example, after having applied a clustering algorithm, for which a
visual output is displayed.
• Correlation task: If two or more variables are under exploration we
are typically interested in a certain correlation behavior, asking whether
those variables behave in a similar way or show some kind of contra-
dicting, opposite effect, for example, the values of one variable show an
increasing behavior while those of the other variables are decreasing in
the same time period.
• Pattern identification task: A very general task comes in the form
of pattern detection, which requires to understand what a pattern is.
This can actually be a problem for algorithmic solutions, which do not
know exactly which kind of pattern we are looking for. The pattern
identification task might be supported by visual outputs which make
use of the perceptual and cognitive strengths of the humans’ visual
systems [246].
There are many more tasks that are imaginable, too many to mention all
of them here, but typically tasks are based on a sequence of much simpler
8 Introduction
tasks. The general question in usability is how users solve tasks step-by-
step. This sequential visual attention process can give useful insights in the
fact if a user interface, dashboard, or visualization tool has been designed
and developed by following the rules that make it a powerful tool for data
exploration and analysis. Eye tracking [44, 87, 123] is a modern technology
applied to interactive visualizations [8] with the goal to record visual attention
behavior but, on the challenging side, to also visually and algorithmically
explore the eye movement data.
Exercises
• Exercise 1.2.1: Imagine you have a dataset about a social network, for
example, people from a certain region who are related or not. Which
hypotheses or research questions might be interesting to ask, given the
fact that we have a social network dataset?
• Exercise 1.2.2: Which kind of tasks do we need to solve, to find solutions
to the formerly stated hypotheses about the social network dataset?
Table 1.1 Examples for programming languages, visualization libraries, the year of first
development, and additional special properties.
Programming Visualization Year Special
language library properties
Java AWT 1995 Graphical user interfaces (GUIs)
Java Swing 1996 GUI widget toolkit
Python Matplotlib 2003 Interactive visualizations
R ggplot2 2005 Lattice graphics
Javascript D3 2011 Web standards
R Leaflet 2011 Spatial data visualization
R Shiny 2012 Interactive web applications
Python Bokeh 2012 Modern web browsers
Python Seaborn 2012 Statistical graphics
Javascript Plotly 2012 Web-based
Javascript Chart 2013 Open-source library
Python Geoplotlib 2016 Hardware-accelerated
Python Chartify 2017 Open source library
R Esquisse 2018 Drag and drop interface
online via a URL that has to be typed into the URL field of a web browser.
Such a web-based tool is easiest to start from users’ perspectives since it
just requires to simply write or copy and paste the correct URL into the
web browser, no extra installations are needed. Actually, a dashboard can
be built in exactly this way, keeping the burden for the users quite low and
hence, with such web-based visualization tools, we can quickly distribute
it among a large community, for example, to disseminate some valuable
results based on visual analyses of data. One big issue can still occur. We
need a stable internet connection to access the implemented visualization tool
successfully; otherwise, a locally stored version of the tool would also be an
option, but negatively, the users have to understand how to get it running on
their computers.
If a dashboard is running, it cannot only be used for data exploration but
even for services, for example, a company might need it to sell products or
request customer reactions and the like. There is an endless list of application
scenarios in which dashboards are worth designing and implementing. How-
ever, more and more dashboards are created to make data visually observable,
for example, by showing the relations in a social network, informing about
weather trends, showcasing the international flight behavior, or illustrating
earthquakes happening on earth every day, from a daily, monthly, or yearly
perspective, provided by multiple coordinated views [200] and the integration
of various interaction techniques [258]. The data scientists are much more
experienced to use algorithms and visual outputs since it belongs to their
daily jobs to deal with data of different kinds; however, the biggest issue
here is to make the data understandable to the laymen, the nonexperts in data
science and data visualization, hence a dashboard that runs online can be of
great help, also for people who do not regularly work in the field of data
science. The goal of this book is to involve interested people in this domain,
that is, nonexperts, to make them aware of the technologies and processes to
build such tools by themselves one day. This has a positive benefit that they
are not dependent on the work of others anymore, but can create their own
independent solutions to their tasks at hand.
Before starting to create one’s own dashboard, we have to understand the
aspects surrounding this whole process. Visualizations have to be understood
and which purpose they have for a certain data type. For example, prominent
visualization techniques like histograms, bar charts, pie charts, scatter plots,
star plots, scarf plots, dendrograms, or geographic maps with additionally
overplotted information (maybe population densities) are a first step but also
the various interactions they support and how they can be linked for creating
1.3 Programming Directions and Solutions 11
a more flexible and complex visualization tool with much more functionality
can be of great interest. Even animated diagrams might be interesting; for
example, if some kind of dynamic story has to be told with data which
is oftentimes preferred in the industrial environment to show processes to
customers. It may be noted that even if all of the involved technologies
to build a dashboard are understood and can be applied, a big issue still
comes from the computer science side which also deals with algorithms and
their runtime complexities [102]. If a dashboard does not only show data
visually, but the data have to be transformed in an earlier stage or even in
real-time, the implementors might get confronted by several more hard-to-
tackle issues, also including the data handling and efficient access to the
data, for example, stored in a database or in a text file. In general, creating
powerful visualization tools, maybe in form of a dashboard, include many
hidden bottlenecks and drawbacks. However, we try our best to explain those
step-by-step in a tutorial-like book with many examples and exercises with
solutions.
Exercises
• Exercise 1.3.1: Search for programming languages and visualization
libraries and describe their benefits and drawbacks for the task of
creating visualization tools and dashboards.
• Exercise 1.3.2: What are the positive and negative aspects when using
web-based solutions for visualization tools?
2
Creating Powerful Dashboards
13
14 Creating Powerful Dashboards
is not easily possible to be flexible in the sense of being able to decide which
functionality, which visualization, and which interaction to offer at what place
and at what time in a dashboard, that is, creating one’s own solution might
still be the better option. There is a lot of support for building dashboards
like Microsoft Power BI, QlikSense, Tableau, or Grafana, just to mention
a few. Those consist of a lot of functionality and negatively, as also in the
case of a purely programmed solution, they have to be learned to efficiently
work with them. Once they are understood, the dashboard creators are miss-
ing functionality and control that is needed to build dashboards designed
for their tasks at hand and to easily extend them with new functionality.
Programming a dashboard from scratch, on the other side, can be a longer-
duration solution, but these kinds of dashboards can be designed for nearly
any kind of task [255] that has to be solved in data analytics, flexibly equipped
with interaction techniques. However, a profound knowledge about Python,
for example, is required to equip the dashboard with all of the features that
are needed.
Not only the programming side is problematic, but also questions about
data handling, visual and interface design, including HTML and CSS to guide
the layout, appearance, and aesthetic appeal of a dashboard, human–computer
interaction, as well as user evaluation might be worth studying, in order
to really get the most powerful solution we are waiting for, to dig deeper
in our own or other people’s data, to explore it for patterns, correlations,
rules, outliers, and anomalies. Moreover, linking all views and perspectives
on the data, storing snapshots of the current state of the visual and algorithmic
output, uploading data, sharing, and disseminating the results in the form of
URLs, visualizations, or parts of a dataset that contain valuable information,
are powerful, and can only tap the full potential if most of the techniques
in this interdisciplinary field of designing and implementing dashboards are
understood. No matter which kind of dashboard is created, the human users,
with their tasks at hand should definitely be consulted, maybe in a controlled
or uncontrolled user study, with the intention to get valuable feedback about
the design flaws in an interactive dashboard. Such design flaws could be based
on the visualization techniques in use or on the visual interface given by
the dashboard with its visual components like sliders, buttons, text fields,
and the like as well as their layout and interactive response. Moreover, from
an algorithmic perspective, it might be worth studying how the data gets
processed to understand the runtime complexities and bottlenecks in the form
of poorly running algorithms that finally, also impact the interactivity of such
2.1 Data Handling 15
a dashboard. Nobody wants to wait for a long time until the next interaction
can be applied.
Actually, building a dashboard can be based on many programming
languages. For example, the language R with its visualization library Shiny
has shown to be a good solution, but for the newcomer in programming and
in dashboard design, we recommend the programming language Python with
either its powerful frameworks like Bokeh or Dash by Plotly. There is some
tendency to use Dash since many users report on the fact that it is easy to learn
while already quite powerful simple dashboards and web apps can be built
with basic programming skills. However, if more advanced dashboards have
to be created, much more profound knowledge about Dash, Plotly, and Python
is needed. Dash itself is JavaScript-based to some extent since it makes use
of React, a popular web framework based on JavaScript and Flask which is
a prominent web server based on the programming language Python. Dash
does not only support Python but programming languages like R or Julia as
well. Deploying the first dashboard results and testing them online, might be
done by using Heroku or pythonanywhere, but for larger results in the sense
of using big data and more advanced functionality in the form of powerful
algorithms we recommend an own virtual machine, in order to let it run on a
server to make it accessible for anybody on earth who owns a computer with
a stable internet connection.
In this chapter we explain which typical ingredients are needed to build
a dashboard, starting from the perspective on the data that can come in a
variety of forms (Section 2.1). We will also take a detailed look on aspects
related to visualization and algorithmic approaches (Section 2.2), also includ-
ing the human users with their tasks at hand to be solved. To include the
aforementioned visual aspects in a broader context, we will describe typical
visualization examples and applications (Section 2.3). The various rules for
visual and interface design with good practice and no-goes will also be
taken into account (Section 2.4). Finally, we look at interaction concepts,
modalities, and displays (Section 2.5).
numerical values, but it does not make sense to add bus line 5 to bus
line 8, for example, nor makes it sense to order the bus lines by their
numbers. The lines are just representatives for certain routes a bus is
taking in a city.
Apart from primitive data, we typically meet much more challenging data
types, challenging in a way that it is more difficult to apply algorithms and
visualizations to detect insights in them. Those complex data types could be
classified as relational, hierarchical, multivariate, textual, spatiotemporal, or
trajectorial data, just to mention a few from a much longer list.
• Relational data: Data objects can be related to some extent. These
relations are expressed in relational data which can consist of binary or
multiple relations between two objects. The data structure we are talking
about in such a case is a graph [248] which can be undirected, in case
the direction of a relation is irrelevant, or directed, in case the direction
is relevant. If weights of the relations are of particular interest and the
relations are directed, we call such a graph a network. For example, a
social network, as the name suggests, contains data of a relational data
type. The people are the data objects while the network itself, with all
its connections, is given by the (weighted) relations between all those
people.
• Hierarchical data: If data objects are superior to some others causing
some kind of parent–child relationship we consider this kind of data
structure a hierarchy [196]. It consists of a root node (the topmost
object), inner nodes (objects in-between), and leaf nodes (objects on the
lowest hierarchy level). An example for such a hierarchy data type might
be a file system on a computer which starts with a directory that contains
other directories (subdirectories), again some other directories, and on
the lowest level there are the files. There are two types of hierarchy data
types which are containment hierarchies and subordination hierarchies.
The file system is a containment hierarchy, while a company, a family,
or a sports league hierarchy is based on the principle of subordination,
not containment.
• Multivariate data: Data that has the form of rows and columns with
numerical values is denoted by the term multivariate data [116, 117].
Each row, that is, case or observation, contains values for each column
under a certain condition, that is, a variable or an attribute. The value
can exist between the minimum and maximum of a given scale while
the scale can vary from column to column. An example for such a data
2.1 Data Handling 19
Exercises
• Exercise 2.1.1.1: Imagine you have an Excel table full of values. Which
kind of data type would this scenario refer to? Which kind of data type
do the individual entries refer to?
• Exercise 2.1.1.2: People in a social network know each other, are send-
ing messages to each other, but might even be related by other attributes,
for example, in a family hierarchy. Which kind of data types can you
find in such a scenario?
Exercises
• Exercise 2.1.2.1: In many scenarios in the field of data science, we find
the data to be analyzed in several nonlinked data files. How would you
design a data reading and parsing functionality to get all the information
you need from all of the data sources?
• Exercise 2.1.2.2: How should a data parser be designed and implemented
to be able to react on different data formats or even on changes in a given
data format?
brings into play the five big V’s that are volume, variety, velocity, validity, and
value [206]. Volume stands for the sheer size of data, for example produced
by sensors, the internet, or the behavior of users who order articles, pay with
credit cards, or travel around the globe. The data can even be so big that
they cannot be stored anymore on traditional computers, hence they must be
moved to special servers or even be split into different data sources at the cost
of maybe reuniting them again for certain data analysis. Variety expresses that
data can come in many forms, which can be structured or unstructured, hence
creating an understandable data format from which the data can be accessed
quickly and effortlessly is a major challenge. Velocity describes how fast we
have to access the data, for example, if an algorithm must generate real-time
analyses we only have fractions of milliseconds to respond to requests and the
algorithms themselves have to operate very quickly. Validity focuses on the
quality of data, for example, freeing the data from noise or add missing data
entities that would hide certain patterns, and that might increase the runtime
of algorithms. The value of the data is important to express which impact the
results from the big data can have, that is, which value they produce for the
academic or industrial community.
Exercises
• Exercise 2.1.3.1: What is the biggest dataset that you can store on your
computer? How could you reduce the size and the complexity of the
original dataset so that it fits again on your computer for a locally
running tool?
• Exercise 2.1.3.2: If we are talking about big data, we come across the
five big V’s standing for volume, variety, velocity, validity, and value.
Discuss which of the V’s is problematic for the implementation of a
dashboard and which solutions exist to mitigate this situation?
not enough information in the data. In such cases, the goal of a preprocessing
step might be to close certain data gaps, for example, by interpolation, even
measurement errors, or uncertainty effects whenever this is possible. In many
situations, data preprocessing tries to improve or augment original data, but
as a negative consequence, there could be the negative effect of removing
relevant data elements unintentionally. Consequently, the preprocessing step
must be taken with care to not lead to misinterpretations later on. The positive
side effect of a data preprocessing can be that the interactive responsiveness
of a data analysis and visualization tool gets much faster due to the fact
that irrelevant information is removed or in the opposite effect, relevant but
missing information is already added and reduces the runtime complexities
of algorithms in the data transformation step.
The data preprocessing typically happens before the algorithmic analyses
and visualizations as the term pre already suggests. However, in some situ-
ations, the preprocessing cannot work properly without the interventions of
human users. For example, it might be a good idea to show the data in its
original form and let the users decide which algorithm to apply to remove
noise in the data. In some situations, it might actually be the noise or a gap
pattern that we are looking for, which is important for detecting insights in
data or to confirm or reject hypotheses. Consequently, it would be a bad idea
to automatically remove those patterns in a preprocessing stage leading to
the effect that we would never see what we are actually interested in. A
visualization tool should support both options, that is, showing the data in its
original or preprocessed form. Even a difference between both forms might
be useful to explicitly point at data elements that are not needed or that are
missing on the other hand.
Exercises
• Exercise 2.1.4.1: What could be the reasons for erroneous data, missing
data, or uncertain data?
• Exercise 2.1.4.2: What are typical solutions to handle missing data
elements in a given dataset?
Figure 2.1 Algorithms transform input data to output data. How this is done exactly depends
on user tasks, that is, under which perspectives data will be explored.
for such problems are the optimal linear arrangement problem [74] (also
sometimes called MinLA problem for the minimum linear arrangement). It
is challenging and very time-consuming to compute the optimal arrangement
for a matrix of pairwise relations but a local minimum might be sufficient to
detect clusters among the pairwise relations. Such NP-hard problems [102]
are known to create challenges for a visualization tool, in particular, if the
focus is on fast interactions.
Exercises
• Exercise 2.1.5.1: What are the benefits and drawbacks when transform-
ing data from its original form to a transformed one?
• Exercise 2.1.5.2: Aggregation can be a form of data transformation
that has the benefit of reducing the dataset size. How can a new data
element stemming from an aggregation of several original data elements
be computed?
Figure 2.2 The visualization pipeline [177] illustrates how raw data gets transformed step-
by-step into a visual output. The user group can interact in any of the intermediate stages and
can intervene to guide the whole algorithmic and visual exploration process.
Data alone in all its varieties and data types with all data preprocessing
steps and algorithmic transformations can only tell us half of the truth about
certain phenomena from the real world. We can store any kind of data with
incredible sizes and complexities but without seeing a visual output in the
form of graphics, we, the human users, are not really able to derive insights, to
see patterns, correlations, or anomalies [245]. Interactive visualizations [258]
2.2 Visualization and Visual Analytics 27
Figure 2.3 Visualizations, interactions, algorithms, and the human users with their percep-
tual abilities build the major ingredients in the field of visual analytics.
computer being fast, accurate, but stupid while the human users with their
perceptual abilities being slow, inaccurate, but intelligent. Together they are
even more powerful." Such a synergy effect is reflected in the research
field of visual analytics [139, 141, 143] that includes the human users with
their tasks at hand, algorithmic concepts, visualizations, human–computer
interaction, perception, cognitive science, data science, and many more to
make it an interdisciplinary approach. The interdisciplinarity makes the field
applicable to many real-world examples, typically involving big data [205].
Figure 2.3 shows some of the most important fields that are included in the
interdisciplinary field of visual analytics [252, 251].
Figure 2.4 Visual variables [180] describe from which core ingredients a visualization is
built: Position, size, shape, orientation, hue, value, or texture are just the major ones from a
longer list.
(a) (b)
Figure 2.5 Pie charts are based on the visual variable "angle" or "area" while bar charts are
based on "position" or "length" in a common scale [73] which seems to be better for solving
comparison tasks for quantities.
much faster and more accurate than in pie charts. One reason for this effect
comes from the fact that humans can judge quantities shown with the visual
variable position in a common scale, as used in bar charts much better than
using angles as in pie charts for the same task. There are various such
experiments investigating the task of which visual variable is best for a certain
situation, that is, for a task at hand for a given dataset based on one or several
data types.
Such user experiments are important to figure out if a designed and imple-
mented visualization or visualization tool, for example, a dashboard, can be
used reliably by a user group. For this reason, we can find many comparative
user evaluations taking into account the visual variables as independent vari-
ables and measuring the response time and accuracy as dependent variables
while confronting the users with typical tasks that have to be solved by
using the corresponding visualizations as stimuli in the user experiment. Even
interaction techniques [258] (see Section 2.5) integrated in the visualization
tool are typically evaluated although those user studies are much more com-
plicated due to the dynamic stimuli, that is, changing representations of the
visualizations and the typically more complicated and time-consuming tasks
to be answered. Also, eye tracking [44, 123] is a powerful technology to
find out where and when visual attention is paid to a visual stimulus, be it
static or dynamic. However, the dependent variables that come in the form
of spatiotemporal eye movement data and extra physiological measures are
30 Creating Powerful Dashboards
Exercises
• Exercise 2.2.1.1: Histograms typically show the distribution of quan-
titative values on a (numeric) y-axis. Whereas the (numeric) x-axis
stands for a scale on which the data is measured. An example would
be the number of people with a certain income in dollars. Which visual
variables can you generally identify in histograms?
• Exercise 2.2.1.2: What are the benefits and drawbacks when using
either bar charts or pie charts for visually representing a dataset with
5/10/20/50/100 quantities?
(a) (b)
Figure 2.6 Visual objects can be observed while trying to solve a search task: (a) Only blue
circles. (b) Blue circles with one red circle which is called the target object, whereas the blue
ones are the distractors.
(a) (b)
Figure 2.7 Change blindness when comparing two images: From the original image in (a)
there are several differences compared to the image shown in (b).
power of the visual memory such comparison tasks could not be solved
efficiently. Change blindness [181] is a concept that describes the challenge of
detecting visual elements in one scene that are not present in another one, also
illustrated in the famous error search images (see Figure 2.7). If the reason
for not seeing the change is caused by the viewers’ missing attention, we
call this inattentional blindness [203]. For information visualization, this is a
mighty concept since we are comparing visual scenes all the time, be it for
2.2 Visualization and Visual Analytics 33
(a) (b)
(c) (d)
Figure 2.9 Visual objects can be grouped in several ways: (a) Symmetry. (b) Closure. (c)
Similarity. (d) Proximity. There are even further laws like the law of good form, common fate,
or continuity.
34 Creating Powerful Dashboards
In Gestalt theory, we follow the principle of "the whole is greater than the
sum of its parts" [114, 147]. This means that our brain is not trying to see
visual scenes as composed of many small pieces, but it more or less tries to
derive complete visual patterns immediately. This powerful strategy happens
effortlessly and helps us to rapidly derive patterns. Moreover, experience
also plays a central role in Gestalt theory since already-known patterns are
found much easier and faster than patterns with which we do not have
much experience. Figure 2.8 illustrates several of the Gestalt principles like
emergence, reification, multistability, invariance, or grouping which can be
categorized into the laws of proximity, similarity, closure, symmetry, com-
mon fate, continuity, and good form (see Figure 2.9). Most of them are really
obvious, but their impact on information visualizations and how we detect
visual patterns to explore data has a tremendous impact on the usefulness of
the visualizations. This again also shows that experience plays a crucial role
in the field of visualization.
Figure 2.10 The Hermann grid illusion demonstrates how "visual objects" in the form of
gray dots can pop out although there are no such gray dots included in the image.
situations, we are not aware that they might occur. They can cause misinter-
pretation issues when trying to derive visual patterns from a visualization.
There are various prominent optical illusions that can also accidentally
be incorporated into an information visualization tool, for example, in a
dashboard. Those can cause problems when judging visual elements for
parallelism, length, color coding, movement and speed, and many more [85].
Exercises
• Exercise 2.2.2.1: Find an example of error search images (Google image
search can help). Look at them and try to find the differences between
the original and the manipulated image. What is your search strategy,
that is, how do you strategically solve this task (e.g., based on your eye
movements)?
• Exercise 2.2.2.2: Draw a Hermann grid (Figure 2.10) and check the
impact of different colors, can you observe any difference depending
on the color effect? Describe your findings!
The human users typically have different experience levels which make a
general claim about the usability of a visualization tool a difficult endeavor.
We might have experts or nonexperts, young or old people, visually disabled
people, and more groups of study participants with certain properties might
occur. All those factors must be checked beforehand to understand what
caused certain issues when trying to solve tasks and to make decisions, for
example, when interacting with a dashboard. In visual analytics, this situation
and the role of the human users get even more complicated since such systems
are most powerful if an interplay between humans and machines is guaran-
teed, however, the decision-making is still partially on the humans’ side. The
big challenge when using visual analytics is to build, confirm, reject, or refine
hypotheses [139] that focus on answering one or several research questions
by means of visually and algorithmically exploring data of any data type,
homogeneous or heterogeneous data, structured or unstructured data, small
or big data, and the like. Such user studies might run over several weeks of
time, as some kind of longitudinal study, maybe splitting the visual analytics
tool into several components that can or must be researched separately to
avoid blowing up the study design due to an otherwise huge parameter
space demanding for many study setup variations and hence, a really large
number of study participants to cover all possible setup possibilities. From a
study type perspective, there are various options, typically depending on the
research questions under investigation, for example, controlled versus uncon-
trolled studies, small population versus crowdsourcing studies, field versus
lab studies, standard versus eye-tracking studies, expert versus nonexpert
studies, and many more.
Exercises
• Exercise 2.2.3.1: What are typical challenges for the task of recruiting
people for a user study? How can we tackle those challenges to get as
many study participants as possible?
• Exercise 2.2.3.2: Describe the benefits and drawbacks of controlled
versus uncontrolled user studies.
the human’s visual system [115, 245] to detect visual patterns that can be
mapped to hidden data patterns [65]. But in many cases, the data cannot just
be visualized. It has to be transformed and processed by efficient algorithms
(see Section 2.1) to bring it into a format that can be graphically represented
to reflect those patterns. For example, if we are interested in temporally
aggregated data, we might first compute the daily values from the hourly
values, and then, as a second step, we visualize those aggregation results [4].
Without the aggregation step, it is difficult or impossible to visually solve
the task of identifying a daily evolution pattern in the data. Also, the task
of finding group structures in a dataset, for example, in a relational dataset
like a graph or a network [194], is typically not solvable by visualizing the
data in its raw form. A clever clustering algorithm [3] might compute such
group and cluster structures beforehand and then, as a second step, visualize
the clustering results [238]. However, no matter how powerful an algorithm
is, it mostly produces another kind of dataset, from a given input dataset,
that is too complex to understand it without a visual depiction of it. For
a dashboard, it can become a problem if certain inefficient algorithms are
included in the data analysis and visualization process, since they can cause
some kind of delay in the data exploration. In some cases, the reason is just a
wrong implementation of such algorithms, but in many cases, it could also be
the case that the algorithm itself falls into a class of algorithms that has a high
runtime complexity per se. Such NP-hard problems create algorithms that are
not able to rapidly find an optimal solution to a data problem at hand. We need
a heuristic approach to the algorithm that does not compute the optimum but
a local minimum or maximum instead. Examples of such NP-hard problems
are the subset sum problem [148] or the traveling salesman problem [204]
among many others.
Applying algorithms in visualization typically means waiting for the
results of an algorithm, starting with inputs and producing outputs that are
then visualized. Another challenge is to explore the algorithm during its
runtime [62], maybe to understand why it caused a wrong result or why it
is not well performing. The algorithm itself is then of interest as a dataset.
It is not treated anymore as a black box but we more or less open this black
box to look inside, to understand what is going on, step-by-step. This can
be as simple as understanding how a sorting algorithm works [35] or how
a shortest path is found in a network [62], for example, how a Dijkstra
algorithm is walking from node to node via edge to edge in a network.
The steps taken produce a complex dynamic dataset, typically focusing on
a basic dataset like a graph/network (Dijkstra algorithm) or a list of quantities
38 Creating Powerful Dashboards
(sorting algorithms). If the basic dataset is even more complex, for example,
a neural network for which we are interested in how the weight function is
modified to find a suitable model in the network we run into a challenging
visualization problem due to the sheer size of modifiable parameters in such
a network. This example brings into play a relatively new field of research
denoted by explainable artificial intelligence (XAI) [164]. From a visual
depiction of such dynamic processes, we have two major concepts which are
animation or static representations of the dynamic data [233].
Exercises
• Exercise 2.2.4.1: Imagine you have 5 (not sorted) natural numbers. Find
a visual representation of those numbers and present the intermediate
steps of a sorting algorithm applied to those 5 numbers.
• Exercise 2.2.4.2: What is better for visualizing a running algorithm?
Animation or a static representation of the intermediate steps. Discuss
the benefits and drawbacks of each concept.
option to avoid link crossings that cause visual clutter [202] if too many of
them occur. This example shows that there is a multitude of combinations
of visual variables, all focusing on providing a visual encoding of the given
dataset that is powerful to support tasks at hand.
Not only quantitative or relational data provide a basis for visualization
candidates. Also hierarchical, multivariate, textual, and many more data types
exist, even in combination, making the choice of suitable visualization tech-
niques limited, but also offering the opportunity to combine and link various
visual variables to the visual output that someone desires. An even more
challenging aspect of data visualization comes from the fact that nearly any
part of a dataset might have an inherent temporal behavior [4]. This means
that the data is not stable or static, but it is dynamically changing over time.
This dynamics in the data brings into play comparison tasks, that is, data
analysts are typically interested in exploring if there is some kind of trend in
the data like a growing or decreasing behavior. In many cases, it is not a good
idea to just use the visualization candidate for the static data and put it next
to each other, one for each time step, to show the dynamics in the data. Such
a small multiples representation [64] is easy to implement but suffers from
visual scalability issues, and even more, the visual comparisons can become
tricky because the visual observer has to move from one snapshot to the next
one to spot the differences over time. However, still, many time steps can
be seen in one view, which is much different from an animation of the time-
dependent data [233]. In many scenarios, the visual metaphor for the dynamic
case is completely different from the one used for the static case of the same
data type.
In this section, we are going to explain various visualization techniques,
each falling into a certain category that is given by the data to be visu-
alized. Simple data types are discussed in Section 2.3.1 while graphs and
networks are the topic in Section 2.3.2 followed by a section on hierarchies
(Section 2.3.3). Visualizations for data that exist in a tabular form, that is,
multivariate or hypervariate data, are described in Section 2.3.4. Trajectories
and possible visualizations for them are explained in Section 2.3.5, while
textual data and its visual encodings are described in Section 2.3.6.
the right visual metaphor with the right visual variables. For example, if a
dataset consists of five quantitative values and we want to compare those
visually, we might choose a so-called pie chart that encodes each value
proportional to an angle that spans a certain circle sector with an inscribed
area. This visual variable is actually the problem here with this radial kind
of visual metaphor [83]. Another visual variable, for example, the length or
the position is much better for visualizing quantitative values if the task is to
visually compare those values [73]. This aspect has been known for a long
time already, but still we can find pie charts in newspapers and magazines
for illustrating the results of an election for example. In most cases, the
designer of such a pie chart typically starts adding the percentage values
as textual labels to each of the circle sectors. This additional information
should mitigate the challenging situation of judging the values by areas of
circle sectors but why is visualization required at all if we start reading the
labels instead of looking at the visual variables that should help us rapidly
finding patterns. Actually, the only task that pie charts might support is the
so-called part-to-whole relationship, that is, showing how much each value
adds to 100%. In scenarios in which we have, let’s say, more than five
quantities, we might also run into problems when judging the small values
but even more, if the pie chart is rotated we might get problems for judging
how large a value is added to the 100% even if only a few values exist in
a dataset.
Looking at the example visualization in Figure 2.11, we can see that four
quantities are visually represented as circle sectors with different areas (and
angles). Additionally, the textual labels representing the percentage values
help to solve a comparison task, but what would happen if we let away those
labels? For the light red sectors it might be easy to judge and compare their
sizes reliably but the darker red sectors differ in size only a little bit (just
1.8% difference), making it perceptually hard to explore them for their size
difference with a pie chart. On the other hand, visualizing the same dataset
as bar charts make a big difference in the response time and accuracy for
the task of comparing the values for their sizes, in case we conducted a
comparative user study. The difference comes from the visual variables in
use. In the pie chart, the visual variables angle and circle sector area are
used, which make it perceptually more difficult to solve this comparison
task than the visual variables used in the bar chart which are bar length or
even just the position of the tallest point of each bar. The phenomenon of
having various options for visualizing data can be found in nearly any visual
encoding of a dataset. Finding out which visualization is best for the task at
2.3 Examples of Visualization Techniques 41
Figure 2.11 A pie chart is one way to visualize quantities, but a bar chart makes it easier to
compare the values due to the fact that it encodes the quantities in the bar lengths instead of
the circle sector angles [73].
hand can be done by conducting a user study, varying the visual variables as
independent variables and measuring the response time, accuracy, or even eye
movements [44] as dependent variables. However, this generates another kind
of dataset, in the case of eye movements, a spatiotemporal dataset for which
advanced visualization techniques and algorithmic concepts are required to
identify patterns [8].
Apart from quantitative data, we can also look at ordinal data for which
an order of the individual data elements is required. Such an order is typi-
cally visually encoded by the position on the display, for example, showing
the bigger ones on top and the smaller ones at the bottom of the display.
Categorical data describes the fact that data elements might belong to one
or several categories or classes. In a visual encoding, such categories are
typically shown by a grouping effect using the Gestalt law of proximity or
similarity, sometimes even visually drawing borders around visual elements,
for example, when applying a clustering algorithm that does not create clear
group structures but produces some group overlaps. Hence, another visual
variable has to be used to indicate the groups and subgroups.
42 Creating Powerful Dashboards
Exercises
• Exercise 2.3.1.1: Imagine you have counted the number of cars and their
brands crossing a certain measurement station at a motorway. Design a
bar chart that shows the number of cars per brand.
• Exercise 2.3.1.2: If you have additional time information, for exam-
ple, hourly, daily, or weekly measurements. How would you design
a diagram that lets you compare trends of such numbers over
time?
(a) (b)
Figure 2.12 Two different ways to visually encode relational data while the edges of the
graph have directions and weights, also known as a network [106]: (a) A node-link diagram.
(b) An adjacency matrix.
a few [125]. Adjacency matrices, on the other hand, represent each vertex
twice in a row and column of a matrix while the weighted edge is visually
encoded as color-coded cell at the intersection point of the corresponding
matrix row and column (see Figure 2.12(b)). The benefit of such matrices is
that they scale to millions of vertices and edges since they do not produce link
crossings and can even be drawn in pixel size; however, reading paths from
such a representation is challenging, even impossible [106]. But identifying
clusters can be done easily, in case a matrix reordering algorithm has brought
the matrix into a good structure beforehand [20], typically requiring advanced
algorithms with high runtime complexities. Node-link diagrams are good at
showing paths in a network but, on the other hand, they suffer from visual
clutter [202] if too many links are crossing each other. There are even further
visualization techniques for graphs, for example, adjacency lists [121], but
also combinations from node-link diagrams and adjacency matrices are imag-
inable and might have their benefits for certain user tasks. Famous examples
of such hybrids are MatLink [118] or NodeTrix [119].
Exercises
• Exercise 2.3.2.1: Think about your own social network, for example,
Facebook, Twitter, or LinkedIn. How can you visually represent with
whom you and the others from your network are connected/connected
most?
44 Creating Powerful Dashboards
Figure 2.13 A hierarchy can be visualized as a node-link tree with a root node, parent nodes,
child nodes, and the nodes on the deepest level being, the leaf nodes.
Hierarchies [211, 212] can come in two general forms, either as some
kind of containment hierarchy or based on the principle of subordination. A
containment hierarchy leads, as the name expresses, to containing elements.
The most popular example of this is probably a file system in which files
are contained in subdirectories and again contained in other subdirectories,
actually everything is contained in a root directory. Also, geographic regions
might be considered as some kind of containment hierarchy. Regions are
contained in countries, countries in continents, and all continents belong to
the earth. On the other hand, if we look at family hierarchies [63] composed
of grandparents, parents, and children, we are confronted by the principle of
subordination which also exists in a company structure or a league system,
for example, the football leagues which consist of several levels depending
on the professionality and strength of the teams.
2.3 Examples of Visualization Techniques 45
(a) (b)
(c) (d)
Figure 2.14 At least four major visual metaphors for hierarchical data exist coming in the
form of: (a) A node-link diagram. (b) An indented plot. (c) A stacking approach. (d) A nested
representation (in which only the leaf nodes are shown).
Exercises
• Exercise 2.3.3.1: Create a family tree of all the people from your own
family like father, mother, grandfathers, grandmothers, sisters, brothers,
and so on.
• Exercise 2.3.3.2: Design a good visualization for the hierarchical file
system on your computer. How do you show the file sizes and the file
type in the hierarchy at the same time?
(a) (b)
Figure 2.15 Visualizations for multivariate data: (a) A scatter plot matrix (SPLOM). (b) A
parallel coordinate plot (PCP).
Visualizing such data is challenging, but there are some prominent visual
encodings like histograms [185] (for univariate data), scatter plots [157]
(for bivariate data), scatter plot matrices (SPLOMs) [78], parallel coordinate
plots [130], or glyph-based representations [138] (for tri- and multivariate
data) (see Figure 2.15). Scatter plot matrices are based on simple scatter
plots and allow comparisons between all pairs of attributes as long as there
is enough display space to show all the individual scatter plots of the scatter
plot matrix. Parallel coordinate plots use parallel vertical axes to show the
attribute values and polylines in-between. Those plots only show subsequent
axis comparisons and typically suffer from visual clutter [202] caused by
line crossings. Finally, glyph-based representations only show one glyph per
case and make comparisons impossible; hence, correlation tasks are more
difficult to solve than in scatter plot matrices or parallel coordinate plots in
which the individual lines are integrated into the same diagram, and this is
not the case in classical glyph-based visualizations like Chernoff faces [72],
leaf glyphs [99], or software feathers [17].
Exercises
• Exercise 2.3.4.1: Compare typical visualizations for multivariate data
like parallel coordinates, scatter plot matrices, and glyph-based repre-
sentations like Chernoff faces, software feathers, or star plots.
• Exercise 2.3.4.2: Imagine you have an Excel table with multivariate data
that is changing from day to day. Develop a visualization technique with
which we can visually explore changes and trends in the correlation
patterns.
48 Creating Powerful Dashboards
Figure 2.16 A static stimulus overplotted with a scanpath, that is, a sequence of fixation
points. The sizes of the circles typically visually encode the fixation duration, that is, how
long the eye fixated on a certain point in the visual stimulus.
2.3 Examples of Visualization Techniques 49
One prominent application field for trajectory data comes from the
research in eye tracking [87, 123]. An eye tracker [256] is a device that
records fixations of people’s visual attention and saccades between two con-
secutive fixations [56], that is, rapid eye movements. Each fixation can have
a certain fixation duration while the saccades in-between more or less rapidly
move between those fixations without acquiring any meaningful information
from the visual stimulus (see Figure 2.16). Also, bird or general animal
movements [146] are of particular interest for trajectory visualizations since
birds might travel far distances from one continent to another one due to
changing seasons, weather conditions, and the modifications in food offered
by mother nature. Biologists are interested in the birds’ traveling strategy to
understand how they generally behave, for example, whether they are exposed
to anomalies due to changes in their natural environments. The bird behavior
might give insights into effects that are hardly recognizable without such
trajectory data. There are various application examples in which trajectories
play a crucial role, however, visually exploring such spatiotemporal data
over space, time, and the objects, people, or animals involved in is a really
challenging task.
Exercises
• Exercise 2.3.5.1: Take into account your own moving strategy over one
day from starting the day until going to bed in the evening. Design a
trajectory visualization of such a dataset and add your own data to a
geographic map.
• Exercise 2.3.5.2: Why is it difficult to visually compare thousands of
trajectories over space and time? Can you imagine algorithmic solutions
for this problem that support the visualization?
text representations, with a word or tag cloud [51, 133] as one way to show
frequent words in a text corpus (see Figure 2.17). More complex ones use
pixel-based representations [142] to show the distribution of special text frag-
ments in a larger text corpus. For example, in source code of a larger software
project, one might be interested in the occurrences of special programming
language-specific keywords, for example, indicated by color coding as in the
SeeSoft tool [91] or in a triangular shape for code similarities [58].
Figure 2.17 A text corpus can also be split into words and their occurrence frequencies
while the common prefixes can be used to reduce the display space in use for showing a word
cloud, known as a prefix word cloud [51].
in two or more texts. However, the pure algorithmic solution is only half
as valuable if it is not supported by extra visual encodings applied to the
alignments. For this reason, color coding can be of great support to quickly
recognize similar subsequences and also anomalies, that is, subsequences that
are not fitting in the text structure.
In general, all of the data we described in this section can have a time-
varying, that is, dynamic nature. Visualizing the dynamics in the data is
of particular interest for researchers, for example, to set the current state
in context to the past, but also to learn from the past to predict the future.
The last idea could involve deep learning and neural networks to solve such
classification or prediction tasks reliably and efficiently. However, visual
depictions of the time-varying data [4] are still important, even if algorithms
have to process the data to generate structures and insights in it, maybe on
several temporal granularities.
Exercises
• Exercise 2.3.6.1: Imagine you have two different texts of a certain
length. How could you design a visualization that shows similar text
passages?
• Exercise 2.3.6.2: Could a matrix visualization be useful to compare two
or more text fragments? How can the matrix be extended to compare
more than two text fragments?
including the user interface with buttons, sliders, text fields, and so on, but
also the visual design that is required to create powerful and perceptually
useful visualizations for the tasks at hand. Those user interface components
as well as the visualizations have to be arranged in a user-friendly and well-
designed layout, in the best case a dynamic one, allowing the users to adapt
the layout on their demands. Moreover, interaction techniques have to be
taken into account that connect and link the individual components. This
means the interface components must be connected in a meaningful way but
even the interface components with the visualization techniques, as well as
the visualizations themselves in case they are shown as multiple coordinated
views [200] to provide insights on the visually encoded data from several
perspectives.
Figure 2.18 A hand-drawn graphical user interface composed of several views and perspec-
tives on a dataset (permission to use this figure given by Sarah Clavadetscher).
Creating such a user interface with all of its ingredients at the right
places requires in understanding some rules about prototyping, meaning
either drawing the dashboard by hand, as some kind of sketch or mockup,
or if the designer is familiar with external tools, the dashboard might even be
designed in a computer-supported style. However, drawing it by hand typi-
cally means more flexibility for the designer than using a computer program
2.4 Design and Prototyping 53
hence interpret the data in its raw form and try to interpret the data gaps,
asking questions about why the data elements are missing and in the best
case allowing to even fill the gaps with missing values. A picture of the data
is of great support to solve those tasks since a picture can say more than a
thousand words [88], in case it is designed in a proper and accurate way.
Hence, it is a good advice to use graphics whenever possible, be it for data
exploration or for presenting and disseminating the obtained results. Pictures
can even visually encode many aspects about data, like numerical values from
several attributes, in a very small display region, making it a visually scalable
approach.
Figure 2.19 A diagram that includes axis labels, scales, guiding lines, and a legend.
Even if we created a good diagram to show the data, there are very
important ingredients that one should never forget. For example, adding
labels at axes if there are some. Hence, this is important to set the data into
some general context. Such labels could express meta data like physical units
for example, or even numerical values for the scale in use, or even several
scales in use. It makes a difference if we inspect the diagram focusing on
meters or kilometers. Moreover, the scale should be including guiding lines
that do not occlude or clutter [202] the rest of the diagram. Such guiding lines
help the eye to solve comparison tasks, for example, when reading several
values in a diagram (see Figure 2.19 for a simple diagram following the
visual design rules). In general, diagrams need words to make them even
more understandable, but if too many words are used, this might again be
counterproductive. Such words or labels should be distinguishable and they
2.4 Design and Prototyping 55
should be readable, meaning choosing a good size and font style. If too many
visual variables are integrated into a diagram, and they might be unclear when
just looking at the diagram, legends should be placed next to the diagram
to explain the data-to-visualization mapping, for example for the values and
categories in use and which sizes, lengths, or colors are encoding the data
values. Color is one of the most applied visual variable in a visualization, but
picking the wrong color scales can lead to misinterpretations when trying to
interpret the data by looking at a visual depiction, for example, the often cited
rainbow colormap [32] could cause problems or colors that are problematic
for people who have color vision deficiencies or who are color blind [174].
Color perception [260] is a research field on its own. Similar rules hold
for scale granularities meaning values for minimum and maximum should
be derivable from the legend. All in all, the storytelling is one of the most
important issues when designing a good visualization. A diagram should be
readable just like a good book, following a red line, chapter by chapter, with
a final Aha effect.
Exercises
• Exercise 2.4.1.1: Create a scatterplot for bivariate data with labels, axis
descriptions, and scales, together with guiding lines for the scales.
• Exercise 2.4.1.2: Can you create a diagram that includes more than one
scale on one axis but that is still usable and readable?
Figure 2.20 Three of the major design problems come in the form of chart junk, the lie
factor, and visual clutter.
understand the data or if the data values do not allow to encode them in a
visual variable in a proportional data-to-visualization manner, maybe due to
the lack of display space.
Exercises
• Exercise 2.4.2.1: Find a visualization or a diagram on the web that
suffers from visual clutter, chart junk, and/or the lie factor. Describe in
which form the visual design problems can be found in the visualization
or diagram.
• Exercise 2.4.2.2: Given a sequence of natural numbers v1 , . . . , vn .
Design a diagram that visually encodes such values with and without
an explicit lie factor.
Figure 2.22 A hand-drawn mockup of a graphical user interface (permission to use this
figure by Sarah Last).
help preserving the mental map when finding user interface components
easily and rapidly, to reduce the cognitive efforts in case we have to start
searching for the important things all the time. The eight golden rules of
user interface design [217] describe a good way to follow the most important
principles, just like a checklist, when thinking about a user interface, even
in the phase of creating a sketch or a mockup already. The challenging
issue with a user interface for visualization comes from the fact that it is
not only dependent on those eight rules, but even more on the design and
interactive functionality of the integrated visualizations. It may be noted that
there are various design rules when it comes to designing user interfaces,
in particular, graphical user interfaces (GUIs) but many of those rules are
also dependent on the application scenario. For example, designing a user
interface for medical applications [75, 76] is much different from one for
fraud or malware detection [67], but there is some kind of common design
rule set that holds for any kind of user interface. Finally, a user evaluation with
or without eye tracking [8, 152, 153] is required to find hints about usefulness,
user-friendliness, and performances in terms of response times, error rates, or
even eye movements [26] with additional physiological measures [25]. Also,
the qualitative feedback from the users in form of verbal interviews, think-
aloud, or talk-aloud, even gestures, can be of great support for the designers,
in particular for visual analytics tools [27].
One of the first criteria is to keep the design consistent. This holds for
the color coding, for the shapes, for the presentation speed when using
animations [233], actually for any kind of visual component that includes
some kind of visual feature. Also certain actions and interactions should be
consistent for similar scenarios, for example selecting a visual element should
always work in the same way, no matter which kind of visual component
the selection interaction actually is applied on. Consistency is important
since it reduces the cognitive efforts [231], that is, users do not have to
rethink again and again for the same or similar processes, meaning the mental
map is somehow preserved [9, 10] during a visual exploration strategy. An
identical terminology should also be chosen for labels or textual output that
is produced in the interface, but not too much text should be shown to
avoid an information overload. Menus, descriptions, error messages, and the
like should follow the same rules, for example font sizes, font faces, font
types, and the like. Also the layout in the components and subcomponents
should be the same, even the borders or distances of the components to
each other, exploiting the Gestalt laws of proximity and similarity [147]. The
universal usability is also of interest, for example, a user interface can be used
60 Creating Powerful Dashboards
Exercises
• Exercise 2.4.3.1: In a user interface we oftentimes find so-called
progress bars to show us how long we must wait until a process is
2.4 Design and Prototyping 61
Exercises
• Exercise 2.4.4.1: Describe and discuss the benefits and drawbacks of
designing a user interface by hand, including visualization techniques.
• Exercise 2.4.4.2: Draw a user interface for visually depicting social
networks consisting of people who are related to some extent.
2.5 Interaction
Without interaction a visualization would just be a static picture that can,
in its static form, be powerful as well but awaking it to life by allowing
interactions [258] provides many more opportunities to dig deeper into
the visualized data, to navigate in it, to modify views, to filter the data,
and to inspect the data from several, even linked, perspectives [200, 222].
Interactions do not only depend on the used visualization techniques, also
on the displays and the experience levels of the users, also on the fact if
the users might suffer from visual, perceptual, or physical disabilities. For
example, interacting on a small-scale display when using a smartphone is
much different than interacting on a medium-scale computer monitor while a
large-scale powerwall display [210] even allows walking around during inter-
actions. There is no best display for interactions, each of them has its benefits
and drawbacks and requires suitable technologies to make the implemented
interaction techniques run smoothly, for the task at hand. Not all interaction
modalities like gaze, touch, mouse, keyboard, gestures, and the like, can be
applied to any kind of display, for example, on a small-scale smartphone
display it is more likely to interact by touch than by using a computer mouse.
Moreover, on a large-scale powerwall display, it is beneficial to allow gesture,
gaze, or body motion interactions than relying on touching the powerwall
with one’s fingers. Touch means standing very close to the display which, on
the other hand, would mean walking around a lot in front of the powerwall,
with a high chance to miss important details due to a lack of overview.
Even more advanced technologies like virtual, augmented, or mixed
reality can bring new challenges for interaction techniques. In particular, the
field of immersive analytics [173] demands for a combination of interaction
modalities, also requiring to be applicable on many linked displays, maybe
with various users in front of those displays [80], with the goal to visually and
algorithmically allow explorations and analyses of data from a multitude of
application domains. Not only the ingredients directly related to interaction
like the displays, modalities, the users, or the linking of the user interface
64 Creating Powerful Dashboards
components are crucial ingredients, also the data processing and transforma-
tion, running in the background build a huge and crucial part of a visualization
tool. If the data structures and algorithms are not properly chosen and imple-
mented, interactions cannot run smoothly and quickly, hence the interactive
responsiveness would suffer from the badly designed algorithmic approaches
when it comes to data handling like storing, accessing, and manipulating it,
either offline or online as a real-time data visualization. We argue that creating
a visualization tool or a dashboard for data exploration and data analysis
is some kind of interdisciplinary field that requires expert knowledge in
many related disciplines like visualization, interaction, user interface design,
perception, but also in data structures and algorithms, programming, software
engineering, and many more. Making design mistakes in any of such related
disciplines can cause performance issues that might make a visualization tool
unusable.
In this section we describe major interaction categories that we can find
in nearly any data visualization tool (Section 2.5.1). These interactions can be
combined in various ways, typically depending on the user tasks. Moreover,
Section 2.5.2 illustrates which kind of modalities exist and in which scenario
they might be the best options to integrate with a visualization tool. The
most important ones might be given by gaze, touch, mouse, keyboard, or
gesture. Also the display on which a visualization tool should be shown
plays a crucial role (Section 2.5.3), not only for the visualizations alone
but also for the interaction techniques and interaction modalities. Finally,
data can be explored best if it is shown from several perspectives, bringing
multiple coordinated views into play. Those are described in more detail in
Section 2.5.4.
Figure 2.23 On a visualization depicting value changes over time, we can select a certain
point, for example, to get detail information or to further use the selected data point in the
exploration process.
around, that is, to change the view, for example, when scrolling or panning in
a visualization that does not fit on the display. This interaction category helps
when an overview cannot be provided in one view. Reconfiguring describes
the effect of changing a visualization to make it usable for a certain task
that could not be solved without the change. For example, adjusting visual
elements to a common scale to make them comparable would be a useful
feature, maybe using a baseline adjustment technique. Encoding actually
allows to modify the visual variables to get a different perspective on the
data while abstracting means to show more or less detail, like in zooming
techniques. In cases in which only parts of the data are shown, maybe only
those parts that follow a certain user-defined condition, we talk about filtering.
Finally, connecting describes the way we link views in a visualization tool,
with the goal to inspect the data from several perspectives at the same time,
that is, simultaneously, for example, in a multiple coordinated view [200]
described in Section 2.5.4.
Exercises
• Exercise 2.5.1.1: How would you design an interaction technique for
selecting one point, several points, connected regions, or points in a
previously selected region?
• Exercise 2.5.1.2: How would you design an interaction technique to
select a pixel, a group of pixels, a line, or a group of (possibly
intersecting) lines?
Figure 2.24 Interacting by using a computer mouse is one of the standard interaction
modalities for visualization tools displayed on a classical computer monitor.
since it does not cover that much information than a human finger would
do. Negatively, we can identify the problem of having a mouse not directly
connected to the visualization system, that is, we have to understand the
properties of a mouse on the desk first before we can apply it to the computer
monitor. Touching with the finger is much more natural but it also brings new
challenges into play, apart from the covering effect it creates some indirect
body-to-body touch effect between human users, for example in cases in
which many people use the same service like a ticket machine placed in
a train station. This might cause negative issues, in particular during the
COVID-19 pandemic, trying to avoid as many human-to-human contacts
as possible [240]. Sure, we cannot ask people to bring their own computer
mouse but we could integrate other interaction modalities like gaze-based
interaction or speech recognition, however speech might be a problem in a
noisy background like in the scenario of a ticket machine in a crowded train
station and gaze causes problems related to the Midas touch problem and
technological issues related to fixation accuracy.
Exercises
• Exercise 2.5.2.1: What are typical scenarios in a visualization tool that
might be good candidates for using speech recognition as an interaction
modality?
• Exercise 2.5.2.2: Imagine you have a visualization tool in which gaze-
based interaction is integrated. What could be a challenging problem
here? Hint: Midas Touch problem.
2.5.3 Displays
Each visualization tool must be presented somewhere, meaning a certain kind
of display [165] is required to let the users see where they can apply an
interaction for example and which impact such an interaction will have on the
diagrams but also on the visual components of the user interface. There are
various ways to display the visualizations that have been created, typically
depending on the tasks to be solved and which visualizations are finally
integrated into a visualization tool. For example, if many users are required
we should create a web-based visualization tool that runs on a mobile phone,
possibly being able to recruit many people since many of us own a mobile
phone, even allowing crowd-sourcing user experiments [197]. However, the
display itself is much smaller than the one of a standard computer, hence
2.5 Interaction 69
Figure 2.25 Showing a geographic map on a large-scale display while the observer is
equipped with an eye tracking device for either exploring where he is paying visual attention
or for using the eye tracker as gaze-assisted interaction (figure provided by Lars Lischke).
the visualization tool itself must be designed in a different way than the one
designed for the standard computer monitor. If many users have to explore
a dataset visually at the same time, it might be a good idea to use a large-
scale display [166], that is, a powerwall [210], allowing many people to
collaboratively work on similar data analysis and visualization problems.
A large display can also be useful for one observer, in cases in which an
overview has to be given with many small integrated details (see Figure 2.25
for a geographic map on a large-scale display). The biggest issue here might
be to merge the different findings of all the collaborators to find a common
result, maybe in form of visual patterns that graphically model data patterns.
The display plays a crucial role during the design but also the implementation
phase. Large-, medium, and small-scale displays can make a difference not
only for the visual and interface design but even more for the interaction
design. Not every interaction technique that is applicable on a computer
monitor can be applied in the same way on a mobile phone or on a powerwall.
Whether or not an interaction modality makes sense and is useful on a
certain type of display depends on several aspects, also on the environment
like noise in the background making speech more difficult to be applied,
70 Creating Powerful Dashboards
but there are some scenarios in which it is clear that a certain setup is not
meaningful, for example using a computer mouse on a powerwall display
(see Table 2.1 for a general overview about meaningfulness).
Table 2.1 Displays for which standard interaction modalities make sense or not: (++) very
meaningful, (+) meaningful, (o) not clear, (-) not meaningful, and (–) not meaningful at all.
Display types and interaction modalities
Small-scale Medium-scale Large-scale
Mobile phone Computer monitor Powerwall
Mouse – ++ –
Keyboard + ++ –
Joystick – + -
Touch ++ + +
Gesture - o ++
Speech + + ++
Gaze + + +
Exercises
• Exercise 2.5.3.1: Discuss the differences of the usefulness when integrat-
ing interaction modalities like touch, gaze, mouse, keyboard, joystick, or
gesture into different types of displays like small-scale displays (mobile
phones), medium-scale displays (computer monitors), and large-scale
displays (powerwalls).
• Exercise 2.5.3.2: Which kinds of displays are most useful for visualiza-
tion tools, that is, dashboards? Discuss benefits and drawbacks.
A big challenge for multiple coordinated views comes from the fact that
the data handling has to keep all of the views up-to-date, that is, a certain
control mechanism has to run in the background that updates all of the views
when just one gets changed. Such a model-view-controller architecture can
be quite useful in such situations. The controller keeps track of the changes
and sends updates to keep the visualization tool consistent in all of the
perspectives and views. This can also include user interface components not
just the views in the visualization tool. For example, updating a visualization
could also cause to modify or update the visual appearance of buttons or
sliders, in cases in which the range of values got changed based on a user
interaction it would make sense to also avoid that the users can select the
72 Creating Powerful Dashboards
wrong value ranges in cases in which the range sliders would not have been
updated. Views should not be changed abruptly since this would destroy the
users’ mental maps [150], hence in case changes have to be made to one or
several views, those should be done smoothly, for example including some
kind of smooth animation [242] to allow users keeping track of the changes,
a typical scenario in which arrangements of visual elements have to be made,
maybe caused by a new ordering or alignment strategy.
Exercises
• Exercise 2.5.4.1: Which role plays the data handling running in the
background when interacting in multiple coordinated views?
• Exercise 2.5.4.2: How many views can be integrated in a visualization
tool at the same time? Discuss.
3
Python, Dash, Plotly, and More
There are various ways to implement a visualization tool, even in the form of
an interactive dashboard [255]. The focus of this book is on the program-
ming language Python [160], combined with Dash and Plotly which will
be described in detail in the following sections. Python is a popular high-
level programming language with a specific focus on code readability by
making use of mandatory indentation rules. It supports a certain number of
programming paradigms, typically the functional and object-oriented ones.
One of the great benefits of designing and implementing visualization tools
as dashboards in Python is the fact that the created tool can be made publicly
available in an easy way by deploying it on a server, hence making it
accessible for a number of people all over the world [201]. This again requires
a user-friendly design solution that takes into account the various differences
in spoken and written languages, cultures, signage, and the like, which is
also reflected in the eight golden rules for designing user interfaces (see
Section 2.4.3) [217]. Moreover, other design rules, focusing on the visual
design (see Section 2.4.1) [232], are also crucial ingredients when building
such web-based solutions for data visualization tools.
Including the important aspects from the field of visualization, visual
analytics, interaction, algorithmics, and the many related disciplines of this
interdisciplinary topic [144], we are now prepared to learn about the concepts
required to actually start building a tool [172], once the design phase has
been completed. This does not mean that the design phase is really over.
In many scenarios, we still learn about the usefulness of a certain feature
when it is really applied in the running tool or even when we think about
it again in a discussion, and hence, there should always be an option to
redesign what we have created before (at least partially), until we and our end
users are confident with the results [214]. This actually brings into play user
evaluation [152], that is, the users can either be on board during the design
phase and even implementation phase or they can test the final product, that is,
73
74 Python, Dash, Plotly, and More
after it has been completed, based on the design criteria and requirements that
we got so far. This again means that starting with an original sketch, mockup,
or prototype (Section 2.4.4), we are able to modify this prototype based on
user interventions until all involved parties are confident with the result. To
reach this goal of a running tool, we actually provide the major ingredients in
this chapter before we discuss code examples in the programming language
Python in its own chapter (see Chapter 4).
First of all, we introduce the necessary technologies, programming lan-
guages, and libraries (Section 3.1) like Python, Dash, and Plotly, as well as
further ingredients and concepts, before we move to important installations
and options to actually get started to efficiently and effectively develop and
implement what we have designed (Section 3.2). Here, we look into differ-
ent modes like the interactive one, including the Jupyter Notebook mode,
and the integrated development environment (IDE) mode. The interplay
between all of the formerly described implementation concepts is illustrated
in Section 3.3 with the subconcepts of data reading and parsing, data trans-
formation, Dash core components, Dash HTML components, cascading style
sheets (CSS), Plotly, and callbacks that more or less build the interface
between the visualization techniques and the user interface, that is, the
dashboard. The web-based solution is described in Section 3.4, with several
options to get it running online.
3.1.1 Python
The programming language Python already exists for quite a while, and
it was developed in the late 80s by Guido van Rossum while 1991 it got
released as version 0.9.0 [236]. Python 2.0 and 3.0 followed in the years
2000 and 2008, respectively, including further improvements and extensions.
During the writing of this book, Python 3.10.4 and 3.9.12 were available.
Python is considered a high-level programming language that can be applied
in various application domains, with data science [223] as one of the major
ones in these days. Popular features of the language are the use of explicit
indentation to make the code more readable and maintainable. For example,
Python also avoids many opening and closing parentheses due to its indented
code structure. The type system of Python is described as being dynamic,
meaning data types do not have to be explicitly specified as in other pro-
gramming languages like Java or Pascal. Moreover, several programming
paradigms are integrated into Python with functional and object-oriented
styles, as being the most obvious ones. Also, the procedural, aspect-oriented,
or logic programming paradigms can be found here and there. Libraries
can be imported to extend the degree of functionality a program can under-
stand, ranging from classical algorithmic data processing libraries to graphics
libraries and many more. Given the fact that Python is frequently used
from data scientists, it is typically considered as one of the most well-
known languages with a large programmer community, eager to help when
programming issues occur that cannot be solved by one’s own knowledge
and experience. However, Python code is typically quite clear and reduced
to a minimum (see Listing 3.1 for an example consisting of a few lines of
Python code).
76 Python, Dash, Plotly, and More
1 c o l o r s = [ " red " , " blue " , " green " , " yellow " ]
2 c a r s = [ "BMW" , "VW" , " Mercedes " ]
3
4 for x in colors :
5 for y in cars :
6 print (x , y)
Listing 3.1 A code example for a running Python program printing 12 pairs of colors and
car brands in code line 6.
Exercises
• Exercise 3.1.1.1: Find other programming languages in which a dash-
board or a visualization tool can be implemented. What are typical
libraries required to create a visualization tool?
• Exercise 3.1.1.2: What are typical negative issues when using Python for
creating a visualization tool?
3.1.2 Dash
Dash does not cost anything, is available as open source, and is created
by the company Plotly as a framework to build web applications, typically
with a focus on data analysis, visualization, and visual analytics tools. The
major programming language which is supported in Dash is Python but also
other ones like R or Julia are imaginable. Dash is actually created on React,
which describes a well-known web framework in the programming language
JavaScript. Moreover, it is also based on Flask, which is a well-known web
server focused on Python. Before working with Dash, we have to make some
installations, for example, when using conda (see Listings 3.2 and 3.3). In
cases, Anaconda is not already installed, we refer to Section 3.2.
1 p i p i n s t a l l dash
Listing 3.2 One way to install Dash on your computer.
1 conda i n s t a l l dash
Listing 3.3 Another way to install Dash is by using conda. Make sure that conda is already
installed.
A very simple code example will generate our first application (line 4 in
Listing 3.4) after having imported Dash and the Dash HTML components
that are required in the layout of our webpage, given by app.layout in line 6.
At the moment, this just contains a headline in H6 HTML size saying "Hello
3.1 General Background Information 77
World in Dash" but in the future, this is the place in which we can integrate
many more HTML features, just like in the case when structuring a webpage
like our own homepage. Finally, in line 9 of Listing 3.4, we will start the
server.
1 import dash
2 from dash import html
3
4 app = dash . Dash (__name__)
5
6 app . l a y o u t = html . H6( c h i l d r e n=" H e l l o World i n Dash" )
7
8 i f __name__ == "__main__" :
9 app . r u n _ s e r v e r ( debug=True )
Listing 3.4 An application showing how to create a simple layout and to start the server with
further required ingredients such as needed imports at the beginning.
Exercises
• Exercise 3.1.2.1: Modify the code example in Listing 3.4 to show a much
larger text saying, "Hello, now the headline is bigger."
• Exercise 3.1.2.2: Why is HTML alone not the best choice for creating a
user-friendly and aesthetically appealing dashboard?
1 import p l o t l y . e x p r e s s a s px
2
3 d f = px . data . t i p s ( )
4
5 f i g = px . bar ( df , x=" smoker " , y=" t o t a l _ b i l l " , c o l o r=" t i p " )
6 f i g . show ( )
Listing 3.5 A simple example of code for creating a bar chart in Plotly from the tips dataset
with extra categories like "smoker" versus "nonsmoker" and color coding based on tips.
Plotly Express comes with a lot of benefits but on the negative side
it also has to deal with problematic issues. On the beneficial side we can
mention that each plot can be built with just one or a few lines of code,
just parameters, attributes, and flags have to be adjusted to obtain the desired
functionality and the visual variables of interest like a specific color coding,
certain shapes, or sizes, and the like (see Section 2.2.1). Moreover, the
generated plots are already equipped with interaction techniques ranging
from selection, zoom and filter, or details-on-demand (see Section 2.5.1).
Even animated diagrams [84, 233] can be created for a certain variable in use,
for example, a time attribute or any other attribute that is given with different
values or value categories. The Plotly Express world would be wonderful
if it had not some major flaws that might make someone think about using
other options for creating interactive graphics in Python or even in a totally
different programming language. One big negative issue comes from the fact
3.1 General Background Information 79
Figure 3.1 The result when executing the code in Listing 3.5. A color coded bar chart
distinguishing between two categories smokers and nonsmokers as well as different tips for
total bills.
that Plotly Express does not support all possible features that one desires.
Although color coding works, for example, it can be a disaster if someone
wants to assign exactly the same colors to pre-defined categories each time
a plot is created again and again. Also for the zooming feature there is no
way to solve the focus-and-context or overview-and-detail problems [195] as
other advanced visualizations typically do. Plotly Express is mostly used by
data scientists who need a quick visual support to their data science problems
at hand, hence it is more focused on exploratory data analysis, missing many
features that visualization or visual analytics experts would require for their
data analyses.
Exercises
• Exercise 3.1.3.1: Modify the code in Listing 3.5 to show male versus
female instead of smokers versus nonsmokers. The attribute for this is
called "sex" instead of "smoker." Visually explore the created diagram.
• Exercise 3.1.3.2: Modify the code in Listing 3.5 to show the tips on the
y-axis and the total bill in the color coding of the diagram. Compare the
new plot with the one in Figure 3.1.
80 Python, Dash, Plotly, and More
(a) (b)
(c) (d)
Figure 3.2 Several graphics libraries for creating diagrams in Python: (a) Plotly Express.
(b) Matplotlib. (c) Seaborn. (d) Bokeh.
while Plotly Express came on the market in 2019. Matplotlib is one of the
most used visualization libraries in Python, not only because it is quite old,
compared to the others but also because it supports interactions in a multitude
of simple diagrams like histograms, scatterplots, pie charts, and many more
(see Section 2.3). Seaborn actually makes use of the Python structures for
handling data such as pandas and numpy. Moreover, it also supports simpler
charts, for example for statistical approaches and results. ggplot is actually
based on an R implementation of ggplot2 while ggplot is also beneficial for
simple plots while at the same time allowing to integrate visual variables
like color, size, shape, and so on, however, the interactions are quite limited.
The specific application domain of geography is supported by geoplotlib that
offers many ways to depict geographical data in maps. Bokeh is also popular,
but its charts and plots are rendered by making use of HTML and JavaScript,
making it a good choice when creating web-based visual solutions.
Exercises
• Exercise 3.1.4.1: Create a scatterplot with each of the visualization
libraries. Which one do you think is the most aesthetically appealing
one?
• Exercise 3.1.4.2: Which of the diagrams from above allow interactions
and which kinds of interaction categories [258] do they support?
82 Python, Dash, Plotly, and More
Table 3.1 Finding Anaconda to get started in the desired operating system.
Anaconda in a specific operating system
Operating system How to find Anaconda?
Windows Start ⇒ Anaconda Navigator
Linux Terminal ⇒ anaconda-navigator
MacOS Launchpad ⇒ Anaconda Navigator
Exercises
• Exercise 3.2.1.1: Open a powershell or terminal and implement Python
code to experiment with this option. Discuss the benefits and drawbacks
of this option.
• Exercise 3.2.1.2: Try to modify your code from Exercise 3.2.1.1 several
times. What is the obvious problem here?
84 Python, Dash, Plotly, and More
Figure 3.4 Programming in Python in a powershell is one way to create, compile, and run
programs. Unfortunately, it comes with a list of negative issues.
Figure 3.5 The same code as in Figure 3.4 is illustrated here in a Jupyter Notebook.
Exercises
• Exercise 3.2.2.1: Start a Jupyter Notebook and extend the Python code
from above by changing the range of the for-loop to be between 5 and
25. Run the new code in the Jupyter Notebook.
• Exercise 3.2.2.2: Store the code in the Jupyter Notebook in a file and
find the file on your computer. Which file extension does it have?
Figure 3.6 Several important aspects around source code and source code quality.
86 Python, Dash, Plotly, and More
Exercises
• Exercise 3.2.3.1: Install the integrated development environments men-
tioned above, experiment with them, and create a list with benefits and
drawbacks for each IDE. Which one is the preferred one, and why is it
suitable for developing dashboards?
3.2 Installations and Options 87
3.2.4 GitHub
Exercises
• Exercise 3.2.4.1: Create a new dashboard project by using GitHub.
• Exercise 3.2.4.2: Invite some collaborators to your project who will help
you with coding the dashboard.
has to be read first since the data builds a certain core ingredient in each
data visualization tool (Section 3.3.1). Transforming the data is important
to bring it in the right data format but also in the right data structure to
derive data patterns (Section 3.3.2). In Section 3.3.3, we describe the Dash
core components, while in Section 3.3.4, we will look into the corresponding
HTML components that are required to layout and decorate the dashboard.
The cascading style sheets (CSS) to allow aesthetically and user-friendly
dashboards are discussed in detail in Section 3.3.5. Popular visualization
techniques are introduced in Section 3.3.6 with explanations about how Plotly
can be integrated into a dashboard code. Finally, we will talk about the very
crucial callback mechanism (Section 3.3.7) to create some kind of dialogue
between the users and the user interface with all of its components and
visualizations by allowing inputs/outputs in the user interface, modifying the
90 Python, Dash, Plotly, and More
Dash core components that can come in the form of menus, sliders, text fields,
and date pickers, but also in the typically visually more detailed diagrams,
charts, and plots that are also considered as Dash core components but those
are based on Plotly code in this book.
Table 3.2 Some rows and columns with attribute values serve as an example dataset for the
following code.
An example tabular data with several attributes
Name Gender Age Smoker Hobbies
Lucas Male 45 No Football, Tennis, Jogging
Emma Female 38 Yes Cooking, Swimming
Bob Male 52 Yes Baseball, Walking
Martha Female 32 No Hiking, TV
Roy Male 40 Yes Theater, TV
1 import pandas a s pd
2
3 d f = pd . read_csv ( " h o b b i e s . c s v " )
Listing 3.6 Reading a csv file containing people with some personal attributes.
3.3 Interplay between Dash, Plotly, and Python 91
As we can see in Listing 3.6, reading tabular data (like the example tabular
data in Table 3.2) is not a big issue in Python, in case, this table is given in
a csv format, that is, in a comma-separated-values format (stored as a file
with name "hobbies.csv"). We can make use of a Pandas DataFrame to read
the data file in exactly that format in just one line of code. It may be noted
that the correct file path must be specified in order to get positive reading
results. However, in this form, the data is still sleeping in a DataFrame and
has to be transformed and visualized, but fortunately, doing the plain vanilla
transformations and visualizations is also not very difficult, as we will see
later (again).
Exercises
• Exercise 3.3.1.1: Create a new table with rows and columns similar to
the given data table from above and read it by using a Pandas DataFrame.
• Exercise 3.3.1.2: Read an arbitrary text file, for example, a page from
a book. Also, read a file that contains an image from one of your last
holidays. Is there a difference between reading text and image files?
Table 3.4 An ordered matrix of zeros and ones, based on the matrix in Table 3.3.
An ordered matrix of zeros and ones
F B G A H D C E
F 1 1 1 1 0 0 0 0
B 1 1 1 1 0 0 0 0
G 1 1 1 1 0 0 0 0
A 1 1 1 1 0 0 0 0
H 0 0 0 0 1 1 1 1
D 0 0 0 0 1 1 1 1
C 0 0 0 0 1 1 1 1
E 0 0 0 0 1 1 1 1
Exercises
• Exercise 3.3.2.1: If we had an Excel table full of numerical values what
would be meaningful data transformations that we can apply, from the
perspective of the rows and the columns?
• Exercise 3.3.2.2: Aggregating a list of numerical values, for example,
the temperature at a place every minute, into an hourly form reduces the
amount of data. Which ways can you find to compute aggregated values
in a time interval?
(a)
(b)
Figure 3.9 A slider (a) and a drop-down menu (b) created as Dash core components.
Exercises
• Exercise 3.3.3.1: Implement a drop-down menu as a dash core compo-
nent that has five labels for cities in the world while the second and
fourth city are already preselected by default.
• Exercise 3.3.3.2: Implement a slider with a range from zero to 100 while
the value 20 is preselected.
additional Dash core components. The cascading style sheets (CSS) (see
Section 3.3.5), on the other hand, and further languages like JavaScript can be
used to enhance the visual appearance of such browser content by allowing to
adapt border sizes, paddings, and margins for example as well as colors and
many more visual features. HTML is actually responsible for the interface
appearance, that is, since a dashboard can be regarded as some kind of web
page (or several of them in a linked manner), the HTML components and their
layouts, sizes, and additional features model the structure of a dashboard.
Consequently, we need a way to model a dashboard as some kind of HTML
structure which is supported by the so-called Dash HTML components which
have to be imported first to work with them as with the Dash core components
(see Listing 3.9 for the import that is required in this case).
1 <d i v>
2 <h1>My f i r s t Dash HTML component</h1>
3 </ d i v>
Listing 3.11 HTML code for the Dash HTML component in Listing 3.10.
Exercises
• Exercise 3.3.4.1: Use the Dash HTML components to implement three
different headlines of sizes H1, H3, and H5 placed below each other and
stating what their size is.
• Exercise 3.3.4.2: Include a headline of size H1 and below that a drop-
down menu by combining the corresponding Dash HTML and Dash core
components.
96 Python, Dash, Plotly, and More
Exercises
• Exercise 3.3.5.1: Create a headline in HTML in size H3 with a green
text color. Use inline, internal, and external CSS commands.
• Exercise 3.3.5.2: What are the benefits of using external CSS? What are
the drawbacks?
Exercises
• Exercise 3.3.6.1: Read a tabular dataset, for example, by using a Pandas
DataFrame and include a Plotly diagram in the form of a scatterplot that
shows the correlation behavior of two attributes from the tabular dataset.
• Exercise 3.3.6.2: Add another Plotly diagram in the form of a histogram
below the scatterplot that shows the distribution of one attribute, that is,
we should see two Plotly diagrams at the same time now.
3.3.7 Callbacks
Callbacks describe some kind of linking between the inputs and outputs, that
is, each time an input or several of them are changed by the users the cor-
responding outputs will be updated. This can be the value of a slider (which
is a Dash core component) that is updated, and hence, as a consequence,
one or several Plotly diagrams (which are also Dash core components) have
to be updated as well. By the callback mechanism, we create some kind of
interaction possibility, that is, a dialogue between users and the dashboard
or visualization tool. Without a callback, we would not have a chance to
3.3 Interplay between Dash, Plotly, and Python 99
1 import pandas a s pd
2 import p l o t l y . e x p r e s s a s px
3
4 from dash import Dash , Input , Output , dcc , html
5
6 app = Dash (__name__)
7
8 d f = pd . read_csv ( "K: \ \ Desktop \\ Data \\ quakes . c s v " )
9
10 app . l a y o u t = html . Div ( [
11 html . H1( " Quakes " , s t y l e = { ’ t e x t - a l i g n ’ : ’ c e n t e r ’ } ) ,
12 html . H4( "Many F a c t s " , s t y l e = { ’ t e x t - a l i g n ’ : ’ l e f t ’ } ) ,
13
14 dcc . Dropdown (
15 i d= ’ l o c a t i o n ’ ,
16 o p t i o n s = [ { " l a b e l " : " Asia " , " v a l u e " : ’AS ’ } ,
17 { " l a b e l " : " A u s t r a l i a " , " v a l u e " : ’AU ’ } ,
18 { " l a b e l " : " Europe " , " v a l u e " : ’EU ’ } ] ,
19 multi = False ,
20 v a l u e= ’ Asia ’ ,
21 s t y l e = { " width " : "40%" }
22 ),
23
24 dcc . Graph ( i d= ’ p l o t 1 ’ , f i g u r e = { } ) ,
100 Python, Dash, Plotly, and More
25 html . Br ( ) ,
26 dcc . Graph ( i d= ’ p l o t 2 ’ , f i g u r e = { } )
27 ])
28
29 @app . c a l l b a c k (
30 [ Output (
31 component_id= ’ p l o t 1 ’ ,
32 component_property= ’ f i g u r e ’
33 ),
34 Output (
35 component_id= ’ p l o t 2 ’ ,
36 component_property= ’ f i g u r e ’
37 )
38 ],
39 [ Input (
40 component_id= ’ l o c a t i o n ’ ,
41 component_property= ’ v a l u e ’
42 )
43 ]
44 )
45
46 d e f update_graph ( o p t i o n _ s l c t d ) :
47 d f f = d f . copy ( )
48 d f f = d f f [ d f f [ " l o c a t i o n " ] == o p t i o n _ s l c t d ]
49
50 f i g = px . s c a t t e r (
51 dff ,
52 x= ’ magnitude ’ ,
53 y= ’ depth ’ ,
54 c o l o r = " depth "
55 )
56
57 f i g 2 = px . s c a t t e r (
58 dff ,
59 x=" l a t i t u d e " ,
60 y=" l o n g i t u d e " ,
61 c o l o r = " magnitude " ,
62 s i z e = " depth "
63 )
64
65 return fig , f i g 2
66
67 i f __name__ == ’__main__ ’ :
68 app . r u n _ s e r v e r ( debug=F a l s e )
Listing 3.16 Python code showing the mechanism of callbacks
3.4 Deploying 101
Exercises
• Exercise 3.3.7.1: Implement a simple dashboard with one range slider
whose values are used to update a corresponding scatterplot, that is, the
range slider is used here as a numerical interval filter. What are the inputs
and outputs of the callback function?
• Exercise 3.3.7.2: Implement another dashboard with two range sliders
allowing to filter two numerical attributes while the effect of the filters
is interactively shown in a scatterplot.
3.4 Deploying
Another important stage during the development of a dashboard is the deploy-
ment of it to make it available for everybody who has a web browser and
internet access. Technically, this is easily possible but it brings various other
challenges into play, also taking into account the visual and interface design.
In case a dashboard is accessible from any place in the world, the users
have a multitude of properties ranging from language differences, cultural
habits, signage, symbols, reading directions, and many more [93], typically
including the visual variables like colors, shapes, icons, all of them having
different meanings depending on the users. Hence, deploying does not only
mean to put the dashboard online, but it has to be done in a way that it is
focusing on the users’ experiences and environments. Making a visualization
tool available for anybody on earth can be a difficult task if it is to fulfill
all of the users’ needs and requirements. Consequently, it is a good advice
to consider the possible users already in the design phase to not run into
problems after the tool is finally deployed. Also, the application domain can
require differences in the tool’s setup, for example for analyzing car traffic
data it makes a difference if the traffic runs on the right or left street side. In
the medical sector, there might be different diseases and viruses that require
different analysis and visualization techniques, creating a dashboard for any
kind of application scenario is not possible. Moreover, domain experts [257]
have to be recruited to create a dashboard for specific scenarios, a fact that
can come up with high costs.
In this section, we take a look at possibilities to deploy a dashboard,
that is, to make it publicly available online in a web browser. Section 3.4.1
describes one popular way to do that by making use of Heroku. The chal-
lenging issue apart from the technical problems are the users themselves
who are now international ones instead of national or local users with
102 Python, Dash, Plotly, and More
3.4.1 Heroku
Actually, we do not need to deploy a dashboard, that is, a Dash app. It
typically runs locally, on our own machine, on so-called localhost. The
URL for accessing the localhost is given after the compilation phase of the
dashboard’s Python code is finished. Typing in this URL in a web browser
or clicking on it will successfully show the created dashboard with all of its
functionality. However, to go one step further, it is of special interest to deploy
the Dash app to a certain kind of server, to share it with our worldwide users,
even by hiding it behind a login and a password. There are various ways to
share a dashboard on a server, but one specific way to do that is by making use
of a Heroku server [81]. This kind of server platform provides an easy way
to deploy so-called Flask-based applications, as we have talked about already
in Section 3.1.2. For more detailed instructions, we recommend to read the
tutorial at https://dash.plotly.com/deployment. Actually, in summary, we only
need four steps to get it running, which are in a condensed form:
1. The creation of a project folder for the dashboard
2. The initialization of this project folder
3. The initialization of the project folder with an example application
4. The initialization of Heroku
In cases, we modify and extend the dashboard code, we have to proceed
with a fifth step that has to be repeated each time a modification or extension
is made, which is the redeployment.
Exercises
• Exercise 3.4.1.1: Create a dashboard that reads a small tabular dataset
(an Excel table) with numerical values for the attributes. The dashboard
should show a scatterplot for two columns of the tabular dataset, and
there should be an option to filter values. Deploy this simple dashboard
to Heroku.
3.4 Deploying 103
• Exercise 3.4.1.2: Let your dashboard run in different web browsers like
Google Chrome, Mozilla Firefox, Opera, or Microsoft Edge and try to
spot the differences.
the dashboard is, and this honor falls back to the designers and implementors
of the dashboard. We can record valuable feedback by asking the users in
some kind of crowdsourcing user experiment [2] which features they liked
and which ones not or what they consider improvable. This feedback can be
collected in a textual form by letting them type in text in a feedback form in
the dashboard or by showing some kind of Likert scale [221] ranging from
very good (5) to very bad (1) in the dashboard to, get numerical instead of
qualitative feedback. Numerical values are easier to evaluate than textual
feedback, but they are also some kind of aggregated measure. Moreover,
the mouse cursor can be tracked and stored over space and time as well as
mouse clicks. This gives a more detailed impression of the user behavior;
however, the mouse movements alone do not give us any feedback on the
cognitive processes that the users are confronted with. The biggest issue
here, no matter which kind of data is recorded from the international users,
comes from the fact that the data itself is not reliable since it is acquired in
some kind of uncontrolled user study in which we cannot control the users
and in which we do not know much about the users, apart from their IP
addresses. We might ask about personal details, but we can never be sure
if those details are true. The recorded data itself is also a problem. It is quite
hard to analyze the data for patterns, correlations, and anomalies, actually
we are interested in the user behavior when they are given a certain task,
that is, we want to detect design flaws in our dashboard based on the user
behavior.
Exercises
• Exercise 3.4.2.1: Imagine your dashboard has to be created for an
international market with users from Europe, Asia, and South America.
Discuss important visual design and interface design features that have
to be taken into account to make it usable for all those users.
• Exercise 3.4.2.2: If we integrate user data into the design of our
dashboard, which kind of user data should be considered, and how
trustworthy and reliable is such user data (since we do not know who
the real users are)?
3.4 Deploying 105
Exercises
• Exercise 3.4.3.1: Create a simple dashboard, deploy it, and include a
text field for recording qualitative feedback of online users. How do you
advertise your dashboard to get enough study participants?
106 Python, Dash, Plotly, and More
• Ethics and privacy: We might draw wrong conclusions from our own
data just by using a dashboard designed and implemented by someone
else. This might be due to a missing experience to work with a dash-
board. Moreover, new questions arise asking about if it is allowed to use
the data uploaded by others.
• Environments: Deploying a dashboard has to take into account different
displays like small-, medium-, or large-scale ones. Also, the operating
systems of the users can have an impact on the functionality as well as
the different web browsers, also in different versions.
Exercises
• Exercise 3.4.4.1: Describe the benefits that you would have when
deploying a dashboard.
• Exercise 3.4.4.2: Are the drawbacks when deploying a dashboard also
depending on the application domain, that is, are there, for example,
differences between geographic, medical, or educational applications?
4
Coding in Python
109
110 Coding in Python
4.1 Expressions
Evaluating mathematical constructs composed of arithmetic expressions by
adding, subtracting, multiplying, or dividing is one of the major ingredients
in nearly any computer program. Such expressions can get quite long with
a multitude of operators connecting the individual parts. It is important to
understand in which order such expressions are computed, for example,
prioritizing some expressions by using parentheses. Understanding the laws
of execution and evaluation is an important ingredient to avoid errors that
might be hard to locate later on in a computer program. Each operator has
some kind of precedence in Python, and in any other programming language,
4.1 Expressions 111
on one value and might change the sign of a value, for example, from a
positive into a negative one. The good thing with expressions in Python is that
they do not only work on raw values like integers and floating point numbers,
but also on variables or even function calls that create a certain value as a
result.
Table 4.1 A list of arithmetic operators, some examples, their meanings, and mathematical
notations.
Operator Example Explanation Math formula
+ x+y Add x and y (addition) x+y
- x-y Subtract y from x (subtraction) x−y
* x*y Multiply x and y (multiplication) x·y
x
/ x/y Divide x by y (division, type float) y
x
// x//y Divide x by y (division, type int) y
y
** x ∗ ∗y x to the power of y (exponentiation) x
% x%y Divide x by y (modulo division) x mod y
- −x Negative of x (unary) −x
+ +x Positive of x (unary) +x
Exercises
• Exercise 4.1.1.1: Evaluate the following arithmetic expression:
1 (4+3*7 -(3+5) * 6 ) /3 -17%3
• Exercise 4.1.1.2: Evaluate the following arithmetic expression:
1 3 -12+4**(3 -1) *0.1 -15//(4+3)
4.1 Expressions 113
Table 4.2 A list of relational operators, some examples, their meanings, and mathematical
notations.
Operator Example Explanation Math formula
> x>y x greater than y x>y
< x<y x smaller than y x<y
>= x >= y x greater or equal than y x≥y
<= x <= y x smaller or equal than y x y
== x == y x equal to y x=y
!= x != y x not equal to y x=y
Although comparison operators are binary operators in the sense that they
are applied to two expressions, that is, the left and right side of the operator,
they can even be used in a sequence in the sense of a chained comparison.
This means we can write
1 w > x >= y > z
instead of
1 w > x and x >= y and y > z
114 Coding in Python
Exercises
• Exercise 4.1.2.1: Evaluate the following arithmetic-relational expres-
sion:
1 ( 4 - 1 2 ) /8+1 > ( 9 * * 0 )
Exercises
• Exercise 4.1.3.1: Evaluate the following arithmetic-relational-Boolean
expression:
116 Coding in Python
Exercise
• Exercise 4.1.4.1: Evaluate the following bitwise expression:
1 5 & 13 & 3 | 14
• Exercise 4.1.4.2: Evaluate the following bitwise expression:
1 23 >> 2 & 23 | ( ~ 1 7 )
Booleans, but even Strings, or more complex objects. Also, the operators
themselves can fall into the categories of arithmetic, relational, Boolean/logi-
cal, or bitwise operators. Such expressions are denoted in this book as mixed
expressions. In cases in which an expression has a mixed character, we must
understand the precedence of the individual operators which is given as an
overview in Table 4.7 from highest to lowest precedence. The precedence of
the operators describes in which order an expression is evaluated. Parentheses
can be used to change the order of evaluation, that is, subexpressions in
parentheses have the highest precedence. During evaluation of an expression,
the subexpressions are evaluated from highest to lowest precedence, in case
we meet equal precedence, a left to right evaluation order is used.
Exercises
• Exercise 4.1.5.1: Evaluate the following mixed expression:
1 4 * ( 3 - 5 * * 2 ) /4 > 7 and (3<<3) *7 -6 == 9
whole numbers, positive ones as well as negative ones, also the zero value.
Those integers can be given to a certain base which could indicate that an
integer value is binary, octal, decimal, or hexadecimal, for example (see
Listing 4.6). If no prefix is given, the value is interpreted as decimal which
is the default setting. Other base options are b for binary, o for octal, and x
for hexadecimal. It makes no difference if the prefix characters are given as
capitals or not. The length of the number, that is, in terms of the number of
following digits is unlimited in Python; however, the computer’s memory is
the limit.
1 0 b101 == 5 # b i n a r y , b a s e 2
2 0 o101 == 65 # o c t a l , b a s e 8
3 0 x101 == 257 # hexadecimal , b a s e 16
Listing 4.6 Integer values to several bases
Apart from integer values we have to deal with real numbers which
are given in Python with the so-called floating point numbers that can be
recognized by a decimal point that divides the number into a prefix and a
postfix. Additionally, we can use the exponent notation [19] to indicate the
value of a floating point number which can be given as the letter e or E with
an additional positive or negative integer expressing the exponent to the base
10 (see Listing 4.7).
1 1 3 . 8 7 6 == 1 3 . 8 7 6
2 . 8 1 == 0 . 8 1
3 1 2 . == 1 2 . 0
4 . 3 2 e5 == 3 2 0 0 0 . 0
5 3 . 2 e - 3 == 0 . 0 0 3 2
Listing 4.7 Examples of floating point numbers in different notations
The complex numbers consist of a real part and an imaginary part that
are given in the form r + ij in Python while r denotes the real part and i the
imaginary part (see Listing 4.8 for examples).
1 1.89 + 2.1 j
2 2.119 - 3.14 j
3 -0.97 + 1.27 j
Apart from the numeric values we can find textual values, typically called
strings in Python. Each string has a finite length and consists of so-called
characters, that is, a sequence of characters with a well-defined order. Python
denotes string objects by using the data type str given in single or double
quotes to make string numbers distinguishable from real numeric values, that
is, the string 33 is different from the integer number 33 (see Listing 4.8).
120 Coding in Python
Finally, the Boolean is a data type giving support for true and for not
true values, that is, denoted by True and False in the programming language
Python. We have seen examples for this already in Section 4.1.3 when we
introduced Boolean expressions.
Exercises
• Exercise 4.2.1.1: What is the result of adding an integer number to a
floating point number?
• Exercise 4.2.1.2: Try the following expressions and describe the results:
1 4/0
2 4/0.0
list, we use brackets that enclose the contained elements (see Listing 4.9).
As you can see we start with one opening bracket, give the elements of
the list separated by commas, and indicate the end of the list by a closing
bracket. Lists in Python are zero-based, that is, the first element (the most left
one) has the index 0 (and not 1 as we might start counting). Consequently,
accessing individual elements from a list happens by the corresponding index
on which the element can be found. This is done by putting the index into
brackets, like myList[3] if the elements are stored in a list called myList.
Typically, this is done by assigning the list to a variable (will be described
in Section 4.2.4). To modify a value in a list at a corresponding position
we can assign it a value at an index like myList[3] = 17.35. There are
various other ways to access elements from a list, for example more than
one at the same time. This can be done by myList[1:5], which gives back
the values at indices 1–4, as another sublist. We can also give back the rest
of a list starting from a corresponding index like myList[3:] which returns
the elements from the given index until the end of the list. It may be noted
that even two-dimensional, three-dimensional, or even n-dimensional lists are
possible due to the fact that in Python we can add any kind of objects in lists,
consequently also lists themselves, making them to lists of lists, or lists of
lists of lists, and so on (see examples in Listing 4.10).
1 [ 3 , 3 . 1 4 , False , -23 ,3+4 j , " Hello " ]
Listing 4.9 A list in Python with a few data elements
1 myList = [ 1 , 2 , 3 , 4 , 5 , 6 , 7 ]
2 myList [ 3 ] = 1 7 . 3 5
3 myList [ 1 : 5 ] # = [ 2 , 3 , 1 7 . 3 5 , 5 ]
4 myList [ 3 : ] # = [ 5 , 6 , 7 ]
5 my2DList = [ [ 1 , 2 , 3 , 4 ] , [ 5 , 2 . 1 1 , 9 ] , [ 0 , " Hi " ] ]
Listing 4.10 Accessing and modifying elements in a list
Listing 4.11 illustrates how tuples are created and how we can work with
them. Here we also see that tuples are built by using parentheses () instead of
brackets [] as in lists.
There is one more option to structure data elements apart from lists and
tuples. Sets are another way in Python to create a collection of data elements.
To indicate a set, we enclose the elements in braces {}, separated by commas.
One more difference to lists and tuples comes from the fact that the elements
in a set are unordered (see Listing 4.12 for examples using sets). This leads
to the consequence that we cannot access the set elements by using an index
since indices have no meaning at all if there is no explicit order given. As
in set theory in the field of mathematics we can work with several sets,
for example applying the well-known set operations like union, intersection,
symmetric difference, and many more (we will introduce functions and
methods in Sections 4.6 and 4.9.3).
1 mySetA = { 1 , 2 , 3 , 4 , 5 }
2 mySetB = { 3 , 4 , 5 , 6 , 7 }
3
4 mySetC = mySetA . i n t e r s e c t i o n ( mySetB ) # e v a l u a t e s t o { 3 , 4 , 5 }
Listing 4.12 Creating sets and applying operations
The problem with sets is that we cannot access the elements contained
in it by just asking about a well-defined index, that is, a position in the set.
This is due to the fact that sets are unordered. However, there is one more
data structure which is called a dictionary that actually also has no index but
the access happens with so-called key-value pairs. This means, to access an
element in a dictionary we just have to know the corresponding key, and we
get the value to this key in return. Dictionary elements are also enclosed by
braces, just like sets, the key-value pairs are separated by commas, and each
key is separated from a value by a sign. A dictionary is also unordered but
compared to sets we can access the elements by using the keys. Listing 4.13
shows some examples for dictionaries and for accessing their values from
keys. Dictionaries can be modified, that is, key-value pairs can be removed,
new ones can be added, and they can be changed. Table 4.8 summarizes the
most important properties of lists, tuples, sets, and dictionaries.
4.2 Data Types and Variables 123
Finally, classes can be implemented for creating more data structures like
lists, tuples, sets, and dictionaries but for classes, objects, and instances we
refer to Section 4.9.
Exercises
• Exercise 4.2.2.1: Given a list of natural numbers myList = [3,1,8,9,2].
Can you find a way to transform this list into a set with the same
elements?
• Exercise 4.2.2.2: Given two lists of natural numbers myListA =
[3,1,2,4,3,8] and myListB = [4,5,1,3,7]. Write Python code to create
a new list that contains all elements that are contained in both
lists.
1 int (3.14) # = 3
2 f l o a t (100) # = 100.0
3 f l o a t ( ’ 3.1415 ’ ) # = 3.1415
4 str (42.99) # = ’42.99 ’
5 set ([3.14 ,7 , -3]) # = {3.14 ,7 , -3}
6 tuple ({3.14 ,7 , -3}) # = (3.14 ,7 , -3)
7 l i s t ( ’ Bye Bye ’ ) # = [ ’ B ’ , ’ y ’ , ’ e ’ , ’ ’ , ’B ’ , ’ y ’ , ’ e ’ ]
8 d i c t ( [ "A" , 1 ] , [ "B" , 2 ] , [ "C" , 3 ] ) # = {"A" : 1 , "B" : 2 , "C" : 3 }
Listing 4.14 Some meaningful conversions from one data type to another one
Exercises
• Exercise 4.2.3.1: Convert the floating point number 2.6176 into a
corresponding integer.
• Exercise 4.2.3.2: Given a string ’3.8821’. Convert the string into a
floating point number and then into an int. Is it allowed to convert the
string directly into an int?
4.2.4 Variables
A variable in Python is something like a container in which we can store
values of a certain data type. When we define a variable, we make sure that
some place is reserved in the memory for possible values contained in such a
variable. Since each value has some well-defined data type, the variable that
stores this value also carries this data type. A variable can be declared with
a certain name and initialized with a certain value (see Listing 4.15). This
is done by mentioning the name of the variable on the left-hand side of an
equality sign and put its current value to the right-hand side of the equality
sign. This order must be preserved. It may be noted that variables in Python
can be redeclared at any time as well as their values can be modified, hence
the name variable. Due to the weakly typed language character we can even
change the data type of the same variable, for example, from an int to a string
(see Listing 4.16).
1 length = 3.89
Listing 4.15 Declaring a variable and initializing it with a value
1 length = 3.72
2 l e n g t h = " Given i n m e t e r s "
Listing 4.16 Variable redeclaration
4.2 Data Types and Variables 125
Variables can even exist in two special forms characterized by the way in
which we can access them and modify them. This brings into play local and
global variables which are discussed in more detail in Sections 4.6 and 4.9.
Exercises
• Exercise 4.2.4.1: Declare three variables called height, width, and length,
initialize them with some floating point values, and compute the value
of the variable volume as the product of the three variable values.
• Exercise 4.2.4.2: Declare two variables a and b, initialize them with
floating point numbers. Compute a Pythagorean triple, that is, a value
for a variable c that the equality a2 + b2 = c2 holds.
4.2.5 Constants
Sometimes we would like to include values that never change during a
program execution. This could be done by a traditional variable, but there
is a chance that the value gets changed at some point which is not desired.
Hence, we would like to give such a variable a special meaning, saying that
its content should stay untouched in any scenario. This is the point in which
we have to use a constant. Actually, in Python there is no special syntax for
that. We just use variables, but we give them a special form by a well-defined
naming convention, that is, using only capital or uppercase letters indicates
that this variable is a constant, although it might be changed. Since constants
are just variables (but never change the values), they can be based on any data
type the standard variables are also based on. The value of a constant should
not be modifiable, we can just use it in one direction, meaning reading the
value it contains (see Listing 4.17 for creating constants).
1 PI = 3 . 1 4 1 5 9 2
2 E = 2.7182
3 HIGHEST_SPEED = 240
Listing 4.17 Defining constants in Python
Exercises
• Exercise 4.2.5.1: Define a constant that contains the number of seconds
per day.
126 Coding in Python
• Exercise 4.2.5.2: Define a constant that stores the free fall acceleration
on earth as a range interval.
problem we look at, there are various ways to get support from built-in Python
methods. How to create one’s own functions and methods will be explained in
Sections 4.6 and 4.9.3, respectively. Moreover, we will explain the difference
between functions and methods, actually at the moment, the outcome does
not make a difference for us. Apart from string methods, we can also apply
built-in methods or our own created methods on the major building blocks of
such strings, namely characters and their internal organization in tables, for
example, in an ASCII table (Section 4.3.2).
If only one string is involved, we might be interested in the length of that
string, the number of lower- and uppercase letters it contains, the positions
of special characters or substrings in that string, or we might actively change
the string, for example, exchanging special characters or converting it into
uppercase letters only or just one uppercase letter at the beginning. There
are many options to apply functions and methods to strings, Listing 4.18
illustrates some examples.
1 o r i g i n a l S t r i n g = ’ h e l l o how a r e you ? ’
2
3 numChars = l e n ( o r i g i n a l S t r i n g )
4 newString = o r i g i n a l S t r i n g . c a p i t a l i z e ( )
5 n e w S t r i n g = o r i g i n a l S t r i n g . encode ( )
6 test = originalString . i s a s c i i ()
Listing 4.18 String functions and methods if only one string is involved
If two or more strings are involved (see Listing 4.19), we can apply
different kinds of functions and methods.
1 o r i g i n a l S t r i n g = ’ h e l l o how a r e you ? ’
2 t e x t = ’ ow a r ’
3
4 originalString . find ( text )
5 originalString . index ( text )
6 originalString . r e p l a c e ( ’ a r e you ’ , ’am I ’ )
Listing 4.19 String functions and methods applied to more than one string
Exercises
• Exercise 4.3.1.1: Given a string ’Good morning everybody’. Find a way
to reverse the string.
128 Coding in Python
• Exercise 4.3.1.2: Given two strings ’hello’ and ’how are you’. Find a
way to concatenate both strings into one string.
Figure 4.1 Characters and symbols with their corresponding numeric identifiers represented
in the ASCII table.
4.3 Strings and Characters 129
1 character = ’p ’
2
3 i d e n t i f i e r = ord ( c h a r a c t e r )
4 c h a r a c t e r = chr ( i d e n t i f i e r )
5 identifier = identifier + 5
6 c h a r a c t e r = chr ( i d e n t i f i e r ) # e v a l u a t e s to ’u ’
Listing 4.20 Converting between characters and corresponding numeric identifiers based on
the ASCII table
Exercises
• Exercise 4.3.2.1: What is the numeric value of the character ’M’ in the
ASCII table? Write code for that.
• Exercise 4.3.2.2: Given a list of characters myList = [’H’,’e’,’l’,’l’,’o’].
Convert this character list into a numeric ASCII value list. Do the same
with the letters occurring in your own name.
1 f e e d b a c k = i n p u t ( ’ P l e a s e p r o v i d e some f e e d b a c k : ’)
Listing 4.21 Allowing user input in textual form
• Length of the input: Actually, users can enter a quite long textual
information, that is, consisting of many characters. If we wish to limit
the number of possible characters, we can validate that by asking for the
length of a string. The function len() has already been introduced earlier
(see Section 4.3.1).
• Content/Data type of the input: An input in this form is typically given
as a value in the string data type. This means if we expect integers
or floating point numbers, we have to check first if the given string
is convertable into such a numeric value. This concept has also been
explained earlier (see Section 4.2.3).
• Specific pattern in the input: Finally, we might want to check if a string
follows a certain pattern or rule. This seems to be more complex than
the standard length and data type validations but actually, it is not really
difficult. The powerful idea that comes into play here are so-called
regular expressions [70, 225]. A regular expression can be understood
as a string itself, consisting of characters that have a meaning, that
is, those characters can be used to derive certain well-defined patterns
in a string. In Python there is a built-in package denoted by re. Such
regular expressions can be checked for several properties like meta
characters (Table 4.9), special sequence characters (Table 4.10), or a
set of characters (Table 4.11), without guaranteeing completeness of the
tables.
1 import r e
2
3 i n p u t S t r i n g = ’ I l o v e programming ’
4 matches = r e . f i n d a l l ( ’ o ’ , i n p u t S t r i n g )
5 matches = r e . s e a r c h ( ’ o ’ , i n p u t S t r i n g )
6
7 i n p u t S t r i n g 2 = ’ RE352 ’
8 v a l i d a t e = r e . match ( [ A- Z ] { 1 , 2 } [ 0 - 9 ] { 3 } , i n p u t S t r i n g 2 )
Listing 4.22 Examples of functions and methods for applying regular expressions to strings
Exercises
• Exercise 4.3.3.1: Write a regular expression for strings that contain
exactly one uppercase letter and end with three digits.
• Exercise 4.3.3.2: For a password validation check, we need a string of at
least eight characters, and that starts with an uppercase letter and at least
one digit. Write a regular expression for that.
4.3.4 Comments
The documentation in a program [227] is very important to let the developer
better understand the functionality in certain parts in the code. This is, in
132 Coding in Python
particular, useful if we have to inspect the code many weeks later and to
quickly get an impression about what is being implemented in a certain piece
of code. Due to this fact, it is a good advice to keep the documentation in
the form of code comments short but still informative to explain the effects
of a code and why it has been implemented in exactly this way. A text line
that starts with a # sign will be ignored by the compiler or interpreter, but
when reading the code, it is always there (see Listing 4.23 for an example of
a comment). Comments can be placed everywhere in a code, it may be noted
that if they are placed at the end of a code line, the rest after the # will be
ignored.
1 # Wr iti ng comments i s not d i f f i c u l t
2 p r i n t ( ’ This i s a commented program . ’ )
3
4 v a l u e = 25 # A comment a f t e r a code l i n e
Listing 4.23 A comment in a Python code
Comments are not limited to one line only. They can span several lines
and many of them can be made at different code places (see Listing 4.24),
also with so-called triple quotes indicating a comment over several lines.
1 # Hello
2 # These comments a r e p l a c e d
3 # in several l i n e s
4 p r i n t ( ’ This seems t o work ’ )
5
6 """
7 Hello
8 These comments a r e p l a c e d
9 in several l i n e s
10 """
Listing 4.24 Several comments spread over several lines by using triple quotes
Exercises
• Exercise 4.3.4.1: Write a one-line comment in Python code.
• Exercise 4.3.4.2: Write a multi-line comment in Python code.
directions: Either true or false (Sections 4.1.2 and 4.1.3). This leads to a
binary kind of control flow that can handle one of both ways, depending on
the outcome of a formerly evaluated conditional expression. In some cases,
we even have more than two options which might be modeled by several
conditionals, but, in this case, we might better take the option of allowing
several cases, handling one case after the other until one matching is found,
or in the worst scenario, no case is found, asking to execute a default option.
In some situations, it is even a good idea to handle an exception, meaning
there is a strange, unwanted, or unexpected evaluation that would otherwise
let the program crash if not handled by an exception.
In this section, we will start by explaining the mighty concept of
conditionals allowing a branching in the control flow (Section 4.4.1). We
will also take a look at a so-called pattern matching option that allows
several cases to be handled, but just one or a default one can be executed
(Section 4.4.2). Finally, we describe exceptions and how they can be checked,
even be treated to avoid the crashing of the program (Section 4.4.3).
1 value = -0.05
2
3 i f value > 0 . 0 :
4 p r i n t ( ’ The v a l u e i s g r e a t e r than 0 . 0 . ’ )
5 else :
6 p r i n t ( ’ The v a l u e i s s m a l l e r than o r e q u a l t o 0 . 0 . ’ )
Listing 4.26 The else part of a conditional can be used as an alternative in cases the
if-statement branch is not executed
In Python there is even another alternative: the elif option. This one
gives a chance to proceed as another alternative in cases the if-statement is
evaluated to False (see Listing 4.27).
1 value = 0.00
2
3 i f value > 0 . 0 :
4 p r i n t ( ’ The v a l u e i s g r e a t e r than 0 . 0 . )
5 e l i f v a l u e == 0 . 0 :
6 p r i n t ( ’ The v a l u e i s equal to 0 . 0 . ’ )
7 else :
8 p r i n t ( ’ The v a l u e i s s m a l l e r than 0 . 0 . ’ )
Listing 4.27 The elif option can be used as an alternative in cases the if-statement is not
followed
The keyword ’pass’ can even be used in cases in which there are no
statements after an if-branch. The pass keyword replaces the otherwise empty
code block, however, this happens only in rare cases.
Exercises
• Exercise 4.4.1.1: Write a program to test whether a natural number is
odd or even.
• Exercise 4.4.1.2: Given a variable containing a string. Test whether this
string contains uppercase letters and more than 10 characters.
1 v a l u e = ’ Mercedes ’
2
3 match v a l u e :
4 c a s e ’ Audi ’ :
5 p r i n t ( ’ Your c a r i s an Audi . ’ )
6 c a s e ’ Peugeot ’ :
7 p r i n t ( ’ Your c a r i s a Peugeot . ’ )
8 c a s e ’ Mercedes ’ :
9 p r i n t ( ’ Your c a r i s a Mercedes . ’ )
10 c a s e _:
11 p r i n t ( ’ The brand o f your c a r i s unknown . ’ )
Listing 4.28 A multitude of options are possible by using a match case pattern
Exercises
• Exercise 4.4.2.1: Write code for a pattern matching that checks different
grades and outputs whether the grade is very good, good, medium, bad,
or very bad.
• Exercise 4.4.2.2: Write code for a pattern matching that checks different
sports activities and outputs the number of players required.
4.4.3 Exceptions
A syntax error [253] can occur if a piece of code is not properly defined
to make it understandable for the compiler or interpreter. This kind of error
happens before the actual program execution, that is, before runtime, already
in the program translation phase. A semantic error [182] is an error that is not
detected by the compiler but rather by the programmers themselves. Semantic
errors create unwanted effects, those that do not produce the functionality the
programmers desired. A third kind of error is an exception. A program might
be syntactically and semantically correct, but there might be some places
in which the code is not running properly, but just for a few ’exceptional’
instances of a problem, hence those are so-called exceptions. Unlike syntactic
or semantic errors, exceptions can be handled (in case one knows them). If
they are not handled they can result in errors and the program might crash
(see Listing 4.29 for an exception and Listing 4.30 for handling it). Apart
136 Coding in Python
1 value = 0
2
3 try :
4 d i v i s i o n V a l u e = 6/ v a l u e
5 break
6 except ZeroDivisionError :
7 p r i n t ( ’ D i v i s i o n by z e r o i s not a l l o w e d ’ )
Listing 4.30 Handling a division by zero error
Exercises
• Exercise 4.4.3.1: After a user input we would like to proceed with the
user-defined number, but unfortunately, this number is a string. Write
code to handle such a conversion error.
• Exercise 4.4.3.2: In the example Listing 4.30 extend the code of the
except part to provide a value for the divisionValue variable even if it
generates a division-by-zero error.
4.5 Loops
To avoid implementing the same kind of functionality all the time, only
differing in the size of an argument, for example, we can make use of
4.5 Loops 137
so-called loops [159]. Those are simple constructs that repeat instructions
until a certain well-defined termination condition is met. There are two
types of loops: definite ones and indefinite ones. This means, for the first
type of loops we know how many iterations are made until the process
terminates, for the second type of loops we have no idea how many iterations
have to be made until the process terminates. The termination is decided
during the runtime of the loop and has to be computed in some kind
of dynamic termination condition. For this reason (and maybe for some
others as well) Python supports for-loops and while-loops, both of them,
contain a termination condition, however it is given in two different ways.
Loops can even run endlessly, in case the termination condition is never
met. Moreover, loops can be implemented inside loops and the loop types
can even be mixed, that is, for-loops can be contained in while-loops and
vice versa.
In this section, we start by introducing the principle of definite iteration
and focus on the so-called for-loops (Section 4.5.1). Apart from definite
iterations we look into indefinite iterations, in this case we describe the
concept of while-loops and explain termination conditions (Section 4.5.2).
Finally, we illustrate how loops can be nested, meaning there is actually no
limit to the number of loops contained in each other, but it may be noted that
an unclever nesting can cause high runtimes (Section 4.5.3).
There are even break and continue statements to stop the iteration if a
certain element is found or has a certain property. Moreover, we do not have
to stop it, but we might skip it instead and continue the iteration after it,
hence the corresponding element is omitted in the process (see Listings 4.32
and 4.33).
1 names = [ ’ Marco ’ , ’ M i c h a e l ’ , ’ Heiko ’ , ’ John ’ ]
2 numOfLetters = 0
3
4 f o r name i n names :
5 i f name == ’ M i c h a e l ’ :
6 break
7 numOfLetters += l e n ( name )
8
9 p r i n t ( numOfLetters )
Listing 4.32 A for-loop with a break statement
the for-loop has finished. Moreover, a pass statement can be used in cases the
body of a for-loop is empty for some reason.
1 n = 5;
2
3 f o r i in range (10) :
4 n+=i **2
5
6 f o r i in range (7 ,17) :
7 n-= i
8
9 f o r i in range (10 ,20 ,4) :
10 n+=i
Listing 4.34 A for-loop defined on the range()-function
Exercises
• Exercise 4.5.1.1: Implement a for-loop that sums up the natural numbers
from 1 to 100, that is, 100
i=1 i.
• Exercise 4.5.1.2: Implement a for-loop that computes the factorial of a
natural number n ∈ N given as n! := ni=1 i for a value of n = 20.
test expression is only checked once at the beginning, after each iteration,
independent of the fact whether or not the test expression might change
during executing the statements in the body of the while-loop.
1 v a l u e = 100
2
3 w h i l e ( v a l u e % 3 != 0 ) :
4 v a l u e = v a l u e /2
5
6 print ( value )
Listing 4.35 A while-loop iterates in an indefinite way
Exercises
• Exercise 4.5.2.1: Implement a while-loop that runs as long as the term
n := n + 1.0
n is smaller than a given number.
• Exercise 4.5.2.2: Implement a while-loop that does the same as the
for-loop in Listing 4.31.
Exercises
• Exercise 4.5.3.1: Implement a for-loop that processes a list of strings,
element by element, and that processes each string character by
character to count the number of uppercase letters.
4.6 Functions 141
4.6 Functions
Functions are the major building blocks of programming since they allow
to encapsulate subroutines into code blocks. A subroutine is understood as
a small algorithm that works with input and output parameters, computing
something useful. The whole program is full of such subroutines, being
responsible for the functionalities a software can have. Using functions makes
coding much easier, with less text, and even more maintainable. For example,
if the functionality of a subroutine has to be changed without using functions
we have to find all locations in the code and adapt the subroutine. This is
a tedious, time-consuming, and error-prone task, also with high chances to
include inconsistencies in the code. For this reason, functions can be used to
put such subroutines at one place. Each time we have to adapt something in
the subroutine we only have to do this once, in the corresponding function,
which accelerates the adaptation process and reduces the chance to include
inconsistencies that would lead to the program to crash.
In this section, we describe the concept of creating one’s own functions
(Section 4.6.1) with and without return parameters and with an arbitrary
number of such parameters, also with different data types. Section 4.6.2
illustrates how functions can be called, taking into account their parameter
lists and data types as well. Apart from one, even several functions can be
integrated, in some kind of nested structure, a strategy that is illustrated in
Section 4.6.3. Moreover, the variables inside a function are typically used
in a local way, but for some reason, we could even define them as global
variables (Section 4.6.4).
function is to know the Python keyword for that which is given by def, telling
us that we are going to define a function. Listing 4.37 shows an example
for a simple function that is named sum and gets 3 input parameters x, y,
and z. Those are summed up and the result is returned as the only output
parameter. It may be noted that a function can have as many input and output
parameters as we like (separated by commas), also no inputs and no outputs
are possible. Returning a result is done by the return statement (in the last line
of a function definition). This means the function is completely processed and
we will return in the control flow to the place where the function was called
and process the next statements, but now we know the result of a computation
(a simple or complex one) and can use it in the program.
1 d e f sum ( x , y , z ) :
2 r e t u r n x+y+z
Listing 4.37 A simple function definition in Python
Exercises
• Exercise 4.6.1.1: Define a function that computes the factorial of n, that
is, the product of all natural numbers from 1 to n.
• Exercise 4.6.1.2: Define a function that takes two lists with numeric
values as arguments and adds them element by element, returning a new
list containing the sums of the elements.
function happens by its name and the parameter list while the parameters are
replaced by real values in the call, to allow the function to be executed and
to compute a result. Listing 4.38 gives an example showing how to call a
function, in this case the function sum on three arguments from Listing 4.37.
The three values are given as variables and build the input parameters of the
sum function. Since the function is already properly defined, it is known what
to do with these values while the returned value from the function is finally
assigned to another variable called result which can be seen in the main code.
1 value1 = 2.3
2 value2 = 4.5
3 value3 = 1.3
4
5 r e s u l t = sum ( v a l u e 1 , v a l u e 2 , v a l u e 3 )
Listing 4.38 Calling a function happens by using its name and its parameter list
Exercises
• Exercise 4.6.2.1: Call the function in Listing 4.37 to compute the sum of
the values for each parameter by varying the value between 0 and 100.
Hint: Use a nested for-loop.
• Exercise 4.6.2.2: Extend the function from Listing 4.37 to allow an
unknown number of numeric arguments. Call the function by varying
the number of the arguments.
Exercises
• Exercise 4.6.3.1: Define a function for computing the average length of
strings in a given string list. Use the function len() to compute the length
of each string.
• Exercise 4.6.3.2: Define a function to compute the ratio between
maximum and minimum of a given list of floating point numbers. Use
the functions min and max.
1 nFact = 5
2
3 def nFactorial (n) :
4 g l o b a l nFact
5
6 f o r i i n r a n g e ( 1 , n+1) :
7 nFact = nFact * i
8 r e t u r n nFact
Listing 4.42 A global variable inside a function definition
Exercises
• Exercise 4.6.4.1: Define a function that computes the product of 3
natural numbers. Use a global variable to store the result of the
computation.
146 Coding in Python
situation in which recursion might make sense. The idea behind this concept
is that the problem can be made weaker and weaker (like using loops) until a
basic case is reached for which we know the answer. Typically, this process
generates a chain or tree of executions that have to be handled either during
the recursion or after the recursion has terminated. In this section we look into
both perspectives, the first one sometimes bringing problems with memory
consumption since all of the executions might cause values that have to be
stored somewhere until we reach the stage of putting everything together to
obtain the result. Listing 4.43 illustrates an example of a traditional recursive
function for computing the factorial of a natural number n. Be careful with
the valid numbers that can be used as the input values for this function.
If a negative number is given as input the result will always be 1 which
is mathematically not correct, that is, undefined. A similar aspect holds for
floating point numbers. Another big issue with recursion, apart from memory
consumption, can be the fact that the termination condition is never reached,
ending in a never-ending recursive call.
1 def n f a c t o r i a l (n) :
2 i f (n > 0) :
3 r e t u r n n* n f a c t o r i a l ( n - 1 )
4 else :
5 return 1
Listing 4.43 A recursive function for computing the factorial of a natural number n
1 def f i b (n) :
2 i f n == 0 :
3 return 0
4 e l i f n == 1 :
5 return 1
6 else
7 return f i b (n -1) + f i b (n -2)
Listing 4.44 The Fibonacci numbers can be computed in a recursive way generating some
kind of tree structure for the recursive calls
⎧
n+1 if m = 0
Ack(m, n) := Ack(m − 1, 1) if m > 0, n = 0 (4.2)
⎩
Ack(m − 1, Ack(m, n − 1)) if m > 0, n > 0
The Ackermann function which got its name after Wilhelm Ackermann is
said to be one of its simplest examples of a function that is total computable
and not primitive recursive as well [188]. From a programming perspective
we can implement the function as in Listing 4.45.
1 d e f a c k e r (m, n ) :
2 i f m == 0 :
3 r e t u r n n+1
4 e l i f m > 0 and n == 0 :
5 r e t u r n a c k e r (m- 1 , 1 )
6 else :
7 r e t u r n a c k e r (m- 1 , a c k e r (m, n - 1 ) )
Listing 4.45 A recursive implementation in Python for the Ackermann function
Not only the memory consumption but also the runtime of such recursive
functions can be terribly high which makes them unusable in its ’traditional’
implementation. This can be seen in the example of the Fibonacci numbers,
but it is even more visible if we run the Ackermann function example. One
problem with the recursion can be that the execution chain or tree gets really
large, another problem can be that many calls get repeatedly computed again
and again although the result is already known. These two problems can be
solved in some cases if we use an iterative version of the recursion, sometimes
also called tail recursion (or iterative recursion, repetitive recursion). The
idea behind tail recursion is that the result of intermediate subexpressions
4.7 More Complex Functions 149
is already computed before the next recursion step is done. This reduces
the required memory a lot and as well the chance to recompute the same
expressions all the time, that is, memory and time can be saved at the same
time with this simple idea. Listing 4.46 gives an example of a tail recursive
function for the Fibonacci numbers.
1 def f i b (n , a , b) :
2 i f n <= 1 :
3 return a + b
4 else :
5 r e t u r n f i b ( n - 1 , b+a , a )
6
7 def t a i l r e c u r s i v e f i b (n) :
8 return f i b (n -1 ,1 ,0)
9
10 print ( t a i l r e c u r s i v e f i b (10) )
Listing 4.46 A tail recursive function for the Fibonacci numbers
The variables a and b in the listing are responsible for the intermediate
computations, hence the next recursion step always needs these intermediate
results to proceed. This reduces the memory consumption.
Exercises
• Exercise 4.7.1.1: Evaluate the Ackermann function for (1,1), (2,2), and
(3,3). What are the results? Do you run into challenges when getting
those results?
• Exercise 4.7.1.2: Define a tail recursive function for reversing a list of
natural numbers.
Exercises
• Exercise 4.7.2.1: Define a higher-order function that creates a function
for adding 3 floating point numbers while one of the numbers is fixed in
the created function that is returned.
• Exercise 4.7.2.2: Define a higher-order function that returns two
functions, one for adding and one for multiplying the two given numbers
similar to the example in Listing 4.47.
1 y = lambda x : x * 4
2 print (y(3) )
3
4 z = lambda x , y : x **2 + 2* x*y - 4* x + 3
5 print ( z (3.4 , 5.1) )
Listing 4.48 Lambda expressions can be used as some kind of anonymous functions
The good thing with lambda expressions comes from the fact that they
can even be included in other functions, like the example we have shown in
Section 4.7.2 on higher-order functions. Here we could modify the example
4.8 Reading and Writing Data 151
Exercises
• Exercise 4.7.3.1: Define an anonymous function for multiplying 3
numbers by using a lambda expression.
• Exercise 4.7.3.2: Define an anonymous function for checking a given
input string on containing exactly 3 digits and 2 uppercase letters.
sampling is done, reading and processing the data only if this is necessary,
but this again brings into play other challenges.
In this section, we are going to first describe user input as a way to
communicate textual information to a system. This ’user data’ can then
be used as a dialogue between the users and the system or it might be
used to analyze user feedback about a system (Section 4.8.1). Directly
reading from a file, either textual or binary data is important to get real
data into a visualization tool, however, the data can come in a multitude
of types and formats that have to be taken into account during reading
and parsing it (Section 4.8.2). In some scenarios we might wish to write
data to a data file, for example after a data exploration process when
storing the relevant information (Section 4.8.3). Sometimes the data to be
explored is stored on a local data source but in many more situations
we can access the data from a server, online, that is, as a web-based
data reading approach, also beneficial for real-time data that is regularly
updated on a server and allows an up-to-date state of the visualization tool
(Section 4.8.4).
One issue with the input function is the fact that Python always expects
the data type String from the input. If we were asking for numeric inputs
like integers or floating point numbers we are not completely lost, but
we have to convert the string into the corresponding number first (see
Listing 4.51).
Exercises
• Exercise 4.8.1.1: Write Python code to ask users about their hobbies and
store those hobbies in a list of strings.
• Exercise 4.8.1.2: Write a function that asks a user about an integer
number n and then computes the factorial of n. Can you also write this
function as an anonymous lambda expression?
With read(), we always read the whole content of the file but in most of
the situations we only want to read a small piece of the file, maybe line by
line or even character by character. Listing 4.53 shows an example for this.
Reading line by line works by using some kind of line iterator that moves one
line further after each call of the function.
154 Coding in Python
If we have finished the task of reading content from a file we have to close
it to avoid ugly side effects like content that is still not read due to internal
issues that we cannot easily understand from a programming perspective.
Such negative issues are typically caused by buffering problems or those
caused by several processes reading the same file or writing on it. This
cannot only happen after reading, but also after writing content to files (see
Section 4.8.3).
If the data format is based on comma-separated values (csv) we can also
use a so-called Pandas dataframe to read the content directly into an internal
data structure. This frees us from reading the text file line by line and from
carefully parsing it into corresponding data structure elements like rows and
columns. The csv format reflects tabular data, typically shown to a user by
using some kind of Excel table. Listing 4.54 illustrates how to read tabular
data from a file by making use of a Pandas dataframe.
1 import pandas a s pd
2
3 d f = pd . read_csv ( "C: \ \ U s e r s \\ M i ch ae l \\ c s v f i l e . c s v " )
Listing 4.54 Reading tabular data by using a Pandas dataframe
1 import i m a g e i o a s i m i o
2
3 # Reading t h e image
4 image = i m i o . imread ( "C: \ \ U s e r s \\ M i c h a el \\ l o g o . png" )
5
6 b i n f i l e = open ( "C: \ \ U s e r s \\ M i ch ae l \\ l o g o . png" , " rb " )
7 t e s t = b i n f i l e . read (10)
8 b i n f i l e . close ()
Listing 4.55 Reading images or binary data cannot be done by the same procedure as for
reading texts
Exercises
• Exercise 4.8.2.1: Write Python code to read a given text file line by line.
Then count the characters by using the len() function for each string and
sum up all numbers to get the size of the file.
• Exercise 4.8.2.2: Read a given text file character by character and reverse
each word in the text file.
When a file does not exist it will be created. In the other case the file
might be overwritten accidentally. The best option is to use the "x" letter
since then an error message will be given if the file already exists to avoid
losing content. After the "x" option has been used the content of the file can
be safely modified.
Exercises
• Exercise 4.8.3.1: Create a new file called myNewFile.txt and put the
numbers from 1 to 1000 on the file.
• Exercise 4.8.3.2: Append the numbers from 2000 to 3000 to the file
myNewFile.txt and output the content by directly reading from the file.
1 from u r l l i b . r e q u e s t import u r l o p e n
2
3 page = u r l o p e n ( " h t t p : / /www. f u t b o l 2 4 . com" )
4
5 c o n t e n t = page . r e a d ( )
6 h t m l t e x t = c o n t e n t . decode ( " u t f - 8 " )
7
8 print ( htmltext )
Listing 4.57 Reading data from a web page
4.9 Object-Oriented Programming 157
Exercises
• Exercise 4.8.4.1: Write code to read the HTML content from the online
version of your favorite newspaper.
• Exercise 4.8.4.2: Write code to fill a list with numeric values by reading
the scores from a web page that provides football tables.
4.9.1 Classes
A class can be regarded as some kind of blueprint that gives us an internal
structure on which each object is based that is created later by using this
158 Coding in Python
specific class. Classes can even be used inside other classes just like a nested
structure. Each class follows some well-defined rules (see Listing 4.58). A
house could be modeled as a class with properties like the number of rooms,
the square meters, the floors, the address, and the like. Moreover, a house
could even have some behavior like being dirty, being empty, getting built,
and the like. A house could also have people living in it, that is, the people
themselves can be a property of the house but the people have properties
as well which can be modeled by another class as well, hence the house
class could include the people class for example. A class does not contain
data or values, it is just a specification which data should be integrated in
the corresponding object later on. With each class we can create as many
instances/objects we need in our program. As a coding convention, class
names are written with an uppercase letter at the beginning to make the
instantiated objects distinguishable later on from standard variables that use
lowercase letters.
1 c l a s s House :
2 toRent = F a l s e # class attribute
3
4 d e f __init__ ( s e l f , rooms , s m e t e r s ) :
5 s e l f . rooms = rooms # instance attribute
6 s e l f . smeters = smeters # instance a t t r i b u t e
Listing 4.58 Creating a class
In Listing 4.58, we see the definition of a House class with a so-called init
function that is used to initialize the later created objects with initial values
for the given parameters, that is, the state of the corresponding object is set.
It may be noted that init can have any number of parameters, however, the
first one is always self which stands for the option to allow new attributes
to be defined for this object. This can be seen in the two code lines right
after the init method, actually setting or assigning the values of the attributes
coming during the creation of the object. They are called instance attributes.
In contrast to instance attributes we can find class attributes that carry the
same value for all instances while the instance attributes are individual values
for each instance of a class.
Exercises
• Exercise 4.9.1.1: Create a class Student that includes typical properties
and behaviors of students.
4.9 Object-Oriented Programming 159
1 House ( 1 5 , 3 3 7 ) # s t o r e d a t a memory a d d r e s s
2
3 myHouse = House ( 1 1 , 2 4 0 ) # a s s i g n e d t o a v a r i a b l e
4 myHouse2 = House ( 9 , 1 8 9 ) # a s s i g n e d t o a n o t h e r v a r i a b l e
5
6 myHouse == myHouse2 # r e s u l t s i n F a l s e
Listing 4.59 Instantiating from a class to get an object
Exercises
• Exercise 4.9.2.1: Write a class Student and create some Student objects.
They should have a name, an age, a list of grades, and a gender.
• Exercise 4.9.2.2: Write a class Car and create some Car objects. Those
objects should be stored in a list.
4.9.3 Methods
Methods are a way to modify the instance attribute values from outside, that
is, after an object has been created we should not modify its state by directly
accessing and changing the values, but instead everything should happen
via methods, a special type of functions that actually belong to instances
of a class. For example, we could directly access the values of the rooms
and smeters instance attribute from the House object myHouse as shown in
Listing 4.60. For class attributes this works in the same style.
1 rooms = myHouse . rooms
2 s q u a r e M e t e r s = myHouse . s m e t e r s
3
4 i s R e n t a b l e = myHouse . toRent
Listing 4.60 Accessing the values of some instance attributes without using methods
Apart from accessing those values they can even be changed in a similar
way just like assigning values to variables. However, this strategy does not
follow the encapsulation principle, the values should only be accessed and
modified by so-called instance methods. Those methods are also defined and
coded in the body of a class and can be used for each instance of that class
in the same way, just like the instance variables. They also start with the self
parameter as a first one in the parameter list which works in the same way as
for the instance variables (see Listing 4.61). The calling syntax, however, is a
bit different than those from standard Python functions. Methods are always
bound to an object, hence they are called by stating the name of the object
first, followed by a dot, followed by the corresponding method name (see
Listing 4.62).
1 c l a s s House :
2 toRent = F a l s e # class attribute
3
4 d e f __init__ ( s e l f , rooms , s m e t e r s ) :
5 s e l f . rooms = rooms # instance attribute
6 s e l f . smeters = smeters # instance a t t r i b u t e
7
4.9 Object-Oriented Programming 161
8 d e f getRoomNumber ( s e l f ) :
9 r e t u r n s e l f . rooms
10
11 d e f getSquarePerRoom ( s e l f ) :
12 r e t u r n s e l f . s m e t e r s / s e l f . rooms
Listing 4.61 Adding methods to a class
1 myHouse = House ( 8 , 2 0 9 )
2
3 rooms = myHouse . getRoomNumber ( )
4 roomAverage = myHouse . getSquarePerRoom ( )
Listing 4.62 Creating an object from a class and calling methods
Exercises
• Exercise 4.9.3.1: Create a class Student, add class and instance
attributes, and complete the class with several methods allowing to get
and set the values of the instance attributes.
• Exercise 4.9.3.2: Add another method to the class that returns the name
of the student in capital letters.
4.9.4 Inheritance
In some real-world scenarios we find objects or persons with similar
properties but they still differ by some other properties. But somehow the
core of each object or person is the same. In such a scenario we wish to
have a strategy that avoids reimplementing the core properties as well as the
functionality all the time. It seems as if the additional properties must be some
kind of new implemented code while the core, that is, the same properties
might be somehow reused from existing code. The principle behind this idea
is called inheritance since it can categorize classes into parent classes and
child classes that get all of the properties and functionality from the parent
classes but they can have more properties and functionality than the parent
classes. This concept forms some kind of hierarchy, however in Python we
can also merge classes, similar to the real-world situation between humans,
but in programming this inheritance concept is even more flexible. Although
the child classes inherit the attributes (properties) and methods (functionality)
from the parent classes they can even use the inherited aspects in a more
specific form while they can also extend their functionality. Listing 4.63
shows examples to create child classes from the parent class House from
162 Coding in Python
1 c l a s s TinyHouse ( House ) :
2 pass
3
4 c l a s s H o t e l ( House ) :
5 pass
6
7 c l a s s TreeHouse ( House ) :
8 pass
9
10 h o t e l C a l i f o r n i a = Hotel (250 ,8346)
11 s m a l l H o u s e = TinyHouse ( 1 , 5 )
12 natureHouse = TreeHouse ( 2 , 8 )
Listing 4.63 Parent and child classes for using the principle of inheritance
Exercises
• Exercise 4.9.4.1: Define another kind of house that inherits from the
House class.
• Exercise 4.9.4.2: Define another kind of hotel that inherits from the
Hotel class.
5
Dashboard Examples
163
164 Dashboard Examples
plots can be controlled while also separate tabs are supported. Moreover, we
introduce a simple external CSS file to show how the interface parameters
can be controlled globally to have an additional mechanism apart from the
inline CSS that can become a tedious task when the user interface components
appear in numerous ways and have to be visually enhanced one-by-one.
Also Plotly templates are introduced in this example. In Section 5.4, we
show how to let an interactive diagram be an input option for another plot.
This concept allows to react on user interactions in the visualizations by
interactively modifying other visualizations which shows a first step of the
popular brushing and linking [243] feature in the research field of information
visualization [245]. The callback mechanism can be based on an arbitrary
number of input and output parameters which is also shown in this dashboard
example. As an add-on to the Plotly diagrams we will use go objects as
an alternative to pure Plotly diagrams. An even more complex dashboard
example is explained in Section 5.5 integrating several plots in the user
interface while also supporting tabs to switch between two visual alternatives.
For example, in a scatter plot we can select point clouds that are then shown
in a density heatmap and the categories of the point distribution is also shown
in a color coded bar chart. Even some more input features are implemented
as dash core components.
To run the dashboard codes successfully we recommend to use the
package versions python 3.9, dash 2.11.1, numpy 1.25, pandas 2.0.3,
dash-bootstrap-components 1.4.1, and scikit-learn 1.3.0. The Python codes
can also be found in a GitHub repository https://github.com/BookDas
hboardDesign. In case the readers have questions they can send them to
BookDashboardDesign@gmail.com to get answers or useful hints.
who is familiar with these kinds of diagrams. We first start with some kind
of hand-drawn mockup to get a better visual idea of what is expected from
such a dashboard (Section 5.1.1). As a next step we illustrate the Python
code for getting this dashboard running (Section 5.1.2). As a last step, we
describe what we will see and which interactions are possible when letting
the code run, that is, we see our designed and implemented dashboard in
action (Section 5.1.3).
5.1.1 A simple dashboard with a histogram
Before implementing a dashboard, it is a good idea to think about its design,
that is, the design [13] of the user interface with all of its components
but also the design of the incorporated visualization techniques. Moreover,
the layout and the aesthetics [38], that is, visual decoration of all of the
components is of importance. To get an impression about all the components
and their locations in the display as well as possible interaction techniques
and how the components are linked to each other it might be good to draw
the dashboard, in the best case by hand since that allows the highest degree of
flexibility [250] (see Section 2.4). Figure 5.1 gives a visual impression about
the ingredients in the dashboard, however, the interaction techniques must be
described in textual form since it is difficult to illustrate them visually due to
the lack of animation in a book.
Figure 5.1 A hand-drawn mockup of a dashboard for interactively modifying the color of a
histogram (drawn by Sarah Clavadetscher).
166 Dashboard Examples
Exercises
• Exercise 5.1.1.1: Design a dashboard that uses a box plot instead of a
histogram to show the data distribution.
• Exercise 5.1.1.2: Design a dashboard that integrates a value slider to
select options for colors like 0 for red, 1 for green, and 2 for blue.
The dashboard or app gets actually started in Line 7 with the creation of
a Dash object. Since each visualization needs some kind of data we generate
our own artificial dataset [134] which allows us some flexibility in the dataset
size, structure, and complexity and we are not restricted to a specific dataset
case. In Line 12, the data generation process is illustrated by using a Pandas
DataFrame that consists of random normal distributed data, that is, it is
actually univariate data just mapping a number to each data object while
each object can be represented on a numerical scale. As we already know,
one traditional and prominent diagram for this type of data is the so-called
histogram which we will also use in the dashboard. Lines 17 to 24 illustrate
how the dashboard’s layout can be built. Since our dashboard is similar to a
web page, we can make use of HTML and in particular, the division element
(div) to hierarchically structure the web page. We can see some components,
the first one given in Line 18 as a title of the dashboard in H1 font size. Lines
19 to 22 add a drop-down menu for the three color options as a dash core
component with some additional properties. In Line 23 we also add a graph
as a dash core component which can actually be any diagram but we already
decided to integrate a histogram for the univariate data.
The callback mechanism is coded in Lines 26 to 29. We see that it is
composed of inputs and outputs, in this simple dashboard we only allow one
input (a drop-down menu) and one output (a diagram which is a histogram
in this special case). The following function that is responsible for updating
the dashboard and which corresponds to the callback mechanism is located
right below the callback (Lines 31 to 38) and must have the same signature as
the callback itself, otherwise it runs into compilation errors, or even semantic
5.1 Modifying the Color in a Diagram 169
errors in case the input and output types are the same but the values are mixed
in some way. The update function can be named as the developer likes but it
may be noted that in case we have many callbacks and many such update
functions they should have different, that is, unique names. In the update
function in this example we see an input parameter which gives the color
value that is modifiable in the drop-down menu (Line 31) as well as one
return parameter which is the updated figure, in this case a histogram (Line
38). The histogram itself gets the artificial dataset as a dataframe (Line 33),
an attribute named number (Line 34), and a color coding (Lines 35 and 36).
In Lines 40 and 41 the dashboard is started on a server, which is in its current
implementation the localhost.
Exercises
• Exercise 5.1.2.1: Implement a dashboard that shows a box plot for which
we can interactively manipulate the color by using a drop-down menu.
• Exercise 5.1.2.2: Implement a dashboard that uses a slider instead of a
drop-down menu to select the colors with options like 0 = red, 1 = green,
and 2 = blue.
Figure 5.2 After executing the dashboard code we get this graphical user interface (dash-
board) with a drop-down menu and a blue colored histogram.
spanning a few pixels in horizontal direction. This gives room for further
layout improvements and adjustments later on. We can find the same negative
issue in the histogram which is currently horizontally stretched. In the next
implementation iterations, we will incorporate more and more functionality,
but we also look into aesthetic improvements and visual decorations.
Exercises
• Exercise 5.1.3.1: Check the features provided in the dashboard given
in Figure 5.2. How would you add more options for colors in the
dashboard?
• Exercise 5.1.3.2: For the dashboard in Figure 5.2, we could also integrate
other diagram types apart from a histogram. Which ones do you consider
useful for the same dataset and how do you integrate them in the
dashboard?
allow more flexibility for the layout of the dashboard. The input options are
a drop-down menu and a slider with which a numeric value can be selected
that has an impact on one or several diagrams showing data in a visual way.
A histogram is useful for univariate data, that is, data which is just measured
under one attribute. A scatter plot, on the other hand, can be used to show
correlations between two chosen attributes, that is, bivariate data. Each data
element is measured under two, typically numeric, attributes which allows
some kind of spatial representations for each of the two-dimensional data
points. The distribution of the points in the two-dimensional plane can be
visually explored for patterns, for example positive or negative correlations.
However, a static scatter plot will only tell us half of the truth, hence it is
a good idea to allow interactions like filtering for a certain numeric value.
The section is organized as follows: In Section 5.2.1, we introduce our design
idea coming as a hand-drawn mockup with descriptions about the individual
components and interaction techniques. Then in Section 5.2.2, we explain the
code to implement such a dashboard while finally, we show the result of the
running code as a screenshot (Section 5.2.3).
Figure 5.3 A mockup of a dashboard with a drop-down menu and a slider for manipulating
the color of a histogram and for filtering a scatter plot (drawn by Sarah Clavadetscher).
Exercises
• Exercise 5.2.1.1: Design a dashboard that integrates a range slider
instead of a regular slider for the scatter plot.
• Exercise 5.2.1.2: Design a dashboard that shows the scatter plot on the
left-hand side and the histogram on the right-hand side. Moreover the
inputs in form of a drop-down menu and a slider should be placed below
the diagrams and not above them.
• math: This module is needed for mathematical functions like ceil, floor,
factorial, comb, and many more.
• dash_bootstrap_components: This library consists of so-called bootstrap
components with the purpose of styling dashboards and apps, that is,
with a focus on user interface layouts for example.
1 import math
2 from dash import Dash , dcc , html , Input , Output
3 import p l o t l y . e x p r e s s a s px
4 import numpy a s np
5 import pandas a s pd
6 import dash_bootstrap_components a s dbc
7
8 app = Dash (__name__,
9 e x t e r n a l _ s t y l e s h e e t s =[ dbc . themes .BOOTSTRAP] )
10
11 # g e n e r a t e random normal d i s t r i b u t e d data f o r x and y
12 # and s t o r e i t i n a pandas DataFrame
13
14 d f = pd . DataFrame ( { ’ y ’ : np . random . normal ( l o c =0 ,
15 s c a l e =10 ,
16 s i z e =1000) ,
17 ’ x ’ : np . random . normal ( l o c =10 ,
18 s c a l e =2,
19 s i z e =1000) } )
20
21 app . l a y o u t = html . Div ( [
22 html . H1( " Dashboard 2 " ) ,
23
24 dbc . Row ( [
25 dbc . Col ( [ dcc . Dropdown ( o p t i o n s =[ ’ r e d ’ ,
26 ’ green ’ ,
27 ’ blue ’ ] ,
28 v a l u e= ’ r e d ’ ,
29 i d= ’ c o l o r ’ ,
30 m u l t i=F a l s e ) ] , width =6) ,
31 dbc . Col ( [ dcc . S l i d e r ( min=math . f l o o r ( d f [ ’ y ’ ] . min ( ) ) ,
32 max=math . c e i l ( d f [ ’ y ’ ] . max ( ) ) ,
33 i d=" min_value " )
34 ] , width =6)
35 ]) ,
36
37 dbc . Row ( [
38 dbc . Col ( [
39 dcc . Graph ( i d=" graph_1 " )
174 Dashboard Examples
40 ] , width =6) ,
41
42 dbc . Col ( [
43 dcc . Graph ( i d=" graph_2 " )
44 ] , width =6)
45 ])
46
47 ] , className="m- 4 " )
48
49
50 @app . c a l l b a c k (
51 Output ( " graph_1 " , " f i g u r e " ) ,
52 Input ( " c o l o r " , " v a l u e " )
53 )
54 d e f update_graph_1 ( dropdown_value_color ) :
55
56 f i g = px . h i s t o g r a m ( df ,
57 x="y" ,
58 c o l o r _ d i s c r e t e _ s e q u e n c e=
59 [ dropdown_value_color ] )
60 f i g . update_layout ( )
61 return f i g
62
63
64 @app . c a l l b a c k (
65 Output ( " graph_2 " , " f i g u r e " ) ,
66 Input ( " min_value " , " v a l u e " )
67 )
68 d e f update_graph_2 ( min_value ) :
69 d f f = d f [ d f [ ’ y ’ ]> min_value ]
70 f i g = px . s c a t t e r ( d f f , x= ’ x ’ , y= ’ y ’ )
71 f i g . update_layout ( )
72 return f i g
73
74 i f __name__ == ’__main__ ’ :
75 app . r u n _ s e r v e r ( debug=True , p o r t =8000)
Listing 5.2 Including a histogram and a scatter plot in a dashboard with additional bootstrap
for the layout
After all imports have been made the rest of the code describes the
functionality and features of the dashboard. Lines 8 and 9 initialize the
dashboard and include the bootstrap to improve the layout of the user
interface. The data is artificially generated in Lines 14 to 19 as a Pandas
DataFrame with 2 attributes called ’x’ and ’y’. The data has the additional
property that it is normally distributed in both data dimensions. The layout
5.2 Two Diagrams, Bootstrap, and Value Filter 175
Figure 5.4 A grid layout may consist of a number of rows and columns, like 2 of them as
in this case.
Exercises
• Exercise 5.2.2.1: Modify the dashboard code in a way that the scatter
plot can be filtered with a range slider allowing an interval of numeric
values.
• Exercise 5.2.2.2: Change the input–output mechanism: The scatter plot
should allow to modify its color by a drop-down menu and the histogram
should be filtered for value intervals, on the x-axis but also on the y-axis.
Figure 5.5 The extended dashboard will show a few more features than the one given in
Section 5.1.1. Now, we can see a slider and a scatter plot as well. Moreover, we also have to
care for a good layout of the components although we just have 4 of them at the moment.
Exercises
• Exercise 5.2.3.1: Modify the scatter plot in the dashboard to let it also
visually encode data in the size and shape of the points.
• Exercise 5.2.3.2: Modify the scatter plot in the dashboard to let it
use intervals for the numeric values instead of discrete numbers. Each
interval should be visually encoded as a point size and/or a point shape.
active at the moment on users’ demands, that is, users’ current workspace is
exactly there while the other features and functionalities are still reachable in
a quick way, just by clicking on one of the other provided tabs. Such a concept
will be illustrated in the dashboard in this section while from a visualization
perspective we will focus again on simple visualization techniques like a
histogram and a scatter plot. The readers can create now their own visual
features and exchange the existing ones with their own ones. The histogram’s
color can still be modified by using a drop-down menu with colors while
the scatter plot values can be filtered by a slider. The functionality is clearly
separated, that is, the drop-down menu belongs to the histogram and the
slider belongs to the scatter plot. Figure 5.6 illustrates how such a dashboard
can be imagined before we can implement its functionality in form of user
interface components, diagrams, and interactions. We also integrate CSS
as a concept to globally guide the appearance and layout of the individual
components. The Plotly diagrams can now be based on a certain template
as well, for example to let all of them look consistently, this idea might be
regarded as a similar idea to CSS while with CSS we actually guide and
equip the user interface components with additional features, not primarily
the visualizations and diagrams.
Figure 5.6 A hand-drawn mockup of a dashboard for displaying data in a histogram and
a scatter plot while both diagrams and their inputs can be given a specific focus by a tab
mechanism (drawn by Sarah Clavadetscher).
5.3 Dashboard with Tabs, CSS, and Plotly Template 179
Exercises
• Exercise 5.3.1.1: Design a dashboard that contains four tabs with four
diagrams and corresponding interaction options like a drop-down menu,
a slider, a text field, and a date picker.
• Exercise 5.3.1.2: Which kinds of features might be important to
dynamically adapt in a dashboard, that is, on users’ demands?
Discuss!
set to 25 px. The margins are also set for the content (second part) in the same
manner while the third part just sets the margin to the top to 60 px which is
the tab content. Actually, in the CSS file, we can define nearly any kind of
additional property a certain user interface component should have, not only
the margins but also colors, font sizes, border sizes, backgrounds, and many
more.
1 . header {
2 margin : 0px 25 px 25 px 25 px ;
3 /* margin - top margin - r i g h t margin - bottom margin - l e f t */
4 }
5
6 . content {
7 margin : 0px 25 px 25 px 25 px ;
8 }
9
10 . tab_content {
11 margin - top : 60 px ;
12 }
Listing 5.3 A CSS file for improving the layout and aesthetics of the user interface of the
dashboard
Listing 5.4 illustrates the code for the dashboard shown in the hand-drawn
mockup in Figure 5.6. The imported modules are already familiar from the
previous dashboard examples, consequently we will directly jump into the
Python code.
1 import math
2
3 from dash import Dash , dcc , html , Input , Output
4 import p l o t l y . e x p r e s s a s px
5 import numpy a s np
6 import pandas a s pd
7 import dash_bootstrap_components a s dbc
8
9 # new : Tabs f o r a b e t t e r o v e r v i e w
10
11 # new : e x t e r n a l CSS -> main . c s s
12 # ( n o t h i n g must be changed i n t h e code
13 # i f css f i l e in folder ’ assets ’
14
15 # new : p l o t l y t e m p l a t e="p l o t l y _ w h i t e "
16 # h t t p s : / / p l o t l y . com/ python / t e m p l a t e s /
5.3 Dashboard with Tabs, CSS, and Plotly Template 181
17
18 app = Dash (__name__,
19 e x t e r n a l _ s t y l e s h e e t s =[ dbc . themes .BOOTSTRAP] )
20
21 # g e n e r a t e random normal d i s t r i b u t e d data
22 # f o r x and y and s t o r e i t i n a pandas DataFrame
23
24 d f = pd . DataFrame ( { ’ y ’ : np . random . normal ( l o c =0 ,
25 s c a l e =10 ,
26 s i z e =1000) ,
27 ’ x ’ : np . random . normal ( l o c =10 ,
28 s c a l e =2,
29 s i z e =1000) } )
30
31 app . l a y o u t = html . Div ( [
32 html . Div (
33 [ html . H1( " Dashboard 3 " ) ] ,
34 className=" h e a d e r " ) ,
35 html . Div ( [
36 dcc . Tabs ( i d=" t a b s " ,
37 c h i l d r e n =[
38 dcc . Tab ( l a b e l= ’ Tab One ’ ,
39 i d=" tab_1_graphs " ,
40 c h i l d r e n =[
41 html . Div ( [
42 dbc . Row ( [
43 dbc . Col ( [ dcc . Dropdown (
44 o p t i o n s =[ ’ r e d ’ ,
45 ’ green ’ ,
46 ’ blue ’ ] ,
47 v a l u e= ’ r e d ’ ,
48 i d= ’ c o l o r ’ ,
49 m u l t i=F a l s e ) ] ,
50 width =6) ,
51 dbc . Col ( [ dcc . S l i d e r (
52 min=math . f l o o r (
53 d f [ ’ y ’ ] . min ( ) ) ,
54 max=math . c e i l (
55 d f [ ’ y ’ ] . max ( ) ) ,
56 i d=" min_value " ) ] ,
57 width =6)
58 ]) ,
59 dbc . Row ( [
60 dbc . Col ( [
61 dcc . Graph ( i d=" graph_1 " ) ] ,
62 width =6) ,
182 Dashboard Examples
63 dbc . Col ( [
64 dcc . Graph ( i d=" graph_2 " ) ] ,
65 width =6)
66 ])
67 ] , className=" tab_content " ) ,
68 ]) ,
69 dcc . Tab ( l a b e l= ’ Tab Two ’ ,
70 i d=" tab_2_graphs " , c h i l d r e n =[
71 html . Div ( [ ] ,
72 className=" tab_content " )
73 ]) ,
74 ])
75 ] , className=" c o n t e n t " )
76 ])
77
78 @app . c a l l b a c k (
79 Output ( " graph_1 " , " f i g u r e " ) ,
80 Input ( " c o l o r " , " v a l u e " )
81 )
82 d e f update_graph_1 ( dropdown_value_color ) :
83 f i g = px . h i s t o g r a m ( df ,
84 x="y" ,
85 c o l o r _ d i s c r e t e _ s e q u e n c e =[
dropdown_value_color ] )
86 f i g . update_layout ( t e m p l a t e=" p l o t l y _ w h i t e " )
87 return f i g
88
89 @app . c a l l b a c k (
90 Output ( " graph_2 " , " f i g u r e " ) ,
91 Input ( " min_value " , " v a l u e " )
92 )
93 d e f update_graph_2 ( min_value ) :
94 i f min_value :
95 d f f = d f [ d f [ ’ y ’ ] > min_value ]
96 else :
97 d f f = df
98 f i g = px . s c a t t e r ( d f f , x= ’ x ’ , y= ’ y ’ )
99 f i g . update_layout ( t e m p l a t e=" p l o t l y _ w h i t e " )
100 return f i g
101
102 i f __name__ == ’__main__ ’ :
103 app . r u n _ s e r v e r ( debug=True , p o r t =8000)
Listing 5.4 A dashboard using tabs and CSS as well as a Plotly template
5.3 Dashboard with Tabs, CSS, and Plotly Template 183
The code for this dashboard is a bit more complex than the codes for the
two dashboards before. This is due to the fact that we included more features
and concepts, with CSS, tabs, and Plotly templates among them. In Lines 18
and 19, we initialize the dashboard by including external stylesheets with the
bootstrap mechanism. Lines 24 to 29 are responsible for generating artificial
data based on a random normal distribution. In cases we need other artificial
data or real-life data, this is the place in the code how to put any kind of data
into a Pandas dataframe.
With Line 31, we begin setting the layout of the dashboard by using the
HTML division element again. This div element is split into two subelements
allowing to split the display area for our dashboard (typically the computer
monitor) into two actually equally-sized subregions that we can fill with
components separately. The first subregion in Lines 32 to 34 is just creating
some kind of title for the dashboard followed by the CSS styles coming
from the main.css file given in Listing 5.3 by using the className variable
set to "header." The header information can be found in the CSS file in the
corresponding section. The next subregion is introduced in Line 35 with the
next div element. This time the subregion looks a bit more complex starting
with the dash core component Tabs given the id "tabs." This component can
have as many children as we like, in our case just two, representing the two
tabs we are planning to integrate. Each tab itself can be added as a core
component (dcc) starting with tab one in Line 38 giving it a label and an
id again, to later reference and access it with our callback mechanism. Also
the tab itself can be suborganized by again using the HTML div component.
Now the bootstrap comes into play organizing the dashboard’s user interface
into rows and columns including the drop-down menu and the slider in the
first row and the two diagrams in the second row (Lines 42 to 67). It may
be noted that the drop-down (Lines 43 to 49) can be decorated on designers’
demands as well as the slider (Lines 51 to 56). The CSS styles are based on
the tab_content section from the main CSS file in Listing 5.3. The second tab
(Lines 69 to 74) is just shown for illustrative purposes, at the moment there is
not much functionality, but in the next dashboard we will also fill this tab with
more functionality. The entire style of this dashboard component is based on
the style given by the "content" section of the CSS file (Line 75).
In this dashboard we can find two callbacks, one for the dialogue between
the histogram and the user via a drop-down menu and one for the user
dialogue via a slider with the scatter plot. The first callback starts in Line
78 and defines one input value for the color selection and one output value
for the corresponding figure which is a histogram in this special case (Lines
184 Dashboard Examples
78 to 81). The update function for this callback can be seen in the following
code lines (Lines 82 to 87). We see that a histogram is created with Plotly
with the data as a Pandas dataframe and further parameters. In Line 86 we
additionally find the template information given as "plotly_white." Finally,
the created figure is returned. The second callback starts in Line 89 with
an input value for the filter and a corresponding figure (a scatter plot) as
return value. The corresponding update function describes how this filter
value has to be handled and which impact it has on a created scatter plot
(Lines 93 to 100).
Exercises
• Exercise 5.3.2.1: Implement functionality and features for the second
tab in the dashboard application and test it.
• Exercise 5.3.2.2: Create a dashboard with three tabs instead of two.
Figure 5.7 A dashboard showing two tabs while tab one has the focus at the moment. Two
diagrams are integrated: A histogram (left) and a scatter plot (right).
5.4 Inputs from a Plot and Plotly Go 185
Exercises
• Exercise 5.3.3.1: Create a dashboard with four tabs in which each tab
should be used to switch to a new diagram. You can use the same data
generator as in the dashboards explained before.
• Exercise 5.3.3.2: Modify the external CSS file to also adapt the
background colors of the tabs. Moreover, use a different Plotly template
instead of "plotly_white" to get another visual appearance of the
diagrams.
Figure 5.8 A hand-drawn mockup for a user interface of a dashboard with a scatter plot,
allowing to select a point cloud for which we see the point distribution in a linked and color
coded bar chart (drawn by Sarah Clavadetscher).
Exercises
• Exercise 5.4.1.1: Integrate a fourth color in the drop-down menu. Extend
the dashboard by this new color.
• Exercise 5.4.1.2: Apart from the visual variable color we could also add
the shape of the points in the scatter plot. Extend the dashboard to also
allow colors and shapes, for the drop-down menu, for the scatter plot,
and for the bar chart.
14
15 app = Dash (__name__,
16 e x t e r n a l _ s t y l e s h e e t s =[ dbc . themes .BOOTSTRAP] )
17
18 # g e n e r a t e random normal d i s t r i b u t e d data
19 # f o r x and y and s t o r e i t i n a pandas DataFrame
20
21 d f = pd . DataFrame ( { ’ y ’ : np . random . normal ( l o c =0 ,
22 s c a l e =10 ,
23 s i z e =1000) ,
24 ’ x ’ : np . random . normal ( l o c =10 ,
25 s c a l e =2,
26 s i z e =1000) } )
27
28 # define cluster colors
29
30 COLORS = { ’ 0 ’ : " r e d " ,
31 ’ 1 ’ : " blue " ,
32 ’ 2 ’ : " grey "}
33
34 X, y = make_blobs ( n_samples =100 ,
35 c e n t e r s =3,
36 n _ f e a t u r e s =2 ,
37 random_state =0)
38 c l u s t e r _ d f = pd . DataFrame ( data=X,
39 gcolumns =[ "X" , "Y" ] )
40 cluster_df [ ’ c l u s t e r ’ ] = [ str ( i ) for i in y ]
41
42 app . l a y o u t = html . Div ( [
43 html . Div (
44 [ html . H1( " Dashboard 4 " ) ] ,
45 className=" h e a d e r " ) ,
46 html . Div ( [
47 dcc . Tabs ( i d=" t a b s " ,
48 c h i l d r e n =[
49 dcc . Tab ( l a b e l= ’ Tab One ’ ,
50 i d=" tab_1_graphs " , c h i l d r e n =[
51 html . Div ( [
52 dbc . Row ( [
53 dbc . Col ( [
54 dcc . Dropdown (
55 o p t i o n s =[ ’ r e d ’ ,
56 ’ green ’ ,
57 ’ blue ’ ] ,
58 v a l u e= ’ r e d ’ ,
59 i d= ’ c o l o r ’ ,
5.4 Inputs from a Plot and Plotly Go 189
60 m u l t i=F a l s e ) ] ,
61 width =6) ,
62 dbc . Col ( [
63 dcc . S l i d e r ( min=
64 math . f l o o r (
65 d f [ ’ y ’ ] . min ( ) ) ,
66 max=math . c e i l (
67 d f [ ’ y ’ ] . max ( ) ) ,
68 i d=" min_value " )
69 ] , width =6)
70 ]) ,
71 dbc . Row ( [
72 dbc . Col ( [
73 dcc . Graph ( i d=" graph_1 " )
74 ] , width =6) ,
75 dbc . Col ( [
76 dcc . Graph ( i d=" graph_2 " )
77 ] , width =6)
78 ])
79 ] , className=" tab_content " ) ,
80 ]) ,
81 dcc . Tab ( l a b e l= ’ Tab Two ’ ,
82 i d=" tab_2_graphs " ,
83 c h i l d r e n =[
84 html . Div ( [
85 dbc . Row ( [
86 dbc . Col ( [
87 dcc . Graph (
88 i d=" graph_3 " )
89 ] , width =8) ,
90 dbc . Col ( [
91 dcc . Graph (
92 i d=" graph_4 " )
93 ] , width =4)
94 ])
95 ] , className=" tab_content " )
96 ]) ,
97 ])
98 ] , className=" c o n t e n t " )
99 ])
100
101 @app . c a l l b a c k (
102 Output ( " graph_1 " , " f i g u r e " ) ,
103 Input ( " c o l o r " , " v a l u e " )
104 )
105 d e f update_graph_1 ( dropdown_value_color ) :
190 Dashboard Examples
106 f i g = px . h i s t o g r a m ( df ,
107 x="y" ,
108 c o l o r _ d i s c r e t e _ s e q u e n c e=
109 [ dropdown_value_color ] )
110 f i g . update_layout ( t e m p l a t e=" p l o t l y _ w h i t e " )
111 return f i g
112
113 @app . c a l l b a c k (
114 Output ( " graph_2 " , " f i g u r e " ) ,
115 Input ( " min_value " , " v a l u e " )
116 )
117 d e f update_graph_2 ( min_value ) :
118 i f min_value :
119 d f f = d f [ d f [ ’ y ’ ] > min_value ]
120 else :
121 d f f = df
122
123 f i g = px . s c a t t e r ( d f f , x= ’ x ’ , y= ’ y ’ )
124 f i g . update_layout ( t e m p l a t e=" p l o t l y _ w h i t e " )
125 return f i g
126
127 @app . c a l l b a c k ( Output ( " graph_3 " , " f i g u r e " ) ,
128 Output ( " graph_4 " , " f i g u r e " ) ,
129 In p u t ( " graph_3 " , " r e l a y o u t D a t a " )
130 )
131 d e f update_graph_3_and_4 ( s e l e c t e d _ d a t a ) :
132 i f s e l e c t e d _ d a t a i s None o r
133 ( i s i n s t a n c e ( s e l e c t e d _ d a t a , d i c t ) and
134 ’ x a x i s . r a n g e [ 0 ] ’ not i n s e l e c t e d _ d a t a ) :
135 cluster_dff = cluster_df
136 else :
137 cluster_dff =
138 c l u s t e r _ d f [ ( c l u s t e r _ d f [ ’X ’ ] >=
139 selected_data . get ( ’ xaxis . range [ 0 ] ’)) &
140 ( c l u s t e r _ d f [ ’X ’ ] <=
141 selected_data . get ( ’ xaxis . range [ 1 ] ’)) &
142 ( c l u s t e r _ d f [ ’Y ’ ] >=
143 selected_data . get ( ’ yaxis . range [ 0 ] ’)) &
144 ( c l u s t e r _ d f [ ’Y ’ ] <=
145 selected_data . get ( ’ yaxis . range [ 1 ] ’))]
146
147 f i g 3 = px . s c a t t e r ( c l u s t e r _ d f f ,
148 x="X" ,
149 y="Y" ,
150 c o l o r=" c l u s t e r " ,
151 c o l o r _ d i s c r e t e _ m a p=COLORS,
5.4 Inputs from a Plot and Plotly Go 191
152 c a t e g o r y _ o r d e r s=
153 {" c l u s t e r " : [ "0" , "1" , "2" ] } ,
154 h e i g h t =750)
155
156 f i g 3 . update_layout ( t e m p l a t e=" p l o t l y _ w h i t e " ,
157 c o l o r a x i s _ s h o w s c a l e=F a l s e )
158 f i g 3 . u p d a t e _ t r a c e s ( marker=d i c t ( s i z e =8) )
159
160 group_counts =
161 c l u s t e r _ d f f [ [ ’ c l u s t e r ’ , ’X ’ ] ] .
162 groupby ( ’ c l u s t e r ’ ) . count ( )
163
164 f i g 4 = go . F i g u r e (
165 data =[ go . Bar (
166 x=group_counts . index ,
167 y=group_counts [ ’X ’ ] ,
168 marker_color=
169 [COLORS. g e t ( i ) f o r i i n group_counts . i n d e x ]
170 ) ])
171
172 f i g 4 . update_layout ( h e i g h t =750 ,
173 t e m p l a t e=" p l o t l y _ w h i t e " ,
174 t i t l e ="<b>Counts p e r c l u s t e r </b>" ,
175 x a x i s _ t i t l e=" c l u s t e r " ,
176 t i t l e _ f o n t _ s i z e= 25
177 )
178
179 return fig3 , f i g 4
180
181 i f __name__ == ’__main__ ’ :
182 app . r u n _ s e r v e r ( debug=True , p o r t =8012)
Listing 5.5 A dashboard with more than one plot in a callback and additionally the Plotly
go object
The major implementation concepts in the code after the imports can be
described as follows. In Line 15, the dashboard is initialized with the external
style sheets from the dash bootstrap components. Lines 21 to 26 generate
an artificial dataset based on a random normal distribution. A constant
named COLORS is defined in Lines 30 to 32 with the colors mapped to
numeric information, for the colors of individual point clusters later on in
the visualization. The following code in Lines 34 to 40 creates clusters of
data.
The layout of the dashboard is built with Line 42 and the next ones, again
based on splitting the display area with the HTML div element. This layout
192 Dashboard Examples
strategy is quite similar to the one in the previous dashboard. There is a tab
one with two rows and two columns (see Figure 5.4). The first row contains
the dash core components for the user input like a drop-down menu and a
slider while row two integrates the two diagrams in form of a scatter plot
and a bar chart. Tab two just contains one row with two columns for two
more diagrams. It may be noted that the widths are arranged differently in the
second tab with an 8-to-4 ratio while in tab one we had an equal 6-to-6 ratio.
There are three callback mechanisms in this dashboard code. The first
one starts in Line 101 having one input as a color value from a drop-down
menu and one output as a figure, that is, a diagram which is a histogram
in this special case as we can see in the corresponding update function
(Lines 105 to 111). Moreover, the figure uses a special update of its layout
based on a template called "plotly_white" which was already described in the
previous dashboard example. The second callback (Lines 113 to 116) with its
corresponding update function (Lines 117 to 125) is responsible for reacting
on the slider input, that is, if the user interactively changes a value by using
the slider, this value is directly passed to the corresponding scatter plot as
desired with a filter function implemented. This filter works on a copy of the
Pandas dataframe (Lines 118 to 121). The filtered data is then given to the
scatter plot while again the template is set to "plotly_white" (Line 124).
The third callback starting in Line 127 with its update function starting in
Line 131 is the most complex one compared to the previous two callbacks,
including some new concepts and features. First of all, we see one input which
stems from a graph called "graph_3" and which is passed to two outputs, that
is, the "graph_3" itself and a different graph called "graph_4." This is the idea
of allowing brushing and linking, meaning the selected data elements in a
diagram can be the input for a different diagram which actually sends data
between diagrams and not just "pure" inputs from dash core components in
the form of sliders, menus, date pickers, and many more. Lines 132 to 145
define the selected data and create a cluster variable "cluster_dff" based on
an original variable "cluster_df." The updated diagram (Lines 147 to 154) is
then based on this filtered data, that is, the selected data elements are actually
color coded by using the defined colors from the COLORS constant in Lines
30 to 32. In Lines 156 to 158, we set the layout of the diagram based on
the template again, and we update the traces. To create the corresponding bar
chart with which the scatter plot is linked we first need to count the number
of selected points together with their category, that is, color. This is done in
Lines 160 to 162 and stored in a variable "group_counts." Lines 164 to 170
create the bar chart by a new concept which is based on the so-called graph
5.4 Inputs from a Plot and Plotly Go 193
objects in Plotly. Those go objects have a different syntax than the pure Plotly
express diagrams as we can see in the code lines. Finally, in Lines 172 to 177,
the layout of the bar chart is updated by setting the height, the template, the
title, the description on the x-axis, and additionally a font size to the value 25.
In Line 179, both diagrams (the scatter plot and the bar chart) are returned
which is inline with the corresponding callback mechanism in Lines 127 to
130 (one input, two outputs).
Exercises
• Exercise 5.4.2.1: The selected data points in the scatter plot should also
be represented in a new scatter plot, only showing the selected points.
• Exercise 5.4.2.2: Reimplement the dashboard to let the selected data
points appear in a highlighted yellow color.
Figure 5.9 Tab two is active in this dashboard showing a scatter plot with color coded data
points and a linked bar chart in which the selected point clouds are visually encoded and
categorized by their colors/categories.
194 Dashboard Examples
Exercises
• Exercise 5.4.3.1: Add a third tab in which we can see the distribution of
the selected data elements from the scatter plot based on their occurrence
on the x- and y-axis.
• Exercise 5.4.3.2: Create a three-dimensional scatter plot and integrate a
point selection mechanism. Discuss the usefulness of three-dimensional
visualizations in information visualization.
5.5 Two Tabs, Three Plots in One Tab, and Several Inputs
In this next dashboard example, we would like to extend the previous ideas
by a separate tab that supports the interactive and visual exploration of
trivariate data by means of a color coded scatter plot. These points are
embedded in the two-dimensional plane with an additional color coding
that visually encodes the third attribute while the other two of the trivariate
data are encoded in the x- and y-axes. The scatter plot allows brushing and
linking and the selected data points are shown in a corresponding bar chart
reflecting the data distribution of the selected point clouds separated in their
color categories. Moreover, we require a third diagram that can display the
density information of the selected point clouds in the scatter plot based
on the powerful concept of heatmaps [30]. As illustrated in this even more
complex dashboard example, the designer and implementor can build more
and more features and functions, linked to each other, structured into feature
and function groups by so-called tabs. However, it may be noted that we
should not create too many of such tabs to avoid an information overflow and
an increase of the cognitive efforts and a steep learning curve for our users.
The section is organized as follows: In Section 5.5.1 we introduce a
hand-drawn mockup showing the major features in this dashboard. We mainly
focus on the visual components and the additional interaction techniques
compared to the previously described dashboards. In Section 5.5.2, we look
into the details of the corresponding Python code and describe the most
important code components to get the dashboard running in its desired form.
In the last part (Section 5.5.3), we show the visual outputs of the code after
we let it run to give the readers an impression about how the dashboard will
look like after executing the code. We recommend the readers of the book to
test the code by themselves, modify it and check the new results. Extending
the code step-by-step might help to understand the dashboard design and
implementation on an experimental basis.
5.5 Two Tabs, Three Plots in One Tab, and Several Inputs 195
Exercises
• Exercise 5.5.1.1: Design a dashboard with six different visualization
techniques showing the same dataset from six different perspectives.
• Exercise 5.5.1.2: Which interactions are important for such a dashboard
scenario and which of your diagrams should be linked and in
what way?
1 *{
2 padding : 0px ;
3 margin : 0px ;
4 box - s i z i n g : border - box ;
5 }
6
7 html {
8 }
9
10 body {
11 font - family : Lato , sans - s e r i f ;
12 font - weight : 400;
13 margin - l e f t : 15 px ;
14 margin - r i g h t : 15 px ;
15 }
16
17 . container {
18 margin : 0 auto ;
19 max - width : 2000 px ;
20 }
21
22 . header - - t i t l e {
23 margin - top : 20 px ;
24 margin - bottom : 10 px ;
25 }
26
27 . tab1 - - main - c o n t a i n e r {
28 display : grid ;
29 g r i d - template - columns : 1 f r 1 f r ;
30 }
31
32 . graph {
33 max - width : 100%;
34 h e i g h t : 700 px ;
35 margin - top : 15 px ;
36 }
37
38 . graph - c o n t r o l {
39 max - width : 50%;
40 margin : 0 auto ;
41 margin - top : 50 px ;
42 }
43
44 . tab2 - - main - c o n t a i n e r {
45 display : grid ;
46 g r i d - template - columns : 1 f r 1 f r 1 f r ;
198 Dashboard Examples
47 column - gap : 20 px ;
48 row - gap : 10 px ;
49 margin - top : 50 px ;
50 }
51
52 . graph_3 {
53 g r i d - column : 1 / - 2 ;
54 }
55
56 . graph_5 {
57 g r i d - column : 1 / - 1 ;
58 }
59
60 . graph_5_separated l a b e l : l a s t - c h i l d {
61 margin : 20 px
62 }
Listing 5.6 The external CSS file for a dashboard
Listing 5.7 starts again with the import of the relevant modules (Lines 1
to 8). Most of them we have used already in the dashboard examples before.
A new one in the code is:
• helpers: This module allows to import data-related functionality like the
generation of random data, the generation of cluster data, or the update
of selected data.
We define a color constant again in Lines 18 to 20 and use external
stylesheets in Lines 22 to 25 which are included in Lines 28 and 29 when
starting the app. The random data is generated in Line 31 with a seed of value
8 and in Line 32 we create additional clusters. The layout of the dashboard
is then defined starting in Line 34 with the HTML division element again
that creates a header and another division element (Line 37). In this container
we start building tabs (Line 38) with several subtabs organized as children.
In tab 1 (Lines 39 to 69) we find again the HTML div elements to organize
and layout the dash core components which are a dropdown menu (Lines 42
to 51) and a graph which stands for a Plotly diagram (Lines 52 to 54). The
variable className are used to attach the external CSS file features to the
corresponding core components in the dashboard. A second subcomponent of
tab 1 is built by the slider (Lines 57 to 63) and by another graph representing
a Plotly diagram (Lines 64 to 66). As we can see in this example there
are several className variables attached to the used components, always
defining additional layouts and visual properties.
5.5 Two Tabs, Three Plots in One Tab, and Several Inputs 199
Tab 2 is coded in Lines 70 to 127 and is much more complex than the
code for the functionality provided by tab 1. Again we split the features and
functions by defining children of the tab environment and we use an HTML
div element on the highest level in the tab (Line 72). Moreover, several other
division elements are used to subcategorize and layout the tab’s content. We
start with two graphs called graph_3 and graph_4 (Lines 73 to 76) followed
by another division element containing even another division element with
a label (Lines 79 to 81) and a drop-down menu (Lines 82 to 89). This is
repeated again with further features (Lines 90 to 105) and again for a label
and a RadioItems component (Lines 108 to 119). The last part builds a graph
component for graph_5 (Lines 120 to 122).
The following lines describe the callbacks starting with one that takes
a color value as input and that outputs a figure identified by graph_1
(Lines 130 to 133). The corresponding update function is placed below this
callback (Lines 135 to 141). It is responsible for updating the histogram
in a certain user-selected color value. The next callback (Lines 143 to
146) takes a min_value as input and outputs a graph_2. The corresponding
update function can be found in Lines 148 to 156. It is responsible for the
filtering of the scatter plot based on the minimum value that is user-selected
by a slider. Then, a more complex callback can be found in Lines 158
to 160 that takes some graph-related input and outputs two other graphs.
The idea in this callback is to allow inputs from a diagram and make its
outputs in other diagrams. The update function is coded in Lines 162 to 202.
Another callback (Lines 204 to 210) takes four values as inputs, 3 numbers
and one graph-related property while it just outputs one new figure. The
corresponding update function can be found below (Lines 212 to 228). The
code is completed with the already known commands in Lines 230 to 231.
152 d f f = df
153
154 f i g = px . s c a t t e r ( d f f , x= ’ x ’ , y= ’ y ’ )
155 f i g . update_layout ( t e m p l a t e=" p l o t l y _ w h i t e " )
156 return f i g
157
158 @app . c a l l b a c k ( Output ( " graph_3 " , " f i g u r e " ) ,
159 Output ( " graph_4 " , " f i g u r e " ) ,
160 In p u t ( " graph_3 " , " r e l a y o u t D a t a " ) )
161
162 d e f update_graph_3_and_4 ( s e l e c t e d _ d a t a ) :
163 PLOT_HEIGHT = 400
164
165 c l u s t e r _ d f f = update_selected_data (
166 c l u s t e r _ d f=c l u s t e r _ d f ,
167 s e l e c t e d _ d a t a=s e l e c t e d _ d a t a )
168
169 f i g 3 = px . s c a t t e r ( c l u s t e r _ d f f ,
170 x="X" ,
171 y="Y" ,
172 c o l o r=" c l u s t e r " ,
173 c o l o r _ d i s c r e t e _ m a p=COLORS,
174 c a t e g o r y _ o r d e r s={" c l u s t e r " :
175 [ "0" , "1" , "2" ] } )
176
177 f i g 3 . update_layout (
178 h e i g h t=PLOT_HEIGHT,
179 t e m p l a t e=" p l o t l y _ w h i t e " ,
180 c o l o r a x i s _ s h o w s c a l e=F a l s e )
181 f i g 3 . u p d a t e _ t r a c e s ( marker=d i c t ( s i z e =8) )
182
183 group_counts =
184 cluster_dff
185 [ [ ’ c l u s t e r ’ , ’X ’ ] ] . groupby ( ’ c l u s t e r ’ ) . count ( )
186
187 f i g 4 = go . F i g u r e (
188 data =[ go . Bar (
189 x=group_counts . index ,
190 y=group_counts [ ’X ’ ] ,
191 marker_color=
192 [COLORS. g e t ( i ) f o r i i n group_counts . i n d e x ]
193 ) ])
194
195 f i g 4 . update_layout ( h e i g h t=PLOT_HEIGHT,
196 t e m p l a t e=" p l o t l y _ w h i t e " ,
197 t i t l e ="<b>Counts p e r c l u s t e r </b>" ,
204 Dashboard Examples
Listing 5.8 shows a code example for the same features and functionality
as in Listing 5.7 but this time CSS is not used but instead, we make use of
bootstrap. The imports in Lines 1 to 9 are already familiar to the reader. In
Line 17, we can find the first major difference compared to the code before
which is the integration of the dash bootstrap components. After the data
generation, color settings, and cluster definition (Lines 22 to 46), we code the
layout of the dashboard based on HTML div elements but this time we use the
inline style commands for the margin for example (Line 50). The structure of
5.5 Two Tabs, Three Plots in One Tab, and Several Inputs 205
the code is similar to the code example before but this time we make use of
rows and columns based on the dash bootstrap components (starting in Line
59 and ending in Line 149 with the last column). The rest of the code is again
defining callbacks and update functions, similar to the example code before.
1 import math
2
3 from dash import Dash , dcc , html , Input , Output
4 import p l o t l y . e x p r e s s a s px
5 import p l o t l y . g r a p h _ o b j e c t s a s go
6 import numpy a s np
7 import pandas a s pd
8 import dash_bootstrap_components a s dbc
9 from s k l e a r n . d a t a s e t s import make_blobs
10
11 # New : D e n s i t y heatmap ( 2 columns ) a s t h i r d p l o t on tab 2
12 # with c o l o r and r e s o l u t i o n o p t i o n s
13
14 # New : E v e r y t h i n g with i n l i n e s t y l e and b o o t s t r a p ( no CSS)
15
16 app = Dash (__name__,
17 e x t e r n a l _ s t y l e s h e e t s =[ dbc . themes .BOOTSTRAP] )
18
19 # g e n e r a t e random normal d i s t r i b u t e d data f o r x and y
20 # and s t o r e i t i n a Pandas DataFrame ( f o r p l o t 1 , 2 , and 5 )
21
22 np . random . s e e d ( s e e d =8)
23
24 d f = pd . DataFrame ( { ’ y ’ : np . random . normal ( l o c =0 ,
25 s c a l e =10 ,
26 s i z e =1000) ,
27 ’ x ’ : np . random . normal ( l o c =10 ,
28 s c a l e =2,
29 s i z e =1000) } )
30
31 # define cluster colors
32
33 COLORS = { ’ 0 ’ : " r e d " ,
34 ’ 1 ’ : " blue " ,
35 ’ 2 ’ : " grey "}
36
37 # g e n e r i c c l u s t e r data ( f o r p l o t 3 and 4 )
38
39 X, y = make_blobs ( n_samples =7500 ,
40 c e n t e r s =3,
41 n _ f e a t u r e s =2 ,
206 Dashboard Examples
42 random_state =0,
43 c l u s t e r _ s t d =0.75)
44
45 c l u s t e r _ d f = pd . DataFrame ( data=X, columns =[ "X" , "Y" ] )
46 cluster_df [ ’ c l u s t e r ’ ] = [ str ( i ) for i in y ]
47
48 app . l a y o u t = html . Div ( [
49 html . Div ( [ html . H1( " Dashboard 6 " ) ] ,
50 s t y l e ={ ’ margin ’ : ’ 10 px 25 px 25 px 25 px ’ } ) ,
51
52 html . Div ( [
53 dcc . Tabs ( i d=" t a b s " ,
54 c h i l d r e n =[
55 dcc . Tab (
56 l a b e l= ’ Tab One ’ ,
57 c h i l d r e n =[
58 html . Div ( [
59 dbc . Row ( [
60 dbc . Col ( [ dcc . Dropdown (
61 o p t i o n s =[ ’ r e d ’ ,
62 ’ green ’ ,
63 ’ blue ’ ] ,
64 v a l u e= ’ r e d ’ ,
65 i d= ’ c o l o r ’ ,
66 m u l t i=F a l s e )
67 ] , width =6) ,
68 dbc . Col ( [
69 dcc . S l i d e r (
70 min=math . f l o o r (
71 d f [ ’ y ’ ] . min ( ) ) ,
72 max=math . c e i l (
73 d f [ ’ y ’ ] . max ( ) ) ,
74 i d=" min_value " )
75 ] , width =6)
76 ]) ,
77 dbc . Row ( [
78 dbc . Col ( [
79 dcc . Graph ( i d=" graph_1 " )
80 ] , width =6) ,
81 dbc . Col ( [
82 dcc . Graph ( i d=" graph_2 " )
83 ] , width =6)
84 ])
85 ] , s t y l e ={" margin " :
86 " 100 px 25 px 25 px 25 px" } ) ,
87 ]
5.5 Two Tabs, Three Plots in One Tab, and Several Inputs 207
88 ),
89 dcc . Tab (
90 l a b e l= ’ Tab Two ’ ,
91 i d=" tab_2_graphs " ,
92 c h i l d r e n =[
93 html . Div ( [
94 dbc . Row ( [
95 dbc . Col ( [
96 dcc . Graph ( i d=" graph_3 " )
97 ] , width =8) ,
98 dbc . Col ( [
99 dcc . Graph ( i d=" graph_4 " )
100 ] , width =4)
101 ]) ,
102 dbc . Row ( [
103 dbc . Col ( html . Div ( [
104 dbc . L a b e l (
105 "Number o f b i n s : " ,
106 html_for=
107 " graph_5_nbins " ) ,
108 dcc . Dropdown ( o p t i o n s=
109 [ str ( i ) for i in
110 range (5 , 100 , 5) ] ,
111 v a l u e= ’ 40 ’ ,
112 i d= ’ graph_5_nbins ’ ,
113 m u l t i=F a l s e
114 )
115 ] ) , width={" s i z e " : 3 } , ) ,
116 dbc . Col ( html . Div ( [
117 dbc . L a b e l ( " C o l o r : " ,
118 html_for=
119 " graph_5_color " ) ,
120 dcc . Dropdown (
121 o p t i o n s =[ " V i r i d i s " ,
122 "Magma" ,
123 "Hot" ,
124 "GnBu" ,
125 " Greys " ] ,
126 v a l u e= ’ V i r i d i s ’ ,
127 i d= ’ graph_5_color ’ ,
128 m u l t i=F a l s e )
129 ] ) , width={" s i z e " : 3 ,
130 " o f f s e t " : 1} ,) ,
131 dbc . Col ( html . Div ( [
132 dbc . L a b e l (
133 " Separated
208 Dashboard Examples
226 f i g 3 . update_layout (
227 h e i g h t=PLOT_HEIGHT,
228 t e m p l a t e=" p l o t l y _ w h i t e " ,
229 c o l o r a x i s _ s h o w s c a l e=F a l s e )
230 f i g 3 . u p d a t e _ t r a c e s ( marker=d i c t ( s i z e =8) )
231
232 group_counts = c l u s t e r _ d f f [
233 [ ’ c l u s t e r ’ , ’X ’ ] ] . groupby ( ’ c l u s t e r ’ ) . count ( )
234
235 f i g 4 = go . F i g u r e (
236 data =[ go . Bar (
237 x=group_counts . index ,
238 y=group_counts [ ’X ’ ] ,
239 marker_color =[
240 COLORS. g e t ( i ) f o r i i n group_counts . i n d e x ]
241 ) ])
242
243 f i g 4 . update_layout ( h e i g h t=PLOT_HEIGHT,
244 t e m p l a t e=" p l o t l y _ w h i t e " ,
245 t i t l e ="<b>Counts p e r c l u s t e r </b>" ,
246 x a x i s _ t i t l e=" c l u s t e r " ,
247 t i t l e _ f o n t _ s i z e =25
248 )
249
250 return fig3 , f i g 4
251
252 @app . c a l l b a c k (
253 Output ( " graph_5 " , " f i g u r e " ) ,
254 Input ( " graph_5_nbins " , " v a l u e " ) ,
255 Input ( " graph_5_color " , " v a l u e " ) ,
256 Input ( " graph_5_separated " , " v a l u e " ) ,
257 Input ( " graph_3 " , " r e l a y o u t D a t a " ) ,
258 )
259 d e f update_graph_5 ( nbins , c o l o r , s e p a r a t e d , s e l e c t e d _ d a t a ) :
260 c l u s t e r _ d f f = update_selected_data (
261 s e l e c t e d _ d a t a=s e l e c t e d _ d a t a )
262
263 f i g = px . density_heatmap (
264 cluster_dff ,
265 x="X" ,
266 y="Y" ,
267 n b i n s x=i n t ( n b i n s ) ,
268 n b i n s y=i n t ( n b i n s ) ,
269 c o l o r _ c o n t i n u o u s _ s c a l e=c o l o r ,
270 f a c e t _ c o l=None i f s e p a r a t e d == "No" e l s e " c l u s t e r " ,
271 c a t e g o r y _ o r d e r s={" c l u s t e r " : [ " 0 " , " 1 " , " 2 " ] }
5.5 Two Tabs, Three Plots in One Tab, and Several Inputs 211
272 )
273 f i g . update_layout ( t e m p l a t e=" p l o t l y _ w h i t e " )
274 return f i g
275
276
277 i f __name__ == ’__main__ ’ :
278 app . r u n _ s e r v e r ( debug=True , p o r t =8014)
Listing 5.8 Example of a dashboard with more functionality like tabs and interactive
visualizations as well as the inline style and bootstrap but no CSS
Exercises
• Exercise 5.5.2.1: Add one more row and one more column in the
dashboard code. Do this in both code variants with CSS and bootstrap.
• Exercise 5.5.2.2: Discuss which code variant is better. Take into account
criteria like code understanding, code maintenance, code extension, and
find some more criteria.
Figure 5.11 Executing the dashboard code and activating tab 2 to interactively explore the
trivariate data in a scatter plot linked to a bar chart and a density heatmap.
Exercises
• Exercise 5.5.3.1: Find your own dataset on the world wide web and
design and implement your own dashboard to visually explore this data.
• Exercise 5.5.3.2: For a user-defined (or selected) mathematical function
f : R −→ R we would like to see the plot of the function as well
as additional information like minima, maxima, gradient function, area
under the function in a certain interval, and many more. Design and
implement a dashboard to support a mathematician at these tasks.
6
Challenges and Limitations
213
214 Challenges and Limitations
Exercises
• Exercise 6.1.1.1: Imagine your dashboard should have 20 user interface
components. How do you decide which of them are the most important
ones and where do you place them in the layout?
• Exercise 6.1.1.2: Which general options do you have to support 20
visualization techniques in a dashboard?
the visual encoding while the Gestalt laws work into the other direction, i.e.
as a visual decoding. As a challenge, we have to find a visual encoding that
is powerful enough to serve as a visual decoding, that is, an interpretation of
the visual patterns to visually explore the encoded data.
Exercises
• Exercise 6.1.2.1: Find diagrams on the web that contain visual design
flaws and discuss how to get rid of them.
• Exercise 6.1.2.2: Discuss whether chart junk, the lie factor, or visual
clutter is bad for a designed diagram.
Figure 6.1 Readability and aesthetics cannot be integrated into a diagram at the same time
to a full extent. There is always some kind of trade-off situation.
on labeled data while the labels come again from an original aesthetics
judgment of viewers, a fact that brings us back to the old problem that
users are required to judge the aesthetics of a visualization before a machine
can do it.
Exercises
• Exercise 6.1.3.1: Which diagrams do you think are nicer: Two-dimen-
sional versus three-dimensional ones, or Cartesian versus radial ones,
colored versus gray-scale ones, static versus animated ones?
• Exercise 6.1.3.2: What makes a diagram look aesthetically appealing?
What makes a graphical user interface look aesthetically appealing?
6.2 Implementation Challenges 219
Exercises
• Exercise 6.2.1.1: Integrate two Plotly diagrams into a dashboard and
connect them. This could be a scatter plot on which an axis interval is
selected while the distribution of the points in the selected interval is
shown as a histogram.
• Exercise 6.2.1.2: Integrate a drop-down menu in a dashboard that lets
you execute external software, for example, a statistics tool.
Exercises
• Exercise 6.2.2.1: Try several IDEs to implement a dashboard. Make a
table of desired features and briefly explain which IDE is best for your
purposes. Which one would you recommend to a newcomer, which one
to a professional Python developer?
• Exercise 6.2.2.2: Start a dashboard project in GitHub and get familiar
with the functions and features there.
Exercises
• Exercise 6.2.3.1: Discuss the programming languages you are familiar
with. What are the benefits and drawbacks of those programming
languages?
• Exercise 6.2.3.2: How would you start a collaboration with other
developers in order to create a successful tool based on a more or less
effortless development phase?
Exercises
• Exercise 6.2.4.1: Make a literature research on the web to find the
positive and negative issues when comparing Windows and Linux
operating systems with respect to dashboard design.
• Exercise 6.2.4.2: Compare a Windows and a Linux operating system
with respect to the visual appearance and interactive functionality of the
same dashboard code. Can you find any differences?
This is a suitable scenario, but we have to be aware of the fact that if one of
the servers is not running properly, the dashboard itself might suffer. In case
the data server is not working, we might come up with a local (not up-to-date)
dataset that is shown for the users instead of the real-time data until the data
server is back again. If the dashboard server is not working this might be the
bigger evil.
From an implementation and resources perspective, we definitely need
more knowledge about programming aspects, in particular, web-based
programming, requiring to understand client-server architectures. However,
Dash, Python, and Plotly are powerful concepts that take away the burden
from us in this implementation direction. The Heroku server (Section 3.4.1)
was a good alternative until November 28, 2022. After that date, the service
was not offered for free anymore but instead a low-cost alternative replaced
the originally very user-friendly concept. Consequently, the costs for setting
up a server or deploying the dashboard on a server can become a serious
issue, in particular, if the dashboard has to run over longer time periods or
if the data itself with which the dashboard is working has to be provided on
the same server. There is definitely a limit in terms of dataset sizes as well
as algorithmic operations that run on such a server. As a recommendation, it
can be a good advice to not care about the server issue when designing and
implementing a dashboard as a priority aspect, but concentrate on the server
aspect later on. If the dashboard is running locally, getting it running remotely
on the web is an option for which we can find various solutions.
Exercises
• Exercise 6.2.5.1: What are the typical challenges when building
dashboards for a real-time dataset from an internet connection and server
perspective?
• Exercise 6.2.5.2: Search for possible server solutions when creating
web-based interactive visualization tools for real-time data.
the browsers even have a variety of built-in tools and functionality, which
can cause troubles with respect to working with an interactive visualization
tool in the form of a dashboard. Sometimes, the loading of the built-in tools
causes performance issues, hence the slow performance is not caused by the
dashboard but actually comes from the browser side which is sometimes hard
to locate. These effects might even be blamed on older browser versions,
consequently, a good idea is to have running the latest version of a web
browser. This also comes with the problem that even if the dashboard ran
a few weeks ago, it might show up completely differently today which can
be caused by other browser versions. It is a good advice to keep up with
the browser versions and to check the dashboard from time to time on the
newer versions to understand if the functionality and features are still the
same as a time ago. If this is not the case, the dashboard developers might
have to adapt the code to get back the old visual appearance, interaction
techniques, and algorithmic functions. Popular web browsers are, by the
way, Google Chrome, Firefox, Microsoft Edge, Opera, or Safari, just to
mention a few.
An extension to the code brings into play modifications in the
functionality, as a consequence, it is a good advice to test whether the
dashboard is still running in the most popular web browser or if this
extension has a bad impact on some of the features. Apart from the features
such an extension can also have an impact on the performance, sometimes
the extension itself is the bottleneck, for example, when changing from
one library to another one with a similar functionality or when actually
implementing a new algorithm that has not been tested before. But typically,
this issue is caused by the algorithm itself, not by the web browser. The
biggest challenge is mostly to locate the cause of the performance issues.
Is it coming from the code itself or is it coming from the web browser or even
a library that causes trouble when used together with a specific algorithmic
or visual feature. A good advice to reduce browser issues can be to clean
the cache which might still contain some problematic data. Moreover, the
cookies might bring additional challenges into play. Take a closer look at all
browser-related aspects in case the dashboard is not showing up properly.
Before digging too deep into one browser start the dashboard with several
popular browsers to see if it is running at all, or if the code itself might be the
problem.
6.3 Challenges during runtime 227
Exercises
• Exercise 6.2.6.1: Check your own dashboard in the most popular web
browsers like Mozilla Firefox, Google Chrome, Microsoft Edge, Opera,
and Safari. Can you find any differences between the web browsers?
• Exercise 6.2.6.2: Inspect the diagrams in your dashboard and if they
are visually depicted differently in each of the aforementioned web
browsers.
are challenges related to human perception, for example, taking into account
how large our display can be or how many colors can be perceived and
distinguished (Section 6.3.4). All of those challenges play a big role during
the design and implementation processes of an interactive visualization tool,
also in the specific case of a dashboard.
any kind of data transformation and we do not know the behavior of most of
our users beforehand.
Exercises
• Exercise 6.3.1.1: What is the biggest dataset that your dashboard can
work with? If you do not have your own dashboard, check the dataset
size for the dashboard examples in Chapter 5.
• Exercise 6.3.1.2: Discuss the problem for analyzing and visualizing
real-time data.
cubic, or even exponential function. Asking now the question about a still
suitable dataset size for which the dashboard is algorithmically scalable can
be answered by looking at the y-axis and the corresponding runtime while
following the line back to the curve, then reading the dataset size value from
the x-axis. But still, a challenge with the performance measure can be that
an algorithm will behave differently each time for the same dataset, hence,
the only way to create such a runtime plot is by averaging, but again each
individual run can differ from the average curve a lot, hence, such a prediction
might not be very reliable.
Exercises
• Exercise 6.3.2.1: Read a dataset with your dashboard and measure the
time it takes until the data is read and parsed. Increase the dataset size
by copying it 2, 3, 4, 5, and 10 times and append the copies. Measure
the times for all those dataset sizes and create a line plot for showing the
performance of the reading and parsing algorithm.
• Exercise 6.3.2.2: Is there a difference in terms of performances for the
diagrams integrated into the dashboards in Chapter 5?
goal to preserve the structure in the data somehow. Moreover, also clustering
approaches can help to derive patterns in the data that we would not detect
otherwise. Hence, clustering can also add some benefit to visual scalability,
just by restructuring, grouping, and ordering the data.
Visual clutter is the state in which too many data elements are shown
or even their disorganization leads to performance issues when solving
certain tasks [202]. This effect is happening in most of the situations we
have visual scalability issues. Even if a visualization technique is powerful
for a small number of elements, it can be useless for a growing number
of data elements. Then, we might consider another more visually scalable
visualization technique for the same kind of data but in a more scalable
fashion. A famous example can be found for graph or network data for which
node-link diagrams exist, but those only visually scale for around 20 vertices
with a few edges. Matrix-like visualizations are better in this case since they
can be scaled down to pixel size, even if they do not allow path-related tasks
anymore [106]. Such a situation can be found in many application fields,
typically based on a certain data type, like network data as we mentioned
before. The idea is to provide a visualization technique from a repertoire of
many techniques for the same type of data but one that supports task solutions
in data exploration for as many tasks as possible, however, the task with the
highest priority should be under the supported tasks in any case.
Exercises
• Exercise 6.3.3.1: Imagine you had a network consisting of your friends
and the relations they have with each other. How would you visualize
such a dataset, and how visually scalable is your technique?
• Exercise 6.3.3.2: For histograms, we can include really many data
values, but at some point, they also reach a limit in terms of visual
scalability. What can we do with the shown data values to get a more
scalable approach?
Exercises
• Exercise 6.3.4.1: Discuss the design of a dashboard with respect
to differently large displays, that is, a small-scale smart phone, a
medium-scale computer monitor, and a large-scale powerwall display.
6.4 Testing Challenges 233
Exercises
• Exercise 6.4.1.1: Test the explained dashboards in Chapter 5 from
different locations, for example, from home and from your office at a
company.
• Exercise 6.4.1.2: Add a text field as a dash core component in each of the
dashboards in Chapter 5 and request feedback from your online users.
How can you find insights in such qualitative user feedback.
Exercises
• Exercise 6.4.2.1: Implement different versions of a sorting algorithm and
integrate that into a dashboard. Measure the runtime performances under
different circumstances like operating system, web browser, or the fact
that the algorithm runs on the server or on the client side.
• Exercise 6.4.2.2: Which options do we have when the runtime of an
algorithm integrated into a dashboard is too high, that is, leading to a
noninteractively responsive tool? Discuss!
• Physiological measures
– Blood pressure: Blood values or properties can give insights in
how stressed a study participant is [136]. However, measuring such
values requires a medical assistant and makes the study setup much
more complicated and ethically problematic.
– Pupil dilation: Eye tracking devices can also measure pupil
dilations [14] that give insights into a variety of aspects, one of
which is how much attention is focused on a certain display area.
– Galvanic skin response: Another useful measure is the galvanic
skin response [131] that might provide insights into further
body-related aspects, for example, what the stress level or the sport
activity level is.
These are just a few important measurements about user behavior but
there are many more. The biggest challenge here is the evaluation, and
analysis of all the recorded user data, that is, finding insights in such study
data to improve the dashboard design, its implementation, and finally, the
usefulness and user-friendliness.
Exercises
• Exercise 6.4.3.1: Ask 20 people to use one of your created dashboards.
Give them a concrete task and measure the time taken and the error
rate. Ask them for verbal feedback. Which insights can you find in
the recorded user study data to improve your dashboard? Are there any
design flaws?
• Exercise 6.4.3.2: What are the challenges before, during, or after a user
study? Discuss!
7
Conclusion
239
240 Conclusion
241
242 References
Weidong Huang, Quang Vinh Nguyen, and Yingcai Wu, editors, Pro-
ceedings of the 7th International Symposium on Visual Information
Communication and Interaction, VINCI, page 29.
[65] Michael Burch and Daniel Weiskopf. Visualizing dynamic quantitative
data in hierarchies - timeedgetrees: Attaching dynamic weights to
tree edges. In Gabriela Csurka, Martin Kraus, and José Braz, editors,
Proceedings of the International Conference on Imaging Theory and
Applications and International Conference on Information Visualiza-
tion Theory and Applications, pages 177–186. SciTePress, 2011.
[66] Wolfram Büttner and Helmut Simonis. Embedding boolean
expressions into logic programming. Journal of Symbolic
Computation, 4(2):191–205, 1987.
[67] Bram C. M. Cappers, Paulus N. Meessen, Sandro Etalle, and Jarke J.
van Wijk. Eventpad: Rapid malware analysis and reverse engineering
using visual analytics. In Diane Staheli, Celeste Lyn Paul, Jörn
Kohlhammer, Daniel M. Best, Stoney Trent, Nicolas Prigent, Robert
Gove, and Graig Sauer, editors, Proceedings of IEEE Symposium on
Visualization for Cyber Security, VizSec, pages 1–8. IEEE, 2018.
[68] Mónica A. Carreño-León, Jesús Andrés Sandoval-Bringas, Teresita
de Jesús Álvarez Robles, Rafael Cosio-Castro, Italia Estrada Cota,
and Alejandro Leyva Carrillo. Designing a tangible user interface for
braille teaching. In Constantine Stephanidis, Margherita Antona, Qin
Gao, and Jia Zhou, editors, Proceedings of the 22nd HCI International
Conference - Late Breaking Papers: Universal Access and Inclusive
Design, volume 12426 of Lecture Notes in Computer Science, pages
197–207. Springer, 2020.
[69] Marco A. Casanova. A theory of data dependencies over relational
expressions. International Journal of Parallel Programming,
12(3):151–191, 1983.
[70] Carl Chapman and Kathryn T. Stolee. Exploring regular expression
usage and context in python. In Andreas Zeller and Abhik
Roychoudhury, editors, Proceedings of the 25th International Sympo-
sium on Software Testing and Analysis, ISSTA, pages 282–293. ACM,
2016.
[71] Colombe Chappey, A. Danckaert, Philippe Dessen, and Serge A.
Hazout. MASH: an interactive program for multiple alignment and
consensus sequence construction for biological sequences. Computer
Applications in the Biosciences, 7(2):195–202, 1991.
References 249
[99] Johannes Fuchs, Dominik Jäckle, Niklas Weiler, and Tobias Schreck.
Leaf glyph - visualizing multi-dimensional data with environmental
cues. In José Braz, Andreas Kerren, and Lars Linsen, editors, Proceed-
ings of the 6th International Conference on Information Visualization
Theory and Applications, IVAPP, pages 195–206. SciTePress, 2015.
[100] Katarína Furmanová, Samuel Gratzl, Holger Stitz, Thomas Zichner,
Miroslava Jaresová, Alexander Lex, and Marc Streit. Taggle:
Combining overview and details in tabular data visualizations. Infor-
mation Visualization, 19(2), 2020.
[101] Daniel Fürstenau, Flavio Morelli, Kristina Meindl, Matthias
Schulte-Althoff, and Jochen Rabe. A social citizen dashboard
for participatory urban planning in berlin: Prototype and evaluation. In
Proceedings of the 54th Hawaii International Conference on System
Sciences, HICSS, pages 1–10. ScholarSpace, 2021.
[102] Michael R. Garey and David S. Johnson. Computers and Intractability:
A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.
[103] Fengpei Ge and Yonghong Yan. Deep neural network based
wake-up-word speech recognition with two-stage detection. In Pro-
ceedings of the IEEE International Conference on Acoustics, Speech
and Signal Processing, ICASSP, pages 2761–2765. IEEE, 2017.
[104] Narain H. Gehani. Data Types for Very High Level Programming
Languages. PhD thesis, Cornell University, USA, 1975.
[105] Helene Gelderblom and Leanne Menge. The invisible gorilla revisited:
using eye tracking to investigate inattentional blindness in interface
design. In Tiziana Catarci, Kent L. Norman, and Massimo Mecella,
editors, Proceedings of the International Conference on Advanced
Visual Interfaces, AVI, pages 39:1–39:9. ACM, 2018.
[106] Mohammad Ghoniem, Jean-Daniel Fekete, and Philippe Castagliola.
On the readability of graphs using node-link and matrix-based
representations: a controlled experiment and statistical analysis. Infor-
mation Visualization, 4(2):114–135, 2005.
[107] Tiago Gonçalves, Ana Paula Afonso, and Bruno Martins. Visualization
techniques of trajectory data: Challenges and limitations. In Stephan
Mäs, Lars Bernard, and Hardy Pundt, editors, Proceedings of the 2nd
AGILE PhD School, volume 1136 of CEUR Workshop Proceedings.
CEUR-WS.org, 2013.
[108] Saul Gorn. Code extension in ASCII. Communications of the ACM,
9(10):758–762, 1966.
References 253
[120] Gregor Herda and Robert McNabb. Python for smarter cities:
Comparison of python libraries for static and interactive visualisations
of large vector data. CoRR, abs/2202.13105, 2022.
[121] Marcel Hlawatsch, Michael Burch, and Daniel Weiskopf. Visual
adjacency lists for dynamic graphs. IEEE Transactions on Visualiza-
tion and Computer Graphics, 20(11):1590–1603, 2014.
[122] Heike Hofmann and Marie Vendettuoli. Common angle plots as
perception-true visualizations of categorical associations. IEEE Trans-
actions on Visualization and Computer Graphics, 19(12):2297–2305,
2013.
[123] Kenneth Holmqvist. Eye tracking: a comprehensive guide to methods
and measures. Oxford University Press, 2011.
[124] Danny Holten. Hierarchical edge bundles: Visualization of adjacency
relations in hierarchical data. IEEE Transactions on Visualization and
Computer Graphics, 12(5):741–748, 2006.
[125] Danny Holten, Petra Isenberg, Jarke J. van Wijk, and Jean-Daniel
Fekete. An extended evaluation of the readability of tapered, animated,
and textured directed-edge representations in node-link graphs. In
Giuseppe Di Battista, Jean-Daniel Fekete, and Huamin Qu, editors,
Proceedings of the IEEE Pacific Visualization Symposium, PacificVis,
pages 195–202. IEEE Computer Society, 2011.
[126] Danny Holten and Jarke J. van Wijk. Force-directed edge bundling for
graph visualization. Computer Graphics Forum, 28(3):983–990, 2009.
[127] Chen Hong. Design of human-computer interaction interface
considering user friendliness. International Journal of Reasoning-
based Intelligent Systems, 9(3/4):162–169, 2017.
[128] Tom Horak, Philip Berger, Heidrun Schumann, Raimund Dachselt,
and Christian Tominski. Responsive matrix cells: A focus+context
approach for exploring and editing multivariate graphs. IEEE Trans-
actions on Visualization and Computer Graphics, 27(2):1644–1654,
2021.
[129] Derek Hwang, Vardhan Agarwal, Yuzi Lyu, Divyam Rana,
Satya Ganesh Susarla, and Adalbert Gerald Soosai Raj. A qualitative
analysis of lecture videos and student feedback on static code examples
and live coding: A case study. In Claudia Szabo and Judy Sheard,
editors, Proceedings of the 23rd Australasian Computing Education
Conference, ACE, pages 147–157. ACM, 2021.
[130] Alfred Inselberg and Bernard Dimsdale. Parallel coordinates: A tool
for visualizing multi-dimensional geometry. In Arie E. Kaufman,
References 255
Anicia Peters, Med Salim Bouhlel, and Nobert Jere, editors, Pro-
ceedings of the Second African Conference for Human Computer
Interaction: Thriving Communities, AfriCHI, pages 1:1–1:12. ACM,
2018.
[157] Fabrizio Lamberti, Federico Manuri, and Andrea Sanna. Multivariate
visualization using scatterplots. In Newton Lee, editor, Encyclopedia
of Computer Graphics and Games. Springer, 2019.
[158] Ricardo Langner, Ulrike Kister, and Raimund Dachselt. Multiple
coordinated views at large displays for multiple users: Empirical
findings on user behavior, movements, and distances. IEEE Trans-
actions on Visualization and Computer Graphics, 25(1):608–618,
2019.
[159] Janusz W. Laski. On readability of programs with loops. ACM SIG-
PLAN Notices, 14(11):73–83, 1979.
[160] Daewon Lee. Nezzle: an interactive and programmable visualization
of biological networks in python. Bioinformatics, 38(12):3310–3311,
2022.
[161] Meng-Tse Lee, Fong-Ci Lin, Szu-Ta Chen, Wan-Ting Hsu, Samuel
Lin, Tzer-Shyong Chen, Feipei Lai, and Chien-Chang Lee. Web-based
dashboard for the interactive visualization and analysis of national
risk-standardized mortality rates of sepsis in the US. Journal of
Medical Systems, 44(2):54, 2020.
[162] Grégoire Lefebvre, Emmanuelle Boyer, and Sophie Zijp-Rouzier.
Coupling gestures with tactile feedback: a comparative user study. In
Lone Malmborg and Thomas Pederson, editors, Proceedings of the
Nordic Conference on Human-Computer Interaction, NordiCHI, pages
380–387. ACM, 2012.
[163] Wenjun Li, Yang Ding, Yongjie Yang, R. Simon Sherratt, Jong Hyuk
Park, and Jin Wang. Parameterized algorithms of fundamental np-hard
problems: a survey. Human-centric Computing and Information Sci-
ences, 10:29, 2020.
[164] Xiao-Hui Li, Caleb Chen Cao, Yuhan Shi, Wei Bai, Han Gao, Luyu
Qiu, Cong Wang, Yuanyuan Gao, Shenjia Zhang, Xun Xue, and Lei
Chen. A survey of data-driven and knowledge-aware explainable AI.
IEEE Transactions on Knowledge and Data Engineering, 34(1):29–49,
2022.
[165] Lars Lischke. Interacting with large high-resolution display work-
places. PhD thesis, University of Stuttgart, Germany, 2018.
References 259
[183] Fatih Baha Omeroglu and Yueqing Li. Effects of background music
on visual short-term memory: A preliminary study. In Don Harris and
Wen-Chin Li, editors, Proceedings of the 19th International Confer-
ence on Engineering Psychology and Cognitive Ergonomics, EPCE,
volume 13307 of Lecture Notes in Computer Science, pages 85–96.
Springer, 2022.
[184] Jorge Piazentin Ono, Juliana Freire, Cláudio T. Silva, João Comba, and
Kelly P. Gaither. Interactive data visualization in jupyter notebooks.
Computing in Science and Engineering, 23(2):99–106, 2021.
[185] Elias Pampalk, Andreas Rauber, and Dieter Merkl. Using smoothed
data histograms for cluster visualization in self-organizing maps. In
José R. Dorronsoro, editor, Proceedings of International Conference
on Artificial Neural Networks, ICANN, volume 2415 of Lecture Notes
in Computer Science, pages 871–876. Springer, 2002.
[186] Deok Gun Park, Mohamed Suhail, Minsheng Zheng, Cody Dunne,
Eric D. Ragan, and Niklas Elmqvist. Storyfacets: A design study
on storytelling with visualizations for collaborative data analysis.
Information Visualization, 21(1):3–16, 2022.
[187] Hima Patel, Shanmukha C. Guttula, Ruhi Sharma Mittal, Naresh
Manwani, Laure Berti-Équille, and Abhijit Manatkar. Advances in
exploratory data analysis, visualisation and quality for data centric AI
systems. In Aidong Zhang and Huzefa Rangwala, editors, Proceedings
of the 28th ACM SIGKDD Conference on Knowledge Discovery and
Data Mining, pages 4814–4815. ACM, 2022.
[188] Lawrence C. Paulson. Ackermann’s function is not primitive recursive.
Archive of Formal Proofs, 2022, 2022.
[189] Catherine Plaisant and Ben Shneiderman. Scheduling home
control devices: Design issues and usability evaluation of four
touchscreen interfaces. International Journal of Man-Machine
Studies, 36(3):375–393, 1992.
[190] Helen C. Purchase. Effective information visualisation: a study of
graph drawing aesthetics and algorithms. Interacting with Computers,
13(2):147–162, 2000.
[191] Helen C. Purchase. Metrics for graph drawing aesthetics. Journal of
Visual Languages and Computing, 13(5):501–516, 2002.
[192] Helen C. Purchase, Robert F. Cohen, and Murray I. James. Validating
graph drawing aesthetics. In Proceedings of the Symposium on Graph
Drawing, pages 435–446, 1995.
262 References
[193] Aung Pyae and Paul Scifleet. Investigating the role of user’s english
language proficiency in using a voice user interface: A case of google
home smart speaker. In Regan L. Mandryk, Stephen A. Brewster, Mark
Hancock, Geraldine Fitzpatrick, Anna L. Cox, Vassilis Kostakos, and
Mark Perry, editors, Proceedings of the extended abstracts of the CHI
Conference on Human Factors in Computing Systems. ACM, 2019.
[194] Aaron J. Quigley and Peter Eades. FADE: graph drawing, clustering,
and visual abstraction. In Joe Marks, editor, Proceedings of 8th Inter-
national Symposium on Graph Drawing, GD, volume 1984 of Lecture
Notes in Computer Science, pages 197–210. Springer, 2000.
[195] Ramana Rao and Stuart K. Card. The table lens: merging graphical and
symbolic representations in an interactive focus+context visualization
for tabular information. In Catherine Plaisant, editor, Proceedings of
the Conference on Human Factors in Computing Systems, CHI, page
222. ACM, 1994.
[196] Edward M. Reingold and John S. Tilford. Tidier drawings of trees.
IEEE Transactions on Software Engineering, 7(2):223–228, 1981.
[197] Donghao Ren, Xin Zhang, Zhenhuang Wang, Jing Li, and Xiaoru
Yuan. Weiboevents: A crowd sourcing weibo visual analytic system.
In Issei Fujishiro, Ulrik Brandes, Hans Hagen, and Shigeo Takahashi,
editors, Proceedings of the IEEE Pacific Visualization Symposium,
PacificVis, pages 330–334. IEEE Computer Society, 2014.
[198] Long Ren and Yun Chen. Influence of color perception on consumer
behavior. In Fiona Fui-Hoon Nah and Bo Sophia Xiao, editors, Pro-
ceedings of 5th International Conference on HCI in Business, Govern-
ment, and Organizations, HCIBGO, volume 10923 of Lecture Notes in
Computer Science, pages 413–421. Springer, 2018.
[199] Theresa-Marie Rhyne. Color matters for digital media & visualization.
In SIGGRAPH: Special Interest Group on Computer Graphics and
Interactive Techniques Conference, Courses, Virtual Event, pages
12:1–12:92. ACM, 2021.
[200] Jonathan C. Roberts. Guest editor’s introduction: special issue on
coordinated and multiple views in exploratory visualization. Informa-
tion Visualization, 2(4):199–200, 2003.
[201] Douglas Rolim, Jorge Silva, Thaís Batista, and Everton Cavalcante.
Web-based development and visualization dashboards for smart city
applications. In Mária Bieliková, Tommi Mikkonen, and Cesare
Pautasso, editors, Proceedings of tthe 20th International Conference on
References 263
[252] Pak Chung Wong and Jim Thomas. Visual analytics. IEEE Computer
Graphics and Applications, 24(5):20–21, 2004.
[253] Liwei Wu, Fei Li, Youhua Wu, and Tao Zheng. GGF: A graph-based
method for programming language syntax error correction. In Proceed-
ings of the 28th International Conference on Program Comprehension,
ICPC, pages 139–148. ACM, 2020.
[254] Yiqun Xie, Shashi Shekhar, and Yan Li. Statistically-robust clustering
techniques for mapping spatial hotspots: A survey. ACM Computing
Surveys, 55(2):36:1–36:38, 2023.
[255] Sophia Yang, Marc Skov Madsen, and James A. Bednar. Holoviz:
Visualization and interactive dashboards in python. In Aidong Zhang
and Huzefa Rangwala, editors, Proceedings of the 28th ACM SIGKDD
Conference on Knowledge Discovery and Data Mining, KDD, pages
4846–4847. ACM, 2022.
[256] Alfred L. Yarbus. Eye Movements and Vision. Springer, 1967.
[257] Yucong Chris Ye, Franz Sauer, Kwan-Liu Ma, Aditya Konduri,
and Jacqueline Chen. A user-centered design study in scientific
visualization targeting domain experts. IEEE Transactions on Visual-
ization and Computer Graphics, 26(6):2192–2203, 2020.
[258] Ji Soo Yi, Youn ah Kang, John T. Stasko, and Julie A. Jacko. Toward
a deeper understanding of the role of interaction in information
visualization. IEEE Transaction on Visualization and Computer
Graphics, 13(6):1224–1231, 2007.
[259] Qi Zhang. Medical data and mathematically modeled implicit surface
real-rime visualization in web browsers. International Journal of
Image and Graphics, 22(4):2250027:1–2250027:29, 2022.
[260] Zuyao Zhang and Yuan Zhu. Research on users’ and designers’ product
color perception. In Yongchuan Tang and Jonathan Lawry, editors,
Proceedings of the Second International Symposium on Computational
Intelligence and Design, ISCID, pages 264–267. IEEE Computer
Society, 2009.
Index
269
270 Index
F function, 141
factorial, 173 function call, 142
False, 113, 118, 120 function definition, 142
family hierarchy, 44 function nesting, 144
feedback form, 104 functional paradigm, 75, 109
Fibonacci function, 147 functional programming, 73
Figma, 62
file creation, 155 G
file object, 153 gaze, 63
file operation, 155 gaze-based interaction, 62, 66
file overwriting, 155 geographic map, 10, 19, 81
file path, 153 geographic region, 19, 44
file system, 18, 21, 44 geography, 81
file system browser, 86 geoplotlib, 80
file writing, 155 Gestalt law, 34, 41, 96
filter, 7, 192 Gestalt principle, 34
filter technique, 22 Gestalt theory, 34, 59
filtered view, 4 gesture, 59, 63
filtering, 64 gesture interaction, 62
Fireworks, 62 gesture-based system, 66
Flask, 15, 76, 102 ggplot, 80
Flask server, 77 ggplot2, 81
Flask-based application, 102 GitHub, 83, 87
flexibility, 52 GitHub integration, 86
floating point, 118 global variable, 125, 145
floating point number, 113 glyph, 47
floor, 173 glyph-based representation, 47
focus, 179 GNU Emacs, 86
focus-and-context, 57, 79 go object, 164
follow-up experiment, 105 Google Chrome, 103
font, 9 Grafana, 14
font face, 59 granularity level, 57, 151
font family, 196 graph, 18, 37, 39, 42
font size, 59, 94, 96, 180 graph data, 151
font style, 55 graph drawing, 42
font type, 59 graph object, 185
for loop, 137 graph readability, 42
formatting specification, 96 graph symmetry, 42
fraud detection, 59 graph theory, 42
276 Index
I input dataset, 37
IDE, 109, 221 Input module, 166
if statement, 133 input option, 93
image file, 151 input parameter, 6, 141, 142
imaginary part, 119 input parameters, 27
immersive analytics, 63 input value, 27
immutable, 121 input(), 152
implementation, 16 input-output linking, 98
implementation challenges, 219 input-output mechanism, 22
implementation perspective, 1 insight, 1, 3, 13, 26, 31
implementation phase, 5, 16, 61, 73 installation, 82
in-built data type, 120 instance, 118, 120, 157, 158
inattentional blindness, 32 instance attribute, 158
increment, 138 instance attribute value, 160
indefinite iteration, 137, 139 instance method, 160
indefinite loop, 137 integer, 118
indentation, 75 integrated development environment,
indentation rule, 73 83, 86
indentation support, 86 integrated development environment
independent variable, 29, 35 (IDE), 74, 82
independent-dependent correlation, 35 interaction, 58, 59, 63, 68, 166
index, 120 interaction category, 64
industrial community, 23 interaction chain, 65
inefficient algorithm, 37 interaction hierarchy, 65
infographic, 57 interaction history, 65
information communication, 27 interaction modality, 63, 66
information exchange, 42 interaction process, 65
information overload, 59 interaction sequence, 65
information overplotting, 10 interaction technique, 1, 10, 14, 16,
information processing, 31 29, 51, 61, 78, 163
information visualization, 30 interactions, 4
inheritance, 157, 161 interactive dashboard, 73
inheritance principle, 157 interactive mockup, 61
init function, 158 interactive mode, 82, 83
ink, 57 interactive response, 14
Inkscape, 62 interactive responsiveness, 24, 64, 70
inline CSS, 96, 164, 177 interactive scene, 35
inner node, 18, 45 interactive visualization, 26, 30, 35
input channel, 66 interactive visualization tool, 5
278 Index
trivariate data, 16, 46, 90, 185, 194 user feedback, 2, 4, 6, 14, 17, 35, 104,
True, 113, 118, 120 152
try block, 136 user friendliness, 2, 59
try-except statement, 136 user input, 88, 129, 152, 164
Tufte principle, 57 user interface, 1, 13, 58, 61, 88, 163
tuple, 120, 121 user interface component, 52, 58
two-dimensional, 19 user interface design, 59, 73
two-dimensional list, 121 user interface layout, 173
type system, 75 user performance, 6, 236
user perspective, 103
U user study, 4, 35
UI, 58 user task, 5, 13, 17, 27, 36, 51
UI design, 60 user tasks, 5
unary operator, 111 user-friendly, 1
uncertainty, 24 users’ feedback, 5
uncontrolled study, 105 UTF8, 153
uncontrolled user study, 14, 104 UXPin, 62
undirected graph, 18
undo interaction, 60, 65 V
Unicode, 128 variable, 18, 46, 118, 124, 157
unimodal, 66 variable explorer, 86
union, 122 variable tracker, 86
univariate data, 16, 46, 90, 164, 168, variance, 53
171 version control, 86, 87
universal usability, 59 vertical axis, 47
unstructured data, 23, 36 video, 33
update function, 169 Vim, 86
update rate, 1 virtual machine, 15
uppercase letter, 125 virtual reality, 63
URL access, 151 visual agreement, 61
usability, 8, 22 visual ambiguity, 57
usefulness, 59 visual analysis, 10
user, 29, 35 visual analytics, 4, 6, 28, 36
user behavior, 35, 104 visual analytics system, 4
user environment, 101 visual analytics tool, 61
user evaluation, 2, 14, 29, 73, 102, visual attention, 4, 8, 29, 35, 105
103, 236 visual attention behavior, 4, 8
user experience, 2, 101, 103 visual augmentation, 19
user experiment, 29 visual border, 41
Index 287
word cloud, 50 Y
word frequency, 49 young user, 36, 60
working environment, 82
workspace, 87 Z
wrapper, 78 zero-based, 121
zoom, 66
X zoom and filter, 57
Xara Designer Pro X, 62
About the Authors
289