0% found this document useful (0 votes)
41 views7 pages

August 2001/vol. 44, No. 8 COMMUNICATIONS OF THE ACM

Uploaded by

Fatima Noor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views7 pages

August 2001/vol. 44, No. 8 COMMUNICATIONS OF THE ACM

Uploaded by

Fatima Noor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

This cluster visualization shows an intermediate-

level view of a five-dimensional, 16,000-record


remote-sensing data set. Lines indicate cluster
centers and bands indicate the extent of the
clusters in each dimension. Data represents five
channels—spot, magnetics, and three involving
radiometrics—focusing on potassium, thorium,
and uranium from the Grant's Pass region of
Australia. Data courtesy of Peter Ketelaar,
Commonwealth Scientific and Industrial Research
Organization, Australia. Image generated using
XmdvTool, a public-domain multivariate data
visualization package; courtesy Matthew Ward,
Worcester Polytechnic Institute, Worcester, MA.

38 August 2001/Vol. 44, No. 8 COMMUNICATIONS OF THE ACM


Visual Exploration
Large of
Data Sets
Computer systems today store vast
amounts of data. Researchers, including
those working on the “How Much In the rising tide of business
Information?” project at the University
transaction data, these tools help
of California, Berkeley, recently esti-
mated, about 1 exabyte (1 million ter- distinguish which are strategic
abytes) of data is generated annually assets and which are not worth
worldwide, including 99.997% avail- collecting in the first place.
able only in digital form. This world-
wide data deluge means that in the next
three years, more data will be generated Daniel A. Keim
than during all previous human history.
Data is often recorded, captured, and stored auto- becomes useless and the databases data dumps. Visual
matically via sensors and monitoring systems. Many of data exploration, which aims to provide insight by
the simple transactions now part of our everyday lives, visualizing the data, and information visualization
such as paying for food and clothes by credit card or techniques (such as distorted overview displays and
using the telephone, are typically recorded for future dense pixel displays) can help solve this problem.
reference by computers. Many parameters of each Effective data mining depends on having a human
transaction are routinely captured, resulting in highly in the data exploration process while combining this
dimensional data. The data is collected because com- person’s flexibility, creativity, and general knowledge
panies, including those engaged in some kind of with the enormous storage capacity and computa-
e-commerce, view it as a source of potentially valuable tional power of today’s computers. Visual data explo-
information that, as a strategic asset, could provide a ration seeks to integrate humans in the data
competitive advantage. But actually finding this valu- exploration process, applying their perceptual abilities
able information is difficult. Today’s data management to the large data sets now available. The basic idea is to
systems make it possible to view only small portions of present the data in some visual form, allowing data
it. If the data is presented in text form, the amount analysts to gain insight into it and draw conclusions, as
that can be displayed amounts to only about 100 data well as interact with it. The visual representation of the
items—a drop in the ocean when dealing with data data reduces the cognitive work needed to perform
sets containing millions of data items. Lacking the certain tasks.
ability to adequately explore the large amounts being Visual data mining techniques have proved their
collected, and despite its potential usefulness, the data value in exploratory data analysis; they also have great

COMMUNICATIONS OF THE ACM August 2001/Vol. 44, No. 8 39


Figure 1. Classification of visual data exploration techniques.
exploration than a numerical or
textual representation of the find-
Data to Be Visualized
ings. This fact leads to strong
LEFT VERTICAL IMAGES: PROCEEDINGS OF IEEE INFORMATION VISUALIZATION 2000
HORIZONTAL LINK & BRUSH IMAGES: AMERICAN STATISTICAL ASSOCIATION, 1996

demand for visual exploration


One-dimensional
techniques and makes them indis-
pensable in conjunction with
Two-dimensional
Visualization Technique automatic exploration techniques.
Multidimensional
Visual data exploration, also
Stacked Display
known as the “information seek-
Text Web
Dense Pixel Display ing mantra” [11], usually follows
Iconic Display a three-step process: overview,
Hierarchies Graphs
Geometrically Transformed Display
zoom and filter, and details-on-
demand. In the overview step, the
Algorithm/software Standard 2D/3D Display user identifies interesting pat-
terns, focusing on one or more of
Standard Projection Filtering Zoom Distortion Link & Brush them. To analyze the patterns, the
Interaction and Distortion Technique
user drills down to access details
ThemeRiver Starlcon LinkBrush of the data. Visualization technol-
PixelMap Hemisphere
ogy may be used for all three
steps, presenting an overview of
the data and allowing the user to
potential for exploring large databases. Visual data identify interesting subsets. In analyzing the patterns,
exploration is especially useful when little is known it is important to maintain the overview visualization
about the data and the exploration goals are vague. while focusing on the subset using another visualiza-
Since the user is directly involved in the exploration tion technique. An alternative is to distort the
process, shifting and adjusting the exploration goals overview visualization in order to focus on the inter-
might be done automatically through the interactive esting subsets. Note that visualization technology pro-
interface of the visualization software. vides not only the base visualization techniques for all
The visual data exploration process can be viewed as three steps but bridges the gaps between the steps.
a hypothesis-generation process, whereby through
visualizations of the data allow users to gain insight Visualization Techniques
into the data and come up with new hypotheses. Ver- Information visualization focuses on data sets lacking
ification of the hypotheses can also be accomplished inherent 2D or 3D semantics and therefore also lack-
via visual data exploration, as well as through auto- ing a standard mapping of abstract data onto the phys-
matic techniques derived from statistics and machine ical space of the paper or screen. A number of
learning. In addition to granting the user direct well-known techniques visualize (partially) such data
involvement, visual data exploration involves several sets, including x-y plots, line plots, and histograms.
main advantages over the automatic data mining tech- These techniques are useful for data exploration but
niques in statistics and machine learning: are limited to relatively small low-dimensional data
sets. A large number of novel information visualization
• Deals more easily with highly inhomogeneous and techniques have been developed over the past decade,
noisy data; allowing visualizations of ever larger and more com-
• Is intuitive; and plex, or multidimensional, data sets [4].
• Requires no understanding of complex mathemati- These techniques are classified using three criteria:
cal or statistical algorithms or parameters. the data to be visualized, the technique itself, and the
interaction and distortion method (see Figure 1). For
As a result, visual data exploration usually allows visualizing a specific data type, any of the visualization
faster data exploration, often delivering better results, techniques can be used in conjunction with any of the
especially in cases where automatic algorithms fail. In interaction and distortion methods. Note that the clas-
addition, the related techniques are essential for com- sification does not assume disjoint categories, as mul-
municating complex data mining results to humans, tiple visualization techniques can be combined with
even when machine learning or statistical techniques multiple interaction techniques.
are employed. A visual representation provides a much The classification begins with the data type to be
higher degree of confidence in the findings of the visualized [11], including whether it is:

40 August 2001/Vol. 44, No. 8 COMMUNICATIONS OF THE ACM


• One-dimensional (such as temporal data, as in Fig- landscapes) for visualizing the data.
ure 2); • Geometrically transformed displays using geometric
• Two-dimensional data (such as geographical maps, transformations and projections to produce useful
as in Figure 3); visualizations. Included are parallel coordinates (see
• Multidimensional data (such as relational tables, as Figure 4), projection pursuit, and the various tech-
in Figure 4); niques for visualizing graphs [3].
• Text and hypertext (such as news articles and Web • Icon-based displays that visualize each data item as
documents); an icon (such as stick figures) and the dimension
• Hierarchies and graphs (such as telephone calls and values as features of the icons. Figure 1 shows a
Web sites, as in Figure 5); and thumbnail of a star-map view [1]; one star icon
• Algorithms and software (such as debugging opera- maps the call volume of that state with all other
tions). states to the length of the star segments with the
direction corresponding to the approximate direc-
The visualization technique fits into one or more of tion of the state.
the following categories, as identified in Figure 1: • Dense pixel displays that visualize each dimension
value as a color pixel and group the pixels belonging
• Standard 2D/3D displays using standard 2D or to each dimension into an adjacent area [6]. By
3D visualization techniques (such as x-y plots and arranging and coloring the pixels in an appropriate
way, the resulting visualization
Figure 2. The pixel-oriented circle segments technique [6],
provides detailed information on
showing daily data over about 20 years (1974–1995) of 50 stocks in
local correlations, dependencies,
the Frankfurt Allgemeine Zeitung (Frankfurt Stock Index). Note
and hot spots, as in Figure 2.
the three bright outer rings corresponding to high-price periods and
• Stacked displays that visualize
subsequent low-price periods. The technique maps each data value
the data partitioned hierarchi-
to a colored pixel; high values correspond to bright colors. The
cally. In multidimensional data,
various stocks are also mapped to the segments of the circle; the
the data dimensions to be used
pixels are arranged in a back-and-forth fashion adjacent to
for building the hierarchy have
the segment-halving line.
to be selected carefully. To
obtain a useful visualization,
the most important dimensions
have to correspond to the first
levels of the hierarchy.

The techniques associated with


each of these categories differ in
how they arrange the data on the
screen (such as 2D display or
semantic arrangement) and how
they deal with multiple dimen-
sions in case of multidimensional
data (such as multiple windows,
icon features, and hierarchy).

Interaction and Distortion


Techniques
In addition to these techniques,
data exploration also depends on
interaction and distortion tech-
niques. Interaction techniques,
which allow users to interact
directly with a visualization,
include filtering, zooming, and
IEEE, 1999

linking, thus allowing the data


analyst to make dynamic changes

COMMUNICATIONS OF THE ACM August 2001/Vol. 44, No. 8 41


Figure 3. The SWIFT-3D system [8], showing call of a visualization according to the exploration
volume data from the AT&T long-distance network. objectives; they also make it possible to relate and
Developed at AT&T Research Labs, the system combine multiple independent visualizations.
integrates relevant visualization techniques ranging Note that connecting multiple visualizations
from statistical displays (such as line graphs and his- through interactive techniques provides more
tograms) for overview displays and interactive data information than considering the component
selection, to pixel-oriented visualizations for a bird’s visualizations independently, as in the lower-right
eye overview and navigation in 3D displays, to thumbnail in Figure 1.
interactive 3D maps, to drag-and-drop query tools Interactive distortion techniques support the
for interactive detailed viewing of data from a data exploration process by preserving an
variety of perspectives. overview of the data during drill-down opera-
tions. Basically, they show portions of the data
with a high level of detail and other portions with
a lower level of detail. Popular distortion tech-
niques are hyperbolic and spherical, as in Figure 5,
and the TableLens approach developed by R. Rao
and S. Card at Xerox PARC, for multiattribute
tabular data, such as that derived from customer
and shopping behavior [10].

Evaluating Techniques and


Systems for Suitability
Visualization techniques and visual data explo-
ration systems can be evaluated and compared with
respect to their suitability for certain data charac-
IEEE, 1996

teristics (such as data types, number of dimensions,


number of data items, and category). Task charac-

Figure 4. The parallel coordinates technique [5]. In conjunction with similarity-based coloring,
it displays each multidimensional data item as a polygonal line intersecting the dimension
axes at the position corresponding to the data value for the dimension.
IEEE, 2000

42 August 2001/Vol. 44, No. 8 COMMUNICATIONS OF THE ACM


teristics include clustering, classification, associations, business benefits for a range of organizations, includ-
and multivariate hot spots; visualization characteristics ing those involved in e-commerce. Visual data explo-
include visual overlap and learning curve. ration has great potential for revealing interesting
Different visualization techniques are used for visu- patterns in data (such as clusters, correlations, depen-
alizing different data types. Some are specially designed dencies, and exceptions). Within the next two to five
to support one specific data type; others are more gen- years, many applications, including fraud detection,
eral, allowing the visualization of a range of data types. marketing, and data mining, will incorporate informa-
General visualization techniques are not equally suited tion visualization technology to improve their data
for all data characteristics; for example, icon-based analysis functions.
visualization techniques allow only the visualization of The next step for data analysts will involve the
a limited number of dimensions, and pixel-based tight integration of visualization tools with tradi-
techniques are not suitable for categorical data. tional techniques from such disciplines as statistics,
As there is no universal technique, each one has to be machine learning, operations research, and simula-
evaluated for its suitability for the task at hand [7]. tion. Integration of visualization tools and these
While some are specially designed for certain tasks (such more established methods would combine fast auto-
as classification and clustering) [2], others are more gen- matic data mining algorithms with the intuitive
eral, useful for a range of tasks. Desirable visualization power of the human mind, improving the quality
characteristics for any technique include limited visual and speed of the data exploration process. Visual
overlap, fast learning, and good recall. Undesirable visu- exploration also needs to be tightly integrated with
alization characteristics include occlusions and line the systems used to manage the vast amounts of rela-
crossings that might appear to the user/viewer as an arti- tional and semistructured information, including
fact limiting the usefulness of the
visualization technique. Figure 5. The hemisphere hierarchy visualization technique [9].
Well-regarded visual data The result maps a 2D layout algorithm onto a hemisphere,
exploration and analysis providing a nice overview, good focus, and context operations,
research prototypes include the even for very large graphs.
XmdvTool developed by M.
Ward and his students at the
Worcester Polytechnic Institute,
Worcester, MA, and the VisDB
system [7], I developed with my
students at the Universities of
Munich, Halle, and Konstanz.
Statistical data analysis packages
include S Plus, developed by R.
Becker, J. Chambers, and A.
Wilks at AT&T Research Labs
(commercially available from
Insightful, www.insightful.
com), and XGobi developed by
D. Swayne, D. Cook, and A.
Buja at AT&T Research Labs.
Commercial visual data explo-
ration systems include Silicon
Graphics’ MineSet, the Deci-
sionSite system from Spotfile
(www.spotfire.com), and
eBizinsight from Visual Insights
(www.visualinsights.com).

Conclusion
Addressing the important but
IEEE, 2000

challenging problem of how to


explore large data sets promises

COMMUNICATIONS OF THE ACM August 2001/Vol. 44, No. 8 43


database management and data warehouse systems.
The ultimate goal—possibly within five years—is
to bring the power of visualization technology to any
desktop machine, providing a more intuitive, faster
exploration of very large data resources. This power
and convenience will be valuable not only in the eco-
nomic sense but be a delight to use while prompting
users to think in new ways about their data. c

References
1. Abello, J. and Korn, J. Visualizing massive multi-digraphs. In Proceedings
of Information Visualization’00 (Salt Lake City, UT, Oct. 9–13). IEEE
Computer Science Press, Los Alamitos, CA, 2000, 39–47.
2. Ankerst, M., Elsen, C., Ester, M., and Kriegel, H. Visual classification: An
interactive approach to decision tree construction. In Proceedings of
Knowledge Discovery in Databases’99 (San Diego, CA, Aug. 15–18). ACM
Press, New York, 1999, 392–396.
3. Battista, G., Eades, P., Tamassia, R., and Tollis I. Graph Drawing: Algo-
rithms for the Visualization of Graphs. Prentice Hall, Englewood Cliffs, NJ,
1999.
4. Card, S., Mackinlay, J., and Shneiderman, B. Readings in Information
Visualization: Using Vision to Think. Morgan Kaufmann, San Francisco,
1999.
5. Inselberg, A. and Dimsdale, B. Parallel coordinates: A tool for visualizing
multidimensional geometry. In Proceedings of Visualization’90 (San Fran-
cisco, Oct. 23–26). IEEE Press, Los Alamitos, CA, 1990, 361–370.
6. Keim, D. Designing pixel-oriented visualization techniques: Theory and
applications. Transact. Vis. Comput. Graph. 6, 1 (Jan.–Mar. 2000),
59–78.
7. Keim, D. An introduction to information visualization techniques for
exploring very large databases. Tutorial notes, Information Visualiza-
tion’00 (Salt Lake City, UT, Oct. 9–13, 2000).
8. Koutsofios, E., North, S., and Keim, D. Visualizing large telecommuni-
cation data sets: Visualization Blackboard. IEEE Comput. Graph. Appl. 19,
3 (May/June 1999), 16–19.
9. Kreusler, M., Lopez, N., and Schumann, H. A scalable framework for
information visualization. In Proceedings of Information Visualization’00
(Salt Lake City, UT, Oct. 9–13). IEEE Computer Science Press, Los
Alamitos, CA, 2000, 27–35.
10. Rao, R. and Card, S. The TableLens: Merging graphical and symbolic
representation in an interactive focus-context visualization for tabular
information. In Proceedings of Human Factors in Computing Systems
CHI’94 (Boston, Apr. 24–28). ACM Press, New York, 1994, 318–322.
11. Shneiderman, B. The eyes have it: A task by data-type taxonomy for infor-
mation visualizations. In Proceedings of Visual Languages (Boulder, CO,
Sept. 3–6). IEEE Computer Science Press, Los Alamitos, CA, 1996,
336–343.
12. Ware, C. Information Visualization: Perception for Design. Academic Press,
San Diego, CA, 2000.

Daniel A. Keim (keim@informatik.uni-konstanz.de) is a professor


in and head of the Database and Visualization Group, University of
Konstanz, Germany, and a senior researcher in the Information
Visualization Research Department of AT&T Shannon Labs, Florham
Park, NJ.

Permission to make digital or hard copies of all or part of this work for personal or class-
room use is granted without fee provided that copies are not made or distributed for profit
or commercial advantage and that copies bear this notice and the full citation on the first
page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee.

© 2001 ACM 0002-0782/01/0800 $5.00

44 August 2001/Vol. 44, No. 8 COMMUNICATIONS OF THE ACM

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy