Chapter 15
Chapter 15
15.1 Introduction
The evaluation methods described so far in this book have involved interaction
with, or direct observation of, users. In this chapter we introduce methods that
are based on understanding users through knowledge codified in heuristics, or
data collected remotely, or models that predict users' performance. None of these
methods require users to be present during the evaluation. Inspection methods
typically involve an expert role-playing the users for whom the product is
designed, analyzing aspects of an interface, and identifying any potential
usability problems by using a set of guidelines. The most well known are
heuristic evaluation and walkthroughs. Analytics involves user interaction
logging, which is often done remotely. Predictive models involve analyzing the
various physical and mental operations that are needed to perform particular
tasks at the interface and operationalizing them as quantitative measures. Two of
the most commonly used predictive models are GOMS and Fitts' Law.
15.2 Inspections: Heuristic Evaluation and
Walkthroughs
Sometimes users are not easily accessible, or involving them is too expensive or
takes too long. In such circumstances other people, usually referred to as experts,
can provide feedback. These are people who are knowledgeable about both
interaction design and the needs and typical behavior of users. Various
inspection methods were developed as alternatives to usability testing in the
early 1990s, drawing on software engineering practice where code and other
types of inspections are commonly used. These inspection methods include
heuristic evaluations, and walkthroughs, in which experts examine the interface
of an interactive product, often role-playing typical users, and suggest problems
users would likely have when interacting with it. One of the attractions of these
methods is that they can be used at any stage of a design project. They can also be
used to complement user testing.
Although many heuristics apply to most products (e.g. be consistent and provide
meaningful feedback), some of the core heuristics are too general for evaluating
products that have come onto the market since Nielsen and Mohlich first
developed the method, such as mobile devices, digital toys, online communities,
ambient devices, and new web services. Nielsen (2010) suggests developing
category-specific heuristics that apply to a specific class of product as a
supplement to the general heuristics. Evaluators and researchers have therefore
typically developed their own heuristics by tailoring Nielsen's heuristics with
other design guidelines, market research, and requirements documents. Exactly
which heuristics are appropriate and how many are needed for different
products is debatable and depends on the goals of the evaluation, but most sets of
heuristics have between five and ten items. This number provides a good range
of usability criteria by which to judge the various aspects of an interface. More
than ten becomes difficult for evaluators to remember; fewer than five tends not
to be sufficiently discriminating.
A key question that is frequently asked is how many evaluators are needed to
carry out a thorough heuristic evaluation? While one evaluator can identify a
large number of problems, she may not catch all of them. She may also have a
tendency to concentrate more on one aspect at the expense of missing others. For
example, in a study of heuristic evaluation where 19 evaluators were asked to
find 16 usability problems in a voice response system allowing customers access
to their bank accounts, Nielsen (1992) found a substantial difference between the
number and type of usability problems found by the different evaluators. He also
notes that while some usability problems are very easy to find by all evaluators,
there are some problems that are found by very few experts. Therefore, he
argues that it is important to involve multiple evaluators in any heuristic
evaluation and recommends between three and five evaluators. His findings
suggest that they can typically identify around 75% of the total usability
problems, as shown in Figure 15.1 (Nielsen, 1994a).
However, employing multiple experts can be costly. Skillful experts can capture
many of the usability problems by themselves and some consultancies now use
this technique as the basis for critiquing interactive devices – a process that has
become known as an expert critique or expert crit in some countries. But using
only one or two experts to conduct a heuristic evaluation can be problematic
since research has challenged Nielsen's findings and questioned whether even
three to five evaluators is adequate. For example, Cockton and Woolrych (2001)
and Woolrych and Cockton (2001) point out that the number of experts needed to
find 75% of problems depends on the nature of the problems. Their analysis of
problem frequency and severity suggests that highly misleading findings can
result.
The conclusion from this is that more is better, but more is expensive. However,
because users and special facilities are not needed for heuristic evaluation and it
is comparatively inexpensive and quick, it is popular with developers and is often
known as discount evaluation. For a quick evaluation of an early design, one or
two experts can probably identify most potential usability problems but if a
thorough evaluation of a fully working prototype is needed then having a team of
experts conducting the evaluation and comparing their findings would be
advisable.
BOX 15.1
Extract from the heuristics developed by Budd (2007) that emphasize web design
issues
Clarity
Make the system as clear, concise, and meaningful as possible for the intended audience.
Make the system as simple as possible for users to accomplish their tasks.
Interfaces should provide users with a sense of context in time and space.
1. Select a website that you regularly visit and evaluate it using the heuristics
in Box 15.1. Do these heuristics help you to identify important usability and
user experience issues?
2. Does being aware of the heuristics influence how you interact with the website
in any way?
Comment
1. The heuristics focus on key usability criteria such as whether the interface
seemed unnecessarily complex and how color was used. Budd's heuristics also
encourage consideration of how the user feels about the experience of
interacting with the website.
2. Being aware of the heuristics leads to a stronger focus on the design and the
interaction, and raises awareness of what the user is trying to do and how the
website is responding.
Turning Design Guidelines into Heuristics
There is a strong relationship between design guidelines and the heuristics used
in heuristic evaluation. As a first step to developing new heuristics, evaluators
sometimes translate design guidelines into questions for use in heuristic
evaluation. This practice has become quite widespread for addressing usability
and user experience concerns for specific types of interactive product. For
example Väänänen-Vainio-Mattila and Waljas (2009) from the University of
Tempere in Finland took this approach when developing heuristics for web
service user experience. They tried to identify, what they called ‘hedonic
heuristics,’ which is a new kind of heuristic that directly addresses how users feel
about their interactions. These were based on design guidelines concerning
whether the user feels that the web service provides a lively place where it is
enjoyable to spend time, and whether it satisfies the user's curiosity by frequently
offering interesting content. When stated as questions these become: Is the
service a lively place where it is enjoyable to spend time? Does the service satisfy
users' curiosity by frequently offering interesting content?
ACTIVITY 15.2
Consider the following design guidelines for information design and for each one suggest
a question that could be used in heuristic evaluation:
Heuristic evaluation has been used for evaluating mobile technologies (Brewster
and Dunlop, 2004). An example is provided by Wright et al (2005) who evaluated
a mobile fax application, known as MoFax. MoFax users can send and receive
faxes to conventional fax machines or to other MoFax users. This application was
created to support groups working with construction industry representatives
who often send faxes of plans to each other. Using MoFax enables team members
to browse and send faxes on their cell phones while out in the field (see Figure
15.2). At the time of the usability evaluation, the developers knew there were
some significant problems with the interface, so they carried out a heuristic
evaluation using Nielsen's heuristics to learn more. Three expert evaluators
performed the evaluation and together they identified 56 problems. Based on
these results, the developers redesigned MoFax.
Heuristic evaluation has also been used to evaluate abstract aesthetic peripheral
displays that portray non-critical information at the periphery of the user's
attention (Mankoff et al, 2003). Since these devices are not designed for task
performance, the researchers had to develop a set of heuristics that took this into
account. They did this by developing two ambient displays: one indicated how
close a bus is to the bus-stop by showing its number move upwards on a screen;
the other indicated how light or dark it was outside by lightening or darkening a
light display (see Figure 15.3). Then they modified Nielsen's heuristics to address
the characteristics of ambient displays and asked groups of experts to evaluate
the displays using them.
Figure 15.3 Two ambient devices: (a) bus indicator, (b) lightness and darkness
indicator
The heuristics that they developed included some that were specifically geared
towards ambient systems such as:
1. The briefing session, in which the experts are told what to do. A
prepared script is useful as a guide and to ensure each person receives
the same briefing.
2. The evaluation period, in which each expert typically spends 1–2 hours
independently inspecting the product, using the heuristics for guidance.
The experts need to take at least two passes through the interface. The
first pass gives a feel for the flow of the interaction and the product's
scope. The second pass allows the evaluator to focus on specific
interface elements in the context of the whole product, and to identify
potential usability problems.
If the evaluation is for a functioning product, the evaluators need to
have some specific user tasks in mind so that exploration is focused.
Suggesting tasks may be helpful but many experts suggest their own
tasks. However, this approach is less easy if the evaluation is done early
in design when there are only screen mockups or a specification; the
approach needs to be adapted to the evaluation circumstances. While
working through the interface, specification, or mockups, a second
person may record the problems identified, or the evaluator may think
aloud. Alternatively, she may take notes herself. Evaluators should be
encouraged to be as specific as possible and to record each problem
clearly.
3. The debriefing session, in which the evaluators come together to discuss
their findings and to prioritize the problems they found and suggest
solutions.
The heuristics focus the evaluators' attention on particular issues, so selecting
appropriate heuristics is critically important. Even so, there is sometimes less
agreement among evaluators than is desirable, as discussed in the Dilemma
below.
There are fewer practical and ethical issues in heuristic evaluation than for other
methods because users are not involved. A week is often cited as the time needed
to train evaluators (Nielsen and Mack, 1994), but this depends on the person's
initial expertise. Typical users can be taught to do heuristic evaluation, although
there have been claims that this approach is not very successful (Nielsen, 1994a).
A variation of this method is to take a team approach that may involve users.
ACTIVITY 15.3
Look at the Nielsen (2010) heuristics and consider how you would use them to evaluate a
website for purchasing clothes (e.g. www.REI.com, which has a homepage similar to that
in Figure 15.4).
1. Do the heuristics help you focus on the web site more intently than if you were
not using them?
2. Might fewer heuristics be better? Which might be combined and what are the
trade-offs?
Comment
1. Most people find that using the heuristics encourages them to focus on the
design more than when they are not using them.
2. Some heuristics can be combined and given a more general description. For
example, ‘the system should speak the users language’ and ‘always keep users
informed’ could be replaced with ‘help users to develop a good mental model,’
but this is a more abstract statement and some evaluators might not know
what is packed into it.
An argument for keeping the detail is that it reminds evaluators of the issues to
consider.
DILEMMA
You might have the impression that heuristic evaluation is a panacea for designers, and
that it can reveal all that is wrong with a design. However, it has problems. Shortly after
heuristic evaluation was developed, several independent studies compared heuristic
evaluation with other methods, particularly user testing. They found that the different
approaches often identify different problems and that sometimes heuristic evaluation
misses severe problems (Karat, 1994). This argues for using complementary methods.
Furthermore, heuristic evaluation should not be thought of as a replacement for user
testing.
Another problem concerns experts reporting problems that don't exist. In other words,
some of the experts' predictions are wrong (Bailey, 2001). Bailey cites analyses from
three published sources showing that only around 33% of the problems reported were
real usability problems, some of which were serious, others trivial. However, the
heuristic evaluators missed about 21% of users' problems. Furthermore, about 43% of
the problems identified by the experts were not problems at all; they were false alarms!
Bailey points out that this means only about half the problems identified are true
problems: “More specifically, for every true usability problem identified, there will be a
little over one false alarm (1.2) and about one half of one missed problem (0.6). If this
analysis is true, heuristic evaluators tend to identify more false alarms and miss more
problems than they have true hits.”
How can the number of false alarms or missed serious problems be reduced? Checking
that experts really have the expertise that they claim would help, but how can this be
done? One way to overcome these problems is to have several evaluators. This helps to
reduce the impact of one person's experience or poor performance. Using heuristic
evaluation along with user testing and other methods is also a good idea.
15.2.2 Walkthroughs
Cognitive Walkthroughs
Q: Will users understand from feedback whether the action was correct or not?
Answer: Yes, their action takes them to a form that they need to complete to
search for the book.
Q: Will users understand from the feedback whether the action was correct or
not?
Answer: Yes, they are taken to a picture of the book, a description, and purchase
details.
ACTIVITY 15.4
The cognitive walkthrough probably took longer than the heuristic evaluation for
evaluating the same part of the site because it examines each step of a task.
Consequently, you probably did not see as much of the website. It is also likely that the
cognitive walkthrough resulted in more detailed findings. Cognitive walkthrough is a
useful method for examining a small part of a system in detail, whereas heuristic
evaluation is useful for examining a whole system or large parts of systems. As the name
indicates, the cognitive walkthrough focuses on the cognitive aspects of interacting with
the system. It was developed before there was much emphasis on aesthetic design and
other user experience goals.
These adaptations made the method more usable, despite losing some of the
detail from the analysis. Perhaps most important of all, Spencer directed the
social interactions of the design team so that they achieved their goals.
Pluralistic Walkthroughs
15.3 Analytics
Analytics is a method for evaluating user traffic through a system. When used to
examine traffic on a website or part of a website as discussed in Chapter 7, it is
known as web analytics. Web analytics can be collected locally or remotely across
the Internet by logging user activity, counting and analyzing the data in order to
understand what parts of the website are being used and when. Although
analytics are a form of evaluation that is particularly useful for evaluating the
usability of a website, they are also valuable for business planning. Many
companies use the services of other companies, such as Google and VisiStat, that
specialize in providing analytics and the analysis necessary to understand the
data – e.g. graphs, tables, and other types of data visualizations. An example of
how web analytics can be used to analyze and help developers to improve
website performance is provided by VisiStat's analysis of Mountain Wines’
website (VisiStat, 2010).
VisiStat provided Mountain Wines with data showing how their website was
being used by potential customers, e.g. data like that shown in Figures
15.5 to 15.7. Figure 15.5 provides an overview of the number of page views of the
website per day. Figure 15.6 provides additional details and shows the hour-by-
hour traffic for May 8. Clicking on the first icon for more detail shows where the
IP addresses of the traffic are located (Figure 15.7). VisiStat can also provide
information about such things as which visitors are new to the site, which are
returners, and which other pages visitors came from.
Using this data and other data provided by VisiStat, Mountain Wines could see
visitor totals, traffic averages, traffic sources, visitor activity, and more. They
discovered the importance of visibility for their top search words; they could
pinpoint where guests were going on their website; and they could see where
their guests were geographically located.
Figure 15.7 Clicking on the icon for the first hour in Figure 15.6 shows where the
IP addresses of the 13 visitors to the website are located
ACTIVITY 15.5
1. Users were not directly involved but their behavior on the website was
tracked.
2. Mountain Wines may have changed its keywords. By tracking the way visitors
traveled through the website, web navigation and content layout could be
improved to make searching and browsing more effective and pleasurable.
The company also may have added information to attract visitors from other
regions.
3. We are not told where the evaluation was carried out. VisiStat may have
installed its software at Mountain Wines (the most likely option) or they may
have collected and analyzed the data remotely.
More recently other types of specialist analytics have also been developed such as
visual analytics, in which thousands and sometimes millions of data points are
displayed and manipulated visually, such as Hansen et al's (2011) social network
analysis (see Figure 15.8).
The GOMS model was developed in the early 1980s by Card, Moran, and Newell
and is described in a seminal paper (Card et al, 1983). It was an attempt to model
the knowledge and cognitive processes involved when users interact with
systems. The term GOMS is an acronym that stands for goals, operators, methods,
and selection rules:
Goals refer to a particular state the user wants to achieve (e.g. find a
website on interaction design).
Operators refer to the cognitive processes and physical actions that
need to be performed in order to attain those goals (e.g. decide on which
search engine to use, think up and then enter keywords into the search
engine). The difference between a goal and an operator is that a goal is
obtained and an operator is executed.
Methods are learned procedures for accomplishing the goals. They
consist of the exact sequence of steps required (e.g. type in keywords in
a Google search box and press the search button).
Selection rules are used to determine which method to select when
there is more than one available for a given stage of a task. For example,
once keywords have been entered into a search engine entry field,
many search engines allow users to press the return key on the
keyboard or click the go button using the mouse to progress the search.
A selection rule would determine which of these two methods to use in
the particular instance.
Below is a detailed example of a GOMS model for deleting a word in a sentence
using Microsoft Word.
Click mouse
Drag cursor over text
Select menu
Move cursor to command
Press key
Selection rules to decide which method to use:
1. Delete text using mouse and selecting from menu if a large amount of
text is to be deleted.
2. Delete text using delete' key if small number of letters are to be deleted.
15.4.2 The Keystroke Level Model (KLM)
The KLM differs from the GOMS model in that it provides numerical predictions
of user performance. Tasks can be compared in terms of the time it takes to
perform them when using different strategies. The main benefit of making this
kind of quantitative predictions is that different features of systems and
applications can be easily compared to see which might be the most effective for
performing specific kinds of task.
When developing the KLM, Card et al (1983) analyzed the findings of many
empirical studies of user performance in order to derive a standard set of
approximate times for the main kinds of operators used during a task. In so
doing, they were able to come up with the average time it takes to carry out
common physical actions (e.g. press a key, click a mouse button), together with
other aspects of user–computer interaction (e.g. the time it takes to decide what to
do and the system response rate). Below are the core times they proposed for
these (note how much variability there is in the time it takes to press a key for
users with different typing skills).
The predicted time it takes to execute a given task is then calculated by describing
the sequence of actions involved and then summing together the approximate
times that each one will take:
Texecute = TK + TP + TH + TD + TM + TR
For example, consider how long it would take to insert the word ‘not’ into the
following sentence, using a word-processing program like Microsoft Word:
So that it becomes:
First we need to decide what the user will do. We are assuming that she will have
read the sentences beforehand and so start our calculation at the point where she
is about to carry out the requested task. To begin she will need to think about
what method to select. So, we first note a mental event (M operator). Next she will
need to move the cursor into the appropriate point of the sentence. So, we note
an H operator (i.e. reach for the mouse). The remaining sequence of operators are
then: position the mouse before the word ‘normal’ (P), click the mouse button (P1),
move hand from mouse over the keyboard ready to type (H), think about which
letters to type (M), type the letters n, o, and t (3K), and finally press the spacebar
(K).
The times for each of these operators can then be worked out:
When there are many components to add up, it is often easier to put together all
the same kinds of operator. For example, the above can be rewritten as
2(M) + 2(H) + 1(P) + 1(P1) + 4(K) = 2.70 + 0.80 + 1.10 + 0.2 + 0.88 = 5.68 seconds.
A duration of over 5 seconds seems a long time for inserting a word into a
sentence, especially for a good typist. Having made our calculation it is useful to
look back at the various decisions made. For example, we may want to think why
we included a mental operator before typing the letters n, o, and t, but not before
any of the other physical actions. Was this necessary? Perhaps we don't need to
include it. The decision when to include a time for mentally preparing for a
physical action is one of the main difficulties with using the keystroke level
model. Sometimes it is obvious when to include one, especially if the task
requires making a decision, but for other times it can seem quite arbitrary.
Another problem is that, just as typing skills vary between individuals, so too do
the mental preparation times people spend thinking about what to do. Mental
preparation can vary from under 0.5 of a second to well over a minute. Practice
at modeling similar kinds of task and comparing the results with actual times
taken can help overcome these problems. Ensuring that decisions are applied
consistently also helps, e.g. applying the same modeling decisions when
comparing two prototypes.
ACTIVITY 15.6
As described in the GOMS model above there are two main ways to delete words from a
sentence when using a word processor like Word. These are:
1. Deleting each letter of the word individually by using the delete key.
2. Highlighting the word using the mouse and then deleting the highlighted
section in one go.
Which of the two methods is quickest for deleting the word ’not’ from the following
sentence?
Comment
Usability consultant Bill Killam and his colleagues worked with the US Internal Revenue
Service (IRS) several years ago to evaluate and redesign the telephone response
information system (TRIS). The goal of TRIS was to provide the general public with
advice about filling out a tax return – and those of you who have to do this know only too
well how complex it is. Although this case study is situated in the USA, such phone-based
information systems are widespread across the world.
Typically, telephone answering systems can be frustrating to use. Have you been
annoyed by the long menus of options such systems provide when you are trying to buy
a train ticket or when making an appointment for a technician to fix your phone line?
What happens is that you work your way through several different menu systems,
selecting an option from the first list of, say, seven options, only to find that now you
must choose from another list of five alternatives. Then, having spent several minutes
doing this, you discover that you made the wrong choice back in the first menu, so you
have to start again. Does this sound familiar? Other problems are that often there are too
many options to remember, and none of them seems to be the right one for you.
The usability specialists used the GOMS keystroke level model to predict how well a
redesigned user interface compared with the original TRIS interface for supporting
users' tasks. In addition they also conducted usability testing.
15.4.3 Benefits and Limitations of GOMS
One of the main attractions of the GOMS approach is that it allows comparative
analyses to be performed for different interfaces, prototypes, or specifications
relatively easily. Since its inception, a number of researchers have used the
method, reporting on its success for comparing the efficacy of different computer-
based systems.
Since Card et al developed GOMS and KLM, many new and different types of
product have been developed. Researchers wanting to use the KLM to predict the
efficiency of key and button layout on devices have adapted it to meet the needs
of these new products. Typically, they considered whether the range of operators
was applicable and whether they needed additional ones. They also had to check
the times allotted to these operators to make sure that they were appropriate.
This involved carrying out laboratory tests with users.
Today, mobile device and phone developers are using the KLM to determine the
optimal design for keypads (e.g. see Luo and John, 2005). For example, in order to
do a keystroke model analysis to evaluate the design of advanced cell phone
interaction, Holleis et al (2007) had to create several new operators including a
Macro Attention Shift (SMacro) to describe the time it takes users to shift their
attention from the screen of an advanced cell phone to a distant object such as a
poster or screen in the real world, or vice versa, as indicated in Figure 15.9.
Figure 15.9 Attention shift (S) between the cell phone and objects in the real
world
From their work these researchers concluded that the KLM could be adapted for
use with advanced cell phones and that it was very successful. Like other
researchers they also discovered that even expert users vary considerably in the
ways that they use these devices and that there is even more variation within the
whole user population.
While GOMS can be useful in helping make decisions about the effectiveness of
new products, it is not often used for evaluation purposes. Part of the problem is
its highly limited scope: it can only really model computer-based tasks that
involve a small set of highly routine data-entry type tasks. Furthermore, it is
intended to be used only to predict expert performance, and does not allow for
errors to be modeled. This makes it much more difficult (and sometimes
impossible) to predict how average users will carry out their tasks when using a
range of systems, especially those that have been designed to be used in very
flexible ways. In most situations, it isn't possible to predict how users will
perform. Many unpredictable factors come into play including individual
differences among users, fatigue, mental workload, learning effects, and social
and organizational factors. For example, most people do not carry out their tasks
sequentially but will be constantly multitasking, dealing with interruptions and
talking to others.
A challenge with predictive models, therefore, is that they can only make
predictions about predictable behavior. Given that most people are unpredictable
in the way they behave, it makes it difficult to use them as a way of evaluating
how systems will be used in real-world contexts. They can, however, provide
useful estimates for comparing the efficiency of different methods of completing
tasks, particularly if the tasks are short and clearly defined.
Fitts' Law (Fitts, 1954) predicts the time it takes to reach a target using a pointing
device. It was originally used in human factors research to model the relationship
between speed and accuracy when moving towards a target on a display. In
interaction design, it has been used to describe the time it takes to point at a
target, based on the size of the object and the distance to the object. Specifically, it
is used to model the time it takes to use a mouse and other input devices to click
on objects on a screen. One of its main benefits is that it can help designers decide
where to locate buttons, what size they should be, and how close together they
should be on a screen display. The law states that:
T = k log2(D/S + 1.0)
where
In a nutshell, the bigger the target, the easier and quicker it is to reach it. This is
why interfaces that have big buttons are easier to use than interfaces that present
lots of tiny buttons crammed together. Fitts' Law also predicts that the most
quickly accessed targets on any computer display are the four corners of the
screen. This is because of their pinning action, i.e. the sides of the display
constrain the user from over-stepping the target. However, as pointed out by Tog
on the AskTog website, corners seem strangely to be avoided at all costs by
designers.
Fitts' Law can be useful for evaluating systems where the time to physically locate
an object is critical to the task at hand. In particular, it can help designers think
about where to locate objects on the screen in relation to each other. This is
especially useful for mobile devices, where there is limited space for placing
icons and buttons on the screen. For example, in a study carried out by Nokia,
Fitts' Law was used to predict expert text entry rates for several input methods
on a 12-key cell phone keypad (Silverberg et al, 2000). The study helped the
designers make decisions about the size of keys, their positioning, and the
sequences of presses to perform common tasks. Trade-offs between the size of a
device and accuracy of using it were made with the help of calculations from this
model. Fitts' Law has also been used to compare eye-tracking input with manual
input for visual targets (Vertegaal, 2008) and to compare different ways of
mapping Chinese characters to the keypad of cell phones (Liu and Räihä, 2010).
ACTIVITY 15.7
Microsoft toolbars provide the user with the option of displaying a label below each tool.
Give a reason why labeled tools may be accessed faster. (Assume that the user knows the
tool and does not need the label to identify it.)
Comment
The label becomes part of the target and hence the target gets bigger. As we mentioned
earlier, bigger targets can be accessed more quickly.
Furthermore, tool icons that don't have labels are likely to be placed closer together so
they are more crowded. Spreading the icons further apart creates buffer zones of space
around the icons so that if users accidentally go past the target they will be less likely to
select the wrong icon. When the icons are crowded together the user is at greater risk of
accidentally overshooting and selecting the wrong icon. The same is true of menus where
the items are closely bunched together.
Assignment
This assignment continues the work you did on the web-based ticketing system at the end
of Chapters 10, 11, and 14. The aim of this assignment is to evaluate the prototypes
produced in the assignment of Chapter 11 using heuristic evaluation.
Analytics, in which user interaction is logged, is often performed remotely and without
users being aware that their interactions are being tracked. Very large volumes of data
are collected, anonymized, and statistically analyzed using specially developed software
services. The analysis provides information about how a system is used, e.g. how
different versions of a website or prototype perform, or which parts of a website are
seldom used – possibly due to poor usability design or lack of appeal. Data are often
presented visually so that it is easier to see trends and interpret the results.
The GOMS and KLM models, and Fitts' Law, can be used to predict user performance.
These methods can be useful for determining whether a proposed interface, system, or
keypad layout will be optimal. Typically they are used to compare different designs for a
small sequence of tasks. These methods are labor-intensive and so do not scale well for
large systems.
Evaluators frequently find that they have to tailor these methods so that they can use
them with the wide range of products that have come onto the market since the methods
were originally developed.
Key points
Further Reading
CARD, S. K., MORAN, T. P. and NEWELL, A. (1983) The Psychology of Human
Computer Interaction. Lawrence Erlbaum Associates. This seminal book describes
GOMS and the keystroke level model.
MANKOFF, J., DEY, A. K., HSICH, G., KIENTZ, J. and LEDERER, M. A. (2003)
Heuristic evaluation of ambient devices. Proceedings of CHI 2003, ACM, 5(1), 169–
176. More recent papers are available on this topic but we recommend this paper
because it describes how to derive rigorous heuristics for new kinds of
applications. It illustrates how different heuristics are needed for different
applications.