0% found this document useful (0 votes)
32 views34 pages

Chapter 15

The document discusses evaluation methods that do not require direct user involvement, including inspections and predictive models. It focuses on heuristic evaluation, describing Nielsen's heuristics for evaluating user interfaces. Heuristic evaluation involves experts inspecting an interface and comparing it to usability principles to identify potential problems. The document also discusses using multiple evaluators, noting research showing using 3-5 evaluators can find about 75% of usability issues, though more may be needed depending on the nature of the problems.

Uploaded by

bensu0603
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views34 pages

Chapter 15

The document discusses evaluation methods that do not require direct user involvement, including inspections and predictive models. It focuses on heuristic evaluation, describing Nielsen's heuristics for evaluating user interfaces. Heuristic evaluation involves experts inspecting an interface and comparing it to usability principles to identify potential problems. The document also discusses using multiple evaluators, noting research showing using 3-5 evaluators can find about 75% of usability issues, though more may be needed depending on the nature of the problems.

Uploaded by

bensu0603
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Chapter 15

EVALUATION: INSPECTIONS, ANALYTICS,


AND MODELS
 15.1 Introduction
 15.2 Inspections: Heuristic Evaluation and Walkthroughs
 15.3 Analytics
 15.4 Predictive Models
Objectives

The main aims of this chapter are to:

 Describe the key concepts associated with inspection methods.


 Explain how to do heuristic evaluation and walkthroughs.
 Explain the role of analytics in evaluation.
 Describe how to perform two types of predictive methods: GOMS and Fitts'
Law.

15.1 Introduction
The evaluation methods described so far in this book have involved interaction
with, or direct observation of, users. In this chapter we introduce methods that
are based on understanding users through knowledge codified in heuristics, or
data collected remotely, or models that predict users' performance. None of these
methods require users to be present during the evaluation. Inspection methods
typically involve an expert role-playing the users for whom the product is
designed, analyzing aspects of an interface, and identifying any potential
usability problems by using a set of guidelines. The most well known are
heuristic evaluation and walkthroughs. Analytics involves user interaction
logging, which is often done remotely. Predictive models involve analyzing the
various physical and mental operations that are needed to perform particular
tasks at the interface and operationalizing them as quantitative measures. Two of
the most commonly used predictive models are GOMS and Fitts' Law.
15.2 Inspections: Heuristic Evaluation and
Walkthroughs
Sometimes users are not easily accessible, or involving them is too expensive or
takes too long. In such circumstances other people, usually referred to as experts,
can provide feedback. These are people who are knowledgeable about both
interaction design and the needs and typical behavior of users. Various
inspection methods were developed as alternatives to usability testing in the
early 1990s, drawing on software engineering practice where code and other
types of inspections are commonly used. These inspection methods include
heuristic evaluations, and walkthroughs, in which experts examine the interface
of an interactive product, often role-playing typical users, and suggest problems
users would likely have when interacting with it. One of the attractions of these
methods is that they can be used at any stage of a design project. They can also be
used to complement user testing.

15.2.1 Heuristic Evaluation

Heuristic evaluation is a usability inspection method that was developed by


Nielsen and his colleagues (Nielsen and Mohlich, 1990; Nielsen, 1994a;
Hollingshead and Novick, 2007), in which experts, guided by a set of usability
principles known as heuristics, evaluate whether user-interface elements, such as
dialog boxes, menus, navigation structure, online help, and so on, conform to
tried and tested principles. These heuristics closely resemble high-level design
principles (e.g. making designs consistent, reducing memory load, and using
terms that users understand). The original set of heuristics identified by Nielsen
and his colleagues was derived empirically from an analysis of 249 usability
problems (Nielsen, 1994b); a revised version of these heuristics is listed below
(Nielsen, 2010: useit.com):

 Visibility of system status


The system should always keep users informed about what is going on,
through appropriate feedback within reasonable time.
 Match between system and the real world
The system should speak the users' language, with words, phrases, and
concepts familiar to the user, rather than system-oriented terms. Follow
real-world conventions, making information appear in a natural and
logical order.
 User control and freedom
Users often choose system functions by mistake and will need a clearly
marked emergency exit to leave the unwanted state without having to
go through an extended dialog. Support undo and redo.
 Consistency and standards
Users should not have to wonder whether different words, situations, or
actions mean the same thing. Follow platform conventions.
 Error prevention
Even better than good error messages is a careful design that prevents a
problem from occurring in the first place. Either eliminate error-prone
conditions or check for them and present users with a confirmation
option before they commit to the action.
 Recognition rather than recall
Minimize the user's memory load by making objects, actions, and
options visible. The user should not have to remember information
from one part of the dialog to another. Instructions for use of the system
should be visible or easily retrievable whenever appropriate.
 Flexibility and efficiency of use
Accelerators – unseen by the novice user – may often speed up the
interaction for the expert user such that the system can cater to both
inexperienced and experienced users. Allow users to tailor frequent
actions.
 Aesthetic and minimalist design
Dialogues should not contain information that is irrelevant or rarely
needed. Every extra unit of information in a dialog competes with the
relevant units of information and diminishes their relative visibility.
 Help users recognize, diagnose, and recover from errors
Error messages should be expressed in plain language (no codes),
precisely indicate the problem, and constructively suggest a solution.
 Help and documentation
Even though it is better if the system can be used without
documentation, it may be necessary to provide help and documentation.
Any such information should be easy to search, focused on the user's
task, list concrete steps to be carried out, and not be too large.
These heuristics are intended to be used by judging them against aspects of the
interface. For example, if a new social networking system is being evaluated, the
evaluator might consider how a user would find out how to add friends to her
network. The evaluator is meant to go through the interface several times
inspecting the various interaction elements and comparing them with the list of
usability principles, i.e. the heuristics. At each iteration, usability problems will
be identified or their diagnosis will be refined, until she is satisfied that the
majority of them are clear.

Although many heuristics apply to most products (e.g. be consistent and provide
meaningful feedback), some of the core heuristics are too general for evaluating
products that have come onto the market since Nielsen and Mohlich first
developed the method, such as mobile devices, digital toys, online communities,
ambient devices, and new web services. Nielsen (2010) suggests developing
category-specific heuristics that apply to a specific class of product as a
supplement to the general heuristics. Evaluators and researchers have therefore
typically developed their own heuristics by tailoring Nielsen's heuristics with
other design guidelines, market research, and requirements documents. Exactly
which heuristics are appropriate and how many are needed for different
products is debatable and depends on the goals of the evaluation, but most sets of
heuristics have between five and ten items. This number provides a good range
of usability criteria by which to judge the various aspects of an interface. More
than ten becomes difficult for evaluators to remember; fewer than five tends not
to be sufficiently discriminating.

A key question that is frequently asked is how many evaluators are needed to
carry out a thorough heuristic evaluation? While one evaluator can identify a
large number of problems, she may not catch all of them. She may also have a
tendency to concentrate more on one aspect at the expense of missing others. For
example, in a study of heuristic evaluation where 19 evaluators were asked to
find 16 usability problems in a voice response system allowing customers access
to their bank accounts, Nielsen (1992) found a substantial difference between the
number and type of usability problems found by the different evaluators. He also
notes that while some usability problems are very easy to find by all evaluators,
there are some problems that are found by very few experts. Therefore, he
argues that it is important to involve multiple evaluators in any heuristic
evaluation and recommends between three and five evaluators. His findings
suggest that they can typically identify around 75% of the total usability
problems, as shown in Figure 15.1 (Nielsen, 1994a).

However, employing multiple experts can be costly. Skillful experts can capture
many of the usability problems by themselves and some consultancies now use
this technique as the basis for critiquing interactive devices – a process that has
become known as an expert critique or expert crit in some countries. But using
only one or two experts to conduct a heuristic evaluation can be problematic
since research has challenged Nielsen's findings and questioned whether even
three to five evaluators is adequate. For example, Cockton and Woolrych (2001)
and Woolrych and Cockton (2001) point out that the number of experts needed to
find 75% of problems depends on the nature of the problems. Their analysis of
problem frequency and severity suggests that highly misleading findings can
result.

Figure 15.1 Curve showing the proportion of usability problems in an interface


found by heuristic evaluation using various numbers of evaluators. The curve
represents the average of six case studies of heuristic evaluation

The conclusion from this is that more is better, but more is expensive. However,
because users and special facilities are not needed for heuristic evaluation and it
is comparatively inexpensive and quick, it is popular with developers and is often
known as discount evaluation. For a quick evaluation of an early design, one or
two experts can probably identify most potential usability problems but if a
thorough evaluation of a fully working prototype is needed then having a team of
experts conducting the evaluation and comparing their findings would be
advisable.

Heuristic Evaluation for Websites


As more attention focuses on the web, heuristics for evaluating websites have
become increasingly important. Several slightly different sets of heuristics
exist. Box 15.1 contains an extract from a version compiled by web developer
Andy Budd that places a stronger emphasis on information content than Nielsen's
heuristics.

BOX 15.1

Extract from the heuristics developed by Budd (2007) that emphasize web design
issues

Clarity

Make the system as clear, concise, and meaningful as possible for the intended audience.

 Write clear, concise copy


 Only use technical language for a technical audience
 Write clear and meaningful labels
 Use meaningful icons.
Minimize unnecessary complexity and cognitive load

Make the system as simple as possible for users to accomplish their tasks.

 Remove unnecessary functionality, process steps, and visual clutter


 Use progressive disclosure to hide advanced features
 Break down complicated processes into multiple steps
 Prioritize using size, shape, color, alignment, and proximity.
Provide users with context

Interfaces should provide users with a sense of context in time and space.

 Provide a clear site name and purpose


 Highlight the current section in the navigation
 Provide a breadcrumb trail
 Use appropriate feedback messages
 Show number of steps in a process
 Reduce perception of latency by providing visual cues (e.g. progress indicator)
or by allowing users to complete other tasks while waiting.
Promote a pleasurable and positive user experience
The user should be treated with respect and the design should be aesthetically pleasing
and promote a pleasurable and rewarding experience.

 Create a pleasurable and attractive design


 Provide easily attainable goals
 Provide rewards for usage and progression.
ACTIVITY 15.1

1. Select a website that you regularly visit and evaluate it using the heuristics
in Box 15.1. Do these heuristics help you to identify important usability and
user experience issues?
2. Does being aware of the heuristics influence how you interact with the website
in any way?
Comment

1. The heuristics focus on key usability criteria such as whether the interface
seemed unnecessarily complex and how color was used. Budd's heuristics also
encourage consideration of how the user feels about the experience of
interacting with the website.
2. Being aware of the heuristics leads to a stronger focus on the design and the
interaction, and raises awareness of what the user is trying to do and how the
website is responding.
Turning Design Guidelines into Heuristics

There is a strong relationship between design guidelines and the heuristics used
in heuristic evaluation. As a first step to developing new heuristics, evaluators
sometimes translate design guidelines into questions for use in heuristic
evaluation. This practice has become quite widespread for addressing usability
and user experience concerns for specific types of interactive product. For
example Väänänen-Vainio-Mattila and Waljas (2009) from the University of
Tempere in Finland took this approach when developing heuristics for web
service user experience. They tried to identify, what they called ‘hedonic
heuristics,’ which is a new kind of heuristic that directly addresses how users feel
about their interactions. These were based on design guidelines concerning
whether the user feels that the web service provides a lively place where it is
enjoyable to spend time, and whether it satisfies the user's curiosity by frequently
offering interesting content. When stated as questions these become: Is the
service a lively place where it is enjoyable to spend time? Does the service satisfy
users' curiosity by frequently offering interesting content?
ACTIVITY 15.2

Consider the following design guidelines for information design and for each one suggest
a question that could be used in heuristic evaluation:

1. Good graphical design is important. Reading long sentences, paragraphs, and


documents is difficult on screen, so break material into discrete, meaningful
chunks to give the website structure (Horton, 2005).
2. Avoid excessive use of color. Color is useful for indicating different kinds of
information, i.e. cueing (Koyani et al, 2004).
3. Avoid gratuitous use of graphics and animation. In addition to increasing
download time, graphics and animation soon become boring and annoying.
Comment

We suggest the following questions; you may have identified others:

1. Good graphical design is important. Is the page layout structured


meaningfully? Is there too much text on each page?
2. Avoid excessive use of color. How is color used? Is it used as a form of coding?
Is it used to make the site bright and cheerful? Is it excessive and garish? Does
it have an impact on the user's enjoyment (i.e., user's experience)?
3. Avoid gratuitous use of graphics and animation. Are there any flashing
banners? Are there complex introduction sequences? Can they be short-
circuited? Do the graphics add to the site and improve the user's experience?
Another important issue when designing and evaluating web pages and other
types of system is their accessibility to a broad range of users, as mentioned
in Chapter 1 and throughout this book. In the USA, a requirement known as
Section 508 of the Rehabilitation Act came into effect in 2001. The act requires
that all federally funded IT systems be accessible for people with disabilities. The
guidelines provided by this Act can be used as heuristics to check that systems
comply with it (see Case Study 14.1). Mankoff et al (2005) also used guidelines as
heuristics to evaluate specific kinds of usability. They discovered that developers
doing a heuristic evaluation using a screen reader found 50% of known usability
problems – which was more successful than user testing directly with blind users.
Figure 15.2 A screen showing MoFax on a cell phone

Heuristic evaluation has been used for evaluating mobile technologies (Brewster
and Dunlop, 2004). An example is provided by Wright et al (2005) who evaluated
a mobile fax application, known as MoFax. MoFax users can send and receive
faxes to conventional fax machines or to other MoFax users. This application was
created to support groups working with construction industry representatives
who often send faxes of plans to each other. Using MoFax enables team members
to browse and send faxes on their cell phones while out in the field (see Figure
15.2). At the time of the usability evaluation, the developers knew there were
some significant problems with the interface, so they carried out a heuristic
evaluation using Nielsen's heuristics to learn more. Three expert evaluators
performed the evaluation and together they identified 56 problems. Based on
these results, the developers redesigned MoFax.

Heuristic evaluation has also been used to evaluate abstract aesthetic peripheral
displays that portray non-critical information at the periphery of the user's
attention (Mankoff et al, 2003). Since these devices are not designed for task
performance, the researchers had to develop a set of heuristics that took this into
account. They did this by developing two ambient displays: one indicated how
close a bus is to the bus-stop by showing its number move upwards on a screen;
the other indicated how light or dark it was outside by lightening or darkening a
light display (see Figure 15.3). Then they modified Nielsen's heuristics to address
the characteristics of ambient displays and asked groups of experts to evaluate
the displays using them.

Figure 15.3 Two ambient devices: (a) bus indicator, (b) lightness and darkness
indicator

The heuristics that they developed included some that were specifically geared
towards ambient systems such as:

 Visibility of state: The state of the display should be clear when it is


placed in the intended setting.
 Peripherality of display: The display should be unobtrusive and remain
so unless it requires the user's attention. Users should be able to easily
monitor the display.
In this study the researchers found that three to five evaluators were able to
identify 40–60% of known usability issues. In a follow-up study, different
researchers used the same heuristics with different ambient applications
(Consolvo and Towle, 2005). They found 75% of known usability problems with
eight evaluators and 35–55% were found with three to five evaluators, suggesting
that the more evaluators you have, the more accurate the results will be – as
other researchers have also reported.
The drive to develop heuristics for other products continues and includes video
games (Pinelle et al, 2008), online communities (Preece and Shneiderman, 2009),
and information visualization (Forsell and Johansson, 2010).

Doing Heuristic Evaluation

Heuristic evaluation has three stages:

1. The briefing session, in which the experts are told what to do. A
prepared script is useful as a guide and to ensure each person receives
the same briefing.
2. The evaluation period, in which each expert typically spends 1–2 hours
independently inspecting the product, using the heuristics for guidance.
The experts need to take at least two passes through the interface. The
first pass gives a feel for the flow of the interaction and the product's
scope. The second pass allows the evaluator to focus on specific
interface elements in the context of the whole product, and to identify
potential usability problems.
If the evaluation is for a functioning product, the evaluators need to
have some specific user tasks in mind so that exploration is focused.
Suggesting tasks may be helpful but many experts suggest their own
tasks. However, this approach is less easy if the evaluation is done early
in design when there are only screen mockups or a specification; the
approach needs to be adapted to the evaluation circumstances. While
working through the interface, specification, or mockups, a second
person may record the problems identified, or the evaluator may think
aloud. Alternatively, she may take notes herself. Evaluators should be
encouraged to be as specific as possible and to record each problem
clearly.
3. The debriefing session, in which the evaluators come together to discuss
their findings and to prioritize the problems they found and suggest
solutions.
The heuristics focus the evaluators' attention on particular issues, so selecting
appropriate heuristics is critically important. Even so, there is sometimes less
agreement among evaluators than is desirable, as discussed in the Dilemma
below.

There are fewer practical and ethical issues in heuristic evaluation than for other
methods because users are not involved. A week is often cited as the time needed
to train evaluators (Nielsen and Mack, 1994), but this depends on the person's
initial expertise. Typical users can be taught to do heuristic evaluation, although
there have been claims that this approach is not very successful (Nielsen, 1994a).
A variation of this method is to take a team approach that may involve users.

ACTIVITY 15.3

Look at the Nielsen (2010) heuristics and consider how you would use them to evaluate a
website for purchasing clothes (e.g. www.REI.com, which has a homepage similar to that
in Figure 15.4).

1. Do the heuristics help you focus on the web site more intently than if you were
not using them?
2. Might fewer heuristics be better? Which might be combined and what are the
trade-offs?

Figure 15.4 Homepage of REI.com

Comment

1. Most people find that using the heuristics encourages them to focus on the
design more than when they are not using them.
2. Some heuristics can be combined and given a more general description. For
example, ‘the system should speak the users language’ and ‘always keep users
informed’ could be replaced with ‘help users to develop a good mental model,’
but this is a more abstract statement and some evaluators might not know
what is packed into it.
An argument for keeping the detail is that it reminds evaluators of the issues to
consider.

DILEMMA

Classic problems or false alarms?

You might have the impression that heuristic evaluation is a panacea for designers, and
that it can reveal all that is wrong with a design. However, it has problems. Shortly after
heuristic evaluation was developed, several independent studies compared heuristic
evaluation with other methods, particularly user testing. They found that the different
approaches often identify different problems and that sometimes heuristic evaluation
misses severe problems (Karat, 1994). This argues for using complementary methods.
Furthermore, heuristic evaluation should not be thought of as a replacement for user
testing.

Another problem concerns experts reporting problems that don't exist. In other words,
some of the experts' predictions are wrong (Bailey, 2001). Bailey cites analyses from
three published sources showing that only around 33% of the problems reported were
real usability problems, some of which were serious, others trivial. However, the
heuristic evaluators missed about 21% of users' problems. Furthermore, about 43% of
the problems identified by the experts were not problems at all; they were false alarms!
Bailey points out that this means only about half the problems identified are true
problems: “More specifically, for every true usability problem identified, there will be a
little over one false alarm (1.2) and about one half of one missed problem (0.6). If this
analysis is true, heuristic evaluators tend to identify more false alarms and miss more
problems than they have true hits.”

How can the number of false alarms or missed serious problems be reduced? Checking
that experts really have the expertise that they claim would help, but how can this be
done? One way to overcome these problems is to have several evaluators. This helps to
reduce the impact of one person's experience or poor performance. Using heuristic
evaluation along with user testing and other methods is also a good idea.

15.2.2 Walkthroughs

Walkthroughs are an alternative approach to heuristic evaluation for predicting


users' problems without doing user testing. As the name suggests, they involve
walking through a task with the product and noting problematic usability
features. Most walkthrough methods do not involve users. Others, such as
pluralistic walkthroughs, involve a team that includes users, developers, and
usability specialists.

In this section we consider cognitive and pluralistic walkthroughs. Both were


originally developed for desktop systems but, as with heuristic evaluation, they
can be adapted to web-based systems, handheld devices, and products such as
DVD players.

Cognitive Walkthroughs

“Cognitive walkthroughs involve simulating a user's problem-solving process at


each step in the human–computer dialog, checking to see if the user's goals and
memory for actions can be assumed to lead to the next correct action” (Nielsen
and Mack, 1994, p. 6). The defining feature is that they focus on evaluating
designs for ease of learning – a focus that is motivated by observations that users
learn by exploration (Wharton et al, 1994). The steps involved in cognitive
walkthroughs are:

1. The characteristics of typical users are identified and documented and


sample tasks are developed that focus on the aspects of the design to be
evaluated. A description or prototype of the interface to be developed is
also produced, along with a clear sequence of the actions needed for the
users to complete the task.
2. A designer and one or more expert evaluators come together to do the
analysis.
3. The evaluators walk through the action sequences for each task, placing
it within the context of a typical scenario, and as they do this they try to
answer the following questions:
1. Will the correct action be sufficiently evident to the user? (Will
the user know what to do to achieve the task?)
2. Will the user notice that the correct action is available? (Can
users see the button or menu item that they should use for the
next action? Is it apparent when it is needed?)
3. Will the user associate and interpret the response from the
action correctly? (Will users know from the feedback that they
have made a correct or incorrect choice of action?)
In other words: will users know what to do, see how to do it,
and understand from feedback whether the action was correct
or not?
4. As the walkthrough is being done, a record of critical information is
compiled in which:
1. The assumptions about what would cause problems and why
are identified.
2. Notes about side issues and design changes are made.
3. A summary of the results is compiled.
5. The design is then revised to fix the problems presented.
As with heuristic and other evaluation methods, developers and
researchers sometimes modify the method to meet their own needs
more closely. One example of this is provided by a company called
Userfocus (www.userfocus.com) that uses the following four questions,
rather than those listed in point 3 above, as they are more suitable for
evaluating physical devices such as TV remote controllers:
1. Will the customer realistically be trying to do this action. (This
question does not presume that users will actually carry out
certain actions.)
2. Is the control for the action visible?
3. Is there a strong link between the control and the action?
4. Is feedback appropriate?
When doing a cognitive walkthrough it is important to document the process,
keeping account of what works and what doesn't. A standardized feedback form
can be used in which answers are recorded to each question. Any negative
answers are carefully documented on a separate form, along with details of the
system, its version number, the date of the evaluation, and the evaluators' names.
It is also useful to document the severity of the problems: for example, how likely
a problem is to occur and how serious it will be for users. The form can also
record the process details outlined in points 1 to 4 as well as the date of the
evaluation.

Compared with heuristic evaluation, this technique focuses more closely on


identifying specific user problems at a high level of detail. Hence, it has a narrow
focus that is useful for certain types of system but not others. In particular, it can
be useful for applications involving complex operations. However, it is very time-
consuming and laborious to do and evaluators need a good understanding of the
cognitive processes involved.

The following example shows a cognitive walkthrough of buying this book


at www.Amazon.com.
 Task: to buy a copy of this book from www.Amazon.com
 Typical users: students who use the web regularly
The steps to complete the task are given below. Note that the interface for
www.Amazon.com may have changed since we did our evaluation.

Step 1. Selecting the correct category of goods on the homepage


Q: Will users know what to do?
Answer: Yes, they know that they must find books.

Q: Will users see how to do it?


Answer: Yes, they have seen menus before and will know to select the
appropriate item and to click ‘go.’

Q: Will users understand from feedback whether the action was correct or not?
Answer: Yes, their action takes them to a form that they need to complete to
search for the book.

Step 2. Completing the form


Q: Will users know what to do?
Answer: Yes, the online form is like a paper form so they know they have to
complete it.
Answer: No, they may not realize that the form has defaults to prevent
inappropriate answers because this is different from a paper form.

Q: Will users see how to do it?


Answer: Yes, it is clear where the information goes and there is a button to tell
the system to search for the book.

Q: Will users understand from the feedback whether the action was correct or
not?
Answer: Yes, they are taken to a picture of the book, a description, and purchase
details.

ACTIVITY 15.4

Activity 15.3 asked you to do a heuristic evaluation of www.REI.com or a similar online


retail site. Now go back to that site and do a cognitive walkthrough to buy something, say
a pair of skis. When you have completed the evaluation, compare your findings from the
cognitive walkthrough with those from heuristic evaluation.
Comment

The cognitive walkthrough probably took longer than the heuristic evaluation for
evaluating the same part of the site because it examines each step of a task.
Consequently, you probably did not see as much of the website. It is also likely that the
cognitive walkthrough resulted in more detailed findings. Cognitive walkthrough is a
useful method for examining a small part of a system in detail, whereas heuristic
evaluation is useful for examining a whole system or large parts of systems. As the name
indicates, the cognitive walkthrough focuses on the cognitive aspects of interacting with
the system. It was developed before there was much emphasis on aesthetic design and
other user experience goals.

Another variation of cognitive walkthrough was developed by Rick Spencer of


Microsoft, to overcome some problems that he encountered when using the
original form of cognitive walkthrough (Spencer, 2000). The first problem was
that answering the three questions in step 3 and discussing the answers took too
long. Second, designers tended to be defensive, often invoking long explanations
of cognitive theory to justify their designs. This second problem was particularly
difficult because it undermined the efficacy of the method and the social
relationships of team members. In order to cope with these problems, Rick
Spencer adapted the method by reducing the number of questions and curtailing
discussion. This meant that the analysis was more coarse-grained but could be
completed in about 2.5 hours. He also identified a leader, the usability specialist,
and set strong ground rules for the session, including a ban on defending a
design, debating cognitive theory, or doing designs on the fly.

These adaptations made the method more usable, despite losing some of the
detail from the analysis. Perhaps most important of all, Spencer directed the
social interactions of the design team so that they achieved their goals.

Pluralistic Walkthroughs

“Pluralistic walkthroughs are another type of walkthrough in which users,


developers and usability experts work together to step through a [task] scenario,
discussing usability issues associated with dialog elements involved in the
scenario steps” (Nielsen and Mack, 1994, p. 5). In a pluralistic walkthrough, each
of the evaluators is asked to assume the role of a typical user. Scenarios of use,
consisting of a few prototype screens, are given to each evaluator who writes
down the sequence of actions they would take to move from one screen to
another, without conferring with fellow panelists. Then the panelists discuss the
actions they each suggested before moving on to the next round of screens. This
process continues until all the scenarios have been evaluated (Bias, 1994).

The benefits of pluralistic walkthroughs include a strong focus on users' tasks at a


detailed level, i.e. looking at the steps taken. This level of analysis can be
invaluable for certain kinds of systems, such as safety-critical ones, where a
usability problem identified for a single step could be critical to its safety or
efficiency. The approach lends itself well to participatory design practices by
involving a multidisciplinary team in which users play a key role. Furthermore,
the group brings a variety of expertise and opinions for interpreting each stage of
an interaction. Limitations include having to get all the experts together at once
and then proceed at the rate of the slowest. Furthermore, only a limited number
of scenarios, and hence paths through the interface, can usually be explored
because of time constraints.

15.3 Analytics
Analytics is a method for evaluating user traffic through a system. When used to
examine traffic on a website or part of a website as discussed in Chapter 7, it is
known as web analytics. Web analytics can be collected locally or remotely across
the Internet by logging user activity, counting and analyzing the data in order to
understand what parts of the website are being used and when. Although
analytics are a form of evaluation that is particularly useful for evaluating the
usability of a website, they are also valuable for business planning. Many
companies use the services of other companies, such as Google and VisiStat, that
specialize in providing analytics and the analysis necessary to understand the
data – e.g. graphs, tables, and other types of data visualizations. An example of
how web analytics can be used to analyze and help developers to improve
website performance is provided by VisiStat's analysis of Mountain Wines’
website (VisiStat, 2010).

Mountain Wines, located in Saratoga, California, aims to create memorable


experiences for guests who visit the vineyard. Following on the tradition started
by Paul Masson, world famous wine maker, Mountain Wines offers a beautiful
venue for a variety of events including weddings, corporate meetings, dinner
parties, birthdays, concerts, and vacations. Mountain Wines uses a variety of
advertising media to attract customers to its website and in 2010 invested about
$10 000 a month in advertising. However, the website has remained unchanged
for several years because the company didn't have a way of evaluating its
effectiveness and could not decide whether to increase or decrease investment in
it. Recently Mountain Wines decided to employ a company called VisiStat, which
offers a web analytics tool. Prior to enlisting this company, the only record that
Mountain Wines had of the effectiveness of its advertising came from its front-
desk employees who were instructed to ask visitors ‘How did you hear about
Mountain Wines?’.

VisiStat provided Mountain Wines with data showing how their website was
being used by potential customers, e.g. data like that shown in Figures
15.5 to 15.7. Figure 15.5 provides an overview of the number of page views of the
website per day. Figure 15.6 provides additional details and shows the hour-by-
hour traffic for May 8. Clicking on the first icon for more detail shows where the
IP addresses of the traffic are located (Figure 15.7). VisiStat can also provide
information about such things as which visitors are new to the site, which are
returners, and which other pages visitors came from.

Using this data and other data provided by VisiStat, Mountain Wines could see
visitor totals, traffic averages, traffic sources, visitor activity, and more. They
discovered the importance of visibility for their top search words; they could
pinpoint where guests were going on their website; and they could see where
their guests were geographically located.

Figure 15. 5 A general view of the kind of data provided by VisiStat


Figure 15.6 Clicking on May 8 provides an hourly report from midnight until
10.00 p.m. (only midnight and 2.00 p.m.–7.00 p.m. shown)

Figure 15.7 Clicking on the icon for the first hour in Figure 15.6 shows where the
IP addresses of the 13 visitors to the website are located

ACTIVITY 15.5

1. How were users involved in the Mountain Wines website evaluation?


2. From the information described above how might Mountain Wines have used
the results of the analysis to improve its website?
3. Where was the evaluation carried out?
Comment

1. Users were not directly involved but their behavior on the website was
tracked.
2. Mountain Wines may have changed its keywords. By tracking the way visitors
traveled through the website, web navigation and content layout could be
improved to make searching and browsing more effective and pleasurable.
The company also may have added information to attract visitors from other
regions.
3. We are not told where the evaluation was carried out. VisiStat may have
installed its software at Mountain Wines (the most likely option) or they may
have collected and analyzed the data remotely.
More recently other types of specialist analytics have also been developed such as
visual analytics, in which thousands and sometimes millions of data points are
displayed and manipulated visually, such as Hansen et al's (2011) social network
analysis (see Figure 15.8).

Figure 15.8 Social network analysis

Lifelogging is another interesting variation that can be used for evaluation as


well as for sharing information with friends, family, and colleagues. Typically,
lifelogging involves recording GPS location data and personal interaction data on
cell phones. In an evaluation context this can raise privacy concerns. Even
though users tend to get used to being logged they generally want to remain in
control of the logging (Kärkkäinen et al, 2010).
DILEMMA

Analyzing workers' social networking behavior – an invasion of privacy?

Salesforce.com's ‘Chatter’ is analytics software that can be used by IT administrators to


track workers' behavior on social networking sites during working hours. The data
collected can be used to determine who is collaborating with whom, and to inform
developers about how much their applications are being used – a concept often referred
to as stickiness. While these reasons for tracking users appear to be bona fide, is this a
threat to personal privacy?

15.4 Predictive Models


Similar to inspection methods and analytics, predictive models evaluate a system
without users being present. Rather than involving expert evaluators role-playing
users as in inspections, or tracking their behavior as in analytics, predictive
models use formulas to derive various measures of user performance. Predictive
modeling provides estimates of the efficiency of different systems for various
kinds of task. For example, a cell phone designer might choose to use a predictive
method because it can enable her to determine accurately which is the optimal
layout of keys on a cell phone for allowing common operations to be performed.

A well-known predictive modeling technique is GOMS. This is a generic term used


to refer to a family of models that vary in their granularity concerning the
aspects of the user's performance they model and make predictions about. These
include the time it takes to perform tasks and the most effective strategies to use.
The models have been used mainly to predict user performance when comparing
different applications and devices. Below we describe two of the most well-
known members of the GOMS family: the GOMS model and its daughter, the
keystroke level model (KLM).

15.4.1 The GOMS Model

The GOMS model was developed in the early 1980s by Card, Moran, and Newell
and is described in a seminal paper (Card et al, 1983). It was an attempt to model
the knowledge and cognitive processes involved when users interact with
systems. The term GOMS is an acronym that stands for goals, operators, methods,
and selection rules:

 Goals refer to a particular state the user wants to achieve (e.g. find a
website on interaction design).
 Operators refer to the cognitive processes and physical actions that
need to be performed in order to attain those goals (e.g. decide on which
search engine to use, think up and then enter keywords into the search
engine). The difference between a goal and an operator is that a goal is
obtained and an operator is executed.
 Methods are learned procedures for accomplishing the goals. They
consist of the exact sequence of steps required (e.g. type in keywords in
a Google search box and press the search button).
 Selection rules are used to determine which method to select when
there is more than one available for a given stage of a task. For example,
once keywords have been entered into a search engine entry field,
many search engines allow users to press the return key on the
keyboard or click the go button using the mouse to progress the search.
A selection rule would determine which of these two methods to use in
the particular instance.
Below is a detailed example of a GOMS model for deleting a word in a sentence
using Microsoft Word.

Goal: delete a word in a sentence


Method for accomplishing goal of deleting a word using menu option:

Step 1. Recall that word to be deleted has to be highlighted


Step 2. Recall that command is ‘cut’
Step 3. Recall that command ‘cut’ is in edit menu
Step 4. Accomplish goal of selecting and executing the ‘cut’ command
Step 5. Return with goal accomplished

Method for accomplishing goal of deleting a word using delete key:

Step 1. Recall where to position cursor in relation to word to be deleted


Step 2. Recall which key is delete key
Step 3. Press delete key to delete each letter
Step 4. Return with goal accomplished
Operators to use in the above methods:

 Click mouse
 Drag cursor over text
 Select menu
 Move cursor to command
 Press key
Selection rules to decide which method to use:

1. Delete text using mouse and selecting from menu if a large amount of
text is to be deleted.
2. Delete text using delete' key if small number of letters are to be deleted.
15.4.2 The Keystroke Level Model (KLM)

The KLM differs from the GOMS model in that it provides numerical predictions
of user performance. Tasks can be compared in terms of the time it takes to
perform them when using different strategies. The main benefit of making this
kind of quantitative predictions is that different features of systems and
applications can be easily compared to see which might be the most effective for
performing specific kinds of task.

When developing the KLM, Card et al (1983) analyzed the findings of many
empirical studies of user performance in order to derive a standard set of
approximate times for the main kinds of operators used during a task. In so
doing, they were able to come up with the average time it takes to carry out
common physical actions (e.g. press a key, click a mouse button), together with
other aspects of user–computer interaction (e.g. the time it takes to decide what to
do and the system response rate). Below are the core times they proposed for
these (note how much variability there is in the time it takes to press a key for
users with different typing skills).
The predicted time it takes to execute a given task is then calculated by describing
the sequence of actions involved and then summing together the approximate
times that each one will take:

Texecute = TK + TP + TH + TD + TM + TR

For example, consider how long it would take to insert the word ‘not’ into the
following sentence, using a word-processing program like Microsoft Word:

Running through the streets naked is normal.

So that it becomes:

Running through the streets naked is not normal.

First we need to decide what the user will do. We are assuming that she will have
read the sentences beforehand and so start our calculation at the point where she
is about to carry out the requested task. To begin she will need to think about
what method to select. So, we first note a mental event (M operator). Next she will
need to move the cursor into the appropriate point of the sentence. So, we note
an H operator (i.e. reach for the mouse). The remaining sequence of operators are
then: position the mouse before the word ‘normal’ (P), click the mouse button (P1),
move hand from mouse over the keyboard ready to type (H), think about which
letters to type (M), type the letters n, o, and t (3K), and finally press the spacebar
(K).

The times for each of these operators can then be worked out:

Mentally prepare (M) 1.35

Reach for the mouse (H) 0.40

Position mouse before the word ‘normal’ (P) 1.10

Click mouse (P1) 0.20

Move hands to home position on keys (H) 0.40

Mentally prepare (M) 1.35

Type ‘n’ (good typist) (K) 0.22

Type ‘o’ (K) 0.22

Type ‘t’ (K) 0.22

Type ‘space’ (K) 0.22

Total predicted time: 5.68 seconds

When there are many components to add up, it is often easier to put together all
the same kinds of operator. For example, the above can be rewritten as

2(M) + 2(H) + 1(P) + 1(P1) + 4(K) = 2.70 + 0.80 + 1.10 + 0.2 + 0.88 = 5.68 seconds.

A duration of over 5 seconds seems a long time for inserting a word into a
sentence, especially for a good typist. Having made our calculation it is useful to
look back at the various decisions made. For example, we may want to think why
we included a mental operator before typing the letters n, o, and t, but not before
any of the other physical actions. Was this necessary? Perhaps we don't need to
include it. The decision when to include a time for mentally preparing for a
physical action is one of the main difficulties with using the keystroke level
model. Sometimes it is obvious when to include one, especially if the task
requires making a decision, but for other times it can seem quite arbitrary.
Another problem is that, just as typing skills vary between individuals, so too do
the mental preparation times people spend thinking about what to do. Mental
preparation can vary from under 0.5 of a second to well over a minute. Practice
at modeling similar kinds of task and comparing the results with actual times
taken can help overcome these problems. Ensuring that decisions are applied
consistently also helps, e.g. applying the same modeling decisions when
comparing two prototypes.

ACTIVITY 15.6

As described in the GOMS model above there are two main ways to delete words from a
sentence when using a word processor like Word. These are:

1. Deleting each letter of the word individually by using the delete key.
2. Highlighting the word using the mouse and then deleting the highlighted
section in one go.
Which of the two methods is quickest for deleting the word ’not’ from the following
sentence?

I do not like using the keystroke level model.

Comment

1. Our analysis for method 1 is:

2. Our analysis for method 2 is:


The result seems counter-intuitive. Why do you think this is? The amount of time
required to select the letters to be deleted is longer for the second method than pressing
the delete key three times in the first method. If the word had been any longer, for
example, ’keystroke,’ then the keystroke analysis would have predicted the opposite.
There are also other ways of deleting words, such as double clicking on the word to select
it and then either pressing the delete key or the combination of Ctrl+X keys. What do you
think the keystroke level model would predict for either of these two methods?

Case Study 15.1

Using GOMS in the redesign of a phone-based response system

Usability consultant Bill Killam and his colleagues worked with the US Internal Revenue
Service (IRS) several years ago to evaluate and redesign the telephone response
information system (TRIS). The goal of TRIS was to provide the general public with
advice about filling out a tax return – and those of you who have to do this know only too
well how complex it is. Although this case study is situated in the USA, such phone-based
information systems are widespread across the world.

Typically, telephone answering systems can be frustrating to use. Have you been
annoyed by the long menus of options such systems provide when you are trying to buy
a train ticket or when making an appointment for a technician to fix your phone line?
What happens is that you work your way through several different menu systems,
selecting an option from the first list of, say, seven options, only to find that now you
must choose from another list of five alternatives. Then, having spent several minutes
doing this, you discover that you made the wrong choice back in the first menu, so you
have to start again. Does this sound familiar? Other problems are that often there are too
many options to remember, and none of them seems to be the right one for you.

The usability specialists used the GOMS keystroke level model to predict how well a
redesigned user interface compared with the original TRIS interface for supporting
users' tasks. In addition they also conducted usability testing.
15.4.3 Benefits and Limitations of GOMS

One of the main attractions of the GOMS approach is that it allows comparative
analyses to be performed for different interfaces, prototypes, or specifications
relatively easily. Since its inception, a number of researchers have used the
method, reporting on its success for comparing the efficacy of different computer-
based systems.

Since Card et al developed GOMS and KLM, many new and different types of
product have been developed. Researchers wanting to use the KLM to predict the
efficiency of key and button layout on devices have adapted it to meet the needs
of these new products. Typically, they considered whether the range of operators
was applicable and whether they needed additional ones. They also had to check
the times allotted to these operators to make sure that they were appropriate.
This involved carrying out laboratory tests with users.

Today, mobile device and phone developers are using the KLM to determine the
optimal design for keypads (e.g. see Luo and John, 2005). For example, in order to
do a keystroke model analysis to evaluate the design of advanced cell phone
interaction, Holleis et al (2007) had to create several new operators including a
Macro Attention Shift (SMacro) to describe the time it takes users to shift their
attention from the screen of an advanced cell phone to a distant object such as a
poster or screen in the real world, or vice versa, as indicated in Figure 15.9.

Figure 15.9 Attention shift (S) between the cell phone and objects in the real
world

From their work these researchers concluded that the KLM could be adapted for
use with advanced cell phones and that it was very successful. Like other
researchers they also discovered that even expert users vary considerably in the
ways that they use these devices and that there is even more variation within the
whole user population.

While GOMS can be useful in helping make decisions about the effectiveness of
new products, it is not often used for evaluation purposes. Part of the problem is
its highly limited scope: it can only really model computer-based tasks that
involve a small set of highly routine data-entry type tasks. Furthermore, it is
intended to be used only to predict expert performance, and does not allow for
errors to be modeled. This makes it much more difficult (and sometimes
impossible) to predict how average users will carry out their tasks when using a
range of systems, especially those that have been designed to be used in very
flexible ways. In most situations, it isn't possible to predict how users will
perform. Many unpredictable factors come into play including individual
differences among users, fatigue, mental workload, learning effects, and social
and organizational factors. For example, most people do not carry out their tasks
sequentially but will be constantly multitasking, dealing with interruptions and
talking to others.

A challenge with predictive models, therefore, is that they can only make
predictions about predictable behavior. Given that most people are unpredictable
in the way they behave, it makes it difficult to use them as a way of evaluating
how systems will be used in real-world contexts. They can, however, provide
useful estimates for comparing the efficiency of different methods of completing
tasks, particularly if the tasks are short and clearly defined.

15.4.4 Fitts' Law

Fitts' Law (Fitts, 1954) predicts the time it takes to reach a target using a pointing
device. It was originally used in human factors research to model the relationship
between speed and accuracy when moving towards a target on a display. In
interaction design, it has been used to describe the time it takes to point at a
target, based on the size of the object and the distance to the object. Specifically, it
is used to model the time it takes to use a mouse and other input devices to click
on objects on a screen. One of its main benefits is that it can help designers decide
where to locate buttons, what size they should be, and how close together they
should be on a screen display. The law states that:

T = k log2(D/S + 1.0)
where

T = time to move the pointer to a target

D = distance between the pointer and the target

S = size of the target

k is a constant of approximately 200 ms/bit

In a nutshell, the bigger the target, the easier and quicker it is to reach it. This is
why interfaces that have big buttons are easier to use than interfaces that present
lots of tiny buttons crammed together. Fitts' Law also predicts that the most
quickly accessed targets on any computer display are the four corners of the
screen. This is because of their pinning action, i.e. the sides of the display
constrain the user from over-stepping the target. However, as pointed out by Tog
on the AskTog website, corners seem strangely to be avoided at all costs by
designers.

Fitts' Law can be useful for evaluating systems where the time to physically locate
an object is critical to the task at hand. In particular, it can help designers think
about where to locate objects on the screen in relation to each other. This is
especially useful for mobile devices, where there is limited space for placing
icons and buttons on the screen. For example, in a study carried out by Nokia,
Fitts' Law was used to predict expert text entry rates for several input methods
on a 12-key cell phone keypad (Silverberg et al, 2000). The study helped the
designers make decisions about the size of keys, their positioning, and the
sequences of presses to perform common tasks. Trade-offs between the size of a
device and accuracy of using it were made with the help of calculations from this
model. Fitts' Law has also been used to compare eye-tracking input with manual
input for visual targets (Vertegaal, 2008) and to compare different ways of
mapping Chinese characters to the keypad of cell phones (Liu and Räihä, 2010).

ACTIVITY 15.7

Microsoft toolbars provide the user with the option of displaying a label below each tool.
Give a reason why labeled tools may be accessed faster. (Assume that the user knows the
tool and does not need the label to identify it.)
Comment

The label becomes part of the target and hence the target gets bigger. As we mentioned
earlier, bigger targets can be accessed more quickly.

Furthermore, tool icons that don't have labels are likely to be placed closer together so
they are more crowded. Spreading the icons further apart creates buffer zones of space
around the icons so that if users accidentally go past the target they will be less likely to
select the wrong icon. When the icons are crowded together the user is at greater risk of
accidentally overshooting and selecting the wrong icon. The same is true of menus where
the items are closely bunched together.

Assignment

This assignment continues the work you did on the web-based ticketing system at the end
of Chapters 10, 11, and 14. The aim of this assignment is to evaluate the prototypes
produced in the assignment of Chapter 11 using heuristic evaluation.

1. Decide on an appropriate set of heuristics and perform a heuristic evaluation of


one of the prototypes you designed in Chapter 11.
2. Based on this evaluation, redesign the prototype to overcome the problems
you encountered.
3. Compare the findings from this evaluation with those from the usability
testing in the previous chapter. What differences do you observe? Which
evaluation approach do you prefer and why?
summary

This chapter presented inspection evaluation methods, focusing on heuristic evaluation


and walkthroughs which are usually done by specialists (usually referred to as experts),
who role-play users' interactions with designs, prototypes, and specifications and then
offer their opinions. Heuristic evaluation and walkthroughs offer the evaluator a
structure to guide the evaluation process.

Analytics, in which user interaction is logged, is often performed remotely and without
users being aware that their interactions are being tracked. Very large volumes of data
are collected, anonymized, and statistically analyzed using specially developed software
services. The analysis provides information about how a system is used, e.g. how
different versions of a website or prototype perform, or which parts of a website are
seldom used – possibly due to poor usability design or lack of appeal. Data are often
presented visually so that it is easier to see trends and interpret the results.
The GOMS and KLM models, and Fitts' Law, can be used to predict user performance.
These methods can be useful for determining whether a proposed interface, system, or
keypad layout will be optimal. Typically they are used to compare different designs for a
small sequence of tasks. These methods are labor-intensive and so do not scale well for
large systems.

Evaluators frequently find that they have to tailor these methods so that they can use
them with the wide range of products that have come onto the market since the methods
were originally developed.

Key points

 Inspections can be used for evaluating a range of representations including


requirements, mockups, functional prototypes, or systems.
 User testing and heuristic evaluation often reveal different usability problems.
 Other types of inspections used in interaction design include pluralistic and
cognitive walkthroughs.
 Walkthroughs are very focused and so are suitable for evaluating small parts
of a product.
 Analytics involves collecting data about the interactions of users in order to
identify which parts of a website or prototype are underused.
 When applied to websites, analytics are often referred to as ‘web analytics.’
 The GOMS and KLM models and Fitts' Law can be used to predict expert, error-
free performance for certain kinds of tasks.
 Predictive models require neither users nor usability experts to be present, but
the evaluators must be skilled in applying the models.
 Predictive models are used to evaluate systems with limited, clearly defined,
functionality such as data entry applications, and key-press sequences for cell
phones and other handheld devices.

Further Reading
CARD, S. K., MORAN, T. P. and NEWELL, A. (1983) The Psychology of Human
Computer Interaction. Lawrence Erlbaum Associates. This seminal book describes
GOMS and the keystroke level model.

KOYANI, S. J., BAILEY, R. W. and NALL, J. R. (2004) Research-Based Web Design


and Usability Heuristics. GSA. This book contains a thorough review of usability
guidelines derived from empirical research. The collection is impressive but each
guideline needs to be evaluated and used thoughtfully.
MACKENZIE, I. S. (1992) Fitts' Law as a research and design tool in human–
computer interaction. Human–Computer Interaction, 7, 91–139. This early paper
by Scott Mackenzie, an expert in the use of Fitts' Law, provides a detailed
discussion of how it can be used in HCI.

MACKENZIE, I. S. and SOUKOREFF, R. W. (2002) Text entry for mobile


computing: models and methods, theory and practice. Human–Computer
Interaction, 17, 147–198. This paper provides a useful survey of mobile text-entry
techniques and discusses how Fitts' Law can inform their design.

MANKOFF, J., DEY, A. K., HSICH, G., KIENTZ, J. and LEDERER, M. A. (2003)
Heuristic evaluation of ambient devices. Proceedings of CHI 2003, ACM, 5(1), 169–
176. More recent papers are available on this topic but we recommend this paper
because it describes how to derive rigorous heuristics for new kinds of
applications. It illustrates how different heuristics are needed for different
applications.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy