0% found this document useful (0 votes)
24 views202 pages

Essentials of Biostatistics

The document is a workbook titled 'Essentials of Biostatistics Workbook: Statistical Computing Using Excel® 2003' authored by Lisa M. Sullivan, designed for educational purposes in biostatistics. It includes various chapters covering topics such as data entry, statistical functions, probability, hypothesis testing, and regression analysis, with practice problems for each section. The workbook is published by Jones & Bartlett Learning and is not for sale or distribution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views202 pages

Essentials of Biostatistics

The document is a workbook titled 'Essentials of Biostatistics Workbook: Statistical Computing Using Excel® 2003' authored by Lisa M. Sullivan, designed for educational purposes in biostatistics. It includes various chapters covering topics such as data entry, statistical functions, probability, hypothesis testing, and regression analysis, with practice problems for each section. The workbook is published by Jones & Bartlett Learning and is not for sale or distribution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 202

95313_FM_i_vi.

qxd 3/25/11 12:48 PM Page i

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


Essentials of © Jones & Bartlett Learning, LLC

Biostatistics Workbook
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

Second Edition
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

Statistical Computing
© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
Using Excel® 2003
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
Lisa M. Sullivan, PhD
Professor and Chair, Department of Biostatistics
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION Associate
NOT FOR SALE Dean for Education
OR DISTRIBUTION
Boston University School of Public Health
Boston, Massachusetts
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_FM_i_vi.qxd 3/25/11 12:48 PM Page ii

World Headquarters
Jones & Bartlett Learning Jones & Bartlett Learning Canada Jones & Bartlett Learning International
© Jones &Pine
40 Tall Bartlett
Drive Learning, LLC6339 Ormindale Way © Jones & Bartlett Learning,
Barb House, Barb MewsLLC
Sudbury, MA 01776 Mississauga, Ontario London W6 7PA
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
978-443-5000 L5V 1J2 United Kingdom
info@jblearning.com Canada
www.jblearning.com

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
Jones & Bartlett Learning booksNOT FORare
and products SALE OR
available DISTRIBUTION
through NOT Jones
most bookstores and online booksellers. To contact FOR& SALEOR DISTRIBUTION
Bartlett
Learning directly, call 800-832-0034, fax 978-443-8000, or visit our website www.jblearning.com.

Substantial discounts on bulk quantities of Jones & Bartlett Learning publications are available to corporations, professional
associations, and other qualified organizations. For details and specific discount information, contact the special sales department
at Jones © Jones
& Bartlett & Bartlett
Learning Learning,
via the above LLC or send an email to specialsales@jblearning.com.
contact information © Jones & Bartlett Learning,
LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

Copyright © 2012 by Jones & Bartlett Learning, LLC


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE
ISBN-13: OR DISTRIBUTION
978-0-7637-9531-3 NOT FOR SALE OR DISTRIBUTION
ISBN-10: 0-7637-9531-3

To order this product, use ISBN: 978-1-4496-2394-4

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
All rights reserved. No part ofNOT FORprotected
the material SALE by OR thisDISTRIBUTION NOT
copyright may be reproduced or utilized in any FOR
form, SALE
electronic OR DISTRIBUTION
or
mechanical, including photocopying, recording, or by any information storage and retrieval system, without written permission from
the copyright owner.

This publication is designed to provide accurate and authoritative information in regard to the Subject Matter covered. It is sold with
© Jones
the understanding &publisher
that the Bartlett Learning,
is not LLC legal, accounting, or other©
engaged in rendering Jones service.
professional & Bartlett Learning,
If legal advice or other LLC
NOTisFOR
expert assistance SALE
required, ORofDISTRIBUTION
the service NOT FOR SALE OR DISTRIBUTION
a competent professional person should be sought.

Essentials of Biostatistics Workbook: Statistical Computing Using Excel® 2003 is an independent publication and is not affiliated with, nor
has it been authorized, sponsored, or otherwise approved by Microsoft Corporation.

© Jones & Bartlett


Microsoft Learning,
and Excel are LLC
trademarks of © Jones & Bartlett Learning, LLC
the Microsoft group of companies.
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
Production Credits
Publisher: Michael Brown
Associate Editor: Maro Gartside
Editorial Assistant: Teresa Reilly© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
Production Assistant: Rebekah Linga
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
Senior Marketing Manager: Sophie Fleck
Composition: Publishers’ Design and Production Services, Inc.
Cover Design: Kate Ternullo
Cover Image: © Kheng Guan Toh/ShutterStock, Inc.
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
3277

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_FM_i_vi.qxd 3/25/11 12:48 PM Page iii

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION
Contents
© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

Chapter 1 Basics 1
1.1 Workbooks and Worksheets 1
1.2 © Jones & Bartlett Learning, LLC
Cell Addresses © Jones & Bartlett
1 Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
1.3 Entering and Editing Data 1
1.4 Saving Files 6
1.5 Practice Problems 6
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
Chapter 2 Formulas, Functions, and the Data
NOT FOR SALE OR DISTRIBUTION
Analysis ToolPak 9
NOT FOR SALE OR DISTRIBUTION
2.1 Basic Mathematical Operations 9
2.2 Relative and Absolute Cell References 10
2.3 Creating Formulas and Functions 15
2.4 The Data Analysis ToolPak 17
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
2.5
NOT FOR SALE OR DISTRIBUTION Practice Problems NOT FOR SALE OR DISTRIBUTION 18

Chapter 3 Creating Tables and Graphs 21


3.1 Creating and Formatting Tables 21
3.2 Frequency
© Jones Distribution
& Bartlett TablesLLC
Learning, 25 Learning, LLC
© Jones & Bartlett
3.3 NOTHistograms
FOR SALEand OR Bar Charts
DISTRIBUTION NOT FOR SALE 30OR DISTRIBUTION
3.4 Scatter Diagrams 40
3.5 Practice Problems 45

Chapter
© Jones 4& Bartlett
Summarizing Continuous Variables
Learning, LLC in a Sample
© Jones 47LLC
& Bartlett Learning,
NOT FOR
4.1 SALE OR
TheDISTRIBUTION
Descriptive Statistics Analysis Tool NOT FOR SALE OR DISTRIBUTION
47
4.2 Descriptive Statistics Using Excel Functions 51
4.3 Practice Problems 54

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_FM_i_vi.qxd 3/25/11 12:48 PM Page iv

iv Contents

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
Chapter 5 Working with Probability Functions 55
5.1 Computing Probabilities with the Binomial Distribution 55
5.2 Computing Probabilities with the Normal Distribution 56
5.3 © Jones
Finding & Bartlett
Percentiles of theLearning, LLC
Normal Distribution © Jones60
& Bartlett Learning, LLC
5.4 NOT FOR
Practice SALE OR DISTRIBUTION
Problems NOT FOR61SALE OR DISTRIBUTION

Chapter 6 Confidence Interval Estimates 63


6.1 Confidence Intervals for One Sample, Continuous Outcome 63
©
6.2Jones & Bartlett Learning, LLC © Jones &
Confidence Intervals for One Sample, Dichotomous OutcomeBartlett Learning,
69 LLC
NOT
6.3 FOR SALE OR DISTRIBUTION
Confidence NOT FOR SALE OR DISTRIBUTION
Intervals for Two Independent Samples,
Continuous Outcome 70
6.4 Confidence Intervals for Matched Samples, Continuous Outcome 73
6.5 Confidence Intervals for Two Independent Samples,
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
Dichotomous Outcome 74
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
6.6 Practice Problems 77

Chapter 7 Hypothesis Testing Procedures 79


7.1 Tests with One
© Jones Sample, Continuous
& Bartlett Learning, Outcome
LLC © Jones80& Bartlett Learning, LLC
7.2 Tests with One Sample, Dichotomous
NOT FOR SALE OR DISTRIBUTION Outcome NOT FOR 82SALE OR DISTRIBUTION
7.3 Tests with One Sample, Categorical and Ordinal Outcomes:
The Chi-Square Goodness-of-Fit Test 83
7.4 Tests with Two Independent Samples, Continuous Outcome 85
© Jones & Bartlett Learning, LLC
7.5 Tests with Matched Samples, Continuous © Jones & Bartlett Learning,
Outcome 87 LLC
NOT
7.6 FOR SALE OR Two
Tests with DISTRIBUTION NOT Outcome
Independent Samples, Dichotomous FOR SALE OR DISTRIBUTION
88
7.7 Tests with More Than Two Independent Samples,
Continuous Outcome: Analysis of Variance 92
7.8 Tests for Two or More Independent Samples, Categorical and
© Jones & Bartlett Learning, LLCOutcomes: The Chi-Square
Ordinal © Jones
Test&ofBartlett Learning, LLC 94
Independence
NOT FOR SALE
7.9 OR DISTRIBUTION
Practice Problems NOT FOR SALE OR DISTRIBUTION 97

Chapter 8 Power and Sample Size Determination 101


8.1 Sample Size Estimates for Confidence Intervals with
© Jones & Bartlett
a Continuous Outcome Learning, LLC
in One Sample © Jones & Bartlett Learning, LLC
101
8.2 NOT Size
Sample FOREstimates
SALE OR for DISTRIBUTION
Confidence Intervals with NOT FOR SALE OR DISTRIBUTION
a Dichotomous Outcome in One Sample 103
8.3 Sample Size Estimates for Confidence Intervals with
a Continuous Outcome in Two Independent Samples 104
©8.4Jones & Bartlett Learning,
Sample Size EstimatesLLC
for Confidence Intervals©with
Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
a Continuous Outcome in Matched Samples 106
8.5 Sample Size Estimates for Confidence Intervals with
a Dichotomous Outcome in Two Independent Samples 108
8.6 Issues in Estimating Sample Size for Hypothesis Testing 108
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_FM_i_vi.qxd 3/25/11 12:48 PM Page v

Contents v

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
8.7 Sample Size Estimates for Tests of Means in One Sample 110
8.8 Sample Size Estimates for Tests of Proportions in One Sample 113
8.9 Sample Size Estimates for Tests of Differences in Means in
© Jones & Independent
Two Bartlett Learning,
Samples LLC © Jones & Bartlett
113 Learning, LLC
8.10 NOTSample
FOR SALE OR DISTRIBUTION
Size Estimates for Tests of Mean Differences inNOT FOR SALE OR DISTRIBUTION
Matched Samples 117
8.11 Sample Size Estimates for Tests of Proportions in
Two Independent Samples 119
© Jones &
8.12 Bartlett Learning, LLC
Practice Problems © Jones & Bartlett Learning,
122LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
Chapter 9 Regression Analysis 125
9.1 Simple Linear Regression Analysis 125
9.2 Multiple Linear Regression Analysis 130
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
9.3 Practice Problems 133
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
Chapter 10 Nonparametric Procedures 136
10.1 Ranking Data 136
10.2 Tests
© Jones & with Two Independent
Bartlett Learning, Samples
LLC 139 Learning, LLC
© Jones & Bartlett
10.3 NOTTests
FORwith
SALEMatched Samples
OR DISTRIBUTION NOT FOR SALE141OR DISTRIBUTION
10.4 Tests with More Than Two Independent Samples 147
10.5 Practice Problems 155

Chapter
© Jones 11 Survival
& Bartlett Analysis
Learning, LLC 157LLC
© Jones & Bartlett Learning,
NOT FOR11.1 SALE OREstimating the Survival Function
DISTRIBUTION 157
NOT FOR SALE OR DISTRIBUTION
11.2 Plotting a Survival Function 171
11.3 Comparing Survival Curves 182
11.4 Comparing Two Survival Curves Graphically 190
© Jones & Bartlett11.5 Practice Problems
Learning, LLC © Jones & Bartlett Learning, LLC 194
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_FM_i_vi.qxd 3/25/11 12:48 PM Page vi

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch01_001_008.qxd 3/23/11 3:34 PM Page 1

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


CHAPTER 1
© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


Basics
© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

In this workbook, we describe how Microsoft Office® Excel® and so on) and the rows of the worksheet are numbered. The
can be used to perform the statistical computations and analy- workbook name appears in the top-left corner (Book1), and
ses described in the © textbook.
Jones & Bartlett
Excel Learning,
is a popular program,LLCthe tabs along the bottom©ofJones & show
the screen Bartlett Learning, LLC
the worksheet
NOT FOR
often used for organizing SALE OR
and summarizing DISTRIBUTION
numerical or fi- names. NOT FOR SALE OR DISTRIBUTION
nancial information. It has substantial graphing capabilities It is useful to rename the worksheets to reflect the infor-
and a statistical analysis module that is designed to perform mation stored in each. For example, we rename Sheet1 as Data.
a number of statistical analyses. One of the primary reasons This is done using the Format option along the top menu bar.
we use Excel is its accessibility. Whereas other statistical pack- Under the Format option, we choose the Sheet and then the
© Jones & Bartlett Learning, LLC
ages (e.g., SAS®, SPSS®, and S-PLUS) offer more advanced
© Jones & Bartlett Learning, LLC
Rename options (see Figure 1–2). Once we choose the Rename
NOT FOR SALE OR DISTRIBUTION
analytic techniques and procedures, Excel is suitable for the in- option,NOT FORthe
Excel places SALE
cursorOR DISTRIBUTION
on the worksheet name at the
troductory procedures we present here.1–3 In fact, Excel of- bottom of the screen (Sheet1 in this case), where we can enter
fers many more applications than those we present in this the new name.
workbook. We focus on the concepts and procedures discussed
1.2 CELL ADDRESSES
© Jones & Bartlett Learning,
in the textbook. ReadersLLC © Jones
interested in broader applications of & Bartlett Learning, LLC
Excel OR
should 4
see Dretzke. Before we proceed with specific A worksheet can be thought of as a set of cells. Each cell is
NOT FOR SALE DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
analyses, we first present some basic terminology and general defined by a specific column and row. When we first open
procedures to get started. Excel, the cursor appears in the top-left cell, making that the
current or active cell. Notice in Figure 1–1 and Figure 1–2
1.1 WORKBOOKS AND WORKSHEETS that the top-left cell is outlined in a bold black line. The col-
Excel files are also © Jones
called & Bartlett
workbooks. Learning,
A workbook is a set ofLLCumn and row make up the © cell’s
Jones & Bartlett
address. Learning,
The top-left cell’s LLC
NOT
worksheets, where each FOR SALE
worksheet OR DISTRIBUTION
can be thought of as a table address is A1. As we moveNOT FORaround
the cursor SALE theOR DISTRIBUTION
worksheet
or grid of rows and columns. When we open the Excel pro- into different cells, the address of the current or active cell is
gram, a workbook with three blank worksheets is presented shown just below the menu bars in the top-left portion of the
(this is the default, or preset starting point). Excel calls the screen.
© Jones & Bartlett Learning, LLC
new workbook Book1. The name can be changed when the
1.3 ©ENTERING
Jones & AND Bartlett Learning,
EDITING DATA LLC
workbook is saved (see
NOT FOR SALE OR DISTRIBUTIONSection 1.4 for details). The three NOT FOR SALE OR DISTRIBUTION
worksheets are called Sheet1, Sheet2, and Sheet3. The names of For statistical analysis, we enter data into the cells of an Excel
the worksheets can also be changed. When Excel is opened, worksheet. Once the data are entered, we can manipulate the
Sheet1 is shown on the screen and looks like an empty grid of values and perform statistical analyses. Example 1.1 contains
rows and columns; a sample is shown in Figure 1–1. The data from a small study that we use to illustrate entering and
© Jones & Bartlett
columns of Learning,
the worksheetLLC are labeled with letters (A,©B,Jones
C, & Bartlettdata.
manipulating Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch01_001_008.qxd 3/23/11 3:34 PM Page 2

2 Basics

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 1–1 New Worksheet

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
FIGURE
NOT FOR SALE 1–2 Renaming the Worksheet
OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch01_001_008.qxd 3/23/11 3:34 PM Page 3

Entering and Editing Data 3

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
TABLE 1–1 Data from Study of 5 Participants

© Jones
Subject Identification Number &
Bartlett Learning,
Age LLC Sex © Jones
Weight & Bartlett
(lb) Learning,
Height (in) LLC
1 NOT FOR SALE OR 24 DISTRIBUTIONF NOT FOR SALE OR DISTRIBUTION
125 63
2 21 F 140 68
3 32 M 165 68
4 27 M 170 72
5 25 M 195 71
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

Example 1.1. Suppose we have a sample of n  5 partic- weight, and height) and the rows to hold the observations
ipants and on each participant we measure age, sex, weight, measured in different participants. We use the first row for
© Jones & Bartlett Learning, LLC
and height. We also assign each participant a unique © Jonesthe&variable
identifi- Bartlett Learning,
names—this LLC as the variable names
is important
cationOR
NOT FOR SALE number (shown in the first column of Table 1–1).
DISTRIBUTION NOTThe FORshow on theOR
SALE output, making for easier interpretation. The
DISTRIBUTION
identification numbers are not used in statistical analysis but data shown in Table 1–1 are entered into Excel by moving the
instead are used to keep track of data measured for each par- cursor around the worksheet. Figure 1–3 shows the data en-
ticipant. The data are shown in Table 1–1. tered into the worksheet we named Data. Notice that the
In Excel, we use the columns to hold the different vari- variable names are contained in row 1 and the data measured
© Jones
ables that are measured & Bartlettnumber,
(e.g., identification Learning,
age, sex,LLCon the five participants are
© Jones
shown in& Bartlett
row 2 throughLearning,
LLC
row 6.
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 1–3 Study Data Entered into Excel Worksheet


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch01_001_008.qxd 3/23/11 3:34 PM Page 4

4 Basics

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
There is no restriction on the format or length of variable cell or range of cells in a worksheet. For example, we entered
names. However, it can be easier to work with shorter names, weights in pounds, as shown into our Data worksheet:
simply for convenient viewing of the names and data on the
worksheet. It is important to choose informative names that Weight
© Jones & Bartlett Learning, LLC 125 © Jones & Bartlett Learning, LLC
reflect the information entered.
NOT FOR SALE OR DISTRIBUTION
In Figure 1–3, the current or active cell is E6. The cell ad- 140 NOT FOR SALE OR DISTRIBUTION
165
dress is shown just below the menu bar in the top-left portion
170
of the screen. The contents of the cell (71 in the example) are
195
shown just to the right of the active cell’s address. Once data are
entered, we©can
Joneschange&orBartlett Learning,
modify entries simply byLLCretyping © Jones & Bartlett Learning, LLC
Suppose that weights were actually measured to the nearest
NOT FOR SALE OR DISTRIBUTION
over the contents of the current cell or by typing into the top
hundredths placeNOT FOR
and the SALE
data were ORasDISTRIBUTION
entered follows:
row where the active cell’s contents are shown (see Figure 1–3).
We can move from cell to cell in the worksheet by using the
mouse or by using the arrow keys on the keyboard. Weight
There are some instances where the same data are re- 125.45
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
peated. In Example 1.1, there are two women and three men. 140.05
NOT FOR
SupposeSALE
we enterOR theDISTRIBUTION
sex of Participant 1 (i.e., the participant NOT FOR SALE OR DISTRIBUTION 165.16
with identification number 1 whose data are in row 2 of the 170.39
Data worksheet) into the C2 cell as “F”. Rather than enter- 195.47
ing the sex of Participant 2 into cell C3 directly, we can copy
© Jones
the data from the C2 cell. First, we make the& Bartlett
C2 cell the Learning,
active LLCthat we want to present the
Suppose © Jones & the
weights to Bartlett
nearest Learning, LLC
cell by moving the cursorNOT to thatFOR
cell. SALE
We thenOR tenths
clickDISTRIBUTION
on place (i.e., round the weights to one
NOT FOR SALEdecimal place).
OR DISTRIBUTION
the Copy icon on the menu bar. To let us know that the This can be performed using the Format option on the main
contents of the active cell have been selected, Excel shows menu bar. We first highlight the range of cells we want to
the borders of the cell with a bold flashing dotted line (as format (in our example, cells D2 through D6). We then click
opposed to a bold solid line). We then move the cursor to the on the Format/Cells option. This is shown in Figure 1–4.
© Jones & Bartlett Learning, LLC
destination cell (e.g., C3) and click on the Paste icon. The Choosing the © Format/Cells
Jones & Bartlett Learning,
option brings LLC
up the dialog
NOT FOR SALE OR DISTRIBUTION
contents of cell C2 are copied and pasted into cell C3. NOT FOR SALE OR DISTRIBUTION
box shown in Figure 1–5, where we specify the format we want
The same idea can be used to copy the contents of one cell for the selected cells. In Figure 1–5, we select the Number for-
to several cells. Suppose we enter the sex of Participant 3 into mat from the category list on the left side of the dialog box
cell C4 as “M” and want to copy the contents of cell C4 in to and specify 1 decimal place. Once we click OK, the display in
the worksheet changes to the following:
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
cells C5 and C6. We make cell C4 the active cell and click on
the Copy icon. We then highlight the destination cells—in
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
this case, cells C5 and C6—simultaneously. To do this, we Weight
place our cursor on cell C5 (the top or first cell in the range) 125.5
and, holding the left mouse key down, we drag the cursor to 140.1
cell C6. This highlights both cells C5 and C6. We then click 165.2
© Jones & Bartlett
the Paste icon and the contents of cell C4 are copied into Learning, LLC 170.4 © Jones & Bartlett Learning, LLC
cells C5 and C6. NOT FOR SALE OR DISTRIBUTION 195.5 NOT FOR SALE OR DISTRIBUTION
To insert a row into a worksheet, we use the Insert/Row
command on the menu bar. Once we select the Insert/Row The Format option is particularly useful for formatting re-
option, a row is inserted above the active cell. The same ap- sults. For example, when we compute the mean or standard
proach can © Jones
be taken&toBartlett
insert a Learning, LLC the
column. Selecting © Jones
deviation of a sample, & Bartlett
Excel carries Learning,
more decimal LLC
places than
Insert/Column option from the menu bar inserts a column to we generally want to present. Recall that, as a general rule, we
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
the left of the active cell. report summary statistics using one more decimal place than
For reporting purposes, we often want to format data or the raw data. Reporting too many decimal places implies a
results for a consistent presentation. The Format option on false level of precision. We illustrate how to format results in
the main menu bar can be used to format the contents of any Chapter 4 through Chapter 9 of the Excel workbook.
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch01_001_008.qxd 3/23/11 3:34 PM Page 5

Entering and Editing Data 5

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 1–4 Formatting Data

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 1–5 Formatting Numeric Data


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch01_001_008.qxd 3/23/11 3:34 PM Page 6

6 Basics

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
1.4 SAVING FILES 1.5 PRACTICE PROBLEMS
The File/Save As option is used to save a workbook (and its as- 1. Use Excel to create a worksheet with the data shown
sociated worksheets). Figure 1–6 shows the commands. After in Table 1–2. The data were presented in Table 4–13
© Jones
selecting the File/Save As option, & Bartlett
Excel prompts Learning,
us to enter a LLCin the textbook and were measured
© Jones in a& Bartlett
subsample of Learning, LLC
NOT FOR SALE OR DISTRIBUTION
file name to store the workbook. Once a workbook is saved as n = 10 participants who NOT FOR SALE OR DISTRIBUTION
attended the seventh exam-
a file, we can open it using the File/Open option for further ination of the Framingham Offspring Study. Place
use. the variable names in the first row of the worksheet.

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
FIGURE 1–6 Saving the Workbook
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

TABLE 1–2 Data for Practice Problem 1


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
Participant ID NOTPressure
Systolic Blood FOR
SALE OR
Diastolic DISTRIBUTION
Blood Pressure Total Serum Cholesterol NOT(lb)
Weight FORHeight
SALE (in)OR
DISTRIBUTION
1 141 76 199 138 63.00
2 119 64 150 183 69.75
3 122 62 227 153 65.75
4 127 81 227 178 70.00
5© Jones & Bartlett
125 Learning, LLC70 © Jones & Bartlett
163 161 Learning,
70.50 LLC
6 123 72 210 206 70.00
7
NOT FOR SALE105
OR DISTRIBUTION 81
NOT FOR SALE
205
OR DISTRIBUTION
235 72.00
8 113 63 275 151 60.75
9 106 67 208 213 69.00
10 131 77 159 142 61.00

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch01_001_008.qxd 3/23/11 3:34 PM Page 7

Practice Problems 7

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
2. Rename the worksheet with the data from Problem 1 3. S-PLUS Version 7.0. © 1999–2006 by Insightful Corp., Seattle, WA.
4. Dretzke, B.J. Statistics with Microsoft Excel (3rd ed.). Upper Saddle River,
as Data.
NJ: Pearson Prentice Hall, 2005.
3. Save the Excel workbook as a file.

REFERENCES © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
1. SAS Version 9.1. NOT FOR
© 2002–2003 SALE
by SAS InstituteOR DISTRIBUTION
Inc., Cary, NC. NOT FOR SALE OR DISTRIBUTION
2. SPSS® Version 15.0. © 2006 by SPSS Inc., Chicago, IL.

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch01_001_008.qxd 3/23/11 3:34 PM Page 8

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch02_009_020.qxd 3/23/11 3:35 PM Page 9

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


CHAPTER 2
© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

Formulas, Functions, and the


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC

Data Analysis ToolPak


NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

Once data are entered into a Microsoft Office® Excel® work- specific operations into the cells of a worksheet. For exam-
sheet, we can create formulas and functions to organize, ma- ple, we use these operations to convert variables measured on
nipulate, and analyze© the
Jones & example,
data. For BartlettExcel
Learning,
can be usedLLCone scale to another or to© Jones
create & Bartlett
new variables fromLearning,
existing LLC
NOT
to create new variables fromFOR SALE
existing OR(e.g.,
variables DISTRIBUTION
to convert variables. NOT FOR SALE OR DISTRIBUTION
variables from one scale of measurement to another, to stan- In Example 1.1 of the Excel workbook, we presented data
dardize variables into z scores, or to create new variables from
on n = 5 participants. We measured age (in years), sex (male/
those that are measured directly) or to compute summary sta-
female), weight (in pounds), and height (in inches). The data
tistics (e.g., the mean, standard deviation, or median of a
for Participant 1 are shown in Table 2–2. Using Excel, we could
© Jones
dataset, or the&minimum
BartlettorLearning,
maximum values LLC in a set). © Jones & Bartlett Learning, LLC
convert age measured in years to age in months by multiplying
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
age in years by 12 (e.g., Agemonths  Ageyears  12). We could also
2.1 BASIC MATHEMATICAL OPERATIONS
Basic mathematical operations are performed in Excel as they convert weight in pounds to weight in kilograms using
are on a calculator or in other statistical computing packages. Weightkilograms  Weightpounds/0.4636. Excel can be used to make
In Excel, basic operations are implemented with the operators these transformations easily. We illustrate how this is done in
© Jones & Bartlett Learning,
shown in Table 2–1. LLC © Jones & Bartlett
Section 2.2. Learning, LLC
NOT FOR SALEThe ORorder
DISTRIBUTION
of operations is exponentiation first,NOT then FOR SALE OR DISTRIBUTION
Example 2.1. Consider a study designed to assess the im-
multiplication and division, and then addition and subtrac- pact of a medication designed to lower systolic blood pressure.
tion. To implement mathematical operations, we “program” Suppose we measure each participant’s baseline systolic blood
pressure (SBP) and their systolic blood pressure after 6 months
on treatment. The data on n  3 participants are entered into
© Jones & Bartlett Learning, LLCExcel and shown in Figure©2–1. Jones & Bartlett Learning, LLC
NOT FOROperators
TABLE 2–1 Mathematical SALE inOR DISTRIBUTION Notice that the distinct
Excel NOT FOR(e.g.,
variables SALE OR
ID and theDISTRIBUTION
base-
line and 6-month systolic blood pressures) are shown in the
columns and the data for each participant are shown in the
Operation Operator rows of the worksheet. To analyze these data, we use meth-
Multiplication *
© Jones & Bartlett Learning, LLC ods for©dependent,
Jones &matched,Bartlett or Learning,
paired samples LLCand focus
Division /
NOT FOR SALE OR DISTRIBUTION
Addition + NOT FOR SALE OR DISTRIBUTION
specifically on differences in blood pressures. For each par-
Subtraction - ticipant, we need to first compute the differences. We can
Exponentiation ^ take differences as

Difference  6-Month SBP  Baseline SBP


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch02_009_020.qxd 3/23/11 3:35 PM Page 10

10 Formulas, Functions, and the Data Analysis ToolPak

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
TABLE 2–2 Participant Data

© Jones
Subject Identification Number & Bartlett
Age Learning, Sex
LLC Weight (lb) © Jones & Bartlett
Height (in) Learning, LLC
1 NOT FOR SALE OR DISTRIBUTION
24 F 125 NOT FOR SALE OR DISTRIBUTION
63

2.2 © JonesAND
RELATIVE & Bartlett
ABSOLUTELearning,
CELL LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
REFERENCES sheet. Suppose weNOT FOR
want to createSALE OR DISTRIBUTION
a new variable, age in months,
and label it “Age, months.” We first choose a location for the
The operations described in Section 2.1 can be implemented variable. Suppose we want to place the new variable in column F
in Excel by “programming” the operations into cells in the of the worksheet. We enter the new name into row 1 of column
Excel worksheet. The programming amounts to specifying F, as shown in Figure 2–2. Age in months is computed by mul-
© Jones & Bartlett Learning, LLC
equations in Excel to perform the desired operations. We are
©tiplying
Jones & Bartlett Learning, LLC
age in years (which is contained in column B) by 12.
NOT FOR SALE
essentially ORnew
creating DISTRIBUTION
variables as functions of existing vari- NOT FOR the
Specifically, SALE ORtoDISTRIBUTION
formula create age in months is age (in
ables using specific operations (e.g., converting from one scale years)  12. This formula is entered into cell F2 as “B2*12”.
to another, creating difference scores). To implement these In this formula, B2 represents the address of the cell contain-
operations, we first choose a column location for the new vari- ing the age in years for Participant 1. Once the formula is en-
able and specify a name for the © new
Jones & Bartlett
variable. Learning,
The new variable LLC
tered, Excel takes the value from B2©(24 Jones & Bartlett
years) and multiplies Learning, LLC
name is placed in the first row NOT
of theFOR SALE
worksheet alongOR
withDISTRIBUTION
the NOT
it by 12. The result (288) is placed into cellFOR
F2. SALE OR DISTRIBUTION
other variable names. We then input the operation or formula To perform this operation for each participant, we copy the
to create the new variable. In Excel, these operations are indi- formula in cell F2 and paste it into cell F3 through cell F6. This
cated by an equals sign (“”). When Excel sees an equals sign is done by making F2 the active cell and clicking the Copy icon.
in a cell, it is expecting a formula to follow. The border of cell F2 is shown in a bold, flashing dotted line. We
© Jones
The formula & Bartlett
in a cell Learning,
is implemented LLC
to produce the de- ©F3Jones
next highlight cell through&cell
Bartlett Learning,
F6 and click LLC
the Paste icon.
NOT FOR SALE OR DISTRIBUTION
sired result, which is placed into that cell. Figure 2–2 shows NOT FOR SALE OR DISTRIBUTION
The formula is copied from cell F2 into cell F3 through cell F6,
the data from Example 1.1 of the Excel workbook in a work- and Excel automatically updates the cell referencing (i.e., the lo-

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 2–1 Systolic Blood Pressure


© JonesMeasured
& Bartlett Learning,
at Baseline LLCLater
and 6 Months © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch02_009_020.qxd 3/23/11 3:35 PM Page 11

Relative and Absolute Cell References 11

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 2–2 Creating New Variables

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

cations of the cells that contain the ages in years for each par- erences are called relative cell references. The formula to com-
ticipant). Specifically, when we enter the formula to compute pute age in months for each participant uses the relevant in-
© Jones1 &
age in months for Participant Bartlett
into Learning,
F2, we specify that ExcelLLCformation, the age in years ©forJones & Bartlett
that participant, Learning,
contained in LLC
NOT FOR SALE OR DISTRIBUTION
should take the data in cell B2 and multiply it by 12. We want to column B. NOT FOR SALE OR DISTRIBUTION
do the corresponding for the remaining participants: for each Consider again the data in Example 2.1 shown in Figure
participant, we want to multiply their age in years by 12. When 2–1. Suppose we now want to create the difference variable.
we copy and paste the formula from cell F2 into cell F3 through Figure 2–4 shows the new variable (column D) and the formula
cell F6, Excel updates the cell references as shown in Figure 2–3. to compute it in cell D2. The difference score is computed
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
Excel automatically updates the formula to compute age by subtracting the baseline SBP (column B) from the
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
in months for Participant 2 through Participant 5 by updating SBP measured at 6 months (column C). The formula for
the cell references (i.e., to cell B3 through cell B6). These ref- Participant 1 is “C2B2”. If we copy the contents of cell D2

© Jones & Bartlett Learning,


FIGURELLC © Jones & Bartlett Learning, LLC
2–3 Using Relative Cell References
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch02_009_020.qxd 3/23/11 3:35 PM Page 12

12 Formulas, Functions, and the Data Analysis ToolPak

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 2–4 Computing Differences

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

into cell D3 and cell D4, Excel automatically updates the cell The lengths in centimeters are shown in column A. We stan-
referencing. The new formulas © Jones & Bartlett Learning, LLC
are “C3B3” and “C4B4”, dardize the lengths by subtracting the©mean
Jones & Bartlett
and dividing by the Learning, LLC
respectively. Once the formulas are entered, Excel computes
NOT FOR SALE OR DISTRIBUTION standard deviation: Z  (Length  NOT FOR SALEnew
75)/ 2.1. To create the OR DISTRIBUTION
the differences and the results are shown in Figure 2–5. variable Z, we enter the formula as shown in Figure 2–7. If we
Example 2.2. Suppose we measure the lengths (in cen- copy the formula from cell B2 into cell B3 through cell B7,
timeters) of n  6 infant boys who are 12 months of age. Excel updates the reference to cell A2 to use the lengths in cen-
Suppose we want to standardize the lengths by subtracting the timeters in cell A3 through cell A7, respectively.
© Jones & Bartlett Learning, LLC
mean and dividing by the standard deviation. The mean length © Jones
There is a second way to&perform
Bartlett theLearning, LLC
standardization.
NOT FOR SALE OR DISTRIBUTION
for 12–month-old boys is 75 centimeters and the standard NOT FOR SALE OR DISTRIBUTION
Suppose we enter the data into an Excel worksheet as shown in
deviation is 2.1 centimeters. The data are entered into an Excel Figure 2–8. The mean and standard deviation are now shown
worksheet as shown in Figure 2–6. in cell B9 and cell B10, respectively. We again create Z scores by

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
FIGURE 2–5 Difference Scores NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch02_009_020.qxd 3/23/11 3:35 PM Page 13

Relative and Absolute Cell References 13

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 2–6 Lengths of Boys 12 Months of Age

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 2–7 Standardizing the Lengths FIGURE 2–8 Lengths of Boys with Mean and
Standard Deviation
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
taking each length in column A, subtracting the mean of 75 and cell B2 is “(A2$B$9)/$B$10”. A dollar sign before a column
dividing by the standard deviation of 2.1. What we do here is or row in a cell address freezes or fixes that column or row so
refer Excel to the cells containing the mean and standard that, as opposed to relative addresses as in the previous exam-
deviation (i.e., cell B9 and cell B10) in the worksheet. If we ples, Excel is not allowed to update it. In this example, we are
© Jones & Bartlett Learning,
again place the Z scores LLC © Jones
in column B, the formula entered in & Bartlett
fixing Learning,
both the columns LLC
and rows of the addresses of the mean
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch02_009_020.qxd 3/23/11 3:35 PM Page 14

14 Formulas, Functions, and the Data Analysis ToolPak

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
and standard deviation. These are called absolute cell references. Figure 2–9 produce the results shown in Figure 2–10. The
Figure 2–9 displays the formulas that are copied into cell B3 boy of length 71 cm is 1.9 standard deviations below the mean,
through cell B7; notice that the cell addresses for the mean and whereas the boy of length 79 cm is 1.9 standard deviations
standard deviation do not change from cell to cell. above the mean. For presentation purposes, we format the cells
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
When we enter the formulas, the results are shown in in column B to two decimal places (using the Format/Cells
NOT FOR SALE OR DISTRIBUTION
column B. Both of the methods illustrated in Figure 2–7 and
NOT FOR SALE OR DISTRIBUTION
option; see Figure 1–4 and Figure 1–5).

FIGURE 2–9 Using Absolute Cell References


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 2–10 Standardizing Lengths

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch02_009_020.qxd 3/23/11 3:35 PM Page 15

Creating Formulas and Functions 15

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
2.3 CREATING FORMULAS AND FUNCTIONS BMI data, we first use the Format/Cells option to format the
BMIs to two decimal places (Figure 2–13).
We now describe how Excel is used to compute summary
Excel has a number of functions and formulas that can
statistics (e.g., 
X, s, median) using formulas and functions.
be used to summarize and analyze data. We now use Excel to
Example 2.3. In © many
Jones & Bartlett
studies Learning,
of cardiovascular diseaseLLC © Jones & Bartlett Learning, LLC
compute the sample mean BMI (i.e.,  X X / n) using these
(e.g., the Framingham NOT FOR SALE OR DISTRIBUTION
Heart Study), body mass index is NOT FOR SALE OR DISTRIBUTION
functions. We first sum the BMI scores and place the sum in cell
assessed as a risk factor. Body mass index (BMI) is defined as
H8. This is done with the SUM function. In cell H8, we enter
BMI  weightkilograms / height 2meters the formula “SUM(H2:H6)”. The SUM function sums the
Often weights are measured in pounds and heights in data in the cells listed in the range shown in parentheses. In this
© Jones & Bartlett Learning, LLC
inches. Thus, the observed measurements must be converted example, ©we Jones
want to& sumBartlett
the data inLearning, LLCcell H6.
cell H2 through
toNOT FORand
kilograms SALE OR
meters, DISTRIBUTION
respectively, and then divided to We then NOT FOR SALE OR DISTRIBUTION
compute the sample size using the COUNT function
produce BMI scores. In Example 1.1 of the Excel workbook, we and place the sample size into cell H9. In cell H9, we enter the
measured weight in pounds and height in inches in n  5 par- formula “COUNT(H2:H6)”. The COUNT function tallies
ticipants. Suppose we now want to create a BMI for each the number of cells with non-missing data. The sample mean
is computed by dividing the sum by the sample size, and we
© Jones & Bartlett
participant.Learning, LLC from pounds to kilograms
The conversion © Jones
is & Bartlett Learning, LLC
1 pound  DISTRIBUTION
0.4536 kilograms, and the conversion from inches place the sample mean into cell H10. Specifically, in cell H10
NOT FOR SALE OR NOT FOR SALE OR
we enter the formula DISTRIBUTION
“H8/H9”. We use column G for labels
to meters is 1 inch  0.0254 meters. The formula to compute
BMI from weight in pounds and height in inches is (Figure 2–13).
Suppose we now wish to compute the standard deviation of
weightpounds  0.4536

BMI  2 (X – X )2 ). We need to first subtract the mean
(heightinches  0.0254) BMI (i.e., s = 
© Jones & Bartlett Learning, LLC n – 1 © Jones & Bartlett Learning, LLC
Figure 2–11 shows NOT FOR SALE
the computation ORThe
of BMI. DISTRIBUTION
formula in BMI (in cell H10) from each NOT BMI FOR SALEtheOR
and square DISTRIBUTION
difference.
cell H2 can be copied to cell H3 through cell H6 to compute We place the squared differences in column I, and the for-
BMI for each participant. Notice the power operator “^2”, used mula entered in cell I2 is “(H2$H$10)^2” and then copied
to square the height in the denominator of the formula. Once into cell I3 through cell I6. We then sum the squared differ-
the formula is copied, the BMI scores are computed as shown ences using the SUM function (i.e.,“SUM(I2:I6)”) and place
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
in Figure 2–12. Before we compute summary statistics on the the result in cell I8. The variance is computed by dividing the
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning,


FIGURELLC
2–11 Computing BMI From©Height
Jones and & Bartlett Learning, LLC
Weight
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch02_009_020.qxd 3/23/11 3:35 PM Page 16

16 Formulas, Functions, and the Data Analysis ToolPak

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 2–12 Computing Body Mass Index

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT
FIGURE 2–13 FOR SALE
Computing ORMean
the Sample DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

sum of the squared differences by (n1). We compute the vari- computes a mean. First, we select a cell for the result. Suppose
ance in cell I11 as “I8/(H91)”. The standard deviation is then we wish to compute the mean age for the data shown in Figure
computed©as Jones & Bartlett
“SQRT(I11)”. Learning,
Figure 2–14 displays theLLC
results. 2–14, and we want © Jones & mean
to place the Bartlettage inLearning, LLC
cell B8. We enter
Like many
NOTstatistical computing
FOR SALE ORpackages, there are several
DISTRIBUTION the following intoNOT
cell B8: “AVERAGE(B2:B6)”.
FOR SALE OR DISTRIBUTION Once the for-
ways to compute summary statistics in Excel. One way is to mula is entered, the mean of the observations contained in cell
use the mathematical operations to program the formulas. A B2 through cell B6 is computed and placed in cell B8. The
second method is to use one of Excel’s many built-in functions AVERAGE function sums the data specified in parentheses
that directly compute summary statistics on a continuous (in the worksheet, the age data are in cell B2 through cell B6)
© Jones & Bartlett
variable. For example,Learning, LLC
Excel has an AVERAGE function that ©and
Jones
divides&byBartlett
the sampleLearning, LLCnumber of non-
size (i.e., the total
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch02_009_020.qxd 3/23/11 3:35 PM Page 17

The Data Analysis ToolPak 17

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 2–14 Computing the Sample Variance and Sample Standard Deviation

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
missing values). There are other functions available in Excel with the chapters from the textbook where these procedures are
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
that are useful for computing summary statistics; these are discussed in more detail. (Excel offers other procedures, but we
discussed in Chapter 4 of the Excel workbook. restrict our attention to only those discussed in the textbook.)
A third option for computing summary statistics is There are some analyses (e.g., chi-square tests) that are not
through the Excel Data Analysis ToolPak. The ToolPak offers available in the Data Analysis ToolPak. In addition to those in
a©number
Jones & Bartlett
of modules Learning,
designed to performLLC
various statistical © Jones & Bartlett Learning, LLC
NOT FOR
analyses. SALEthe
We introduce OR DISTRIBUTION
ToolPak here and use it extensively NOT FOR SALE OR DISTRIBUTION
in Chapter 4 through Chapter 9 of the Excel workbook to per-
FIGURE 2–15 The Data Analysis ToolPak
form statistical analysis.

2.4 THE DATA ANALYSIS TOOLPAK


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
The Excel Data Analysis
NOT FOR SALE OR DISTRIBUTION ToolPak is an additional module
NOT(or FOR SALE OR DISTRIBUTION
add-in) that must be loaded either at the time of instal-
lation of Excel or at a later date. If the ToolPak was loaded at
installation, it will be available as an option on the Tools menu
(Figure 2–15).
© Jones
If the Data Analysis & Bartlett
ToolPak Learning,
is not available, it canLLC © Jones & Bartlett Learning, LLC
be loaded at any time. NOT ThisFOR SALE
is done ORTools/Add-Ins
using the DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
option (Figure 2–15). Once the Add-Ins option is selected,
a dialog box appears with several additional modules, one
of which is the Data Analysis ToolPak. If we check the
© Jones
box & Bartlett
next to “Analysis Learning,
ToolPak” LLC and click on
(Figure 2–16) © Jones & Bartlett Learning, LLC
OK,
NOT theFOR
ToolPak will beOR
SALE added and available under the Tools
DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
menu.
The Data Analysis ToolPak can be used to perform many
statistical computations. Table 2–3 lists the analyses that are
available through the Data Analysis ToolPak that are discussed in
© Jones & Bartlett Learning,
the textbook. LLC
Table 2–3 lists the procedures alphabetically,© Jones &
alongBartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch02_009_020.qxd 3/23/11 3:35 PM Page 18

18 Formulas, Functions, and the Data Analysis ToolPak

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 2–16 Adding the Data Analysis ToolPak

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

n  10 participants who attended the seventh exam-


TABLE©2–3
Jones & Bartlett
Analyses Available Learning, LLC
in the Data Analysis © Jones & Bartlett Learning, LLC
ination of the Framingham Offspring Study. Place
ToolPakNOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
the variable names in the first row of the worksheet.
Analysis Chapter in Textbook 2. Compute two new variables for each participant,
Analysis of variance 7 body mass index (BMI) and mean arterial pressure
Descriptive statistics 4, 6 (MAP). The formulas for the variables are:
© JonesHistogram
& Bartlett Learning, LLC 4 © Jones & Bartlettweight
Learning, LLC
pounds  0.4536
NOT FOR
Regression
SALE OR DISTRIBUTION
9 
NOT FOR SALE OR
BMI  DISTRIBUTION
t Test: Paired two sample for means 7 (height  0.0254)2
inches
t Test: Two sample assuming equal variances 7
z Test: Two sample for means 7 (2  Diastolic Blood Pressure)  Systolic Blood Pressure
MAP  
3

© Jones & Bartlett Learning, LLC


3. Compute the sample size© Jones & Bartlett
for the MAP data in Learning, LLC
NOT FOR SALE OR DISTRIBUTION Problem 2 using the NOT FOR SALE the
COUNT function and store OR DISTRIBUTION
the ToolPak, Excel also offers many built-in statistical functions result in the Data worksheet.
(e.g., CHITEST for a chi-square test) that can be used to per- 4. Compute the mean MAP in Problem 2 by program-
form specific tests and procedures. As we discuss specific analy- ming the formula
X
© Jones
ses in Chapter &Chapter
4 through Bartlett11 ofLearning, LLC we
the Excel workbook, © Jones &
for the mean (i.e., X   ) and store the result in the
Bartlett
n Learning, LLC
present options Data worksheet.
NOTfor analysis
FOR SALEusing the
ORToolPak and Excel’s statis-
DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
tical functions. 5. Compute the standard deviation of the MAP values in
Problem 2 by programming the formula for the stan-
2.5 PRACTICE PROBLEMS dard deviation


2
1. Use Excel to create a worksheet, called Data, with
 (i.e., s  (X – X
) ) and store the result in the
© Jones & Bartlett Learning,
the data in LLC
Table 2–4. The data were presented in © Jones & BartlettnLearning,
–1 LLC
NOT FOR SALE
Table OR
4–13 DISTRIBUTION
and were measured in a subsample of NOT FOR DataSALE OR DISTRIBUTION
worksheet.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch02_009_020.qxd 3/23/11 3:35 PM Page 19

Practice Problems 19

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
TABLE 2–4 Data for Practice Problems

Participant ID ©Systolic
Jones &Pressure
Blood Bartlett Diastolic
Learning, LLC
Blood Pressure © Jones
Total Serum Cholesterol & (lb)
Weight Bartlett Learning,
Height (in) LLC
1 NOT FOR SALE OR DISTRIBUTION
141 76 199 NOT FOR 138SALE OR DISTRIBUTION
63.00
2 119 64 150 183 69.75
3 122 62 227 153 65.75
4 127 81 227 178 70.00
5 125 70 163 161 70.50
6 123 72 210 206 70.00
© Jones
7
& Bartlett Learning,
105
LLC 81
© Jones
205
& Bartlett Learning,
235
LLC
72.00
NOT FOR
8 SALE OR 113DISTRIBUTION 63 NOT FOR
275 SALE OR DISTRIBUTION
151 60.75
9 106 67 208 213 69.00
10 131 77 159 142 61.00

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch02_009_020.qxd 3/23/11 3:35 PM Page 20

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 21

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


CHAPTER 3
© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


Creating Tables
© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
and Graphs
NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

Microsoft Office® Excel® is very useful for creating tables and investigator must select the most appropriate statistics to sum-
graphs to summarize and present statistical information. Many marize key information.
investigators use Excel© Jones
to prepare&tables
Bartlett Learning,
and graphs for reports,LLC Example 3.1. Suppose © Jones
we conduct& Bartlett Learning, LLC
a cross-sectional
NOT FOROften,
presentations, and manuscripts. SALE OR DISTRIBUTION
presentations and man- study of 125 undergraduate NOT FORtoSALE
students estimateORthe DISTRIBUTION
preva-
uscripts are prepared in other packages (e.g., PowerPoint®, lence of cigarette smoking. Before presenting data on the
Word®) but tables and charts are prepared in Excel and then prevalence of smoking, the investigators wish to provide a
imported into those presentations or manuscripts. description of the study sample. Suppose that several back-
ground variables are analyzed (with Excel or with another
© Jones
3.1 & Bartlett
CREATING ANDLearning,
FORMATTING LLC TABLES © Jones & Bartlett Learning, LLC
statistical computing package). For each continuous vari-
NOT FOR SALE OR DISTRIBUTION
In Chapter 4 of the textbook, we presented a number of sta- NOTmeans
able, sample FORand SALE OR deviations
standard DISTRIBUTIONare produced,
tistics to summarize continuous, ordinal, and categorical vari- and for each ordinal and categorical variable, frequencies
ables. Investigators must determine which statistics most and relative frequencies are produced. The results are shown
accurately and completely describe sample data. For example, in Table 3–1.
© Jones & Bartlett Learning,
for continuous variablesLLC
we can compute the sample mean, © Jones & TableBartlett
3–2 was Learning,
developed LLC
in Excel to present the infor-
median, and mode
NOT FOR SALE OR DISTRIBUTION to describe central tendency and theNOT FOR SALE OR DISTRIBUTION variables, we present
sam- mation shown here. For the continuous
ple range, interquartile range, variance, and standard devia- means and standard deviations rounded to one decimal place.
tion to describe variability. For ordinal and categorical For ordinal and categorical variables, we present the fre-
variables, we can compute frequencies, relative frequencies, quencies and relative frequencies. The table can be copied
and cumulative relative frequencies (appropriate for ordinal from Excel into a Word document for presentation.
© Jones
variables). For presentation & Bartlett
purposes, we mustLearning,
decide whichLLC It is always important©toJones include a&clear
Bartlett Learning,
and concise title LLC
statistics to presentNOTand how.FOR SALE OR DISTRIBUTION NOT
in a table. It is also important to FOR
specify SALE OR DISTRIBUTION
clear variable names
In almost all research reports, investigators include a de- with appropriate units. Finally, the data (i.e., summary statis-
scription of the study sample. The description usually includes tics) presented in the table must be clearly defined. Table 3–2
socio-demographic or background characteristics (e.g., age, was prepared in Excel, as shown in Figure 3–1.
© Jones & Bartlett Learning, LLC
gender, educational level) and might include data to describe ©title
The Jones & Bartlett
is entered Learning,
in the first LLC To ac-
row of the table.
clinical
NOT FOR historySALE
(e.g., prevalent disease, symptom severity at
OR DISTRIBUTION commodate
NOTthe FOR length of theOR
SALE titleDISTRIBUTION
and lengths of variable
the start of the study). With a cross-sectional or cohort study, names, the widths of the columns (e.g., A, B) in the worksheet
the description is often based on the full sample or cohort. In can be increased. It is not necessary to enlarge column A to
clinical trials, descriptions are usually provided for each treat- accommodate all of the long title because the title is the only
ment group, considered separately. Regardless of whether data item in the first row. Thus, to accommodate the length of the
© Jones & Bartlett Learning,
are presented LLC
for a single group or for separate groups, © the
Jonestitle,
& Bartlett
we merge cell Learning,
A1 and cell LLC
B1 into one larger cell. This is
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 22

22 Creating Tables and Graphs

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
TABLE 3–1 Summary Statistics on Background Variables

Age © Jones & Bartlett Learning,


  19.567LLC
X © Jones & Bartlett Learning, LLC
s  1.867
Sex NOT FOR SALE OR DISTRIBUTION
Men NOTWomen
FOR SALE OR DISTRIBUTION
79 (63.2%) 46 (36.8%)
Year in school Freshmen Sophomore Junior Senior
35 (28.0%) 41 (32.8%) 31 (24.8%) 18 (14.4%)
Number of hours of   5.821
X s  2.989
© per
exercise Jones
week & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
Weight   165.352
X s  14.857
Height
NOT FOR SALE OR DISTRIBUTIONX  67.463 NOT FOR SALE OR DISTRIBUTION
s  4.655

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE
TABLE 3–2OR DISTRIBUTION
Description of Study Sample NOT FOR SALE
FIGURE 3–1 OR DISTRIBUTION
Description of Study Sample

Characteristic Mean (SD) or n (%)


Age (years) 19.6 (1.9)
Gender © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
Men NOT FOR SALE OR DISTRIBUTION
79 (63%) NOT FOR SALE OR DISTRIBUTION
Women 46 (37%)
Year in School
Freshmen 35 (28%)
Sophomore 41 (33%)
Junior 31 (25%)
Senior
© Jones & Bartlett Learning, LLC
18 (14%)
© Jones & Bartlett Learning, LLC
NOT FOR
Exercise per week (h) SALE OR DISTRIBUTION
5.8 (3.0) NOT FOR SALE OR DISTRIBUTION
Weight (lb) 165.4 (14.9)
Height (in) 67.5 (4.7)

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

done by first entering the title into cell A1. Because the title is
longer than the width of the A1 cell, the title runs across into
© Jones
cell B1 through cell D1, as shown in Figure& 3–2.
Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
Before we merge cells in NOT FOR
the first rowSALE OR DISTRIBUTION
to accommodate We now merge cell A1 and cell NOT
B1 toFOR SALE the
accommodate OR DISTRIBUTION
the title, we first resize column A and column B to accom- title. This is done by highlighting cell A1 and cell B1 and click-
modate the variable names, units, and the descriptive sta- ing on the Merge and Center icon shown on the menu bar
tistics. To increase the width of column A, we place the (see Figure 3–3). The result is shown in Figure 3–4. We now
cursor on© theJones
vertical &
lineBartlett Learning, LLC
between column A and column B. enter the variable © names,
Jonesunits,
& Bartlett Learning,
and the summary LLC
statistics
The cursor changes shape to a bold
NOT FOR SALE OR DISTRIBUTION cross with arrows run- into column A and column B.
NOT FOR SALE OR DISTRIBUTION
ning right and left. Clicking and holding the mouse while Excel can be used to create tables for presentation pur-
moving the cursor to the right or to the left then increases or poses, and it offers options similar to those offered in Word for
decreases the width of column A. We do the same for col- formatting text in terms of font size, type, justification, and so
umn B, using the vertical line between column B and column on. Excel also has options to round numeric information—
© Jones & Bartlett
C. Figure 3–3 shows Learning,
the widenedLLCcolumn A and column B. ©for Jones
example, &toBartlett Learning,
round statistics to one orLLC
two decimal places.
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 23

Creating and Formatting Tables 23

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–2 Entering the Title for the Table

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT
FIGURE 3–3 Increasing the Widths of the FOR SALE OR DISTRIBUTION
Columns

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 3–4 Merging Cell A1 and Cell B1


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
Suppose
NOT FOR we compute
SALE the ORmean and standard deviation of
DISTRIBUTION Once
NOTthe Format/Cells
FOR SALEmenu ORoption is selected, a dialog
DISTRIBUTION
several continuous variables as shown in Figure 3–5. For box appears. Various options are available for formatting. To
presentation, we wish to display the summary statistics to one format the numerical values to one decimal place, we select the
decimal place. This can be done by first highlighting the desired “Number” category and then the desired number of decimal
range of cells (B2 through C5), and then selecting the places. Here we choose 1 decimal place, as shown in Figure 3–7.
© Jones & Bartlett Learning,
Format/Cells LLC3–6).
option (Figure © JonesOnce
& Bartlett Learning,
we click OK, LLC to one decimal place.
the data are formatted
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 24

24 Creating Tables and Graphs

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–5 Summary Statistics for Continuous FIGURE 3–6 Formatting Summary Statistics
Variables

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

©FIGURE
Jones 3–7
& Bartlett Learning,
Formatting LLC
Numeric Data © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 25

Frequency Distribution Tables 25

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
3.2 FREQUENCY DISTRIBUTION TABLES different file. The second detail concerns the type of report we
Excel has a built-in menu option that can be used to create wish to produce. The default response is checked and is a pivot
frequency tables for presenting information. This is especially table. This again is the appropriate response for a frequency
useful for ordinal © andJones & Bartlett
categorical variables.Learning,
The option isLLCdistribution table. Once we© click
JonesNext,&weBartlett Learning,
are presented with LLC
illustrated below. NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
a second dialog box, as shown in Figure 3–11.
Example 3.2. Suppose we have a small study of n  10 Excel then asks specifically for the range of cells contain-
patients with rheumatoid arthritis, and we record their gen- ing the study data. We want to generate a frequency distribution
der and the severity of their symptoms of arthritis. The data table for gender, so we specify cell B1 through cell B11. Notice
are shown in Figure 3–8. Suppose we want to present the that we include cell B1, which actually contains the name of
© Jones & Bartlett Learning, LLC
gender distribution of the sample in a frequency distribution
© Jones
the variable & Bartlett
as opposed Learning,
to data (Figure 3–12). LLC
NOTWeFOR
table. SALE
first select theOR
DataDISTRIBUTION
option on the main menu bar OnceNOT FORNext,
we click SALEwe areOR DISTRIBUTION
presented with a third dia-
and then the Pivot Table and Pivot Chart Report option, as log box where Excel asks for the location for the resultant
shown in Figure 3–9. Once we select this option, Excel opens frequency distribution table (see Figure 3–13). We need
the Pivot Table and Pivot Chart Wizard, as shown in Figure only to specify the cell address for the top-left corner of
© Jones & Bartlett
3–10. Learning, LLC © Jonesthe&frequency
Bartlettdistribution
Learning, LLC
table. We request that Excel place
NOT FOR SALE OR DISTRIBUTION
The wizard essentially asks for the details necessary NOT FOR SALE OR DISTRIBUTIONdistribution table in
to gen- the top-left corner of the frequency
erate the frequency distribution table. The first detail concerns cell E1. (Note that we could have checked the first option
the data. Excel asks where the data reside, and the default re- and requested that the frequency distribution table be
sponse is a Microsoft Office Excel list or database, and this re- placed in a different worksheet.) When we click Finish, Excel
sponse is already checked. This response applies when the data sets up a template in the current worksheet for the frequency
© Jones
are in an Excel worksheet, & Bartlett
as is the Learning,
case here. Thus, we do notLLCdistribution table. The template
© Jones & Bartlett
is shown in FigureLearning,
3–14. LLC
NOT FOR SALE OR DISTRIBUTION
need to modify the default response. We would choose an NOT FOR SALE OR DISTRIBUTION
Notice that the top-left corner of the template is in cell E1.
alternate response if, for example, the data were stored in a

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR
FIGURE SALE
3–8 OR DISTRIBUTION
Data on n10 Patients with
NOT FOR SALE OR DISTRIBUTION
Rheumatoid Arthritis
FIGURE 3–9 Creating a Frequency Distribution Table

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 26

26 Creating Tables and Graphs

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 3–10 Creating a Frequency Distribution Table


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–11 Creating a Frequency Distribution Table

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 27

Frequency Distribution Tables 27

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 3–12 Creating a Frequency Distribution Table


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALEFIGURE
OR DISTRIBUTION
3–13 Creating a Frequency DistributionNOT
Table FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 28

28 Creating Tables and Graphs

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–14 Creating a Frequency Distribution Table

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlettto Learning,


In addition LLC in the background), ©the
the template (shown Jones & Bartlett
data variable would beLearning, LLC
different from the row variable).
Excel also displays the
NOT FOR SALE OR DISTRIBUTION variable we specified for the analysis (in This is done by selecting Data
NOT FOR SALE OR DISTRIBUTION Area from the drop-down list
this case, sex) in the pivot table field list. To produce the fre- at the bottom of the dialog box and clicking Add To. This is
quency distribution table, we must specify that sex is the row shown in Figure 3–16. Once we add sex to the data area, the
variable for the table and, in addition, that sex is the data vari- frequency distribution table is generated by Excel (Figure
able. We first specify that sex is the row variable by clicking the 3–17).
Add To button in the dialog © box.Jones
The sex& Bartlett
variable Learning, LLC
is specified © Jones
Using the same sequence of steps, & Bartlett
we can also generate a Learning, LLC
NOT
in the list and we are requesting FOR
that Excel SALE
use sex inORthe DISTRIBUTION
row NOT severity.
frequency distribution table for symptom FOR SALE OR DISTRIBUTION
The results
area. Once we click Add To, Excel automatically makes sex the are shown in Figure 3–18. In the specifications, we requested
row variable in the frequency distribution table, as shown in that Excel place the top-left corner of the frequency distribu-
Figure 3–15. tion table in cell E7.
We now© Jones & Bartlett Learning, LLC
need to specify that sex is also the data vari- Suppose we© Jones
also want to & Bartlett
present relativeLearning, LLC
frequencies. The
able. TheNOTdata variable is the variable
FOR SALE OR DISTRIBUTION that will be summa- relative frequencies of females and males are
NOT FOR SALE OR DISTRIBUTION computed by en-
rized. For a frequency distribution table on one variable, the tering “F3/$F$5” and “F4/$F$5” into cell G3 and cell G4,
row and data variable are the same. In other instances, it respectively. The same can be done for the symptom severity
may be of interest to summarize a second variable (e.g., com- data using “F9/$F$12” in cell G9 and copying to cell G10
pute frequencies of symptom severity by sex, in which case and cell G11. The results are shown in Figure 3–19.
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 29

Frequency Distribution Tables 29

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–15 Specifying the Variable for the Frequency Distribution Table

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–16 Selecting the Data Variable

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 30

30 Creating Tables and Graphs

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–17 Frequency Distribution Table for Sex

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

3.3 HISTOGRAMS AND © Jones & Bartlett Learning, LLC


BAR CHARTS Similar to the wizard we used © to Jones & Bartlett
create frequency distri- Learning, LLC
NOT FOR SALE OR DISTRIBUTION
Excel is very powerful for generating graphical displays. NOT FOR SALE OR DISTRIBUTION
bution tables in the preceding section, Excel has a Chart Wizard
Investigators often run statistical analyses in other packages, that is very useful for generating graphical displays. The Chart
such as SAS®, and then use Excel to produce graphical displays Wizard can be accessed through the graphic icon on the main
of the statistical results. There are several ways to generate toolbar, as shown in Figure 3–20. Clicking on the Chart Wizard
histograms © Jones
and bar & Bartlett
charts Learning,
for ordinal LLC
and categorical opens the dialog ©boxJones
shown in&Figure
Bartlett
3–20, Learning, LLC
which offers vari-
variables, NOT
respectively. We describe two techniques.
FOR SALE OR DISTRIBUTION The first ous options for NOT
graphical displays. Excel offers
FOR SALE OR DISTRIBUTION a number of
is a follow-on to the Data/Pivot Table and Pivot Chart Report standard graphical displays as well as some custom displays
menu option, and the second uses the Excel Chart Wizard. (under the respective tabs). We first illustrate how to produce

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOTTable
FIGURE 3–18 Frequency Distribution FORforSALE OR DISTRIBUTION
Symptom
Severity

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 31

Histograms and Bar Charts 31

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–19 Adding Relative Frequencies

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

a graphical display following the Data/Pivot Table and Pivot selecting (highlighting) the frequency distribution for symp-
© Jones & Bartlett Learning, LLCtom severity and then clicking
Chart Report option, and then using the Chart Wizard.
© Jones & Bartlett Learning, LLC
on the Chart Wizard (Figure
NOT FOR SALE OR DISTRIBUTION
In Example 3.1 of the Excel workbook, we used the NOT FOR
3–21). Once we click on the Chart Wizard, SALEa new OR DISTRIBUTION
worksheet is
Data/Pivot Table and Pivot Chart Report to generate frequency generated, as shown in Figure 3–22. The new worksheet is called
distribution tables for participant’s sex and symptom severity. Chart1. (This is the default name, but it can be changed.)
Once a frequency distribution table is produced, we can easily There are a number of options available for formatting
© Jones & Bartlett Learning, LLC
generate a graphical display. Suppose we want to generate a his- the © Jones
display. First, we &
canBartlett
hide someLearning, LLC
of the templates and labels
NOT for
togram FOR SALE OR
the distribution DISTRIBUTION
of symptom severity. This is done by NOT
that Excel hasFOR
placedSALE
on the OR DISTRIBUTION
display. This can be done by

FIGURE 3–20 The Chart Wizard


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 32

32 Creating Tables and Graphs

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–21 Creating a Histogram with the Chart Wizard

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–22 Display with Dafault Options

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 33

Histograms and Bar Charts 33

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
placing the cursor over the label,“Count of Symptom Severity.” Notice in Figure 3–24 that we also changed the title. This
If we right-click while over this label, several options are avail- is done by double-clicking on the default title (“Total”); when
able (Figure 3–23). When we click on Hide Pivot Chart Field the default title is highlighted, we enter a new title (“Frequency
Buttons, the field labels are removed, producing the display Histogram of Symptom Severity”). By default, Excel generates
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning,
LLC
shown in Figure 3–24. a bar chart. Because symptom severity is an ordinal variable
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 3–23 Formatting Options

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 3–24 Removing Field Labels

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 34

34 Creating Tables and Graphs

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
and not a categorical variable, we want to display a histogram and diastolic blood pressures were measured as continuous vari-
(i.e., the adjacent bars should run together). To change the bar ables and organized into ordinal categories. Table 4–5 in the
chart into a histogram, we double-click on one of the three textbook showed a frequency distribution table for the ordinal
bars (double-clicking on any of the three will produce the same blood pressure variable and is shown here as Table 3–3.
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
result). Double-clicking opens the menu of formatting options We want to generate a relative frequency histogram to
shown in Figure 3–25.
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
present the blood pressure data. We first enter the informa-
To convert the bar chart to a histogram, we select the tion shown in Table 3–3 into Excel. Figure 3–29 contains the
Options tab, shown in Figure 3–26. The default gap width
between bars is 150. To produce a histogram, we change the gap
width to 0 © Jones
(Figure &Once
3–27). Bartlett
we clickLearning, LLC the
OK, Excel generates © JonesDistribution
TABLE 3–3 Frequency & Bartlett Learning,
Table for Blood LLC
histogramNOT shownFOR SALE OR DISTRIBUTION
in Figure 3–28. NOT FOR SALE OR DISTRIBUTION
Pressure Categories
Excel offers many options for formatting graphical
displays. We describe only a few here. A second method for Relative
generating graphical displays uses the Chart Wizard directly Blood Pressure Frequency Frequency (%)
(i.e., does not involve first using the Data/Pivot Table and Normal 1206 34.1
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning,
Pre-hypertension 1452 LLC 41.1
Pivot Chart Report).
NOT FORExample
SALE3.3. OR DISTRIBUTION
In Example 4.2 in the textbook, we presented NOT FOR
Stage SALE OR DISTRIBUTION
I hypertension 653 18.5
Stage II hypertension 222 6.3
data from the seventh examination of the offspring in the Total 3533 100.0
Framingham Heart Study on blood pressure (n  3539). Systolic

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–25 Formatting the Bars

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 35

Histograms and Bar Charts 35

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–26 Reducing the Gap Between Bars

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–27 Setting the Gap Width to Zero

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 36

36 Creating Tables and Graphs

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–28 Frequency Histogram for Symptom Severity

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

data. To generate the relative frequency histogram, we click the option shown in black in Figure 3–30) under Chart Subtype.
© Jones
Chart Wizard & Bartlett
icon on the Learning,
menu bar. This LLC box
opens the dialog © Jones
Once we click Next, & asks
Excel then Bartlett
for the Learning, LLC
range of the data
NOT
with various FOR
options SALE
shown OR 3–30.
in Figure DISTRIBUTION NOT3–31).
for the display (Figure FOR SALE OR DISTRIBUTION
The first option is a column chart, and Excel can generate We specify both the location of the response labels (to
various forms of this display (e.g., one-dimensional, three-di- label the bars) and the location of the relative frequencies. The
mensional). Suppose we click the top-left option (the default response labels are in column A and the relative frequencies are

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–29 Blood Pressure Data

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 38

38 Creating Tables and Graphs

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
in column C. The data are specified as “A1:A5,C1:C5.” The first for labels and titles. The chart title is entered first, followed
range contains the response labels and the second range con- by labels for the x- and y-axis of the display (Figure 3–34).
tains the relative frequencies (Figure 3–32). At this stage, we can also format other aspects of the display
When we click Next, Excel provides a snapshot of the by selecting other tabs that are shown in Figure 3–34.
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
display and opens a dialog box with a number of format- However, if we click Next, Excel asks for the location of the
NOT FOR SALE OR DISTRIBUTION
ting options (Figure 3–33). The first tab contains options display (Figure 3–35).
NOT FOR SALE OR DISTRIBUTION

FIGURE 3–32 Specifying the Data Range


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 3–33 Formatting Options


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 39

Histograms and Bar Charts 39

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–34 Specifying a Title and Labels for the Axes

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 3–35 Specifying a Location for the Display


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
We can place the display in a new worksheet—in which Excel generates the new worksheet containing the display
case we provide a name for the new worksheet—or we (Figure 3–36).
can place the display in the current worksheet. In Figure 3–35, We can continue to format the display. We first reduce
© Jones & Bartlett
we specifyLearning,
that we wouldLLClike the display placed©inJones a the&gap
Bartlett
betweenLearning,
adjacent bars LLC
to zero by double-clicking on
NOT FOR SALE OR DISTRIBUTION
new worksheet called Histogram. We click FinishNOT and FOR
any SALE ORwhich
of the bars, DISTRIBUTION
opens the dialog box shown in Figure

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


40 Creating Tables and Graphs
95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 41

Scatter Diagrams 41

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–37 The Relative Frequency Histogram for Blood Pressure Categories

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 3–38 Data for Scatter Diagram type is selected. Once we click Next, Excel displays a tem-
plate of the scatter diagram along with a dialog box in which
© Jones & Bartlett Learning, LLC © Jonesthe&range
Bartlett
of the Learning, LLC(Figure 3–40). Note that
data are indicated
NOT FOR SALE OR DISTRIBUTION the data range is
NOT FOR SALE OR DISTRIBUTIONautomatically filled (we actually selected
the data range before invoking the Chart Wizard).
We click Next and are presented with a template of the
scatter diagram and options for formatting (Figure 3–41). We
can input a title as well as labels for the x (horizontal) and y
© Jones & Bartlett Learning, LLC(vertical) axes. We can use ©the
Jones & Bartlett
other options Learning,
to remove the LLC
NOT FOR SALE OR DISTRIBUTION NOT we
gridlines and the legend. When FORclickSALE ORasks
Next, Excel DISTRIBUTION
for
a location for the scatter diagram (Figure 3–42).
We can place the scatter diagram in a new worksheet—in
which case we provide a name for the new worksheet—or we
© Jones & Bartlett Learning, LLC © the
can place Jones
display&inBartlett
the currentLearning,
worksheet. In LLC
Figure 3–42,
NOT FOR SALE OR DISTRIBUTION we specify
NOT thatFOR
we would like the
SALE ORdisplay placed in a new work-
DISTRIBUTION
sheet called Scatter Diagram. We click Finish and Excel gen-
erates the new worksheet containing the display (Figure 3–43).
Excel automatically scales the x- and y-axes from zero
to a value larger than the maximum value in the dataset. There
© Jones & Bartlett Learning, LLC © Jonesare&many
Bartlett
variablesLearning, LLCminimum is much larger
whose theoretical
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 42

42 Creating Tables and Graphs

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–39 Using the Chart Wizard to Generate a Scatter Diagram

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–40 Template for Scatter Diagram

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 43

Scatter Diagrams 43

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–41 Formatting Options

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–42 Specifying a Location for the Chart

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch03_021_046.qxd 3/23/11 3:36 PM Page 45

Practice Problems 45

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 3–45 Scatter Diagram of BMI and SBP

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
major unit represents the distance between tick marks shown 2. For the data in Table 3–4, generate a frequency bar
NOT
on that FOR
axis. WeSALE OROK.
then click DISTRIBUTION
After following the same se- NOT FOR SALE OR DISTRIBUTION
chart for gender using the Pivot Table and Pivot
quence for the y-axis (SBP) and changing the background from Chart Report option.
grey to white (i.e., double-clicking anywhere in the back- 3. For the data in Table 3–4, generate a frequency his-
ground, which opens a dialog box where we can either select togram for year in school using the Pivot Table and
© Jones & Bartlett Learning,
another color or choose LLC
None under the Area section for© Jones
no & Bartlett Learning,
Pivot Chart option. LLC
background color).
NOT FOR SALE OR DISTRIBUTION We also remove the horizontal lines by 4. For the data
NOT FOR SALE OR DISTRIBUTIONin Table 3–4, create a frequency distri-
right-clicking on any line and choosing Delete (or Clear). The bution table for drinking status, defined by the fol-
scatter diagram is as shown in Figure 3–45. lowing numbers of drinks per night:
The scatter diagram illustrates the positive association
between body mass index and systolic blood pressure. The Abstinent 0
© Jones & Bartlett Learning,
scatter diagram can now be copied from Excel into a manu- LLC ©
Light Jones & Bartlett
1–3 Learning, LLC
NOT FOR SALE OR DISTRIBUTION
script, report, or presentation. NOT FOR SALE OR DISTRIBUTION
Moderate 4–5
Heavy 6 or more
3.5 PRACTICE PROBLEMS
5. For the data in Table 3–4, generate a frequency his-
The data in Table 3–4 were measured in n  15 col-
1.
togram for drinking status using the Chart Wizard.
© Joneslege& Bartlett
seniors Learning, LLC
in a cross-sectional study of alcohol
6. ©For
Jones
the data&inBartlett
Table 3–4,Learning, LLC
generate a scatter diagram
consumption. Each
NOT FOR SALE OR DISTRIBUTION participant was asked their NOT FORtheSALE OR between
DISTRIBUTION
to display association age at first drink
gender, year in school, the age at which they first
and number of drinks per night. (Note that the sam-
consumed alcohol, and the number of alcoholic
ple size for analysis is n  13.)
drinks they consume on a typical drinking night.
Generate frequency distribution tables for gender and
© Jones & Bartlettyear
Learning, LLCthe “Data/Pivot Table and©Pivot
in school using Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
Chart Report” option on the menu bar. NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch04_047_054.qxd 3/23/11 3:37 PM Page 47

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


CHAPTER
© Jones & Bartlett Learning, LLC
4
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

Summarizing Continuous
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC

Variables in a Sample
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

In Chapter 4 of the textbook, we presented summary statis- examination of the offspring in the Framingham Heart Study.
tics for dichotomous, ordinal, categorical, and continuous The data values were presented in Table 4–13 of the text and
variables. We discussed © Jones & Bartlett
both numerical and grap Learning,
hical sum- LLCare shown here in Table 4–1© . Jones & Bartlett Learning, LLC
maries. Numerical NOT FOR SALE
summaries OR
fo r co n tinuo DISTRIBUTION The means, standardNOT
u s variables deviatioFOR
ns, andSALE OR DISTRIBUTION
other statistics can
include the sample mean, standard deviation, median, and be computed using the Descriptive Statistics Analysis Tool
quartiles. Numerical summaries for ordinal and categorical in the Data Analysis To o l Pak. Figure 4– 1 show s the data
data use frequency distributio n tables, and these were entered into an Excel wo r ksheet. We now use the Data
discussed in detail in Chapter 3 of the Microsoft Office® Analysis ToolPak to generate descriptive statistics on each
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
Excel® workbook. In Chapter 3 of the Excel workbook, we continuous variable. We begin with body mass index (BMI).
NOT FOR SALE OR DISTRIBUTION
also presented several graphical displays for sample data. When NOT FORtheSALE
we select OR Analysis
Tools/Data DISTRIBUTION
option from the
Numerical summaries for continuous variables measured in menu bar, a dialog box with the various analysis tools appears
a sample are discussed here. (Figure 4–2). From the list of analysis to ols, we choose
The most appropriate numerical summaries for contin- Descriptive Statistics. We are then presented with a second
© Jones & Bartlett Learning,
uous variables LLC or not there are outliers. © Jonesdialo
depend on whether &gBartlett
box. Figure 4–Learning, LLC
3 displays this dialo g box, which requires
Regardless o f
NOT FOR SALE OR DISTRIBUTION whether there are o u tliers, the summary sho u ld input for the descrip
NOT FOR SALE OR DISTRIBUTION t ive statistics.
always include the sample size, a measure of central tendency In the dialog box, we must provide the location of the
or a typical value (e.g., the mean or median), and a measure data values we wish to summarize. This is requested under
of variability (e.g., standard deviation or interquartile range). Input Range. In the Input Range text field, we specify cell G1
Here we generate summaries that include numerous statistics; through cell G11. The data actually reside in cell G2 through
the investigator must ©choJones & Bartlett
ose those that are most apprLearning,
opriate LLCcell G11, with the variable © name
Jones & Bartlett
in cell G1. We must Learning,
check LLC
to summarize a particularNOTcharacteristic.
FOR SALE OR DISTRIBUTION the box to indicate that the NOT FOR name
variable SALEo r OR label DISTRIBUTION
is in
the first row. By doing so, Excel pr ints the variable name
4.1 THE DESCRIPTIVE STATISTICS o n the o u tp u t. If we do no t check the box to indicate
ANALYSIS TOOL that the variable label is in the first row, then we must spe-
© Jones & Bartlett Learning, LLC
Here we generate summary statistics on a continuo u s variable cify the©inpJonesut range & as G“ Bartlett
2:G11”. InLearning,
the next sectio LLCn of the
using
NOTtheFOR Descrip tive Statistics
SALE Analysis Tool available in the
OR DISTRIBUTION dialog box , we speFOR
NOT cify where
SALE we would
ORlikeDISTRIBUTION
the results placed.
Excel Data Analysis ToolPak. The options are to place the output in the current work-
Example 4.1. In Example 4.3 in the textbook, we analyzed sheet, in a new wor ksheet within the same wor kbook, or
data from a subset of n  10 ap rticipants attending the seventh in a new wor kbook. If we select Output Range, we must

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch04_047_054.qxd 3/23/11 3:37 PM Page 48

48 Summarizing Continuous Variables in a Sample

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
TABLE 4–1 Subsample of n = 10 Participants Attending the Seventh Examination of the Framingham Offspring
Study

Participant © JonesDiastolic
Systolic Blood & Bartlett
Blood Learning, LLC
Total Serum © Jones & Bartlett Learning, LLC
ID PressureNOT FORPressure
SALE OR DISTRIBUTION
Cholesterol Weight (lb) NOT
HeightFOR
(in) SALE
BMIOR DISTRIBUTION
1 141 76 199 138 63.00 24.4
2 119 64 150 183 69.75 26.4
3 122 62 227 153 65.75 24.9
4 127 81 227 178 70.00 25.5
5 © Jones & Bartlett Learning,
125 70 LLC 163 © Jones & Bartlett Learning,
161 70.50 22.8LLC
6 123 72 210 206 70.00 29.6
7 NOT FOR105SALE OR DISTRIBUTION
81 205 NOT
235 FOR SALE72.00
OR DISTRIBUTION
31.9
8 113 63 275 151 60.75 28.8
9 106 67 208 213 69.00 31.5
10 131 77 159 142 61.00 26.8

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 4–1 Data for Analysis Entered into Excel


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

specify the cell address for the top-left corner of the table in the dialog box. The results of the analysis are show n in
co n taining©the
Jones
results&(e.g.,
Bartlett
I1). InLearning,
Figure 4– 3 , weLLC
select Figure 4–4. © Jones & Bartlett Learning, LLC
New Worksheet
NOT FOR Ply to SALE
p l ace theORresults in a new wo r k-
DISTRIBUTION Notice that the
NOTresults
FOR are SALE
containedOR in a new worksheet
DISTRIBUTION
sheet in the same wor kbook, and we specify the name of in the workbook called Descriptive Statistics. The data are con-
the new wo r ksheet as Descriptive Statistics. In the last tained in the Data worksheet. When the Descriptive Statistics
section of the dialog box, we request that Excel generate option is selected, Excel generates all of the statistics shown in
summary statistics for BMI by checking the appropriate box Figure 4–4. The default (or automatic) statistics produced
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch04_047_054.qxd 3/23/11 3:37 PM Page 49

The Descriptive Statistics Analysis Tool 49

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 4–2 Invoking the Data Analysis ToolPak

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & BartlettFIGURE


Learning,
4–3 LLC © Jones
The Descriptive Statistics Analysis Tool & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch04_047_054.qxd 3/23/11 3:37 PM Page 50

50 Summarizing Continuous Variables in a Sample

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 4–4 Descriptive Statistics for TABLE 4–2 Default Statistics in Descriptive Statistics
BMI Analysis Tool

© Jones & Bartlett Learning,Statistic


LLC © Jones &
Formula/DescriptionBartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT
 FOR SALE OR DISTRIBUTION
Sample mean   
X
n
s
Standard error SE  
n 
Median
© Jones & Bartlett Learning, LLC © JonesMiddle
& value (50% above and 50%
Bartlett
below)
Learning, LLC
NOT FOR SALE OR DISTRIBUTION Mode NOT FORThe SALE ORvalue.
most frequent DISTRIBUTION
If there is
no value that appears more than
any other, Excel indicates this
with #“ N/A”

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION
© Jones
Sample standard
NOTdeviatio
s= 
FORn SALE OR DISTRIBUTION
(XLLC
& Bartlett Learning, – X)2
n–1 
Sample variance (X – 
s2 =  X)2
n– 1
Kurtosis Reflects the thickness of the tails of
© Jones & Bartlett Learning, LLC © Jones
a distributio & uBartlett
n of a continuo s Learning, LLC
characteristic as compared to a
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE
normal distribution (see Chapter 5
OR DISTRIBUTION
of the textbook). The kurtosis of
a normal distribution is 0.
Skewness Reflects the symmetry of a distri-
bution of a continuous charac-
© Jones & Bartlett Learning, LLC © Jones & Bartlett
teristic Learning,
as compared to a normal LLC
NOT FOR SALE OR DISTRIBUTION NOT FORdistributio
SALEn (see ORChap DISTRIBUTION
ter 5 of
the textbook). The skewness of a
normal distribution is 0.
Range Range = Maximum  Minimum
Minimum The smallest value in the dataset
© Jones & Bartlett Learning, LLC © Jones
Maximum& Bartlett Learning, LLC
The largest value in the dataset
NOT FOR SALE OR DISTRIBUTION NOTSumFOR SALE OR DISTRIBUTION
The sum of the observations, X
Count The number of observations, n

with the Descriptive Statistics© Jones


Analysis & Bartlett
Tool are Learning, LLC
shown in Table © Jones & Bartlett Learning, LLC
4–2. In Chapter 4 of the textbook,NOT FORmany
we discussed SALE (but OR
not DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
all) of these statistics. The most relevant statistics for BMI are
the sample size n  10 (referred to as Count), the sample mean Figure 4–5, we specify the input range as B1 through G11. This
X  27.26, and the sample standard deviation s  3.07. Notice
 range includes all variables and variable names or labels (which
that Excel©generatesJones & Bartlett Learning, LLC
summary statistics with many mo re dec- are contained in ro ©w 1).
Jones & Bartlett Learning, LLC
imal placesNOT than woFORuld be reaso n able to repor
SALE OR DISTRIBUTION t. Again, we request
NOT FOR that SALE
the o u tp u tOR
be pDISTRIBUTION
l aced in a
It is also possible to generate descriptive statistics for sev- new worksheet, here called All Variables. A screenshot of par-
eral continuous variables simultaneously. Again, we select tial results is show n in Figure 4–6. Descriptive statistics are
Tools/Data Analysis, and then Descriptive Statistics from the co m p u ted o n each o f the variables but canno t be dis-
list of analysis tools. We then specify the range of all variables p l ayed o n the screen witho u t substantially reducing the
© Jones
in the &inpBartlett
ut range field Learning, LLCn of the dialog box. In
in the first sectio ©fonJones
t size. & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch04_047_054.qxd 3/23/11 3:37 PM Page 51

Descriptive Statistics Using Excel Functions 51

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 4–5 Descriptive Statistics for Several Variables

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

4.2 DESCRIPTIVE STATISTICS USING EXCEL Figure 4–7, we use the quartile function to compute the first
FUNCTIONS quartile of DBP and place it in cell C13. Notice that we also

© Jones & Bartlett


There areLearning, LLC
other statistics that we discussed in Chapter© 4 ofJonesadded a label into cell B13. In Figure 4–8, we compute the
&ndBartlett Learning, LLC
the textbo o k that are useful fo r co n tinuo u s variables and are seco (median) and third (Q3) quartiles of DBP. These are
NOT FOR SALE OR DISTRIBUTION NOT FOR compuSALE
ted usingOR
the forDISTRIBUTION
mulas 
“ QUARTILE(C2:C11,2)” and
not automatically generated with the Descriptive Statistics
Analysis Tool. An example is the first (and third) quartile. “ QUARTILE(C2:C11,3)”, resp e ctively. Excel uses an
Excel does not include the quartiles in the output of the
Descriptive Statistics Analysis Tool, but they can be computed
with an Excel functio© Jones
n. The functio&
n isBartlett
invoked as “ Learning,
 QUAR-LLC © Jones & Bartlett Learning, LLC
NOT number)”.
FOR SALE OR DISTRIBUTION TABLE 4–3 Quartiles Produced by the QUARTILE
NOT FOR SALE OR DISTRIBUTION
TILE(data range, quartile
Function
The data range is defined by the addresses of the cells con-
taining the first and last observations in the dataset, separated Quartile Number Statistic
by a colon. For example, using the data in Figure 4–1, systolic 0 Minimum
©od Jones
blo pressures oc&
cupyBartlett Learning,
the range B2:B11, diastolicLLC
blood pres- © 1Jones & BartlettFirst Learning, LLC
quartile (holds 25% of
sures values below it)
NOToccupFORy the range
SALEC2:C11, and total cholesterol values
OR DISTRIBUTION NOT2 FOR SALE OR DISTRIBUTION
Median
occupy the range D2:D11. The quartile numbers range from 0
3 Third quartile (holds 25% of
to 4. Table 4–3 indicates which values are generated for each
values above it)
quartile number. 4 Maximum
We now use the quartile function to generate quartiles for
© Jones & Bartlett
the diastolicLearning,
blood pressuresLLC
(DBP) shown in Figure 4–1. In © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch04_047_054.qxd 3/23/11 3:37 PM Page 52

52 Summarizing Continuous Variables in a Sample

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 4–6 Descriptive Statistics for All Variables

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 4–7 First Quartile of DBP

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch04_047_054.qxd 3/23/11 3:37 PM Page 53

Descriptive Statistics Using Excel Functions 53

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
algorithm to determine the quartiles that involves interpola- in the worksheet containing the data to be analyzed) are spec-
tion. When we compute quartiles by hand as described in ified in parentheses. The exception is the QUARTILE func-
Chapter 4 of the textbook, we do not interpolate, and thus tion, which also requires specification of the quartile number.
Excel may produce different values for the quartiles than those To compute the standard deviation of the systolic blood
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
we determine by hand. Results are not likely to be different pressures, we enter “ STDEV(B2:B11)”. Similarly, we can
NOT FOR SALE OR DISTRIBUTION
when the sample size is large.
NOT FOR SALE OR DISTRIBUTION
compute the median systolic blood pressure using the median
Other Excel functio n s that are useful fo r generating function,“ MEDIAN(B2:B11)”. Note that “ MEDIAN(B2:B11)”
summary statistics for continuous variables are show n in is equivalent to “ QUARTILE(B2:B11,2)”. In Exceland in most
Table 4–4. For each function, the data (i.e., the range of cells other statistical computing packages, there are often several
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 4–8 Second and Third Quartiles of DBP

TABLE 4–4 Excel Functions for Summary Statistics

Function Description
AVERAGE Computes the sample mean
COUNT Computes the sample size
MAX Computes the maximum value
MEDIAN Computes the sample median
MIN Computes the minimum value
MODE Computes the mode
QUARTILE Computes the quartiles
STDEV Computes the sample standard
deviation
VAR Computes the sample variance

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch04_047_054.qxd 3/23/11 3:37 PM Page 54

54 Summarizing Continuous Variables in a Sample

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
ways to perform the same analysis. The analyst can choose the b. Use the QUARTILE function to compute the first
method with which they are the most comfortable or that best and third quartiles.
suits their style. 3. In the study of a new antihypertensive medication,
systolic blood pressures are measured at baseline (or
4.3 PRACTICE PROBLEMS © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
the start of the study before any treatment is admin-
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
istered). The data are as follows:
1. A study is conducted to estimate the mean total
cholesterol level in children 2 to 6 years of age. A
sample of nine participants is selected and their total 120 112 138 145 135 150 145 163
cholesterol levels are measured as follows.
© Jones & Bartlett Learning, LLC 148 © Jones
128 143& Bartlett
156 160Learning,
142 LLC
150
NOT
185 225FOR
240SALE OR DISTRIBUTION
196 175 180 194 147 223 NOT FOR SALE OR DISTRIBUTION
Use Excel functions to compute the sample mean,
standard deviation, median, and quartiles.
a. Use the Data Analysis ToolPak to compute the
sample mean, standard deviation, and median. 4. The following are height measurements (in cm) for a
b. Use the QUARTILE function to compute the first sample of infants participating in a study of infant
and third quartiles. health:
2. The following data were collected as part of a study
of coffee consumption among graduate students. The 28 30 41 48 29 48 62 49 51 39
following reflect cups per day consumed:
Use the Data Analysis ToolPak to compute summary
3 4 6 8 2 1 0 2 statistics.

a. Use the Data Analysis ToolPak to compute the


sample mean, standard deviation, and median.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch05_055_062.qxd 3/23/11 3:38 PM Page 55

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION
CHAPTER 5
© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION

Working with Probability


© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION

Functions

Microsoft Office® Excel® has a number of probability func- tribtion model (i.e., x, n, and p). Excel requires one addi-
tions that can be used to compute probabilities or to find per- tional input, labeled cumulative. This last entry in the
centiles. In Chapter 5 of the textbook, we discussed in detail BINOMDIST function is a logical value (i.e., one whose
two probability models, the binomial and normal distribu- responses are true or false). We either use the cumulative
tions, which are appropriate for discrete and continuous out- distribution function by specifying “true” or do not use the
comes, respectively. As we discussed in Chapter 5 of the cumulative distribution function by specifying “false.” The
textbook, there are many other probability distributions that cumulative distribution function returns the probability of
describe discrete and continuous outcomes; we focused ex- observing x or fewer (rather than exactly x)successes. For
clusively on these two. Excel has a number of functions that example, if we specify “true” and indicate x = 5 in the func-
can be used to compute probabilities for various distributions. tion, then Excel computes P(X  5). In contrast, if we spec-
We focus on the binomial and normal distributions. ify “false” and indicate x = 5 in the function, then Excel
computes P(X = 5). We illustrate the use of the function in
5.1 COMPUTING PROBABILITIES the following example.
WITH THE BINOMIAL DISTRIBUTION Example 5.1. In Example 5.9 in the textbook, we pre-
In Chapter 5 of the textbook, we discussed the binomial dis- sented an example assessing the extent to which adults with al-
tribution model and computed probabilities using the model: lergies report relief from allergic symptoms with a specific
medication. We know that the medication is effective in 80%
n!
P(x successes) =  px (1  p)nx of patients with allergies. If we provide the medication to
x!(n  x)!
10 patients with allergies, what is the probability that it is ef-
where n denotes the number of times the application or process fective in exactly 7?
is repeated (sometimes called the number of trials), x denotes For this example, n = 10, p = 0.80, and x = 7. We now use
the number of successes out of n trials of interest, and p is the Excel to compute the desired probability. In Figure 5–1, we
probability of success for any individual. enter n, p, and x into an Excel worksheet. We wish to compute
To use the binomial distribution model, we need to the probability of 7 successes when n = 10 and the probability
specify n, p, and x. Excel has a probability function to com- of success for any individual is 0.80. We use the BINOMDIST
pute probabilities from a binomial distribution. The function function and specify the cell locations for x (A2), n (B2), and
is invoked as “=BINOMDIST(x, n, p, cumulative)”. The first p (C2). Because we want to compute the probability of exactly
three inputs for the function are the same as those we use in 7 successes, we set cumulative = false. The specification of the
computing probabilities by hand with the binomial dis- formula is shown in Figure 5–2.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch05_055_062.qxd 3/23/11 3:38 PM Page 57

Computing Probabilities with the Normal Distribution 57

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 5–3 P(X  7) When n  10 and p  0.8

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 5–4 Requesting P(X  7) When n  10 and p  0.8

FIGURE 5–5 P(X  7) When n  10 and p  0.8

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch05_055_062.qxd 3/23/11 3:38 PM Page 59

Computing Probabilities with the Normal Distribution 59

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 5–8 P(X  85) When   70 and   10

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

with x entered in column A. We specify the mean and standard are shown in Figure 5–9. Suppose we now wish to compute
deviation of the distribution in column B and column C, re- the probability that a male has a BMI between 30 and 35—
spectively, and use the NORMDIST function to generate prob- i.e., P(30  X  35). This can be done using the NORMDIST
abilities less than or equal to x. For example, in cell D2 we enter function to compute the probabilities that a male has BMI less
“=NORMDIST(A2,B2,C2,true)”. For a normal distribution than 35 and less than 30 (as in Figure 5–9) and subtracting
with  = 70 and standard  = 10, P(X  85)  0.9332. (Figure 5–10) to compute the desired probability: P(30  X 
Example 5.2. In Example 5.11 in the textbook, we ana- 35)  P(X  35)  P(X  30)  0.275.
lyzed body mass index (BMI), which is assumed to be normally Now consider BMI in women. What are the probabilities
distributed for specific gender and age groups. The mean BMI that a female aged 60 has a BMI less than 30 and less than 35?
for men aged 60 is 29 with a standard deviation of 6, and for What is the probability that a female aged 60 has a BMI be-
women aged 60 the mean is 28 with a standard deviation of 7. tween 30 and 35? We use the same approach, but recall for
For men aged 60, we now use Excel to compute the women aged 60, the mean is 28 and the standard deviation is
following: P(X  35), P(X  41), and P(X  30). The results 7. The results are shown in Figure 5–11.

FIGURE 5–9 Probabilities from the Normal Distribution Using


the NORMSDIST Function

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


60 Working with Probability Functions

FIGURE 5–10 P(30  X  35) When   29 and   6

FIGURE 5–11 P(30  X  35) When   328 and   7

5.3 FINDING PERCENTILES OF THE NORMAL percentiles for the standard normal distribution and for any
DISTRIBUTION normal distribution. The two functions are NORMSINV and
Recall from Chapter 5 in the textbook that a percentile is a NORMINV, respectively. The inputs for the functions are
score that holds a specified percentage or proportion of scores shown in the following text.
below it. For example, the 80th percentile is the score that holds The function to compute percentiles for the standard
80% of the scores below it. Excel can also be used to compute normal distribution is invoked as “NORMSINV(probabil-
95313_Ch05_055_062.qxd 3/23/11 3:38 PM Page 61

Practice Problems 61

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 5–12 Percentiles of BMI

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

ity)”. The input for the function is the desired percentile, 2. Among coffee drinkers, men drink a mean of 3.2 cups
entered as a probability or proportion. For example, to com- per day with a standard deviation of 0.8 cups. Assume
pute the 80th or 95th percentiles, we specify 0.80 or 0.95, the number of coffee drinks per day follows a nor-
respectively. mal distribution.
The function to compute percentiles for any normal dis- a. What proportion drink 2 cups per day or more?
tribution is invoked as “NORMINV(probability, , )”. The b. What proportion drink no more than 4 cups
inputs for the function are the desired percentile entered as a per day?
probability and the mean () and standard deviation () of c. If the top 5% of coffee drinkers are considered
the normal distribution. heavy coffee drinkers, what is the minimum num-
Example 5.3. Using the data in Example 5.2, we now use ber of cups consumed by a heavy coffee drinker?
Excel to compute the 90th and 95th percentiles of BMI for (Hint: Find the 95th percentile.)
men and women. The results are shown in Figure 5–12. In 3. A study is conducted to assess the impact of caffeine
men, 90% of BMIs are below 36.7 and 95% are below 38.9. In consumption, smoking, alcohol consumption, and
women, 90% of BMIs are below 37.0 and 95% are below 39.5. physical activity on the risk of cardiovascular disease.
Suppose that 40% of participants consume caffeine
5.4 PRACTICE PROBLEMS and smoke. If 8 participants are evaluated, what is
1. Total cholesterol in children aged 10 to 15 years of the probability that:
age is assumed to follow a normal distribution with a. Exactly half of them consume caffeine and smoke?
a mean of 191 and a standard deviation of 22.4. b. At most 6 consume caffeine and smoke?
a. What proportion of children 10 to 15 years of age 4. A recent study of cardiovascular risk factors reports
have total cholesterol between 180 and 190? that 30% of adults meet the criteria for hypertension.
b. What proportion of children 10 to 15 years of age If 15 adults are assessed, what is the probability that:
would be classified as hyperlipidemic (assume that a. Exactly 5 meet the criteria for hypertension?
hyperlipidemia is defined as a total cholesterol b. None meet the criteria for hypertension?
level over 200)? c. Less than or equal to 7 meet the criteria for
c. What is the 90th percentile of total cholesterol? hypertension?

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch06_063_078.qxd 3/23/11 3:38 PM Page 63

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


CHAPTER 6
© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


Confidence Interval © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
Estimates NOT FOR SALE OR DISTRIBUTION

In Chapter 6 of the textbook, we presented formulas to gener- 6.1 CONFIDENCE INTERVALS FOR ONE
ate confidence intervals for means () and proportions (p) in SAMPLE, CONTINUOUS OUTCOME
one sample and for differences in means (1  2) and differ- In Chapter 6 of the textbook, we presented the following for-
ences in proportions (p1  p2) in two independent samples. mulas for confidence intervals for the mean of a continuous
We also discussed confidence intervals for the mean difference variable in one sample.
(d) when two samples are matched or paired. For each appli-
cation, we used the same general approach. Confidence inter- s
n  30: X
  z  (Find z in Table 1B)
vals for each parameter take the following form:  n

Point estimate  Margin of error s


n  30: X
  t  (Find t in Table 2, df  n  1)
n 
The point estimate depends on the parameter being estimated.
When computing the confidence intervals by hand, we com-
For example, when estimating the mean of a population, , the
puted the sample size, mean, and standard deviation, and then
point estimate is the sample mean, X . When estimating the
used Table 1B or Table 2 in the Appendix of the textbook to
population proportion, p, the point estimate is the sample pro-
find the appropriate z or t value to reflect the desired confi-
portion, p̂. The margin of error includes two components. The
dence level. Here we use Excel to compute summary statistics
first component is from a probability distribution (e.g., z or t)
and to determine the appropriate z or t value for the confi-
and reflects the selected confidence level (e.g., 90%, 95%) and
dence interval. Once all of the requisite components are de-
the second component is the standard error of the point esti-
termined, we construct the confidence interval.
mate. For example, the standard error of the sample mean, X , is
Excel has a function to compute z values that can be used
SE  s ⁄ n . The standard error of the sample proportion, p̂,
in confidence intervals. The function is “NORMSINV
is SE  
p̂(1 p̂) ⁄n. (lower tail area)”. To use this function for confidence intervals,
In Chapter 6 of the textbook, we presented formulas for we specify the area under the curve in the lower tail of the
confidence intervals for various parameters (see Table 6–22 in standard normal distribution. For example, for a 95% confi-
Section 6.7 of the textbook for details). Here we use Microsoft dence interval, the area in the lower tail is 0.975. Figure 6–1
Office® Excel® to generate confidence intervals for various pa- shows the standard normal distribution, z, and the z values
rameters. Excel does not generate confidence intervals directly; that hold the middle 95% of the distribution, P(1.96  X
instead, we use specific Excel functions to produce the z or t val-  1.96)  0.95. To produce the z value for a 95% confidence
ues that reflect the desired confidence level and then construct interval, we specify “NORMSINV(0.975)”, which returns
the confidence interval. 1.96.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


64 Confidence Interval Estimates

FIGURE 6–1 Z Value for 95% Confidence

0.95

0.025 0.025

-3 -2 -1 0 1 2 3

Excel has a second function that computes t values that can to t. The data are entered into an Excel worksheet as shown in
also be used in confidence intervals. The function is Figure 6–2.
“TINV(total tail area, df ).” To use this function for confi- First, we compute the standard errors for each character-
dence intervals, we specify the total area in the tail of the t dis- istic as SE  s ⁄ n and place these values in column E. For
tribution along with degrees of freedom, df. For a 95% systolic blood pressure, the following is entered into cell E2:
confidence interval, the total tail area is 0.05, and for one sam- “D2/SQRT(B2)”. We then compute the z scores for 95% con-
ple, the degrees of freedom are df  n  1. We now illustrate fidence intervals and place these in column F. The computation
the use of these formulas to compute confidence intervals. is the same for each characteristic—for example, in cell F2 we
Example 6.1. In Example 6.1 in the textbook, we analyzed enter “NORMSINV(0.975)”. The standard errors and z val-
data on n  3539 participants who attended the seventh ex- ues are shown in Figure 6–3. Note that had we wanted 90%
amination of the offspring in the Framingham Heart Study. confidence intervals, we would have specified “NORM-
Summary statistics on variables measured in the sample are SINV(0.95)” in column F.
shown in Table 6–1. We use Excel to generate 95% confidence We now compute the lower and upper limits of the 95%
intervals for each characteristic. Because the sample size is s
confidence intervals using X   z . For systolic blood
large, we use the confidence interval formula with z as opposed n
pressure, the lower limit is “C2  (F2*E2)” and the upper
limit is “C2  (F2*E2)”. (The product of the z value and the
standard error produce the margin of error.) The confidence
intervals are shown in Figure 6–4.
TABLE 6–1 Summary Statistics, Framingham Heart
Study Offspring Example 6.2. In Example 6.2 in the textbook, we presented
data on a subsample of n  10 participants who attended the
Standard seventh examination of the Framingham Offspring Study.
Characteristic n )
Mean (X Deviation (s) Summary statistics on variables measured in the subsample
Systolic blood pressure 3534 127.3 19.0 are shown in Table 6–2.
Diastolic blood pressure 3532 74.0 9.9 We use Excel to generate 95% confidence intervals for each
Total serum cholesterol 3310 200.3 36.8 characteristic. Because the sample size is small, we use the con-
Weight (lb) 3506 174.4 38.7
fidence interval formula with t as opposed to z. The data are en-
tered into an Excel worksheet as shown in Figure 6–5.
First we compute the standard errors for each character-
istic as SE  s ⁄ n
 and place these values in column E—e.g.,
95313_Ch06_063_078.qxd 3/23/11 3:38 PM Page 65

Confidence Intervals for One Sample, Continous Outcome 65

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOTOffspring
FIGURE 6–2 Data from Framingham FOR SALEStudy OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION

FIGURE 6–3 Standard Errors and z Values for Confidence Intervals

FIGURE 6–4 Upper and Lower Limits of 95% Confidence Intervals

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch06_063_078.qxd 3/23/11 3:38 PM Page 66

66 Confidence Interval Estimates

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
Example 6.3. In Example 4.3 in the textbook, we presented
TABLE 6–2 Summary Statistics, Framingham Heart data on the subset of n  10 participants, which were summa-
Study Offspring Subsample
rized in Example 6.2. The data values were presented in Table 4–13
of the textbook and are shown in Table 6–3. The data are entered
© Jones & Bartlett Learning, LLC
Standard
Characteristic n Mean (X ) Deviation (s) into an Excel worksheet and shown in Figure 6–8.
NOT FOR SALE OR DISTRIBUTION In Chapter 4 of the Excel workbook, we described the Data
Systolic 10 121.2 11.1
blood pressure Analysis ToolPak and generated summary statistics on con-
Diastolic 10 71.3 7.2 tinuous variables using the Descriptive Statistics Analysis Tool.
blood pressure This tool can be used to generate summary statistics and also
Total serum 10 202.3 37.7 to generate information that can be used to produce a confi-
cholesterol
Weight (lb) 10 176.0 33.0
dence interval. Suppose we wish to generate descriptive statis-
Height (in) 10 67.175 4.205 tics on systolic blood pressures (SBP). Using the Data Analysis
Body mass 10 27.26 3.10 ToolPak and selecting the Descriptive Statistics Module pro-
index (BMI) duces the dialog box shown in Figure 6–9.
In the dialog box, we specify the range of the data, we re-
quest that the results be placed in a new worksheet entitled
Descriptives on SBP, and we request summary statistics and
confidence interval information for a 95% confidence level.
for systolic blood pressure, the following is entered into cell In the Confidence Level for Mean input field, any level from
E2: “D2/SQRT(B2)”. We then compute the t scores for 95% 0% to 100% can be specified (typical values are 90%, 95%,
confidence intervals and place these in column F. The com- and 99%). The information in the new worksheet Descriptives
putation is the same for each characteristic—for example, in on SBP is shown in Figure 6–10.
cell F2 we enter “TINV(0.05,B21)”. The standard errors The information in cell B16 is the margin of error (i.e.,
and t values are shown in Figure 6–6. the product of the t value for 95% confidence and the stan-
We now compute the lower and upper limits of the 95% dard error). We must now take the margin of error and add
s
confidence intervals using X   t . For systolic blood it to and subtract it from the sample mean (point estimate)
n
pressure, the lower limit is “C2  (F2*E2)” and the upper to produce the confidence limits. This is done and shown
limit is “C2  (F2*E2)”. The confidence intervals are shown in Figure 6–11. Notice that the confidence limits shown in
in Figure 6–7. Figure 6–11 are identical to those shown in Figure 6–7.

FIGURE 6–5 Data From Framingham Offspring Subsample

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


FIGURE 6–6 Standard Errors and t Values for Confidence Intervals

FIGURE 6–7 Upper and Lower Limits of 95% Confidence Intervals

TABLE 6–3 Subsample of n = 10 Participants Attending the Seventh Examination of the Framingham Offspring Study

Participant Systolic Blood Diastolic Blood Total Serum


ID Pressure Pressure Cholesterol Weight (lb) Height (in) BMI
1 141 76 199 138 63.00 24.4
2 119 64 150 183 69.75 26.4
3 122 62 227 153 65.75 24.9
4 127 81 227 178 70.00 25.5
5 125 70 163 161 70.50 22.8
6 123 72 210 206 70.00 29.6
7 105 81 205 235 72.00 31.9
8 113 63 275 151 60.75 28.8
9 106 67 208 213 69.00 31.5
10 131 77 159 142 61.00 26.8
95313_Ch06_063_078.qxd 3/23/11 3:38 PM Page 68

68 Confidence Interval Estimates

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 6–8 Data Measured in Subsample of n  10 Participants


Attending Seventh Examination of the Framingham Offspring Study

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION

FIGURE 6–9 Descriptive Statistics Analysis Tool in Data Analysis ToolPak

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch06_063_078.qxd 3/23/11 3:38 PM Page 69

Confidence Intervals for One Sample, Dichotomous Outcome 69

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
6.2 CONFIDENCE INTERVALS FOR ONE
FIGURE 6–10 Descriptive Statistics on SBP SAMPLE, DICHOTOMOUS OUTCOME
In Chapter 6 of the textbook, we presented the following
© Jones & Bartlett Learning, LLCformula for the confidence interval for a proportion (of a
NOT FOR SALE OR DISTRIBUTION dichotomous variable) in one sample:

p̂  z 
p̂(1 p̂)

n
(Find z in Table 1B)

When computing the confidence intervals by hand, we com-


puted the sample size and sample proportion and then used
Table 1B in the Appendix of the textbook to find the appro-
priate z value to reflect the desired confidence level. We now use
Excel to compute the sample proportion and to determine the
appropriate z value for the confidence interval.
Example 6.4. In Example 6.4 in the textbook, we analyzed
data on the prevalence of cardiovascular disease (CVD) mea-
sured in men and women at the fifth examination of the
Framingham Offspring Study. The data are shown in Table 6–4.
The data are entered into an Excel worksheet and shown in
Figure 6–12.
In Figure 6–13, we compute the sample proportions with
prevalent CVD by dividing the numbers with prevalent CVD
by the respective totals, and then generate the z values for 95%

FIGURE 6–11 Computing Confidence Limits From Point Estimate


and Margin of Error

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


70 Confidence Interval Estimates

n1  30 or n2  30:


1 1
1  
(X X2)  tS p    (Find t in Table 2,
n1 n2
df  n1 + n2  2)

where


(n11)s12  (n21)s22
Sp  
n1  n22

When computing the confidence intervals by hand, we


confidence using the NORMSINV function. The sample pro- compute the sample sizes, means, and standard deviations in
portions and z values are shown in Figure 6–13. each sample. We then compute the pooled estimate of the com-
In Figure 6–14, we compute the upper and lower limits of mon standard deviation, Sp, and use Table 1B or Table 2 in the
the 95% confidence interval using pz p(1 − p) . Notice how
ˆ ˆ Appendix of the textbook to find the appropriate z or t value
n to reflect the desired confidence level. We now use Excel to
the formula is implemented. For example, the lower 95% con-
compute summary statistics and to determine the appropriate
fidence limit for the total sample is in cell G4 and is computed
z or t value for the confidence interval. Once all of the requi-
as “E4F4*SQRT(E4*(1E4)/D4)”. The upper 95% confi-
site components are determined, we construct the confidence
dence limit for the total sample is in cell H4 and is computed
interval.
as “E4+F4*SQRT(E4*(1E4)/D4)”.
Example 6.5. In Example 6.5 in the textbook, we analyzed
6.3 CONFIDENCE INTERVALS FOR TWO data on n  3539 participants who attended the seventh ex-
INDEPENDENT SAMPLES, CONTINUOUS OUTCOME amination of the offspring in the Framingham Heart Study
and compared men and women on the characteristics shown
In Chapter 6 of the textbook, we presented the following for-
in Table 6–5.
mulas for confidence intervals for the difference in means of a
We use Excel to generate 95% confidence intervals for the
continuous variable in two independent samples:
difference in means between men and women. Because the
n1  30 and n2  30: sample sizes are large, we use the confidence interval formula
with z as opposed to t. The data are entered into an Excel work-

1 1
1  
(X X2)  zS p    (Find z in Table 1B) sheet as shown in Figure 6–15.
n1 n2
Confidence Intervals for Two Independent Samples, Continuous Outcome 71

FIGURE 6–13 Sample Proportions and z Values for Confidence


Intervals
95313_Ch06_063_078.qxd 3/23/11 3:38 PM Page 72

72 Confidence Interval Estimates

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 6–15 Data for Confidence Intervals for Differences in
Means

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION

First we compute the pooled estimates of the common estimates of the common standard deviations Sp and z values


2 2
n11)s1  (n21)s2
( are shown in Figure 6–16.
standard deviations Sp  n1  n22 and place
We now compute the point estimates for the difference in
these in column H. For systolic blood pressure, the following means (X1  X2) and the lower and upper limits of the 95%
is entered into cell H3: “SQRT(((B31)*D3^2 
 n
1 1
1  
confidence intervals using (X X 2)  z S p    . The
(E31)*G3^2)/(B3E32))”. We then compute the z scores n1 2
for 95% confidence intervals and place these in column I. The confidence intervals are shown in Figure 6–17. The 95% con-
computation is the same for each characteristic—for exam- fidence limits are shown in column K and column L for each
ple, in cell I3 we enter “NORMSINV(0.975)”. The pooled characteristic.

FIGURE 6–16 Sp and z Values for Confidence Intervals

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch06_063_078.qxd 3/23/11 3:38 PM Page 73

Confidence Intervals for Matched Samples, Continuous Outcome 73

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 6–17 Upper and Lower Limits of 95% Confidence Intervals

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION

The same approach is used to compute confidence inter- We use Excel to compute difference scores, to generate
vals for the difference in means when the sample sizes are small summary statistics on the difference scores, and to generate a
(i.e., when one or both of the sample sizes are less than 30), ex- 95% confidence interval for the mean difference in systolic
cept that the TINV function is used to compute the appropri- blood pressures over time. The data are entered into an Excel
ate value from the t distribution with degrees of freedom equal worksheet as shown in Figure 6–18. Difference scores are
to n1  n2  2.

6.4 CONFIDENCE INTERVALS FOR MATCHED


SAMPLES, CONTINUOUS OUTCOME
In Chapter 6 of the textbook, we presented the following for-
mulas for confidence intervals for the mean difference of a TABLE 6–6 Systolic Blood Pressures Measured at
Examinations 6 and 7
continuous variable in two dependent or matched samples.
s Subject
n  30: d  z d (Find z in Table 1B)
X Identification
n 
sd Number Examination 6 Examination 7
n  30: d  t  (Find t in Table 2, df  n  1)
X
n  1 168 141
2 111 119
where n is the number of participants or pairs and X d and sd 3 139 122
are the mean and standard deviation of the difference scores 4 127 127
(where differences are computed on each participant or be- 5 155 125
6 115 123
tween members of a matched pair).
7 125 113
We now use Excel to compute summary statistics and to 8 123 106
determine the appropriate z or t value for the confidence in- 9 130 131
terval. Once all of the requisite components are determined, we 10 137 142
construct the confidence interval. 11 130 131
Example 6.6. In Example 6.7 in the textbook, we analyzed 12 129 135
13 112 119
systolic blood pressures measured at the sixth and seventh ex- 14 141 130
aminations of the offspring in the Framingham Heart Study in 15 122 121
a subsample of n  15 randomly selected participants. The
data are shown in Table 6–6.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch06_063_078.qxd 3/23/11 3:38 PM Page 74

74 Confidence Interval Estimates

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
computed for each participant by subtracting the systolic blood
FIGURE 6–18 Data for Confidence Interval for pressure measured at Exam 6 from that measured at Exam 7.
Mean Difference We next use the Descriptive Statistics Analysis Tool to gen-
erate descriptive statistics on the differences and we request
© Jones & Bartlett Learning, LLC
the confidence interval information. The specifications for the
NOT FOR SALE OR DISTRIBUTION
analysis are shown in Figure 6–19. In the following, we request
that the results are placed in the current worksheet and we
specify the top-left corner of the results table as cell F1. The de-
scriptive statistics are shown in Figure 6–20.
Cell G16 contains the margin of error (i.e., the product of the
t value for 95% confidence and the standard error). We now take
the margin of error and add it to and subtract it from the mean
difference in the sample (the point estimate in cell G3) to produce
the confidence limits. This is done and shown in Figure 6–21.

6.5 CONFIDENCE INTERVALS FOR TWO


INDEPENDENT SAMPLES, DICHOTOMOUS OUTCOME
In Chapter 6 of the textbook, we presented the following for-
mula for the confidence interval for a difference in propor-
tions in two independent samples.
pˆ1 (1 − pˆ1) pˆ 2 (1 − pˆ 2)
p̂1 p̂2 z + (Find z in Table 1B)
n1 n2

FIGURE 6–19 Computing a Confidence Interval on the Difference


Scores

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch06_063_078.qxd 3/23/11 3:38 PM Page 75

Confidence Intervals for Two Independent Samples, Dichotomous Outcome 75

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 6–20 Descriptive Statistics on Difference Scores

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION

FIGURE 6–21 Upper and Lower Limits of 95% Confidence Interval

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch06_063_078.qxd 3/23/11 3:38 PM Page 76

76 Confidence Interval Estimates

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
When computing the confidence intervals by hand, we com- In Figure 6–23, we compute the sample proportions
puted the sample sizes and sample proportions, and then with prevalent CVD by dividing the numbers with prevalent
used Table 1B in the Appendix of the textbook to find the CVD by the respective totals. We then compute the point es-
appropriate z value to reflect the desired confidence level. timate as the difference in sample proportions and generate
© Jones & Bartlett Learning, LLC
We now use Excel to compute the sample proportions and the z value for 95% confidence using the NORMSINV
NOT FOR SALE OR DISTRIBUTION
to determine the appropriate z value for the confidence function.
interval. In Figure 6–24, we compute the upper and lower
Example 6.7. In Example 6.4 of the Excel workbook, we limits of the 95% confidence interval using p̂1 p̂2 z
analyzed data on the prevalence of cardiovascular disease pˆ1 (1 − pˆ1) pˆ 2 (1 − pˆ 2)
+ . Notice how the formula is implemented
(CVD) measured in men and women at the fifth examination n1 n2
of the Framingham Offspring Study. The data are shown in (see the formula for the upper limit in the formula bar).
Table 6–4. The data are entered into an Excel worksheet and Excel can be used to generate confidence intervals for rela-
shown in Figure 6–22. tive risks and odds ratios using a similar approach. The exact

FIGURE 6–22 Data for Confidence Interval for Difference in


Proportions

FIGURE 6–23 Sample Proportions, Difference in Proportions, and z Value for


Confidence Interval

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch06_063_078.qxd 3/23/11 3:38 PM Page 77

Practice Problems 77

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 6–24 Upper and Lower Limits of 95% Confidence Interval

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION

formulas for these confidence intervals can be found in 3. After the pilot study described in Problem 2, the main
Chapter 6 of the textbook. trial is conducted and involves a total of 200 patients.
Patients are enrolled and randomized to receive either
6.6 PRACTICE PROBLEMS the experimental medication or the placebo. The data
1. A study is run to estimate the mean total cholesterol shown in Table 6–7 are data collected at the end of the
level in children 2 to 6 years of age. A sample of 9 study after 6 weeks on the assigned treatment. Generate
participants is selected and their total cholesterol lev- a 95% confidence interval for the difference in pro-
els are measured as follows: portions of patients with hypertension between
groups.
185 225 240 196 175 180 194 147 223 4. The following data were collected as part of a study
of coffee consumption among male and female
Generate a 95% confidence interval for the true mean undergraduate students. The following reflect cups
total cholesterol levels in children. per day consumed:
2. A clinical trial is planned to compare an experi-
mental medication designed to lower blood pres- Male: 3 4 6 8 2 1 0 2
sure to a placebo. Before starting the trial, a pilot
study is conducted involving 10 participants. The Female: 5 3 1 2 0 4 3 1
objective of the study is to assess how systolic blood
pressure changes untreated over time. Systolic
blood pressures are measured at baseline and again
4 weeks later. Compute a 95% confidence interval
TABLE 6–7 Data for Problem 3
for the mean difference in blood pressures over
4 weeks. Experimental Placebo
(n = 100) (n = 100)
Baseline: 120 145 130 160 152 143 126 121 115 135 % Hypertensive 14% 22%

4 Weeks: 122 142 135 158 155 140 130 120 124 130

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch06_063_078.qxd 3/23/11 3:38 PM Page 78

78 Confidence Interval Estimates

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
Generate a 95% confidence interval for the difference
in mean numbers of cups of coffee consumed be- TABLE 6–8 Data for Problem 5
tween men and women.
5. A clinical trial is conducted comparing a new pain
© Jones & Bartlett Learning, LLC © JonesNo&Pain
Pain Relief Bartlett
Relief Learning, LLC
reliever for arthritis to a placebo. Participants are ran-
NOT FOR SALE OR DISTRIBUTION
domly assigned to receive the new medication or a
New medication 44 NOT FOR SALE OR DISTRIBUTION
76
Placebo 21 99
placebo. The outcome is pain relief within 30 min-
utes. The data are shown in Table 6–8.
a. Generate a 95% confidence interval for the pro-
©portion
Jonesof&patients
Bartlett Learning,
on the LLCwho
new medication © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
report pain relief. NOT FOR SALE OR DISTRIBUTION
b. Generate a 95% confidence interval for the differ-
ence in proportions of patients who report pain
relief.
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch07_079_100.qxd 3/23/11 3:42 PM Page 79

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


CHAPTER 7
© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


Hypothesis Testing © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
Procedures NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

In Chapter 7 of the textbook, we presented the approach for hy- example, suppose we conduct an upper-tailed test for the
pothesis testing for means () and proportions (p) in one sam- population mean (i.e., H1:   0) and observe a test statistic
ple, for differences©inJones
means (&1Bartlett
2) and Learning,
differences inLLCz = 2.04. The p-value is P(z  2.04). If we conduct a two-sided
proportions (p1p2NOT) in twoFOR SALEsamples,
independent OR DISTRIBUTION
for the mean test for the population mean (i.e., H1:   0) and observe a
difference in two dependent samples (d), and for differences test statistic z = 2.04, the p-value is P(z  2.04) + P(z  2.04)
in means and proportions in more than two independent sam-  2  P(z  2.04). We use Microsoft Office® Excel® to com-
ples. For each test we used the same general five-step approach, pute the test statistics for each test and to compute p-values for
which is outlined below: each test to draw conclusions based on the following:
• Step 1: Set up hypotheses (H0 and H1) and select a level
Reject H0 if p  .
of significance, .
• Step 2: Choose the appropriate test statistic (e.g., z, t, F, p-values for tests involving z statistics are computed with the
2). NORMSDIST function, and p-values for tests involving t sta-
• Step 3: Determine critical values and set up the deci- tistics are computed with the TDIST function.
sion rule (which depends on , the test statistic, To compute p-values for tests involving a z statistic, we
and whether the test is upper-, lower-, or two- use the NORMSDIST function: “NORMSDIST(z)”. To use
tailed). the NORMSDIST function, we specify the value of the test sta-
• Step 4: Compute the test statistic based on observed tistic, z. The function returns the area under the standard nor-
sample data. mal curve below z. To use the NORMSDIST function to
• Step 5: Draw a conclusion by comparing the test compute p-values, we make the following modifications:
statistic to the critical value.
The test statistic (Step 2) varies depending on the specific test. One-sided z test = “1NORMSDIST(ABS(z))”
When we conducted tests of hypothesis by hand in Chapter 7
of the textbook, we ultimately drew a conclusion by compar- Two-sided z test = “2*(1NORMSDIST(ABS(z)))”
ing the test statistic to the critical value, which was derived
from an appropriate probability distribution. There is an The ABS function takes the absolute value of the test statistic.
alternative means of drawing a conclusion, and it involves By using the ABS function, we can use the preceding one-sided
comparing the p-value of a test (defined as the exact signifi- formula for both upper- and lower-tailed tests.
cance level) to the selected level of significance, . The p-value To compute p-values for tests involving t statistics, we use
is the probability of observing a test statistic as or more extreme the TDIST function: “TDIST(t, df, test type)”. To use the
than that observed, and it can be one-sided or two-sided. For TDIST function, we specify the test statistic t, the degrees of

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


freedom (e.g., for a one-sample test of means, df  n  1),
and then the test type. The test type indicates whether the test
is one- or two-tailed (i.e., test type  1 for upper- or lower-
tailed tests and test type  2 for two-tailed tests). The func-
tion returns the area in the t distribution in one or two tails
(depending on the test type). The TDIST function is used as
follows to compute p-values:

One-sided t test = “TDIST(ABS(t), df, 1)”


Two-sided t test  “TDIST(ABS(t), df, 2)”

Again, the ABS function is used to take the absolute value of the
test statistic.

7.1 TESTS WITH ONE SAMPLE, CONTINUOUS


OUTCOME
For a one-sample test of a hypothesis with a continuous out-
come, the hypotheses are as follows:

H0:   0
H1:   0, H1:  0, or H1:   0

where  is the mean of the population of interest and 0 is a


known mean (e.g., an historical control).
In Chapter 7 of the textbook, we presented the following
formulas for test statistics:

n  30: z= (Find critical value in Table 1C)

X − m0
n 30: t= (Find critical value in Table 2,
s/ n
df  n  1)

When performing the test of hypothesis by hand, we com-


pute the sample size, the mean and standard deviation, and
then the test statistic. We use Table 1C or Table 2 in the
Appendix of the textbook to find the appropriate critical val-
ues of z or t and compare the test statistic to the critical value
to draw a conclusion. Excel does not have a specific analysis
tool for a one-sample test of means. However, Excel can be
used to compute the test statistic and the p-value to draw a
conclusion.
Example 7.1. In Example 7.1 in the textbook, we analyzed
data on expenditures on health care and prescription drugs. We
specifically analyzed whether there was significant evidence of
a reduction in expenditures from the reported value of $3302
per year. To test the hypothesis, a sample of 100 Americans
Tests with One Sample, Continuous Outcome 81

Example 7.2. In Example 7.2 in the textbook, we tested


whether the mean total cholesterol level in the Framingham
Offspring Study was different from the national mean value of
203. The following statistics on total cholesterol levels of par-
ticipants in the Framingham Offspring Study were available:
n  3310, X   200.3, and s  36.8. Here we use Excel to test
if there is statistical evidence of a difference in mean cholesterol
level in the Framingham offspring as compared to the national
mean of 203.
The hypotheses are as follows:

H0:   203

H1:   203

  0.05.

Because the sample size is large (n  30), the appropriate test


X − m0
statistic is z = .
s/ n
We use Excel to compute the test statistic and the p-value.
The data are entered into an Excel worksheet, as shown in We reject H0 because p  0.00002    0.05. (Note
Figure 7–3. We now compute the test statistic, z, and place it in that the p-value is given as 2.43E05, which is equivalent to
cell B8. The formula is “(B3B6)/(B4/SQRT(B2))”. The two- 2.43  105  0.0000243.) We have significant evidence to
sided p-value is computed with the NORMSDIST function show that the mean total cholesterol level in the Framingham
using “2*(1NORMSDIST(ABS(z)))”. The test statistic and offspring is different from the national value of 203. (Recall
p-value are shown in Figure 7–4. that when this test was done by hand, we compared the test sta-
95313_Ch07_079_100.qxd 3/23/11 3:42 PM Page 82

82 Hypothesis Testing Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
tistic z  4.22 to the critical value from the standard normal
distribution and rejected H0 because 4.22 1.96.) FIGURE 7–5 Data for One-Sample Test
of Proportions

7.2 © SAMPLE,
TESTS WITH ONE Jones &DICHOTOMOUS
Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
OUTCOME NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
For a one-sample test of hypothesis with a dichotomous out-
come, the hypotheses are as follows:

© Jones & H 0: p  p0 Learning, LLC


Bartlett © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
H1: p  p0, H1: p p0, or H1: p  p0

where p is the proportion of successes in the population


of interest and p0 is a known proportion (e.g., an historical
© Jones & Bartlett Learning, LLC
control). © Jones & Bartlett Learning, LLC
NOT FORInSALE ChapterOR DISTRIBUTION
7 of the textbook, we presented the following NOT FOR SALE OR DISTRIBUTION
test statistic:

p̂p0
z (Find critical value in Table 1C)
p 0(1 p0)⁄n
 
© Jones & Bartlett Learning, LLCFIGURE 7–6 Test Statistic and p-Value
When performing this test ofNOT FORbySALE
hypothesis hand, weOR DISTRIBUTION
compute
the sample size and sample proportion, and then the test sta-
tistic. We use Table 1C in the Appendix of the textbook to find
the appropriate critical value of z and compare the test statis-
tic to the critical value to draw a conclusion. We now use Excel
to conduct the test of hypothesis.
Example 7.3. In Example 7.4 in the textbook, we tested
whether the prevalence of smoking in the Framingham
Offspring Study was lower than the prevalence of smoking
among American adults, reported as 21.1%. In the
Framingham Offspring Study, 482 of 3536 (13.6%) of the re-
spondents were currently smoking at the time of the exam.
The hypotheses are as follows:

H0: p  0.211

H1: p  0.211
We now compute the test statistic, z, and place it in cell B8.
  0.05.
The formula is “(B4B6)/(SQRT(B6*(1B6)/B2))”. The
p̂ p0 one-sided p-value is computed with the NORMSDIST func-
The appropriate test statistic is z  .
0(1 p0)⁄n
p  tion using “1NORMSDIST(ABS(z))”. The test statistic and
p-value are shown in Figure 7–6.
We use Excel to compute the test statistic and the p-value. In this test, the test statistic is z  10.89 and we reject H0
The data are entered into an Excel worksheet, as shown in because p  0    0.05. We have statistically significant ev-
Figure 7–5. The sample proportion is shown in cell B4 and is idence to show that the prevalence of smoking in the
computed by dividing the number of smokers in the sample by Framingham offspring is lower than the prevalence of smok-
the sample size (i.e., B3 / B2). ing among American adults reported at 21.1%.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch07_079_100.qxd 3/23/11 3:42 PM Page 83

Tests with One Sample, Categorical and Ordinal Outcomes: The Chi-Square Goodness-of-Fit Test 83

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
7.3 TESTS WITH ONE SAMPLE, CATEGORICAL
AND ORDINAL OUTCOMES: THE CHI-SQUARE TABLE 7–1 Data from University Survey
GOODNESS-OF-FIT TEST
© Jones
For a 2 goodness-of-fit test, the&hypotheses
Bartlettare
Learning,
as follows: LLC No Regular© Jones
Sporadic& Bartlett
Regular Learning, LLC
Exercise
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR Total
Exercise Exercise
DISTRIBUTION
H0: p1  p10, p2  p20, ..., pk  pk0 Number of
students 255 125 90 470
H1: H0 is false

© Jones
where & in
the pi are Bartlett Learning,
the population LLCof successes in
proportions © Jones & Bartlett Learning, LLC
each response category in the population and the pi0 are the
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
known proportions in each response category. exercise question following the implementation of the health-
In Chapter 7 of the textbook, we presented the following promotion campaign on campus?
formula for the test statistic: The hypotheses are as follows:
2
© Jones & Bartlett
2   Learning,
(OE)
(Find LLC
critical value in Table 3, df  k©
 Jones
1) &HBartlett Learning, LLC
0: p1  0.60, p2  0.25, p3  0.15, or equivalently
NOT FOR SALE OR DISTRIBUTION
E NOT FOR SALE OR DISTRIBUTION
H0: Distribution of responses is 0.60, 0.25, 0.15
where O  observed frequency and E  expected frequency in
each of the response categories and k  the number of re- H1: H0 is false
sponse options.
© Jones
When performing & Bartletttest
the goodness-of-fit Learning,
by hand, weLLC  0.05
compute the expectedNOT FOR SALE
frequencies for eachOR DISTRIBUTION
category and then (OE)2
compute the test statistic. We then use Table 3 in the Appendix The appropriate test statistic is 2   .
E
of the textbook to find the appropriate critical value from the
Recall that the expected frequencies (E) are computed
2 distribution and compare the test statistic to the critical
based on the assumption that H0 is true. The data for the test
value to draw a conclusion.
are entered into an Excel worksheet, as shown in Figure 7–7.
Excel does not have a specific analysis tool to perform the
The sample data (i.e., the numbers of students in each response
2 goodness-of-fit test. However, it does have a CHIDIST func-
category) are the observed frequencies. The total sample size
tion, which can be used to produce p-values. The CHIDIST
is computed using the SUM function and is shown in cell B6.
function is used as “CHIDIST(2, df )”. To use the CHIDIST
function, we specify the test statistic, 2, and the degrees of
freedom, df. For the 2 goodness-of-fit test, df  k  1, where
k represents the number of response categories. The CHIDIST FIGURE 7–7 Data for Chi-Square Goodness-of-Fit Test
function returns the area in the right tail of the distribution,
which is the p-value for the  2 goodness-of-fit test. We now use
Excel to conduct a goodness-of-fit test.
Example 7.4. In Example 7.6 of the textbook, we analyzed
a university’s survey of its graduates, in which demographic
and health information were collected for future planning pur-
poses. In response to a question on regular exercise, 60% of
all graduates reported getting no regular exercise, 25% reported
exercising sporadically, and 15% reported exercising regularly
as undergraduates. The next year, the university launched a
health-promotion campaign on campus in an attempt to in-
crease healthy behaviors among undergraduates and conducted
another survey that was completed by 470 graduates. The data
shown in Table 7–1 were collected. Based on the data, is there
evidence of a shift in the distribution of responses to the

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch07_079_100.qxd 3/23/11 3:42 PM Page 84

84 Hypothesis Testing Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
We compute the expected frequencies by multiplying the pected proportions and expected frequencies are shown in
hypothesized or expected proportions in each response Figure 7–8.
category (from H0) by the total sample size. The expected We now compute (O  E)2 / E in each response category
proportions are first entered into the Excel worksheet in col- and sum to produce the 2 statistic. The 2 test statistic is
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
umn C. We then multiply the expected proportions in col- shown in cell E6. The p-value is computed with the CHIDIST
NOT FOR SALE OR DISTRIBUTION
umn C by the sample size to produce the expected
NOT FOR SALE OR DISTRIBUTION
function using “CHIDIST(E6,2)”, where “2” reflects the de-
frequencies. For example, the expected frequency in cell D2 grees of freedom (df  k  1  3  1  2). The test statistic
is computed using “C2*$B$6”. (Notice that we use the ab- and p-value are shown in Figure 7–9.
solute cell address for the total sample size so that the same In this test, the test statistic is 2  8.46 and we reject H0
© Jones
value is used to compute& Bartlett Learning,
each expected frequency.)LLC
The ex- because p  0.0146 © Jones
 0.05. &We Bartlett Learning,
have statistically LLC
significant
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 7–8 Expected Frequencies


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION

FIGURE 7–9 Test Statistic and p-Value

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch07_079_100.qxd 3/23/11 3:42 PM Page 85

Tests with Two Independent Samples, Continuous Outcome 85

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
evidence to show that the distribution of responses is not 0.60, Is there statistical evidence of a difference in mean systolic
0.25, 0.15. blood pressures between treatments? We now run the test using
Excel.
7.4 TESTS WITH TWO INDEPENDENT The hypotheses are:
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
SAMPLES, CONTINUOUS OUTCOME
NOT FOR SALE OR DISTRIBUTION NOT
H :  FOR
 SALE OR DISTRIBUTION
For a two-independent-samples test of hypothesis with a con- 0 1 2
tinuous outcome, the hypotheses are as follows:
H1: 1  2
H0: 1  2
© Jones & Bartlett Learning, LLC   0.05Learning, LLC
© Jones & Bartlett
NOT FOR  2, HOR
H1: 1SALE 2, or H1: 1  2
1: 1 DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
Because the sample sizes are small (both n1 30 and n2 30),
where 1 and 2 are the means of the two independent pop- X1 − X 2
ulations of interest. the appropriate test statistic is t  . Excel has
S p 1 /n1 + 1 / n2
In Chapter 7 of the textbook, we presented the following an analysis tool to perform a two-independent-samples test of
© Jones & Bartlett Learning,
formulas for LLC
test statistics: © Jonesmeans
& Bartlett Learning, LLC
in its Data Analysis ToolPak. We first enter the data into
NOT FOR SALE OR DISTRIBUTION X1 − X 2 NOT FOR SALE
an Excel OR DISTRIBUTION
worksheet, as shown in Figure 7–10.
n1  30 and n2  30: z
S p 1 /n1 + 1 / n2 Under the Tools/Data Analysis option, we choose the
(Find critical value of z in Table 1C) t Test: Two-Sample Assuming Equal Variances analysis tool,
as shown in Figure 7–11. Excel offers other options to per-
© Jones & Bartlett Learning, LLCform a two-independent-samples test for the equality of
X1 − X 2
n1 30 or n2 30:NOT t FOR SALE OR DISTRIBUTION means that do not assume that the variances are equal and
S p 1 /n1 + 1 / n2 thus would not involve Sp. In Chapter 7 of the textbook, we
presented guidelines for using the formulas that assume equal
(Find critical value of t in Table 2, df  n1 + n2  2) variances. Once we click OK, Excel presents the dialog box
shown in Figure 7–12.


2
(n11)s12 (n21)s2
where Sp 
n1 n22

When performing tests of hypothesis by hand, we com-


FIGURE 7–10 Data for Two-Independent-
pute the sample sizes, means, and standard deviations in each Samples Test for Difference in Means
sample. We then compute the pooled estimate of the common
standard deviation, Sp, and the test statistic. We then use Table
1C or Table 2 in the Appendix of the textbook to find the ap-
propriate critical values of z or t and compare the test statistic
to the critical value to draw a conclusion. We now use Excel to
conduct the test of hypothesis.
Example 7.5. A clinical trial is run to compare an experi-
mental drug to a placebo for its effectiveness in lowering systolic
blood pressure. A total of 18 participants are enrolled in the
study and randomly assigned to receive either the experimen-
tal drug or placebo. After 6 weeks on the assigned treatment,
each patient’s systolic blood pressure is measured and the data
are shown here:

Experimental drug 125 130 135 121 140 137 129 145 115

Placebo 145 140 132 129 145 150 160 140 120

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch07_079_100.qxd 3/23/11 3:42 PM Page 86

86 Hypothesis Testing Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 7–11 Two-Independent-Samples Test Using Data
Analysis Tool

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION

FIGURE 7–12 Specifications for Test

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


In the dialog box, we specify the range of the data for each 7.5 TESTS WITH MATCHED SAMPLES,
group. The data for variable 1 (experimental drug) is in cell A1 CONTINUOUS OUTCOME
through cell A10 and the data for variable 2 (placebo) is in cell For a two-dependent-samples test of hypothesis with a con-
B1 through cell B10. Because we included the first row of la- tinuous outcome, the hypotheses are as follows:
bels (A1 and B1), we click the Labels box. We then specify the
difference in means in the Hypothesized Mean Differences H0: d  0
input field. For most situations, the difference is zero. We then
specify the level of significance,   0.05, and specify a loca- H1: d  0, H1: d 0, or H1: d  0
tion for the results. Finally, we request that Excel place the re-
sults in the current worksheet and we specify the top-left corner where d is the mean difference of the two dependent,
of the results table as E1. The results are shown in Figure 7–13. matched, or paired populations.
The mean systolic blood pressure for patients on the ex- In Chapter 7 of the textbook, we presented the following
perimental drug is 130.8 as compared to 140.1 for patients on formulas for test statistics:
the placebo. The estimate of the pooled variance is Sp2 116.3,
and the test statistic is t  1.84. The two-sided p-value is X d − md
n  30: z  (Find critical value of z in
0.085, and thus we do not have significant evidence to show sd / n Table 1C)
that there is a difference in mean blood pressures between
treatments because p  0.085    0.05.
X d − md
For two-independent-samples tests of means, as long as n  30: t (Find critical value of t in
sd / n Table 2, df  n  1)
the variances are assumed to be equal, the t Test: Two Sample
Assuming Equal Variances analysis tool can be used, regard-
less of the sample sizes (n1 and n2). If the sample sizes are large, To perform the test of hypothesis by hand, we first compute
Excel makes the appropriate adjustments for larger samples difference scores, and then the sample size and mean and stan-
(essentially, uses a z statistic) and produces an appropriate test dard deviation of the difference scores in the sample. We then
statistic and p-value. The test statistic is always labeled t, even compute the test statistic and determine the appropriate critical
for large samples. values of z or t from Table 1C or Table 2 in the Appendix of the
95313_Ch07_079_100.qxd 3/23/11 3:43 PM Page 88

88 Hypothesis Testing Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
Example 7.6. In Example 7.11 of the textbook, we evalu-
TABLE 7–2 Data from Cholesterol Study ated the efficacy of a new drug for lowering cholesterol. Fifteen
patients had a pretreatment or baseline total cholesterol level
measured, and then after taking the drug for 6 weeks, each
© Jones
Subject Identification Number & Bartlett
Baseline Learning, LLC
6 Weeks © Jones & Bartlett Learning, LLC
patient’s total cholesterol level was measured again. The data
1 NOT FOR SALE OR DISTRIBUTION
215 205
are shown in Table 7–2.
NOT FOR SALE OR DISTRIBUTION
2 190 156
3 230 190 The hypotheses are as follows:
4 220 180
5 214 201
H0: d  0
© Jones
6 & Bartlett Learning,
240 LLC
227 © Jones &
H1:  Bartlett
0 Learning, LLC
d
NOT FOR SALE OR DISTRIBUTION
7 210 197 NOT FOR SALE OR DISTRIBUTION
8 193 173   0.05
9 210 204
10 230 217 Because the sample size is small, the appropriate test statistic is
11 180 142 X − md
t d .
© Jones & Bartlett
12 Learning, LLC260 262 © Jones sd / n& Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
13 210 207 NOTWe FOR SALE
first enter theOR
data DISTRIBUTION
into an Excel worksheet, as shown
14 190 184
in Figure 7–14. Under the Tools/Data Analysis option, we
15 200 193
choose the t Test: Paired Two-Sample For Means analysis tool
as shown in Figure 7–15. Once we click OK, Excel presents the
© Jones & Bartlett Learning, dialog
LLC box shown in Figure 7–16.
In the dialog box, we specify the range of the data for each
NOT
textbook, and compare the test FOR
statistic SALE
to the OR
critical DISTRIBUTION
value to
draw a conclusion. measurement. The first measurement on each participant
Excel has a procedure to perform a two-dependent- (variable 1) is the baseline measurement, and these values are
samples test of means in its Data Analysis ToolPak called the in cell B1 through cell B16. The second measurement on each
t Test: Paired Two Sample for Means. We use this procedure participant (variable 2) is the measurement taken at 6 weeks
to perform the test. and these values are in cell C1 through cell C16. Because we
included the first row of labels (B1 and C1), we click the Labels
box. We then specify the difference in means in the
FIGURE 7–14 Data for Test of Mean Hypothesized Mean Differences input field. For most situa-
Difference tions, the mean difference is zero. We then specify the level of
significance,   0.05, and specify a location for the results. In
Figure 7–16, we specify the top-left corner of the results table
as E1. The results are shown in Figure 7–17.
The mean cholesterol level at baseline is 212.8 and the
mean cholesterol level at 6 weeks is 195.9. The two-dependent-
samples test is based on difference scores (see Chapter 7 of
the textbook for details). The test statistic is t  4.63 and the
one-sided p-value is 0.0002. We reject H0 because p  0.0002
  0.05. We have statistically significant evidence at  
0.05 to show that there is a reduction in cholesterol levels over
6 weeks. Notice that Excel does not produce summary statis-
tics on the difference scores (i.e., 
Xd, sd). However, these values
are used in the computation of the test statistic.

7.6 TESTS WITH TWO INDEPENDENT


SAMPLES, DICHOTOMOUS OUTCOME
For a two-independent-samples test of a hypothesis with a
dichotomous outcome, the hypotheses are as follows:

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch07_079_100.qxd 3/23/11 3:43 PM Page 89

Tests with Two Independent Samples, Dichotomous Outcome 89

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 7–15 Two-Matched-Samples Test Data Analysis


Tool

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION

FIGURE 7–16 Specifications for Test

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch07_079_100.qxd 3/23/11 3:43 PM Page 90

90 Hypothesis Testing Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 7–17 Results of Test for Mean Difference

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION

H0: p1 p2 trial and were randomly assigned to receive either the new pain
reliever or the standard pain reliever following surgery, and
H1: p1  p2, H1: p1 p2, or H1: p1  p2 were blind to the treatment assignment. Before receiving the as-
where p1 and p2 are the proportions of successes in the two signed treatment, patients were asked to rate their pain on a
populations of interest. scale of 0 to 10, with higher scores indicative of more pain.
In Chapter 7 of the textbook, we presented the following Each patient was then given the assigned treatment and after
test statistic: 30 minutes was again asked to rate their pain on the same scale.
The primary outcome was a reduction in pain of 3 or more
pˆ1 − pˆ 2 scale points (defined by clinicians as a clinically meaningful
z (Find critical value of z reduction). The data in Table 7–3 were observed in the trial.
pˆ1 (1 − pˆ )(1 / n1 + 1 / n2 ) in Table 1C) We use Excel to test whether there is a statistically signif-
icant difference in the proportions of patients reporting a
where p̂1 is the proportion of successes in sample 1, p̂2 is the meaningful reduction (i.e., a reduction of 3 or more scale
proportion of successes in sample 2, and p̂ is the proportion of points). The hypotheses are as follows:
x1 + x 2
successes in the pooled sample, p̂  . Excel does not
n1 + n2
have a specific analysis tool to perform this test. We instead TABLE 7–3 Data from Clinical Trial
use Excel to compute the z statistic and the p-value.
Example 7.7. In Example 7.13 of the textbook, we ana-
lyzed data from a randomized trial designed to evaluate the Reduction of 3 Points
effectiveness of a newly developed pain reliever as compared to Treatment Group n Number Proportion
a standard pain relievers in reducing pain in patients follow- New pain reliever 50 23 0.46
Standard pain reliever 50 11 0.22
ing joint replacement surgery. A total of 100 patients under-
going joint replacement surgery agreed to participate in the

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch07_079_100.qxd 3/23/11 3:43 PM Page 91

Tests with Two Independent Samples, Dichotomous Outcome 91

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
H0: p1  p2 viding the numbers of successes (column C) by the sample
sizes (column B) in each group.
H1: p1  p2 Before computing the test statistic, z, we need to com-
© Jones  0.05& Bartlett Learning, LLCpute the overall proportion. This is placed in cell D5 and is
© Jones & Bartlett Learning, LLC
computed as “(C2+C3)/(B2+B3)”. We next compute the
NOT FOR SALE OR pˆ1 −DISTRIBUTION
pˆ 2 NOT FOR SALE OR DISTRIBUTION
The appropriate test statistic is z  . test statistic, z, and place it in cell D7. The formula is
ˆp1 (1 − pˆ )(1 / n1 + 1 / n2 ) “(D2D3)/SQRT(D5*(1D5)*(1/B2+1/B3))”. The overall
proportion and z statistic are shown in Figure 7–19. The last step
We now use Excel to compute the test statistic and the involves computing the two-sided p-value using the NORMS-
© Jones
p-value. & Bartlett
The data are enteredLearning, LLC as shown
into an Excel worksheet, © Jones
DIST function & Bartlett
as “2*(1 Learning, LLCThe p-
NORMSDIST(ABS(z)))”.
inNOT
FigureFOR
7–18. SALE
The sampleOR DISTRIBUTION
proportions are computed by di- value is NOT FOR SALE OR DISTRIBUTION
shown in Figure 7–20.

FIGURE 7–18 Data for Two-Independent-Samples Test


© Jones & Bartlett Learning, LLC for Difference in Proportions
© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC


NOT FOR SALE OR DISTRIBUTION

FIGURE 7–19 Test Statistic

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch07_079_100.qxd 3/23/11 3:43 PM Page 92

92 Hypothesis Testing Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 7–20 p-Value

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

In this test, the test statistic is z  2.53 and we reject H0 TABLE 7–4 Data from Clinical Trial
because p  0.011   0.05. We have statistically signif-
icant evidence at   0.05 to show that there is a difference
in the proportions of patients on the new pain reliever re- Low-Calorie Low-Fat Low-Carbohydrate Control
porting a meaningful reduction (i.e., a reduction of 3 or 8 2 3 2
more scale points) as compared to patients on the standard 9 4 5 2
pain reliever. 6 3 4 1
7 5 2 0
3 1 3 3
7.7 TESTS WITH MORE THAN TWO
INDEPENDENT SAMPLES, CONTINUOUS
OUTCOME: ANALYSIS OF VARIANCE
where nj is the sample size in the jth group, X j is the sample
In analysis of variance, the hypotheses are as follows:
mean in the jth group, and X  is the overall mean. k represents
H0: 1  2  ...  k the number of independent groups (k  2), and N represents
the total number of observations in the analysis.
H1: Means are not all equal Example 7.8. In Example 7.14 of the textbook, we ana-
lyzed data from a clinical trial comparing four weight-loss
where j is the mean in the jth group and k is the number of programs. The outcome of interest was weight loss, defined as
independent comparison groups. the difference in weight at the start of the study (baseline) and
In Chapter 7 of the textbook, we presented the test statis- weight at the end of the study (8 weeks), in pounds. A total of
tic for analysis of variance as: 20 patients agreed to participate in the study and were ran-
domly assigned to one of the four diet groups. The data are
j  X
nj(X )2⁄(k  1) shown in Table 7–4.
F  (X  Xj)2⁄(N  k) We use Excel to conduct an ANOVA to test whether there
is a statistically significant difference in the mean weight loss
(Find critical value in Table 4, df1  k  1, df2  N  k) among the four diets. Excel has an analysis tool to perform an

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch07_079_100.qxd 3/23/11 3:43 PM Page 93

Tests with More Than Two Independent Samples, Continuous Outcome: Analysis of Variance 93

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 7–21 Data for ANOVA

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 7–22 ANOVA Data Analysis Tool

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


Tests for Two or More Independent Samples, Categorical and Ordinal Outcomes: The Chi-Square Test of Independence 95

FIGURE 7–24 Results of ANOVA

where O  observed frequency (i.e., sample data), E  expected


TABLE 7–5 Data from University Survey
frequency in each of the cells of the table, r  the number of rows
in the two-way table, and c  the number of columns in the two-
way table (where r and c correspond to the number of compari- No Regular Sporadic Regular
son groups and the number of response options in the outcome). Exercise Exercise Exercise Total
When performing the 2 test of independence by hand, we Dormitory 32 30 28 90
compute the expected frequencies for each cell and then com-
pute the test statistic. We then use Table 3 in the Appendix of
the textbook to find the appropriate critical value from the 2
distribution and compare the test statistic to the critical value
to draw a conclusion.
Excel does not have a specific analysis tool to perform the
 test of independence. We will use the CHIDIST function to
2

produce p-values. We used this function for the 2 goodness- tween exercise and students’ living arrangements. The data
of-fit test as “CHIDIST(2, df )”. are shown in Table 7–5.
Again, to use the CHIDIST function we specify the test The hypotheses are as follows:
statistic, 2, and the degrees of freedom, df. For the 2 test of
independence, df  (r  1)  (c  1). The CHIDIST function H0: Living arrangement and exercise are independent
returns the area in the right tail of the distribution, which is the
p-value for the 2 test of independence. We now use Excel to H1: H0 is false
conduct a test of independence.
Example 7.9. In Example 7.4 of the Excel workbook, we  0.05
examined data from a survey of university graduates that (OE)2
assessed, among other things, how frequently students exer- The appropriate test statistic is 2   .
E
cised. Here we want to test whether there is a relationship be-
95313_Ch07_079_100.qxd 3/23/11 3:43 PM Page 97

Practice Problems 97

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
We place a “$” in front of the column to freeze the column In this test, the test statistic is 2  60.44 and the p-value
address on row totals that are contained in column E. We is practically zero (3.66  10 11 ). We reject H 0 because
do the same for the column total, B$6, except we freeze p  0    0.05. We have statistically significant evidence
the row address on row 6, which contains the column totals. at   0.05 to show that H0 is false, or that living arrange-
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
The total sample size is in cell $E$6. When we copy this ment and exercise are not independent (i.e., they are de-
NOT FOR SALE OR DISTRIBUTION
formula from cell H2 to cell H3, for example, the formula is pendent or related).
NOT FOR SALE OR DISTRIBUTION
updated to “$E3*B$6/$E$6”. The sums of the expected fre-
quencies across rows and down columns are equal to the
sums of the observed frequencies across rows and down 7.9 PRACTICE PROBLEMS
© Jones & Bartlett Learning, LLC
columns. 1. © Jones
Data are & Bartlett
collected Learning,
in a clinical LLC a new
trial evaluating
NOT FOR SALE OR DISTRIBUTION
We now compute (O  E) 2 / E in each cell of the table.
NOT FOR SALE OR DISTRIBUTION
compound designed to improve wound healing in
Once we compute these, we sum to produce the 2 statistic. trauma patients. The new compound is compared
The (O  E)2 / E values for each cell are shown in Figure 7–27. against a placebo. After treatment for 5 days with the
For example, in cell F9 the formula is “(B2H2)^2/H2”. new compound or placebo, the extent of wound
When we copy this formula to the other cells in the bottom healing is measured and the data are shown in
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
table, the cell references are automatically updated (e.g., the Table 7–6. Is there a difference in the extent of wound
NOT FOR SALE
formulaOR DISTRIBUTION
in cell H12 is “(D5J5)^2/J5”). NOT FOR SALE ORbyDISTRIBUTION
healing treatment? (Hint: Are treatment and the
The  test statistic is computed by summing the (O  E) / E
2 2 percent wound healing independent?) Run the
values in the twelve cells. The test statistic is placed in cell appropriate test at a 5% level of significance.
G14 and is computed using “SUM(F9:H12)”. The p-value 2. Use the data in Problem 1 and pool the data across
is computed with the CHIDIST function using “CHIDIST the treatments into one sample of size n  250.
(G14,6)”, where “6” reflects the degrees of freedom, df  (r  Use the pooled data to test whether the distribution
1)  (c  1)  3(2)  6. The test statistic and p-value are of the percent wound healing is approximately nor-
shown in Figure 7–28. mal. Specifically, use the following distribution: 30%,

FIGURE 7–27 Computing the Test Statistic

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch07_079_100.qxd 3/23/11 3:43 PM Page 98

98 Hypothesis Testing Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 7–28 Test Statistic and p-Value

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

fetal heart rates by position? Run the test at a 5%


40%, 20%, and 10% and   0.05 to run the appro-
level of significance.
priate test.
4. A clinical trial is conducted comparing a new pain
3. Data are collected in an experiment designed to in-
reliever for arthritis to a placebo. Participants are
vestigate the impact of different positions of the
randomly assigned to receive the new treatment or a
mother during ultrasound on fetal heart rate. Fetal
placebo, and the outcome is pain relief within
heart rate is measured by ultrasound in beats per
30 minutes. The data are shown in Table 7–8. Is there
minute. The study includes 20 women who are as-
a significant difference in the proportions of patients
signed to one position and have the fetal heart rate
reporting pain relief? Run the test at a 5% level of
measured in that position. Each woman is between
significance.
28 weeks and 32 weeks gestation. The data are shown
in Table 7–7. Is there a significant difference in mean

TABLE 7–7 Data for Practice Problem 3


TABLE 7–6 Data for Practice Problems 1 and 2
Back Side Sitting Standing
140 141 144 147
Percent Wound Healing
144 143 145 145
Treatment 0–25 26–50 51–75 76–100
146 145 147 148
New compound 15 37 32 41
(n  125) 141 144 148 149
Placebo(n  125) 36 45 34 10 139 136 144 145

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch07_079_100.qxd 3/23/11 3:43 PM Page 99

Practice Problems 99

6. A hypertension trial is mounted and 12 participants


TABLE 7–8 Data for Practice Problem 4 are randomly assigned to receive either a new med-
ication or a placebo. Each participant takes the as-
Pain Relief No Pain Relief signed medication and their systolic blood pressure
New medication 44 76
(SBP) is recorded after 6 months on the assigned
Placebo 21 99 medication. The data are shown in Table 7–9. Is there
a difference in mean SBP between treatments? Run
the appropriate test at   0.05.

5. A clinical trial is planned to compare an experi-


mental medication designed to lower blood pressure TABLE 7–9 Data for Practice Problem 6
to a placebo. Before starting the trial, a pilot study is
conducted involving 7 participants. The objective
of the study is to assess how systolic blood pressure Placebo New Medication
changes over time untreated. Systolic blood pres- 134 114
sures are measured at baseline and again 4 weeks 143 117
later. Is there a statistically significant difference in 148 121
142 124
blood pressures over time? Run the test at a 5% level 150 122
of significance. 160 128

Baseline: 120 145 130 160 152 143 126

4 Weeks: 122 142 135 158 155 140 130


95313_Ch07_079_100.qxd 3/23/11 3:43 PM Page 100

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


In Chapter 8 in the textbook, we presented various formulas to
determine the sample size for statistical inference. In applica-
tions where the goal is to generate a confidence interval esti-
mate for an unknown parameter, the sample size is computed
to ensure that the margin of error is sufficiently small. In ap-
plications where the goal is to perform a test of hypothesis,
the sample size is computed to ensure that the test has a high
95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 102

102 Power and Sample Size Determination

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 8–1 Z value for 95% Confidence

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION 0.95
NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC 0.025 © Jones & Bartlett0.025Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
-3 -2 -1 0 1 2 3

FIGURE 8–2 Data to Estimate Sample Size

argument for the NORMSINV function is “(1(1C2)/2)”. determine the number of subjects required for the study, we
This returns the value 1.96, which is the z value for a 95% must round up. This is done using Excel’s ROUNDUP func-
confidence interval. The sample size is computed using the tion. The ROUNDUP function is invoked as “ROUNDUP
preceding formula implemented in Excel as “(D2*B2/A2)^2”. (number to round, number of decimal places)”. For sample size
The result is in cell E2 and is shown in Figure 8–3. computations, we round the value produced by the formula to
Recall that the sample size formula always produces the the nearest integer (i.e., zero decimal places). The sample size
minimum number of subjects required to ensure that the required for the study is shown in Figure 8–4. To ensure that
confidence interval has a margin of error not exceeding E. To the 95% confidence interval estimate of the mean systolic

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 103

Sample Size Estimates for Confidence Intervals with a Dichotomous Outcome in One Sample 103

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 8–3 Determining the Sample Size Required

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 8–4 Sample Size Required for Study

blood pressure in children between the ages of 3 and 5 years size 385 is needed. If the standard deviation is 15, a sample of
with congenital heart disease is within 5 units of the true mean, size 217 is needed. It is extremely important to accurately es-
a sample of size 62 is needed. timate the standard deviation, as it can dramatically affect the
Once the Excel formulas are programmed, other scenar- sample size.
ios can be considered. For example, suppose we wish to con-
sider other margins of error (e.g., E  5, 4, 3, 2) and other
standard deviations (e.g., 20 and 15). The sample sizes for these 8.2 SAMPLE SIZE ESTIMATES FOR
other scenarios are determined in Figure 8–5 by copying the CONFIDENCE INTERVALS WITH A DICHOTOMOUS
formulas from cell D2 through cell F2 to cell D3 through cell OUTCOME IN ONE SAMPLE
F9. The sample sizes are shown in Figure 8–5. In Chapter 8 of the textbook, we presented the following for-
If the standard deviation is 20, then to ensure that a 95% mula to estimate the proportion of successes in a dichotomous
confidence interval estimate of the mean systolic blood pres- outcome variable in a single population:
sure in children between the ages of 3 and 5 with congenital
z

2
heart disease is within 2 units of the true mean, a sample of n  p(1p) 
E

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 104

104 Power and Sample Size Determination

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 8–5 Sample Size Estimates for Various Scenarios

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

where z is the value from the standard normal distribution 8.3 SAMPLE SIZE ESTIMATES FOR
reflecting the confidence level that will be used (e.g., z  CONFIDENCE INTERVALS WITH A CONTINUOUS
1.96 for 95%), E is the desired margin of error, and p is OUTCOME IN TWO INDEPENDENT SAMPLES
the proportion of successes in the population. If there is
In Chapter 8 of the textbook, we presented the following for-
no information available to approximate p, then p  0.5
mula to estimate the sample size required to estimate the dif-
can be used to generate the most conservative, or largest,
ference in means in two independent populations:
sample size.
Example 8.2. In Example 8.3 of the textbook, we deter- z
 
2
mined the sample size required to estimate the proportion of n i  2 
E
freshmen at a university who currently smoke (i.e., the preva-
lence of smoking). The investigator wanted to ensure that a where ni is the sample size required in each group (i  1,2),
95% confidence interval estimate of the proportion of fresh- z is the value from the standard normal distribution reflecting
men who smoke was within 5% of the true proportion. No the confidence level that will be used (e.g., z  1.96 for 95%),
information was available on the prevalence of smoking, thus and E is the desired margin of error.  again reflects the
p  0.5 was used. standard deviation of the outcome variable. Recall from
The margin of error (E  0.05), proportion (p  0.5), Chapter 6 in the textbook, when we generated a confidence
and confidence level are input into Excel as shown in Figure interval estimate for the difference in means, we used Sp, the
8–6. The z value is estimated using the NORMSINV func- pooled estimate of the common standard deviation, as a mea-
tion, as shown in Figure 8–6. Recall that the argument for sure of variability in the outcome (where Sp is computed as
the NORMSINV function is the area in the lower tail of the (n1 − 1)s12 + (n2 − 1)s22 ⎞
Sp  ⎟ . If data are available on varia-
standard normal curve (Figure 8–1). The sample size is com- n1 + n2 − 2 ⎠
puted using the formula shown implemented here in Excel as ability of the outcome in each comparison group, then Sp can
“B2*(1B2)*(D2/A2)^2”. The result is in cell E2 and is be computed and used in the sample size formula. However, it
shown in Figure 8–7. is more often the case that data on the variability of the out-
The final step is to round up to the next integer using the come are available from only one group, often the untreated
ROUNDUP function. The sample size required for the study (e.g., placebo/control) or unexposed group. This value can be
is shown in Figure 8–8. To ensure that a 95% confidence in- used to determine the sample sizes.
terval estimate of the proportion of freshmen who smoke is Example 8.3. In Example 8.6 of the textbook, we
within 5% of the true proportion, a sample of size 385 is determined the sample sizes required to compare two diet
needed. programs in obese children. The plan is to enroll children

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 105

Sample Size Estimates for Confidence Intervals with a Continuous Outcome in Two independent Samples 105

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 8–6 Data to Estimate Sample Size

FIGURE 8–7 Determining the Sample Size Required

FIGURE 8–8 Sample Size Required for Study

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 106

106 Power and Sample Size Determination

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
and weigh them at the start of the study. Each child will then Samples of size n1  57 and n2  57 will ensure that the
be randomly assigned to one of the competing diets (low-fat 95% confidence interval for the difference in weight lost between
or low-carbohydrate) and followed for 8 weeks, at which time diets will have a margin of error of no more than 3 pounds. (Note
they will again be weighed. The number of pounds lost will that in Chapter 8 of the textbook, we estimated the sample size
be computed for each child. A 95% confidence interval will at 56 per group because we carried only 2 decimal places in the
be estimated to quantify the difference in weight lost between by-hand computations. Excel carries more decimal places and
the two diets, and the investigator would like the margin of therefore rounding up produces sample sizes of 57 per group.)
error to be no more than 3 pounds. Based on adult studies,
the common standard deviation was estimated at 8.1 pounds. 8.4 SAMPLE SIZE ESTIMATES FOR
The margin of error, standard deviation, and confidence CONFIDENCE INTERVALS WITH A CONTINUOUS
level are input into an Excel worksheet. The z value is esti- OUTCOME IN MATCHED SAMPLES
mated using the NORMSINV function as shown in Figure 8–9.
In Chapter 8 of the textbook, we presented the following for-
The sample size required per group is computed using the
mula to estimate the sample size required to estimate the mean
formula shown here implemented in Excel as “2*(D2*B2/
difference of a continuous outcome variable in two matched
A2)^2”. The result is in cell E2 and is shown in Figure 8–10. The
populations:
final step is to round up to the next integer using the
z
 
ROUNDUP function. The sample size required in each group 2
n  d
for the study is shown in Figure 8–11. E

FIGURE 8–9 Data to Estimate Sample Size

FIGURE 8–10 Determining the Sample Size per Group Required

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 107

Sample Size Estimates for Confidence Intervals with a Continuous Outcome in Matched Samples 107

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 8–11 Sample Size per Group Required for Study

where z is the value from the standard normal distribution re- between diets is within 3 units of the true mean difference.
flecting the confidence level that will be used (e.g., z  1.96 for Suppose that the standard deviation of the difference in weight
95%), E is the desired margin of error, and d is the standard de- loss between a low-fat diet and a low-carbohydrate diet is ap-
viation of the difference scores (e.g., the difference based on meas- proximately 9.1 lbs. based on a cross-over trial conducted in
urements over time or the difference between matched pairs). It adults.
is extremely important that the standard deviation of the differ- The margin of error, standard deviation of the differences
ence scores is used here to appropriately estimate the sample size. in weights, and the confidence level are input into an Excel
Example 8.4. Consider again the diet study proposed in worksheet. The z value is estimated using the NORMSINV
Example 8.3 of the Excel workbook (and in Example 8.7 in the function as shown in Figure 8–12. The sample size required is
textbook). The investigator considers an alternative design, a computed using the formula shown here implemented in Excel
crossover trial, where each participant will follow each diet for as “(D2*B2/A2)^2”. The result is in cell E2 and is shown in
8 weeks. At the end of each 8-week period, the weight lost dur- Figure 8–13.
ing that period will be measured. The difference in weight lost on The final step is to round up to the next integer using the
the low-fat diet and the low-carbohydrate diet will be computed ROUNDUP function. The sample size required for the study
for each child and a confidence interval for the mean difference is shown in cell F2 in Figure 8–13. To ensure that the 95% con-
in weight lost will be computed. The investigator wants to fidence interval estimate of the mean difference in weight lost
determine the sample size required to ensure that a 95% confi- between diets is within 3 units of the true mean, a sample of
dence interval estimate of the mean difference in weight lost size 36 children is needed.

FIGURE 8–12 Data to Estimate Sample Size

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 108

108 Power and Sample Size Determination

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 8–13 Sample Size Required for Study

8.5 SAMPLE SIZE ESTIMATES FOR cancer. How many men should be enrolled in the study
CONFIDENCE INTERVALS WITH A DICHOTOMOUS to ensure that the 95% confidence interval for the difference
OUTCOME IN TWO INDEPENDENT SAMPLES in proportions has a margin of error of no more than 5%?
Estimates of the incidence of prostate cancer from a previ-
In Chapter 8 of the textbook, we presented the following for-
ous study were used to design the current study: p1  0.34
mula to estimate the difference in proportions between two
and p2  0.17.
independent populations (i.e., to estimate the risk difference):
The margin of error, estimates of proportions, and the
z confidence level are input into an Excel worksheet. The z value

2
ni  p1(1p1)  p2(1p2)  is estimated using the NORMSINV function as shown in
E
Figure 8–14. The sample size required per group is computed
where ni is the sample size required in each group (i  1,2), using the formula shown here implemented in Excel as
z is the value from the standard normal distribution reflecting “(B2*(1B2)+C2*(1C2))*(E2/A2)^2”. The result is in cell
the confidence level that will be used (e.g., z  1.96 for 95%), F2 and is shown in Figure 8–15.
E is the desired margin of error, and p1 and p2 are the propor- The final step is to round up to the next integer using the
tions of successes in each comparison group. Again, here we are ROUNDUP function. The sample size required in each group
planning a study to generate a 95% confidence interval for the for the study is shown in Figure 8–16. Samples of size n1 
difference in unknown proportions, and the formula to esti- 562 men who smoke and n2  562 men who do not smoke will
mate the sample sizes needed requires p1 and p2. To estimate ensure that the 95% confidence interval for the difference in in-
the sample size, we need approximate values of p1 and p2. The cidence of prostate cancer will have a margin of error of no
values of p1 and p2 that maximize the sample size are p1  p2 more than 5%.
 0.5. Thus, if there is no information available to approximate
p1 and p2, then 0.5 can be used to generate the most conser- 8.6 ISSUES IN ESTIMATING SAMPLE SIZE FOR
vative, or largest, sample sizes. HYPOTHESIS TESTING
Example 8.5. In Example 8.9 in the textbook, an inves- In Chapter 8 of the textbook, we presented formulas to
tigator determined the sample size to estimate the impact determine the sample size required to ensure a specified
of smoking on the incidence of prostate cancer. Men who power in a test of hypothesis. Excel does not have an analy-
are free of prostate cancer will be enrolled at age 50 and sis tool to perform the computations, but the formulas can
followed for 30 years. The plan is to enroll approximately be programmed into Excel to determine the appropriate
equal numbers of smokers and nonsmokers in the study and sample sizes. The sample size formulas for hypothesis test-
to follow them prospectively for the outcome of interest, ing depend on the nature of the outcome variable (e.g., con-
a diagnosis of prostate cancer. The plan is to generate a 95% tinuous or dichotomous) and also the number of
confidence interval for the difference in proportions comparison groups involved (e.g., one, two independent, or
of smoking and nonsmoking men who develop prostate two matched).

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 109

Issues in Estimating Sample Size for Hypothesis Testing 109

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 8–14 Data to Estimate Sample Size

FIGURE 8–15 Determining the Sample Size per Group Required

FIGURE 8–16 Sample Size per Group Required for Study

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 110

110 Power and Sample Size Determination

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
All of the sample size formulas contain the following 95 mg/dl (the standard deviation was 9.8 mg/dl). Investi-
two terms: z1/2 and z1 where  is the probability of a gators wanted a sample size that would ensure 80% power to
Type I error or the specified level of significance (e.g., 0.05), detect a mean of 100 mg/dl. A two-sided test is planned with
 is the probability of a Type II error, and 1   is the a 5% level of significance.
specified power (e.g., 0.80, 0.90). z1/2 is the value from the Before we compute the sample size, we first must com-
standard normal distribution holding 1 /2 below it and pute the effect size. This is done by entering the mean under
z 1 is the value from the standard normal distribution the null hypothesis, the mean under the alternative hypothe-
holding 1 below it. sis, and the standard deviation into an Excel worksheet, as
The NORMSINV function is used to compute these val- shown in Figure 8–17.
ues. The NORMSINV function returns the value from the The effect size is shown in cell B7 and is computed as
standard normal distribution, z, which holds a specified area “ABS(B3B1)/B5”, where ABS is the Excel function to com-
below it (i.e., in the lower tail): “NORMSINV(lower-tail pute the absolute value of the difference in means under the
area)”. For example, if   0.05, then z1/2  z0.975 is com- null and alternative hypotheses. The next step is to compute the
puted by “NORMSINV(0.975)”. If power  0.80, then z0.80 z value for the selected level of significance (i.e., z1/2) and the
is computed by “NORMSINV(0.80)”. z value for the desired power (i.e., z1). We first enter the level
of significance, , and the desired power. This is shown in
8.7 SAMPLE SIZE ESTIMATES FOR TESTS Figure 8–18.
OF MEANS IN ONE SAMPLE Recall that the argument for the NORMSINV function
In Chapter 8 of the textbook, we presented a formula to de- is the area in the lower tail of the standard normal
termine the sample size required to ensure adequate power to curve (Figure 8–1). If a two-sided test is planned (which is
test the following hypotheses about the mean of a continuous generally the case for sample size planning) with a 5%
outcome variable in a single population: level of significance, the area in the lower tail is defined as
(1/2). Thus, we specify 1B9/2 as the argument to the
H0:   0 NORMSINV function as shown in Figure 8–18. z1 is de-
termined in the same way using “NORMSINV(B11)”. The
H1:   0 computations are shown in Figure 8–19. The next step is to
compute the sample size based on the effect size and the ap-
where 0 is the known mean (e.g., an historical control). The propriate z values for the selected  and power. This is shown
formula for determining sample size to ensure that the test has in Figure 8–20.
a specified power is given below:

z1/2  z1 FIGURE 8–17 Data to Estimate Sample


 
2
n   Size
ES
where  is the selected level of significance and z1/2 is the
value from the standard normal distribution holding 1  /2
below it. 1   is the selected power and z1 is the value from
the standard normal distribution holding 1   below it. ES
is the effect size, defined as follows:

|10|
ES   

where 1 is the mean under the alternative hypothesis, 0 is the
mean under the null hypothesis, and  is the standard devia-
tion of the outcome of interest.
Example 8.6. In Example 8.10 of the textbook, we de-
termined the sample size required to test whether the mean
blood glucose level in people who drink at least two cups
of coffee per day is different from the reported mean of

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 111

Sample Size Estimates for Tests of Means in One Sample 111

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 8–18 Level of Significance and Power

FIGURE 8–19 Computing z Values

Because the sample size formula always produces the done using Excel’s ROUNDUP function. The sample size re-
minimum number of subjects required to ensure that the test quired for the study is shown in Figure 8–21. A sample of size
has the specified power to detect the desired effect size at the n  31 will ensure that a two-sided test with   0.05 has
specified level of significance, to determine the number of 80% power to detect a 5-mg/dl difference in mean fasting
subjects required for the study, we must round up. This is blood glucose levels.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 112

112 Power and Sample Size Determination

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 8–20 Determining the Sample Size Required

FIGURE 8–21 Sample Size Required for Study

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 113

Sample Size Estimates for Tests of Differences in Means in Two Independent Samples 113

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
8.8 SAMPLE SIZE ESTIMATES FOR TESTS OF Example 8.7. In Example 8.13 in the textbook, we deter-
PROPORTIONS IN ONE SAMPLE mined the sample size required to test whether the percentage
of defective stents produced by a manufacturer in one of his
In Chapter 8 of the textbook, we presented a formula to deter-
plants was more than 10%. The manufacturer wanted the test
mine the sample size required to ensure adequate power to test
to have 90% power to detect an absolute difference in propor-
the following hypotheses about the proportion of successes in
tions of 0.05 (i.e., from 0.10 to 0.15 defectives). How many
a dichotomous outcome variable in a single population:
stents must be evaluated? A two-sided test will be used with a
H0: p  p0 5% level of significance.
Before we compute the sample size, we first must com-
H1: p  p0 pute the effect size. This is done by entering the proportion
under the null hypothesis and the proportion under the alter-
where p0 is the known proportion (e.g., an historical control). native hypothesis into an Excel worksheet, as shown in Figure
The formula for determining sample size to ensure that the 8–22.
test has a specified power is given below: The effect size is shown in cell B5 and is computed as
“ABS(B3B1)/SQRT(B1*(1B1))”, where ABS is the Excel
z1/2  z1
 
2
n   function to compute the absolute value of the difference in
ES proportions under the null and alternative hypotheses. The
where  is the selected level of significance and z1/2 is the next step is to compute the z value for the selected level of sig-
value from the standard normal distribution holding 1  /2 nificance (i.e., z1/2) and the z value for the desired power
below it. 1   is the selected power and z1 is the value from (i.e., z1). We first enter the level of significance, , and the
the standard normal distribution holding 1 below it. ES is desired power. We then use the NORMSINV function twice
the effect size, defined as: to compute z1/2 and z1. This is shown in Figure 8–23.
The next step is to compute the sample size based on the
|p1p0|
 effect size and the appropriate z values for the selected  and
ES  p0(1p0) power. This is shown in Figure 8–24. As the final step, we round
up to the next integer using the ROUNDUP function. The
where p0 is the proportion of successes under H0 and p1 is
sample size for the study is shown in Figure 8–25.
the proportion of successes under H1. The numerator of the
A sample of size n  379 stents will ensure that a two-sided
effect size, the absolute value of the difference in proportions
test with   0.05 has 90% power to detect a 5% difference in the
|p1  p0|, again represents what is considered a clinically mean-
proportion of defective stents produced. (When we computed
ingful or practically important difference in proportions.
the sample size by hand in the textbook, we determined that
n  364 stents were needed. The difference is because Excel is car-
rying more decimal places in the computations.)

FIGURE 8–22 Data to Estimate Sample 8.9 SAMPLE SIZE ESTIMATES FOR TESTS OF
Size DIFFERENCES IN MEANS IN TWO INDEPENDENT
SAMPLES
In Chapter 8 of the textbook, we presented a formula to de-
termine the sample size required to ensure adequate power to
test the following hypotheses about the difference in means in
two independent populations:

H0: 1  2
H1: 1  2
where 1 and 2 are the means in the two comparison popula-
tions. The formula for determining sample size required in each
group to ensure that the test has a specified power follows:
z1/2  z1
 
2
ni  2  
ES

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 114

114 Power and Sample Size Determination

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 8–23 Computing z Values

FIGURE 8–24 Determining the Sample Size


Required

|12|
where ni is the sample size required in each group (i  1,2),  ES   

is the selected level of significance, z1/2 is the value from the
standard normal distribution holding 1  /2 below it, 1   where |12| is the absolute value of the difference in means
is the selected power, and z1 is the value from the standard between the two groups representing what is considered a clin-
normal distribution holding 1   below it. ES is the effect ically meaningful or practically important difference in means.
size, defined as follows:  is the standard deviation of the outcome of interest. If data

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 115

Sample Size Estimates for Tests of Differences in Means in Two Independent Samples 115

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 8–25 Sample Size Required for Study

are available on variability of the outcome in each comparison


group, then Sp (the pooled estimate of the common standard FIGURE 8–26 Data to Estimate Sample
deviation) can be computed and used to generate the sample Size
sizes. However, it is more often the case that data on the vari-
ability of the outcome are available from only one group, usu-
ally the untreated (e.g., placebo/control) or unexposed group.
Example 8.8. In Example 8.14 in the textbook, we deter-
mined the sample sizes required for a clinical trial to evaluate
the efficacy of a new drug designed to reduce systolic blood
pressure. The plan was to enroll participants and to randomly
assign them to receive either the new drug or a placebo and to
measure systolic blood pressure in each participant after
12 weeks on the assigned treatment. Investigators indicated
that a 5-unit difference in mean systolic blood pressure would
represent a clinically meaningful difference. How many pa-
tients should be enrolled in the trial to ensure that the power
of the test is 80% to detect this difference? A two-sided test
is planned with a 5% level of significance and the standard
deviation is assumed to be 19.0, based on data from the some applications, the means under the null and alternative hy-
Framingham Heart Study. potheses are specified, in which case the difference is com-
We first compute the effect size based on the hypothesized puted and used as the numerator in the computation of the
difference in means under the alternative hypothesis and the effect size. We next enter the level of significance, , and the de-
standard deviation. The data are entered into an Excel work- sired power to compute z1/2 and the z1. This is shown in
sheet as shown in Figure 8–26. The effect size is shown in cell Figure 8–27. The next step is to compute the sample size per
B5 and is computed as “ABS(B1)/B3”. Notice that the hy- group based on the effect size and the appropriate z values for
pothesized difference in means is specified in Figure 8–26. In the selected  and power. This is shown in Figure 8–28.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 116

116 Power and Sample Size Determination

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 8–27 Computing z Values

FIGURE 8–28 Determining the Sample Size Required per Group

Finally, because the sample size formula always produces ROUNDUP function. The sample sizes required per group
the minimum number of subjects per group required to en- are shown in Figure 8–29.
sure that the test has the specified power to detect the de- Samples of size n1  227 and n2  227 will ensure that the
sired effect size at the specified level of significance, to test of hypothesis will have 80% power to detect a 5-unit dif-
determine the numbers of subjects per group required for ference in mean systolic blood pressures in patients receiving
the study, we must round up. This is done using Excel’s the new drug as compared to patients receiving the placebo.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 117

Sample Size Estimates for Tests of Mean Differences in Matched Samples 117

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 8–29 Sample Size Required per Group for Study

8.10 SAMPLE SIZE ESTIMATES FOR TESTS OF where d is the mean difference expected under the alter-
MEAN DIFFERENCES IN MATCHED SAMPLES native hypothesis, H1, and d is the standard deviation of the
In Chapter 8 of the textbook, we presented a formula to de- difference in the outcome (e.g., the difference based on mea-
termine the sample size required to ensure adequate power to surements over time or the difference between matched
test the following hypotheses about the mean difference in a pairs).
continuous outcome based on matched populations: Example 8.9. In Example 8.16 of the textbook, we gener-
ated sample size requirements for a crossover trial to compare
H0: d  0 two diet programs for their effectiveness in promoting weight
loss. The proposed study will have each child follow each diet
H1: d  0 for 8 weeks, and at the end of each 8-week period, the weight
lost during that period will be measured. The difference in
where d is the mean difference in the population. The formula weight lost between the diets will be computed for each child
for determining the sample size (i.e., number of participants, and the plan is to test if there is a statistically significant dif-
each of whom will be measured twice) required to ensure that ference in weight loss between the diets. How many children
the test has a specified power is as follows: are required to ensure that a two-sided test with a 5% level of
z1/2  z1 significance has 80% power to detect a mean difference of
 
2
n   3 pounds in weight lost between the two diets? Based on a pre-
ES
vious study, the standard deviation in the differences in weight
where  is the selected level of significance, z1/2 is the value loss is estimated at 9.1 pounds.
from the standard normal distribution holding 1  /2 below We first compute the effect size based on the hypothesized
it, 1   is the selected power, and z1 is the value from the mean difference between weight-loss programs and the stan-
standard normal distribution holding 1   below it. ES is the dard deviation of the differences in weight loss. The data are
effect size, defined as follows: entered into an Excel worksheet as shown in Figure 8–30.
 The effect size is shown in cell B5 and is computed as
ES  d
d “B1/B3”. We next enter the level of significance, , and the

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 118

118 Power and Sample Size Determination

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 8–30 Data to Estimate
Sample Size

FIGURE 8–31 Computing z Values

desired power to compute z1/2 and z1. This is shown in size at the specified level of significance, to determine
Figure 8–31. The next step is to compute the sample size based the numbers of subjects required for the study, we
on the effect size and the appropriate z values for the selected must round up. This is done using Excel’s ROUNDUP func-
 and power. This is shown in Figure 8–32. tion. The sample sizes required per group are shown in
Finally, because the sample size formula always produces Figure 8–33.
the minimum number of subjects required to ensure that A sample of size n  73 children will ensure that a two-
the test has the specified power to detect the desired effect sided test with   0.05 has 80% power to detect a mean dif-

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 119

Sample Size Estimates for Tests of Proportions in Two Independent Samples 119

FIGURE 8–32 Determining the Sample Size Required

FIGURE 8–33 Sample Size Required for Study

ference of 3 pounds between diets using a crossover trial (i.e., test the following hypotheses about the difference in propor-
each child will be measured on each diet). tions in two independent populations:

H0: p1  p2
8.11 SAMPLE SIZE ESTIMATES FOR TESTS OF
H1: p1  p2
PROPORTIONS IN TWO INDEPENDENT SAMPLES
In Chapter 8 of the textbook, we presented a formula to de- where p1 and p2 are the proportions in the two comparison
termine the sample size required to ensure adequate power to populations. The formula for determining sample size

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


120 Power and Sample Size Determination

required in each group to ensure that the test has a specified


power is as follows: FIGURE 8–34 Sample Proportions
z1/2  z1
 
2
ni  2  
ES

where ni is the sample size required in each group (i  1,2), 


is the selected level of significance, z1/2 is the value from the
standard normal distribution holding 1  /2 below it, 1  
is the selected power, and z1 is the value from the standard
normal distribution holding 1   below it. ES is the effect
size, defined as follows:
|p1p2|
ES   
p(1p)

where |p1  p2| is the absolute value of the difference in pro-


portions between the two groups expected under the alterna- FIGURE 8–35 Computing Overall
tive hypothesis, H1, and p is the overall proportion, based on Proportions
pooling the data from the two comparison groups. (p can be
computed by taking the mean of the proportions in the two
comparison groups, assuming that the groups will be of ap-
proximately equal size.)
Example 8.10. In Example 8.18 of the textbook, we
determined the sample size needed for a clinical trial pro-
posed to evaluate the efficacy of a new drug designed to re-
duce systolic blood pressure. The primary outcome is
diagnosis of hypertension (true/false), defined as a systolic
blood pressure above 140 or a diastolic blood pressure above 90.
In planning the trial, investigators hypothesized that 30% of
the participants would meet the criteria for hypertension in
the placebo group and that the new drug would be considered
efficacious if there was a 20% reduction in the proportion of
patients receiving the new drug who meet the criteria for
hypertension (i.e., if the proportion is 24% among patients
receiving the new drug). How many patients should be en-
rolled in the trial to ensure that the power of the test is 80%
to detect this difference in the proportions of patients with
hypertension? A two-sided test will be used with a 5% level
of significance.
We first compute the effect size based on the hypothesized
difference in proportions. The proportion expected in the
placebo group is entered into cell B1 and the proportion ex-
pected in the treatment group is computed as a 20% reduction
in B3 using “B1*(10.2)”. The data in the Excel worksheet
are shown in Figure 8–34.
Before computing the effect size, we need to compute the
overall proportion. This is done by taking the mean of the pro-
portions in the two treatment groups using “(B1+B3)/2”.
The computation is shown in Figure 8–35. In Figure 8–36, we
compute the effect size.
95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 121

Sample Size Estimates for Tests of Proportions in Two Independent Samples 121

We next enter the level of significance, , and the de- that the test has the specified power to detect the desired effect
sired power to compute z1/2 and the z1. This is shown size at the specified level of significance, to determine the num-
in Figure 8–37. The next step is to compute the sample bers of subjects per group required for the study, we must
size per group based on the effect size and the appropriate round up. This is done using Excel’s ROUNDUP function. The
z values for the selected  and power. This is shown in sample sizes required per group are shown in Figure 8–39.
Figure 8–38. Samples of size n1  860 patients on the new drug and
Finally, because the sample size formula always produces n2  860 patients on placebo will ensure that the test of
the minimum number of subjects per group required to ensure hypothesis will have 80% power to detect a 20% reduction

FIGURE 8–37 Computing z Values

FIGURE 8–38 Determining the Sample Size


Required per Group

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 122

122 Power and Sample Size Determination

FIGURE 8–39 Sample Size Required per Group for Study

in the proportions of patients who meet the criteria for sured in mg) is within 15 mg of the true mean?
hypertension. Assume that the standard deviation in caffeine intake
Once the Excel formulas are programmed to compute the is 68 mg.
sample sizes required to ensure a specified power in a test of 3. Consider the study proposed in Problem 2. How
hypothesis, other scenarios can be considered easily by chang- many students would be required to estimate the pro-
ing the inputs (e.g., , the desired power, the difference in the portion of students who consume coffee? Suppose
parameter reflecting a clinically meaningful change, or the we want the estimate to be within 5% of the true pro-
standard deviation). portion with 95% confidence.
4. A clinical trial was conducted comparing a new com-
8.12 PRACTICE PROBLEMS pound designed to improve wound healing in trauma
1. We want to design a new placebo-controlled trial to patients to a placebo. After treatment for 5 days, 58%
evaluate an experimental medication to increase lung of the patients taking the new compound had a sub-
capacity. The primary outcome is peak expiratory stantial reduction in the size of their wound as com-
flow rate, a continuous variable measured in liters pared to 44% in the placebo group. The trial failed to
per minute. The primary outcome will be measured show significance. How many subjects would be re-
after 6 months on treatment. The mean peak expira- quired to detect the difference in proportions
tory flow rate in adults is 300 with a standard devia- observed in the trial with 80% power? A two-sided
tion of 50. How many subjects should be enrolled to test is planned at   0.05.
ensure 80% power to detect a difference of 15 liters 5. A crossover trial is planned to evaluate the impact of
per minute with a two-sided test and   0.05? an educational intervention program to reduce alco-
2. An investigator wants to estimate caffeine consump- hol consumption in patients determined to be at risk
tion in high school students. How many students for alcohol problems. The plan is to measure alcohol
would be required to ensure that a 95% confidence consumption (the number of drinks on a typical
interval estimate for the mean caffeine intake (mea- drinking day) before the intervention and then again

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 123

Practice Problems 123

after participants complete the educational interven- 7. The mean body mass index (BMI) for boys age 12 is
tion program. How many participants would be 23.6. An investigator wants to test if the BMI is higher
required to ensure that a 95% confidence interval for in boys age 12 living in New York City. How many
the mean difference in the number of drinks is within boys are needed to ensure that a two-sided test of hy-
two drinks of the true mean difference? Assume that pothesis has 80% power to detect a difference of 2
the standard deviation of the difference in the mean units in BMI? Assume that the standard deviation in
number of drinks is 6.7 drinks. BMI is 5.7.
6. An investigator wants to design a study to estimate the 8. An investigator wants to design a study to estimate the
difference in the proportions of men and women who difference in the mean BMI between 12-year-old boys
develop early-onset cardiovascular disease (defined and girls living in New York City. How many boys
as cardiovascular disease before age 50). A study con- and girls are needed to ensure that a 95% confidence
ducted 10 years ago found that 15% and 8% of men interval estimate for the difference in mean BMI be-
and women, respectively, developed early onset car- tween boys and girls has a margin of error not ex-
diovascular disease. How many men and women are ceeding 2 units? Use the estimate of the variability in
needed to generate a 95% confidence interval esti- BMI from Problem 7.
mate for the difference in proportions with a margin
of error not exceeding 4%?

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch08_101_124.qxd 3/23/11 3:42 PM Page 124

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch09_125_135.qxd 3/23/11 3:41 PM Page 125

CHAPTER 9
Regression Analysis

In Chapter 9 of the textbook, we introduced regression analy- estimated slope. Excel has an analysis tool that can be used to
sis. We noted that regression analysis is a very general and estimate the y-intercept and slope.
widely applied technique. In the textbook, we focused more Example 9.1. Suppose we wish to estimate the equation of
on the use of regression analysis to assess confounding and the line that best describes the relationship between systolic
effect modification. We limit our focus here to estimating blood pressure (SBP) and age. The data from Table 9–1 are
simple linear and multiple linear regression models using the entered into an Excel worksheet as shown in Figure 9–1. Note
linear regression tool in the Data Analysis ToolPak. that the Excel worksheet contains n  40 observations; only the
We use data collected from n  40 randomly selected par- first 20 are shown in Figure 9–1.
ticipants of the Sixth Examination of the Framingham To estimate the simple linear regression equation, we use
Offspring Study to illustrate regression analysis using Microsoft the Tools/Data Analysis menu option. We select the Regression
Office® Excel®. The data are shown in Table 9–1 and include Analysis tool as shown in Figure 9–2 and click OK. Excel then
the participant’s age (in years), gender (which is coded 1 for requests specification of the variables for analysis in the dialog
males and 0 for females), body mass index (BMI), systolic and box shown in Figure 9–3.
diastolic blood pressures, total cholesterol, HDL cholesterol, We first specify the dependent or outcome variable (y).
diabetes (coded 1 for participants diagnosed with diabetes and In our example, the dependent variable is systolic blood pres-
0 otherwise), and current smoking status (coded 1 for current sure, which is contained in cell D1 through cell D41. We then
smokers and 0 otherwise). specify the independent variable (x), which in this example
is age. The age data is contained in cell A1 through cell A41.
9.1 SIMPLE LINEAR REGRESSION ANALYSIS Because we include the first row of labels (A1 and D1), we
In Chapter 9 of the textbook, we introduced simple linear re- click on the Labels box to indicate that the labels are con-
gression analysis as a technique for estimating the equation tained in these cells. We then specify a location for the results
that best describes the linear association between a continuous of the regression analysis. For this example, we request that
dependent or outcome variable, y, and a single independent or Excel place the results in a new worksheet entitled Simple
predictor variable, x. The independent variable can be contin- regression. Excel offers a number of additional details, such as
uous or dichotomous (sometimes called an indicator variable). analysis of residuals and normal probability plots. These are
The regression equation is as follows: used to examine the fit of the regression equation, and are
called regression diagnostics. (We introduced only the basic
ŷ  b0  b1x applications of regression analysis in the textbook, and we re-
strict our attention to the same in the Excel applications.)
where ŷ is the predicted or expected value of the dependent or The results of the regression analysis are shown in Figure
outcome variable, b0 is the estimated y-intercept, and b1 is the 9–4.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch09_125_135.qxd 3/23/11 3:41 PM Page 126

126 Regression Analysis

TABLE 9–1 Data from n  40 Randomly Selected Participants of the Sixth Examination of the Framingham Offspring
Study
Total
Age Male BMI SBP DBP Cholesterol HDL Diabetes Smoke
48.2683 1 27.92 140 88 184 35 1 1
47.3347 1 32.61 118 77 178 48 0 1
47.1129 1 34.83 112 69 177 33 0 0
49.0541 1 28.76 128 84 246 54 0 0
45.9548 1 26.76 121 85 193 43 0 0
54.5243 1 27.01 126 77 182 40 0 0
56.2409 1 28.76 124 77 246 50 0 1
52.1068 1 24 131 80 167 40 1 0
56.011 1 30.37 129 81 176 39 1 0
58.2012 1 27.88 121 85 210 45 0 1
51.1129 1 19.67 93 59 174 63 0 0
53.1444 1 25.45 111 79 180 58 0 0
68.8241 1 23.1 151 75 192 31 1 0
66.8611 1 27.44 132 76 180 50 0 1
66.8446 1 29.03 137 56 129 39 0 0
62.152 1 27.25 144 82 216 57 0 0
69.2293 1 24.68 109 75 184 64 0 0
64.3723 1 34.44 133 77 271 50 1 0
61.2567 1 22.86 104 68 198 51 0 0
66.9624 1 27.84 122 60 180 33 1 0
71.7454 1 27.7 137 81 198 44 1 0
71.0089 1 31.04 136 75 213 62 0 0
77.4456 1 34.06 110 57 181 45 0 0
34.6557 0 21.8 99 60 178 33 0 0
59.0773 0 23.59 124 76 212 47 0 0
45.7659 0 22.39 118 77 258 56 0 0
55.9808 0 26.18 110 66 263 50 0 0
47.5729 0 24.86 103 66 183 47 0 0
59.4798 0 32.89 123 85 203 40 0 0
58.3381 0 24.47 118 61 230 81 0 0
50.0589 0 21.98 110 68 168 72 0 1
52.6845 0 25.12 105 67 201 61 0 0
51.7016 0 39.93 131 80 197 43 0 0
58.8255 0 38.14 107 69 224 29 0 0
64.5859 0 25.86 138 68 205 53 0 0
67.0418 0 30.95 135 72 210 36 0 0
62.642 0 31.99 123 65 209 70 0 0
71.8248 0 19.03 103 50 206 63 0 0
76.6899 0 21.8 137 85 176 74 0 0
73.9932 0 33.07 135 80 254 57 0 0

Excel produces a number of statistics and analyses in with age.) The regression equation relating age to systolic
its standard regression analysis. We again focus only on the blood pressure is:
analyses discussed in the textbook. Specifically, the estimates
of the regression coefficients are in the last section of the ŷ  89.40 + 0.56(Age)
results under the column headed Coefficients. The estimate of
the y-intercept is b0  89.40 and the estimate of the slope is where ŷ is the predicted or expected SBP. Excel also provides
b1  0.56. (Notice that the slope is the coefficient associated standard errors of the regression coefficients, t statistics, and

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch09_125_135.qxd 3/23/11 3:41 PM Page 127

Simple Linear Regression Analysis 127

FIGURE 9–1 Data for Regression Analysis

FIGURE 9–2 Regression Analysis Tool

p-values to test whether the regression coefficients are sta- Specifically, we test H0: 1  0 versus H1: 1  0. Excel
tistically significantly different from zero. Usually, we are not provides a p-value of 0.0107, indicating that there is a statisti-
interested in whether the intercept is significantly different cally significant association between age and systolic blood
from zero. However, it is of interest to test whether the slope pressure. The regression equation indicates that each addi-
in the population is significantly different from zero. tional year of age is associated with a 0.56-unit increase in

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch09_125_135.qxd 3/23/11 3:41 PM Page 128

128 Regression Analysis

FIGURE 9–3 Specification of Variables for Simple Linear


Regression Analysis

FIGURE 9–4 Results of Simple Linear Regression Analysis

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch09_125_135.qxd 3/23/11 3:41 PM Page 129

Simple Linear Regression Analysis 129

systolic blood pressure. (The other analyses that Excel gener- labels are contained in these cells. We then specify a loca-
ates are useful and interested readers should see some of the tion for the results of the regression analysis. For this ex-
references at the end of Chapter 9 in the textbook for more ample, we request that Excel place the results in a new
details.) worksheet entitled SBP and smoking. The results of the re-
Example 9.2. Suppose we wish to assess whether there is gression analysis are shown in Figure 9–6.
an association between systolic blood pressure and current The estimate of the y-intercept is b0  121.85 and the es-
smoking status using the data in Table 9–1, which were entered timate of the slope is b1  2.31. The regression equation re-
into an Excel worksheet as shown in Figure 9–1. We again use lating current smoking status to systolic blood pressure is:
the Tools/Data Analysis menu option and select the
Regression Analysis tool. When we click OK, Excel requests ŷ  121.85 + 2.31 (Current Smoking Status)
specification of the variables for analysis in the dialog box
shown in Figure 9–5. where ŷ is the predicted or expected SBP. The p-value for the
We again specify the dependent or outcome variable (y). test of signif-icance for the slope is p  0.7098, indicating that
In this example, the outcome is systolic blood pressure, there is no statistically significant association between current
which is contained in cell D1 through cell D41. We then smoking status and systolic blood pressure in the population.
specify the independent variable (x), which in this example The regression equation indicates that smokers have higher
is smoking status. The smoking data is contained in cell I1 systolic blood pressures by approximately 2.31 units, as com-
through cell I41. Because we included the first row of labels pared to nonsmokers. However, this difference is not statisti-
(I1 and D1), we click on the Labels box to indicate that the cally significantly different from zero (because p  0.7098).

FIGURE 9–5 Specification of Variables for Simple Linear


Regression Analysis

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


9.2 MULTIPLE LINEAR REGRESSION
ANALYSIS
In Chapter 9 of the textbook, we introduced multiple linear re-
gression analysis as a technique for estimating the equation
that best describes the association between a continuous out-
come variable y and a set of independent variables, x1, x2, ...,
xp. The independent variables can be continuous or dichoto-
mous. The regression equation is as follows:

ŷ  b0  b1x1  b2x2  …  bpxp,

where ŷ is the predicted or expected value of the dependent vari-


able, x1 through xp are p distinct independent or predictor vari-
ables, b0 is the value of y when all of the independent variables
(x1 through xp) are equal to zero, and b1 through bp are the es-
timated regression coefficients. Excel has an analysis tool that
can be used to estimate the coefficients of a multiple regression
equation.
Example 9.3.
95313_Ch09_125_135.qxd 3/23/11 3:41 PM Page 131

Multiple Linear Regression Analysis 131

FIGURE 9–7 Specification of Variables for Multiple Linear Regression


Analysis

FIGURE 9–8 Results of Multiple Linear Regression Analysis

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch09_125_135.qxd 3/23/11 3:41 PM Page 132

132 Regression Analysis

age and sex are p  0.0122 and p  0.1930, respectively. The which is contained in cell G1 through cell G41. We then spec-
p-values indicate that there is a statistically significant asso- ify the independent variables (x1 and x2), which in this exam-
ciation between age and systolic blood pressure accounting ple are sex and BMI. The sex data is contained in cell B1
for sex, but not between sex and systolic blood pressure, once through cell B41 and the BMI data is contained in cell C1
age is considered. The multiple regression equation indicates though cell C41. The range “B1:C41” includes both inde-
that each additional year of age is associated with a 0.54-unit pendent variables. Because we included the first row of labels
increase in systolic blood pressure, holding sex constant, and (B1, C1, and G1), we click on the Labels box to indicate that
that men have higher systolic blood pressures than women by the labels are contained in these cells. We then specify a loca-
about 5.38 units, holding age constant. tion for the results of the regression analysis. For this example,
Example 9.4. We now consider HDL as our dependent or we request that Excel place the results in a new worksheet en-
outcome variable and want to assess the association between titled Multiple regression 2. The results of the regression analy-
BMI and sex, considered simultaneously, and HDL using the sis are shown in Figure 9–10.
data in Table 9–1. We again use the Tools/Data Analysis menu The estimates of the coefficients of the multiple regres-
option and select the Regression Analysis tool. When we click sion equation are as follows: b0  79.38, b1  6.31, and b2 
OK, Excel requests specification of the variables for analysis 0.94. The regression equation relating sex and BMI to HDL
in the dialog box shown in Figure 9–9. is:
We first specify the location of the data for our dependent
or outcome variable (y). In this example, the outcome is HDL, ŷ  79.38  6.31(Male sex)  0.94(BMI)

FIGURE 9–9 Specification of Variables for Multiple Linear Regression


Analysis

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch09_125_135.qxd 3/23/11 3:41 PM Page 133

Practice Problems 133

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 9–10 Results of Multiple Linear Regression Analysis

where ŷ is the predicted or expected HDL. The p-values for ables. To use the Regression Analysis Tool (to correctly specify
the tests of significance for the regression coefficients asso- these independent variables), we would need to reorganize the
ciated with gender and BMI are p  0.0986 and p  0.0190, data in the Excel worksheet so that gender and diabetes are in
respectively. The p-values indicate that there is a marginally adjacent columns. This can be done in several different ways. An
significant association between sex and HDL (often when easy way is to copy the data from column B and column H into
p-values fall in the range of 0.05 to 0.10, they are described column K and column L, as shown in Figure 9–11.
as marginally significant), accounting for BMI, and a statis- To estimate the multiple regression equation, we use the
tically significant association between BMI and HDL, ac- Tools/Data Analysis menu option and select the Regression
counting for sex. The multiple regression equation indicates Analysis tool. When we click OK, Excel requests specification
that men have lower HDL than women by about 6.31 units, of the variables for analysis in the dialog box shown in Figure
holding BMI constant, and that each additional unit of BMI 9–12.
is associated with a 0.94-unit reduction in HDL.Thus, in- We again specify the location of the data for our
creased BMI is associated with decreased HDL. Recall that dependent or outcome variable (y  HDL), which is con-
HDL is the “good cholesterol” and that higher values are tained in cell G1 through cell G41. We then specify the
healthier. independent variables (x1 and x2, or sex and diabetes), which
It is important to note that for multiple regression analy- are now contained in cell K1 through cell L41. The analysis is
sis, the independent or predictor variables (x1, x2, x3, ... , xp) performed as described in Example 9.3 and Example 9.4.
must be in adjacent columns in the Excel worksheet. When we
specify the location of the cells containing the independent 9.3 PRACTICE PROBLEMS
variables (i.e., Input x Range text field, Figure 9–9), we specify 1. Consider the data shown in Table 9–2 measured in a
the locations of the first and last cells in the adjacent columns sample of n  25 undergraduates in an on-campus
containing the data. For example, suppose in Example 9.3 we survey of health behaviors. Enter the data into an
wished to consider sex and diabetes as the independent vari- Excel worksheet for analysis.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch09_125_135.qxd 3/23/11 3:41 PM Page 134

134 Regression Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 9–11 Organizing the Data for a Multiple Regression Analysis

FIGURE 9–12 Specification of Variables for Multiple Linear


Regression Analysis

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_Ch09_125_135.qxd 3/23/11 3:41 PM Page 135

Practice Problems 135

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
TABLE 9–2 Data for Practice Problems

Year in Current Exercise per Average Drinks Cups of Coffee


ID Age Female School GPA Smoker Week (h) per Week per Week
1 18 1 Fr 3.85 1 7 3 3
2 21 0 Jr 3.27 1 3 2 4
3 19 1 So 2.90 0 0 4 7
4 22 0 Sr 3.65 1 0 2 4
5 21 1 Sr 3.41 1 0 1 3
6 20 0 Jr 3.20 0 2 5 8
7 19 1 Jr 2.89 1 1 4 10
8 17 0 Fr 3.75 0 6 0 0
9 18 0 So 4.00 0 6 2 6
10 17 1 So 3.18 0 3 5 7
11 21 0 Jr 2.58 1 3 12 12
12 22 1 Sr 2.98 0 2 3 4
13 19 0 Fr 3.16 1 2 0 6
14 21 1 Jr 3.36 1 3 1 2
15 22 1 So 3.72 0 6 3 0
16 19 0 So 3.30 1 4 0 6
17 16 0 Fr 3.28 0 4 0 5
18 22 0 Sr 2.98 0 0 8 5
19 17 1 Fr 3.90 0 7 0 2
20 20 1 Sr 3.78 1 4 6 2
21 21 1 So 3.26 1 2 3 4
22 23 0 Jr 3.01 0 1 9 7
23 23 0 Sr 3.83 1 5 4 4
24 17 1 Fr 3.76 0 5 2 1
25 22 1 Sr 3.05 0 1 5 5

2. Using the data in Table 9–2, estimate the simple lin- 4. Using the data in Table 9–2, estimate the multiple
ear regression equation relating number of cups of linear regression equation relating number of cups
coffee per week to GPA (consider GPA the dependent of coffee per week, female gender, and number of
or outcome variable). hours of exercise per week, considered simultane-
3. Using the data in Table 9–2, estimate the simple lin- ously, to GPA (consider GPA the dependent or out-
ear regression equation relating female gender to GPA come variable).
(consider GPA the dependent or outcome variable).

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 136

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

CHAPTER 10
Nonparametric Procedures

In Chapter 10 of the textbook we presented hypothesis testing The test statistic (Step 2) varies depending on the specific test.
procedures for situations with small sample sizes and outcomes When we conducted nonparametric tests of hypothesis by
that are ordinal, ranked, or continuous and cannot be assumed hand in Chapter 10 of the textbook, we ultimately drew a con-
to be normally distributed. Nonparametric tests are based on clusion by comparing the test statistic to the critical value,
ranks that are assigned to the ordered data. The tests involve the which was derived from an appropriate probability distribution
same five steps as parametric tests: specifying the null hy- table. Critical values can be found in Table 5 through Table 8
pothesis and alternative or research hypothesis, selecting and of the Appendix. We now use Microsoft Office® Excel® 2003 to
computing an appropriate test statistic, setting up a decision compute the test statistics for each test. Before illustrating the
rule, and drawing a conclusion. computation of test statistics with Excel, we first illustrate how
Four tests were presented in Chapter 10: the Mann– to use Excel to rank data, which is a key component of all non-
Whitney U Test for comparing a continuous outcome in two parametric tests.
independent samples, the Sign and Signed Rank tests for com-
paring a continuous outcome in two matched or paired sam- 10.1 RANKING DATA
ples, and the Kruskal–Wallis test for comparing a continuous
The nonparametric procedures that we describe here follow
outcome in more than two independent samples.
the same general procedure. The outcome variable (ordinal, in-
For each test we used the same general approach, a five-
terval, or continuous) is ranked from lowest to highest, and
step approach, which is outlined below:
the analysis focuses on ranks as opposed to the measured or
• Step 1: Set up hypotheses (H0 and H1) and select a raw values. For example, suppose we measure self-reported
level of significance, . pain using a visual analog scale with anchors at 0 (no pain)
• Step 2: Choose the appropriate test statistic (e.g., and 10 (agonizing pain) and record the following in a sample
U, the smaller of the number of positive or of six participants (n = 6) :
negative signs, or W or H).
• Step 3: Determine the critical value(s) and set up 7 5 9 3 0 2
the decision rule (which depends on ,
the test statistic, and whether the test is The data are entered into Excel as shown in Figure 10-1.
upper-, lower-, or two-tailed). The ranks, which are used to perform a nonparametric
• Step 4: Compute the test statistic based on observed test, are assigned to the data, which are ordered from smallest
sample data. to largest. Notice that the data do not have to be ordered when
• Step 5: Draw a conclusion by comparing the test they are entered into Excel. The smallest value is assigned a
statistic to the critical value. rank of 1, the next smallest is assigned a rank of 2, and so on.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 137

Ranking Data 137

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 10-1 Observed Data FIGURE 10-2 Ranked Data

we assigned ranks by hand. The RANK function ranks the data


in ascending or descending order. However, if there are ties,
the same ranks are assigned to the tied values. For example,
The largest value is assigned a rank of n (in this example, n = suppose the following are recorded in a sample of six partici-
6). Excel has a built-in RANK function that assigns ranks to pants (n = 6):
data. The function is used as follows:
7 7 9 3 0 2
=RANK(value, data range, order) The data are entered into Excel as shown in Figure 10-3, and
the rank function is again invoked as “=RANK(value,A$2:
The value is the address of the cell that we wish to rank. A$7,1)”. The ranks are shown in Figure 10-3.
The data range contains the addresses of the cells containing Notice that the 4th and 5th ordered values are both as-
the first and last observations in the dataset, separated by a signed ranks of 4. In nonparametric testing, we wish to assign
colon. For example, the data in Figure 10-1 occupy the range
A2:A7. Order is specified as either 0 or 1. Order = 0 indicates
that ranks are assigned in descending order, and order = 1 in- FIGURE 10-3 Ranks in Data with
dicates that ranks are assigned in ascending order. Because we Ties
want to assign ranks from smallest to largest, we specify order
= 1. To rank the data shown in Figure 10-1 in ascending order,
we use the RANK function as follows: “=RANK(value,A$2:
A$7,1)”, where value changes depending on which data point
we wish to rank (i.e., which cell in cell A2 through cell A7).
Notice that we use absolute cell references to indicate the data
range (A$2:A$7) to ensure that the same data range is consid-
ered as we copy the RANK function from cell to cell. The rank
function is specified in cell B2 as “=RANK(A2,A$2:A$7,1)”
and is copied from cell B2 to cell B3 through cell B7 as shown
in Figure 10-2 (Notice the formula in cell B7 showing in the
menu bar).
Using Excel, we rank the data in one step, as opposed to
first ordering the data and then assigning ranks as we did when

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 138

138 Nonparametric Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 10-4 Assigning Ranks Accounting for Ties

the mean rank to values that are tied. Specifically, we assign Suppose the following are recorded in a sample of six partici-
ranks of 4.5 to the two values of 7. To assign mean ranks to the pants (n = 6):
tied values, we use the following as a correction. In each cell
7 7 7 3 0 2
(see cell C2 in Figure 10-4), we enter “=RANK(A2,A$2:
A$7,1)(COUNT(A$2:A$7)1RANK(A2,A$2:A$7,0)RANK The data are entered into Excel as shown in Figure 10-5, and
(A2,A$2:A$7,1))/2.” This formula is then copied from cell C2 the rank function with the correction factor is specified as
to cell C3 through cell C7 as shown in Figure 10-4. (Notice the “=RANK(A2,A$2:A$7,1)(COUNT(A$2:A$7)1RANK
formula in cell C2 showing in the menu bar.) (A2,A$2:A$7,0)RANK(A2,A$2:A$7,1))/2” in cell B2. This
The corrected formula assigns the mean rank to tied val- formula is then copied from cell B2 to cell B3 through cell B7
ues. As we work through applications, only the formula shown as shown in Figure 10-5.
in column C is needed to assign ranks, as it makes the appro- Notice that there are three values of 7. We assign a rank of
priate adjustment to assign mean ranks, when there are ties. 5 (the mean of 4, 5, and 6) to the 4th, 5th, and 6th ordered

FIGURE 10-5 Ranking Data with Ties

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 139

Tests with Two Independent Samples 139

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
values. Using this approach of assigning the mean rank when given below and we run the test at the 5% level of significance
there are ties ensures that the sum of the ranks is the same in (i.e.,  = 0.05).
each sample and always equal to n(n  1)/2. When conduct-
ing nonparametric tests, it is useful to check the sum of the H0: The two populations are equal.
ranks before proceeding with the analysis.
H1: The two populations are not equal.
10.2 TESTS WITH TWO INDEPENDENT SAMPLES
The data are entered into Excel as shown in Figure 10-6.
The Mann–Whitney U test is used to compare a continuous
The first step is to assign ranks, and this is done on the
outcome in two independent samples with small sample sizes
combined or total sample (i.e., pooling the data from the two
and outcomes that are ordinal, ranked, or continuous and can-
treatment groups), and assigning ranks from 1 to 10. Using
not be assumed to be normally distributed. The hypotheses,
Excel, we enter the data for each comparison group and then
with a two-sided alternative (as is generally the case) are as
rank the pooled sample (n = 10). This is done using the proce-
follows:
dure outlined in Section 10.1. Specifically, we use the rank func-
tion with the correction factor to assign ranks to the observed
H0: The two populations are equal.
data in the placebo group and new-drug group, combined.
H1: The two populations are not equal. Recall that we must maintain the group assignments. To per-
form the ranking, we create two new columns—one to contain
In Chapter 10 of the textbook, we presented the following the ranks of the data in the placebo group and the other to con-
formula for the test statistic, U, in the Mann–Whitney U Test: tain the ranks of the data in the new-drug group (See Figure 10-
n (n + 1) 7). The first observation in the placebo group is in cell A2. To
U is the smaller of U1 = n1n2  1 1  R1 and U2 = n1n2
n2 (n2 + 1) 2 rank this value, in cell C2 we specify “=RANK(A2,$A$2:
  R2, where R1 and R2 are the sums of the ranks $B$6,1)(COUNT($A$2:$B$6)1RANK(A2,$A$2:$B$6,0)
2
in groups 1 and 2, respectively. RANK(A2,$A$2:$B$6,1))/2”. This formula is then copied
When performing the test of hypothesis by hand, we com- from cell C2 to cell C3 through cell D6 as shown in Figure 10-
puted the test statistic U and found the appropriate critical 7. Notice that we specify the range of the data as $A$2:$B$6.
value in Table 5 in the Appendix to set up the decision rule: Specifically, we use absolute cell references for both the row and
Reject H0 if U  critical value from Table 5. Excel does not column addresses of the cells containing the data so that when
have a specific analysis tool for the Mann–Whitney U test. the formula is copied we continue to rank the observed data,
However, Excel can be used to assign ranks and compute the which is located in cell A2 through cell B6. See the formula for
test statistic. The conclusion of the test is based on a compar- cell D6 showing in the menu bar.
ison of the test statistic to the appropriate critical value from Before proceeding, we check the assignment of the ranks
Table 5 in the Appendix. by summing the ranks in each group. These are denoted R1
Example 10.1. In Example 10.1 in the textbook we ana- and R2, for the placebo and new-drug groups, respectively.
lyzed data from a Phase II clinical trial designed to investigate
the effectiveness of a new drug to reduce symptoms of asthma
FIGURE 10-6 Data for
in children. A total of 10 participants (n = 10) were random-
Mann–Whitney U Test
ized to receive either the new drug or a placebo. Participants
recorded the number of episodes of shortness of breath over
a 1-week period following receipt of the assigned treatment.
The data are shown below.

Placebo 7 5 6 4 12
New Drug 3 6 4 2 1

The question of interest is whether there is a difference in


the number of episodes of shortness of breath over a 1-week
period in participants receiving the new drug compared to
those receiving the placebo. The hypotheses to be tested are

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 140

140 Nonparametric Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 10-7 Ranking Data in the Pooled Sample

Recall that the sum of the ranks will always equal n(n  1)/2 To compute the test statistic in Excel, we first compute the
= 10(11)/2 = 55. This is shown in Figure 10-8. sample sizes, n1 and n2, using the COUNT function. We spec-
Thus, R 1 = 37 and R 2 = 18. The test statistic for the ify the cells we wish to count as follows: “=COUNT(A2:A6)”.
Mann–Whitney U Test is denoted U and is the smaller of U1 The totals the number of cells in the range of A2 through A6
and U2, defined below. that contain numeric data (numbers). The computation of
sample sizes is shown in Figure 10-9.
n1 (n1 + 1)
U1 = n1n2 + − R1 In this example, the sample size of group 1 (n1) is in cell
2
C13 and is determined by “=COUNT(A2:A6)”. The sample
n (n + 1)
U 2 = n1n2 + 2 2 − R2 size of group 2 (n2) is in cell D13 and is determined by
2 “=COUNT(B2:B6)”. We now use R1, R2, n1, and n2 to com-

FIGURE 10-9 Determining Sample Sizes


FIGURE 10-8 Summing the Ranks in Each Group

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 141

Tests with Matched Samples 141

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 10-10 Computing U1 and U2 FIGURE 10-11 Computing the Test Statistic U

pute U1 and U2 using Excel. The computations are shown in distributed. Recall that when data are matched or paired, we
Figure 10-10. The formula used to compute U2 in cell D16 is compute difference scores for each individual and analyze dif-
shown in the top menu bar. ference scores. The hypotheses for both tests, with a two-sided
The final step is to produce the test statistic, U, which is the alternative (as is generally the case), are as follows:
smaller of U1 and U2. This is computed using the MIN (min-
imum) function and shown in Figure 10-11 . H0: The median difference is zero.
Thus, U = 3. To draw a final conclusion in the test, we H1: The median difference is not zero.
must determine whether the observed test statistic, U, sup-
ports the null or the research hypothesis. This is done by de- In Chapter 10 of the textbook, we presented the Sign test,
termining a critical value of U such that if the observed value where the test statistic is the smaller of the number of positive
of U is less than or equal to the critical value, we reject H0 in or negative signs of the difference scores. When performing
favor of H1, and if the observed value of U exceeds the critical the test of hypothesis by hand, we found the appropriate crit-
value we do not reject H0. The critical value of U is found in ical value in Table 6 in the Appendix to set up the decision
Table 5 in the Appendix. For n1 = n2 = 5 and a two-sided level rule: Reject H0 if the smaller of the number of positive or neg-
of significance  = 0.05, the critical value is 2, and the decision ative signs  critical value from Table 6. Excel does not have a
rule is to reject H0 if U  2. We do not reject H0 because 3 > specific analysis tool for the Sign test. However, Excel can be
2. We do not have statistically significant evidence at  = 0.05 used to compute difference scores, to count the numbers of
to show that the two populations of numbers of episodes of positive and negative signs, and to determine the smaller of
shortness of breath are not equal. the two, which is the test statistic. The conclusion of the test is
based on a comparison of the test statistic to the appropriate
10.3 TESTS WITH MATCHED SAMPLES critical value from Table 6 in the Appendix.
The Sign test and the Wilcoxon Signed Rank test are used to In Chapter 10 of the textbook, we also presented the
compare a continuous outcome in two matched or paired sam- Wilcoxon Signed Rank test and the test statistic W, which is de-
ples with small sample sizes and outcomes that are ordinal, fined as the smaller of W and W, which are the sums of the
ranked, or continuous and cannot be assumed to be normally positive and negative ranks of the difference scores, respec-

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 142

142 Nonparametric Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
tively. When performing the test of hypothesis by hand, we
found the appropriate critical value in Table 7 in the Appendix FIGURE 10-12 Data on Matched Pairs for Sign Test
to set up the decision rule: Reject H0 if W  critical value from
Table 7. Excel does not have a specific analysis tool for the
Wilcoxon Signed Rank test. However, Excel can be used to
compute difference scores, to rank the differences, to attach
the signs, to determine W and W, and to compute the test
statistic W. The conclusion of the test is based on a compari-
son of the test statistic to the appropriate critical value from
Table 7 in the Appendix.
Example 10.2. In Example 10.5 in the textbook, we ana-
lyzed data from a study to assess quality of life (QOL) in pa-
tients with breast cancer following a new chemotherapy
treatment. QOL was measured on an ordinal scale, and for
analysis purposes, numbers were assigned to each response
category as follows: 1 = Poor, 2 = Fair, 3 = Good, 4 = Very
Good, 5 = Excellent. The data are shown in Table 10-1.
The question of interest is whether there is a difference in
QOL after chemotherapy treatment as compared to before.
The test is run at a 5% level of significance and the hypothe-
ses are as follows:
by subtracting the QOL measured before treatment from that
H0: The median difference is zero.
measured after (i.e., the measurement in column C minus the
H1: The median difference is not zero. measurement in column B for each participant). The difference
scores are shown in Figure 10-13.
The data are entered into Excel as shown in Figure 10-12. With the Sign test, we only concern ourselves with the
We analyze the data using the Sign test. The test statistic signs of the difference scores. Thus, we take each difference
for the Sign test is the smaller of the number of positive or score in column D and retain only the sign of the difference
negative difference scores. We first compute difference scores score. This is done in Excel using the IF function. Specifically,

TABLE 10-1 QOL Before and After Chemotherapy


Treatment FIGURE 10-13 Difference Scores

QOL Before QOL After


Chemotherapy Chemotherapy
Patient Treatment Treatment
1 3 2
2 2 3
3 3 4
4 2 4
5 1 1
6 3 4
7 2 4
8 3 3
9 2 1
10 1 3
11 3 4
12 2 3

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 143

Tests with Matched Samples 143

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
we evaluate each difference score in column D (in cell D2 terion. In this example the number of positive signs is com-
through cell D13) and retain only the sign of the difference by puted as “=COUNTIF(E2:E13,“”)” and the number of neg-
using the IF function as follows: IF(condition, result if condition ative signs is computed as “=COUNTIF(E2:E13,“”)” as
is true, result if condition is false). For example, we specify the shown in Figure 10-16.
following in cell E2: “=IF(D2<0,“”,“ ”)”. The condition de- The test statistic is the smaller of the number of positive
termines whether the value in cell D2 is negative (i.e., <0), and or negative signs of the difference scores, which is equal to 3.
if so the IF function places a negative sign “” in cell E2; if The appropriate critical value for the Sign test is found in Table
not, cell E2 is left blank, “ ”. We actually wish to assign a posi- 6 in the Appendix based on the sample size (or number of
tive sign “” to a cell with a positive difference and a negative matched pairs, n = 12), and our two-sided level of significance
sign “” to a cell with a negative difference, and so to do this, ( = 0.05). The critical value for this two-sided test with n =
we specify the following in cell E2: “=IF(D2<0,“”, 12 and  = 0.05 is 2, and the decision rule is as follows: Reject
IF(D2>0,“”, “ ”))”. This IF expression first determines H0 if the smaller of the number of positive or negative signs 
whether the value in cell D2 is negative and if so, a negative sign 2. We do not reject H0 because 3 > 2. We do not have statisti-
“” is placed in cell E2. If not, the expression then determines cally significant evidence at  = 0.05 to show that there is a
whether the value in cell D2 is positive, and if so, places a pos- difference in QOL after chemotherapy treatment compared to
itive sign “” in cell E2. If not, cell E2 is left blank. This is en- before treatment.
tered into cell E2 and is copied into cell E3 through cell E13 as With the Sign test it is possible to compute a p-value for
shown in Figure 10-14. the test using the binomial distribution. In Chapter 7 of the
Notice that cells E6 and E9 are blank because the differ- Excel workbook, we computed p-values for the parametric
ences are neither positive nor negative, but are equal to zero. tests analyzing means, proportions, differences in means and
Recall that if there is an even number of zeros in a dataset, we proportions, and mean differences using the Z and t proba-
randomly assign positive and negative signs to them. In this ex- bility distributions. We use the same approach here.
ample, we assign one negative sign (i.e., “” to patient 5) and Specifically, we use Excel to compute the test statistic (with the
one positive sign (i.e., “” to patient 8). The signs are entered Sign test, the smaller of the number of positive or negative
directly into cells E6 and E9 as shown in Figure 10-15. signs) and a p-value, and the investigator then compares the p-
Next, we count the number of positive and negative signs value to the predetermined level of significance to draw a con-
using the COUNTIF function. The COUNTIF function counts clusion about the hypotheses using the following rule: Reject
the number of cells in a specified range that meet a certain cri- H0 if p  .

FIGURE 10-14 Signs of the Difference Scores

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 144

144 Nonparametric Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 10-15 Assigning Signs to Differences of Zero

FIGURE 10-16 Summing the Numbers of Positive and Negative Signs

The test statistic for the Sign test is the smaller of the num- Chapter 5 of the Excel workbook, we used the BINOMDIST
ber of positive or negative signs of the difference scores, and it function to compute probabilities from the binomial distri-
follows a binomial distribution with n = the number of sub- bution. The BINOMDIST function is specified as follows:
jects in the study and p = 0.5. In this example, n = 12 and p = “=BINOMDIST(x,n,p,cumulative)”. The first three inputs for
0.5. The two-sided p-value for the test is p = 2  P(x  3). In the function are the same as those we use in computing prob-

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 145

Tests with Matched Samples 145

abilities by hand with the binomial distribution model (i.e., x, Sign test, it is based on difference scores, but in addition to an-
n, and p). Excel requires one additional input, labeled cumu- alyzing the signs of the differences, it also takes into account the
lative. This last argument in the BINOMDIST function is a magnitude of the observed differences.
logical value (i.e., one whose responses are true or false). We use Example 10.3. In Example 10.7 in the textbook, we ana-
the cumulative distribution function by specifying “true” or lyzed data from a study to evaluate the effectiveness of an ex-
do not use the cumulative distribution function by specifying ercise program in reducing systolic blood pressure in patients
“false.” The cumulative distribution function returns the prob- with prehypertension (defined as a systolic blood pressure be-
ability of observing x or fewer successes. For example, if we tween 120 mmHg and 139 mmHg or a diastolic blood pressure
specify “true” and indicate x = 3 in the function, then Excel between 80 mmHg and 89 mmHg). A total of 15 patients with
computes P(X  3). In contrast, if we specify “false” and indi- prehypertension enrolled in the study, and their systolic blood
cate x = 3 in the function, then Excel computes P(X = 3). We pressures were measured. Each patient then participated in an
wish to compute P(X  3) with n = 12 and p = 0.5. The com- exercise training program where they learned proper tech-
putation is shown in Figure 10-17. Notice that we indicate the niques and execution of a series of exercises. Patients were in-
number of successes (x) as the smaller of E15 and E16, and n structed to do the exercise program 3 times per week for 6
(the sample size for analysis) is determined using the COUN- weeks. After 6 weeks, systolic blood pressures were again meas-
TIF function to count the number of positive and negative ured and are shown in Table 10-2.
signs (see the formula in the top menu bar). The question of interest is whether there is a difference in
Because p-value = 0.1460 exceeds the level of significance systolic blood pressures after participating in the exercise pro-
( = 0.05) we do not have statistically significant evidence at gram compared to before the exercise program. The test is run
 = 0.05 to show that there is a difference in QOL after at a 5% level of significance and the hypotheses are below:
chemotherapy treatment compared to before treatment.
H0: The median difference is zero.
Another popular nonparametric test for matched or
paired data is called the Wilcoxon Signed Rank test. Like the H1: The median difference is not zero.

FIGURE 10-17 Computing the p-Value for the Sign Test


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 146

146 Nonparametric Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
TABLE 10-2 Blood Pressure Before and After FIGURE 10-18 Data on Matched Pairs for
Exercise Program Wilcoxon Signed Rank Test

© Jones & Bartlett


Systolic Blood Learning, LLC
Systolic Blood © Jones & Bartlett Learning, LLC
Pressure Before Pressure After
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
Patient Exercise Program Exercise Program
1 125 118
2 132 134
3 138 130
4 120 124
5 125 105
6 127 130
7 136 130
8 139 132
9 131 123
10 132 128
11 135 126
12 136 140
13 128 135
14 127 126
15 130 132

The data are entered into Excel as shown in Figure 10-18.


We analyze the data using the Wilcoxon Signed Rank test.
The test statistic is W, the smaller of W and W, which are
the sums of the positive and negative ranks, respectively.
We first compute difference scores by subtracting the SBP
measured after the exercise program from that measured be- FIGURE 10-19 Difference Scores
fore the exercise program. The difference scores are shown in
Figure 10-19.
The next step is to rank the ordered absolute values of the
difference scores. First, we generate a column of the absolute
values of the difference scores using the ABS (absolute value)
function. The absolute values of the difference scores are shown
in column E in Figure 10-20.
Next we assign ranks from 1 through n to the smallest
through the largest absolute values of the difference scores, re-
spectively, and assign the mean rank when there are ties in the
absolute values of the difference scores using the approach
outlined in Section 10.1. The ranks of the absolute values of the
difference scores are shown in Figure 10-21.
In the next step, we attach the signs (“” or “”) of the
observed differences to each rank using the IF function as
shown in Figure 10-22. Specifically, if the difference (value in
column D) is less than zero, the signed rank is equal to 1 
the rank (in column F). If not, the signed rank is equal to the
rank in column F. Notice that for patient 2 (row 3) the differ-
ence is 2, and thus the signed rank is 2.5.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 147

Tests with More Than Two Independent Samples 147

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
“=SUMIF(G2:G16,“>0”)”. W is the sum of the negative
FIGURE 10-20 Absolute Values of the Difference Scores
ranks, computed as “=ABS(SUMIF(G2:G16,“<0”))”. Notice
that when we sum the negative ranks, we want to sum the ab-
solute values of the negative ranks, which is done using the
© Jones & Bartlett Learning, LLCabsolute value (ABS) function.
© Jones & Bartlett Learning, LLC
This is shown in Figure 10-23.
NOT FOR SALE OR DISTRIBUTION The test statistic is W,NOT FORofSALE
the smaller W and ORW. DISTRIBUTION
W=
31 as shown in Figure 10-24. The critical value of W is found
in Table 7 in the Appendix based on the sample size (n = 15)
and our two-sided level of significance ( = 0.05). The critical
value for this two-sided test with n = 15 and  = 0.05 is 25, and
the decision rule is as follows: Reject H0 if W  25. We do not
reject H0 because 31 > 25. We do not have statistically signifi-
cant evidence at  = 0.05 to show that the median difference
in systolic blood pressures is not zero (i.e., that there is a sig-
nificant difference in systolic blood pressures after the exer-
cise program as compared to before).

10.4 TESTS WITH MORE THAN TWO


INDEPENDENT SAMPLES
The Kruskal–Wallis test is used to compare medians of a con-
We now compute the sums of the positive and negative tinuous outcome in more than two independent samples with
ranks, W and W, respectively using the SUMIF function. small sample sizes when the outcome is ordinal, ranked, or
The SUMIF function sums the values in a specified range of continuous and cannot be assumed to be normally distrib-
cells (e.g., cell G2 through cell G16) that meet a specified uted. The Kruskal–Wallis test is used to compare medians
criterion. W is the sum of the positive ranks, computed as among k comparison groups (k > 2) and is sometimes de-

FIGURE 10-21 Ranking the Absolute Values of the Difference Scores

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 148

148 Nonparametric Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 10-22 Signed Ranks

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 10-23 Computing W+ and W-

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


scribed as an ANOVA with the data replaced by their ranks. The However, Excel can be used to assign ranks and to compute
null and research hypotheses for the Kruskal–Wallis nonpara- the test statistic. The conclusion of the test is based on a com-
metric test are as follows: parison of the test statistic to the appropriate critical value
from Table 8.
H0: The k population medians are equal. Example 10.4. In Example 10.8 of the textbook, we ana-
lyzed a clinical study designed to assess differences in albumin
H1: The k population medians are not all equal.
levels in adults following different low-protein diets. Three
diets were compared, ranging from 5% to 15% protein, and the
The procedure for the test involves pooling the observations
15% protein diet represents a typical American diet. The al-
from the k samples into one combined sample, keeping track
bumin levels of participants following each diet are shown in
of which sample each observation comes from, and then rank-
Table 10-3.
ing lowest to highest from 1 to N, where N = n1  n2  ...  nk.
In Chapter 10 of the textbook, we presented the following
formula for the test statistic, H, in the Kruskal–Wallis test:
⎛ 12 k R2 ⎞


j
H =⎜ ⎟ − 3( N + 1) , where k = the number
⎝ N ( N + 1 ) j =1 n j ⎠

of comparison groups, N = the total sample size, nj is the sam-


ple size in the jth group, and Rj is the sum of the ranks in the
jth group.
When performing the test of hypothesis by hand, we com-
puted the test statistic H and found the appropriate critical
value in Table 8 in the Appendix to set up the decision rule:
Reject H0 if H  critical value from Table 8. Excel does not
have a specific analysis tool for the Kruskal–Wallis test.
150 Nonparametric Procedures

The question of interest is whether there is a difference Notice that the range of the data specified in the RANK
in albumin levels among the three different diets. The test is function is $A$2 through $C$6 (see formula in the top menu
run at a 5% level of significance, and the hypotheses are as bar). To compute the test statistic H, we need the sum of the
follows: ranks in each group, Rj. These are denoted R1, R2, and R3 and
are shown in Figure 10-27. Recall that the sum of the ranks
H0: The three population medians are equal. will always equal n(n  1)/2 = 12(13)/2 = 78. This is shown in
Figure 10-28.
H1: The three population medians are not all equal.
The test statistic for the Kruskal–Wallis test is denoted H
and is defined as follows:
The data are entered into Excel as shown in Figure 10-25.
To conduct the test we assign ranks using the procedures
⎛ 12 k R2 ⎞


j
outlined in Section 10.1. This is done on the combined or total H =⎜ ⎟ − 3( N + 1)
sample (i.e., pooling the data from the three comparison ⎝ N ( N + 1) j =1 n j ⎠

groups), and ranks are assigned from 1 to 12. We also need to


keep track of the group assignments in the total sample (n = To compute the test statistic in Excel, we first compute the
12). The ranks are shown in Figure 10-26. sample sizes in each group, nj, using the COUNT function. We

FIGURE 10-25 Data for the


Kruskal–Wallis Test
also sum the sample sizes to compute the total sample size N.
This is shown in Figure 10-29.
We now use Rj, nj, and N to compute the test statistic H
using Excel. The formula to compute the test statistic H in cell
E15 is shown in the top menu bar in Figure 10-30. Thus, H =
95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 152

152 Nonparametric Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 10-30 Computing the Test Statistic H FIGURE 10-31 Data for the Kruskal–Wallis
Test

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

groups), and ranks are assigned from 1 to 20. We also need to The test statistic for the Kruskal–Wallis test is denoted H
keep track of the group assignments in the total sample (n = and is defined as follows:
20). The ranks are shown in Figure 10-32.
⎛ 12 k R2 ⎞
Notice that the range of the data specified in the RANK

j
H =⎜ ⎟ − 3( N + 1)
function is $A$2 through $D$6 (see formula in the top menu ⎝ N ( N + 1) j =1 n j ⎠
bar). To compute the test statistic H, we need the sum of the
ranks in each group, Rj. These are denoted R1, R2, R3, and R4 To compute the test statistic in Excel, we first compute the
and are shown in Figure 10-33. Recall that the sum of the ranks sample sizes in each group, nj, using the COUNT function. We
will always equal n(n  1)/2 = 20(21)/2 = 210. This is shown also sum the sample sizes to compute the total sample size N.
in Figure 10-34. This is shown in Figure 10-35.

FIGURE 10-32 Ranking the Data in the Pooled Sample

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


Tests with More Than Two Independent Samples 153

FIGURE 10-33 Summing the Ranks in Each Group

FIGURE 10-34 Checking the Sum of the Ranks

We now use Rj, nj, and N to compute the test statistic H statistic (H in this case) and the degrees of freedom. The de-
using Excel. The formula to compute the test statistic H in cell grees of freedom is defined as k  1, where k is the number of
G15 is shown in the top menu bar in Figure 10-36. comparison groups. Computation of the p-value for the test is
Thus, H = 9.11. We can now compute a p-value for the shown in Figure 10-37.
test using the 2 distribution and the CHIDIST function in Because the p-value = 0.0278 is less than the level of
Excel. The CHIDIST function requires specification of the test significance ( = 0.05), we reject H0. We have statistically
95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 154

154 Nonparametric Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 10-35 Determining the Sample Sizes, nj, and Total Sample Size N

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 10-36 Computing the Test Statistic H

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 155

Practice Problems 155

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 10-37 Computing the p-Value Using the Chi-Square Distribution

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

significant evidence at  = 0.05 to show that there is a differ- days in the 12 months prior to the start of the well-
ence in median anaerobic thresholds among the four different ness program and again over the 12 months after the
groups of elite athletes. completion of the program are recorded and are
shown in Table 10-5. Is there a significant reduction
10.5 PRACTICE PROBLEMS in the number of sick days taken after completing the
1. A company is evaluating the impact of a wellness pro- wellness program? Use the Sign test at a 5% level of
gram offered on-site as a means of reducing employee significance.
sick days. A total of 8 employees agree to participate 2. Using the data in Problem 1, assess whether there is
in the evaluation, which lasts 12 weeks. Their sick there a significant reduction in the number of sick
days taken after completing the wellness program
using the Wilcoxon Signed Rank test at a 5% level of
significance.
TABLE 10-5 Data for Practice Problems 1 and 2 3. A small study (n = 10) is designed to assess whether
there is an association between smoking in pregnancy
Sick Days Taken Sick Days Taken and low birth weight. Low-birth-weight babies are
in 12 Months in 12 Months those born less than 5.5 pounds. The following data
Employee Prior to Program Following program represent the birth weights (in pounds) of babies
1 8 7 born to mothers who reported smoking in pregnancy
2 6 6 and to those who did not.
3 4 5
4 12 11
Mother smoked in pregnancy 5.0 4.2 4.8 3.3 3.9
5 10 7
6 8 4 Mother did not smoke
7 6 3 during pregnancy 5.1 4.9 5.3 5.4 4.6
8 2 1

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH10_136_156.qxd 3/23/11 3:40 PM Page 156

156 Nonparametric Procedures

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
Is there a significant difference in birth weights be-
tween mothers who smoked during pregnancy and TABLE 10-6 Data for Practice Problem 6
those who did not? Run the appropriate test at a 5%
level of significance.
© Jones & Bartlett Learning, LLC Total Cholesterol© JonesTotal & Bartlett Learning, LLC
Cholesterol
4. The following data represent the number of play- Participant Before Treatment After Treatment
NOT FOR SALE OR DISTRIBUTION
ground injuries occurring among children aged 5 to
NOT FOR SALE OR DISTRIBUTION
1 250 241
9 years over a 3-month period in 12 playgrounds in 2 265 260
and around the neighborhoods of Boston. Play- 3 240 253
ground injuries include fractures, internal injuries, 4 233 230
lacerations, and dislocations. The question of inter- 5 255 224
6 275 227
est is whether there are differences in the numbers of 7 241 232
injuries at playgrounds in various locations. The data
below represent the numbers of injuries recorded at
four randomly selected playgrounds located on
sight, a history of infection, and dry skin? Run the
school properties, at day-care centers, and in resi-
appropriate test at a 5% level of significance.
dential neighborhoods.
6. A study is conducted to assess the potential benefits
of an ayurvedic treatment to reduce high cholesterol.
School properties 39 51 42 29
Seven patients agree to participate in the study. Each
Day-care centers 28 25 30 15 has their cholesterol measured at the start of the study
Residential neighborhoods 28 16 25 22 and then again after 4 weeks taking a popular herb
called arjuna (see Table 10-6). Is there a significant
Run the appropriate test at a 5% level of significance. difference in total cholesterol after taking the herb?
5. The recommended daily allowance of Vitamin A for Use the Sign test at a 5% level of significance.
children between 1 and 3 years of age is 400 micro- 7. Using the data in Problem 6, assess whether there is
grams (mcg). Vitamin A deficiency is linked to a there a difference in total cholesterol after taking the
number of adverse health outcomes including poor herb using the Wilcoxon Signed Rank test at a 5%
eyesight, susceptibility to infection, and dry skin. The level of significance.
following are Vitamin A concentrations in children 8. An investigator wants to test if there is a difference
with and without poor eyesight, a history of infec- in endotoxin levels in children who are exposed to
tion, and dry skin. endotoxin as a function of their proximity to operat-
ing farms. The following are endotoxin levels in units
With poor eyesight, a per milligram of dust sampled from children’s mat-
history of infection, tresses, organized by children’s proximity to farms.
and dry skin 270 420 180 345 390 430 Within 5 miles 54 62 78 90 70
Free of poor eyesight, 5–24.9 miles 28 42 39 81 65
a history of infection,
25–49.9 miles 37 29 30 50 53
and dry skin 450 500 395 380 430
50 miles or more 36 19 22 28 27
Is there a significant difference in Vitamin A concen-
trations between children with and without poor eye- Run the appropriate test at a 5% level of significance.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 157

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

CHAPTER 11
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

Survival Analysis

In Chapter 11 of the textbook we presented techniques to an- In Chapter 11 of the textbook we also discussed Cox pro-
alyze time-to-event data, or survival data. Because of the portional hazards regression analysis, which is a popular mul-
unique features of survival data, most specifically the presence tivariable technique to estimate the effect of several risk factors,
of censoring, special statistical procedures are necessary to an- considered simultaneously, on survival. Excel does not have
alyze these data. In survival analysis applications, it is often of the capability to estimate the parameters of the Cox propor-
interest to estimate the survival function, or survival proba- tional hazards model. Interested readers should see Allison1
bilities over time. We presented two popular nonparametric for details regarding the estimation of parameters in a Cox
techniques called the life table or actuarial table approach and proportional hazards regression model using SAS®.
the Kaplan–Meier approach to constructing cohort life tables
or follow-up life tables. Both approaches generate estimates of 11.1 ESTIMATING THE SURVIVAL FUNCTION
the survival function which can be used to estimate the prob- There are several different ways to estimate a survival func-
ability that a participant survives to a specific time (e.g., 5 or tion or a survival curve. A number of popular parametric
10 years) or the median survival time. Microsoft Office® Excel® methods are used to model survival data, and they differ in
2003 does not have a built-in analysis tool to construct a fol- terms of the assumptions that are made about the distribu-
low-up life table. Here we use Excel to program the formulas tion of survival times in the population. In Chapter 11 of the
to estimate survival probabilities and to generate graphical dis- textbook, we focused on two nonparametric methods, the ac-
plays of survival functions. tuarial or life table approach and the Kaplan–Meier approach,
It is also often of interest to assess whether there are sta- which made no assumptions about how the probability that a
tistically significant differences in survival between groups, i.e., person develops the event changes over time. The approaches
between competing treatment groups in a clinical trial, or be- are summarized here and their implementation is illustrated
tween men and women, or between patients with and without using Excel.
a specific risk factor in an observational study. There are many With the actuarial or life table approach, we first organize
statistical tests available; in Chapter 11 of the textbook we pre- the observed follow-up times into equally spaced intervals. We
sented the log-rank test, which is a popular nonparametric test might, for example, consider 1-, 2-, or 5-year intervals de-
to compare survival between two independent groups. It makes pending on the duration of the follow-up. We then sum the
no assumptions about the survival distributions and can be number of participants who are at risk at the beginning of
conducted relatively easily using life tables based on the each interval, the number who suffer the event of interest, and
Kaplan–Meier approach. Again, Excel does not have a built-in the number who are censored or lost to follow-up in each in-
analysis tool to conduct the log-rank test. However, we use terval. We compute the proportions who suffer the event of
Excel to program the computation of the test statistic and to interest and who do not in each interval, and then we compute
determine a p-value which can be used to assess the statistical the survival probability. The notation we presented in Chapter
significance of the difference in survival between groups. 11 of the textbook is summarized as follows:

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 158

158 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
Nt = number of participants who are event-free and
considered at risk during interval t TABLE 11-1 Year of Death or Year of Last Contact
Dt = number of participants who suffer the event of
interest during interval t
© Jones & Bartlett Learning, LLCParticipant ©ofJones & Year
Year Bartlett
of Learning, LLC
Ct = number of participants who are censored during Identification Number Death Last Contact
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
interval t 1 24
Nt* = average number of participants at risk during interval 2 3
t, Nt* =Nt – Ct/2 3 11
4 19
qt = proportion suffering the event of interest during
5 24
interval t, qt = Dt/Nt* 6 13
pt = proportion remaining event-free during interval t, pt = 7 14
1 – qt 8 2
9 18
St = proportion remaining event-free past interval t, St+1 =
10 17
pt+1 ¥ St, where S0 = 1. 11 24
12 21
With the Kaplan–Meier approach, we do not consider
13 12
equally spaced intervals; instead, we re-estimate the survival 14 1
probability at each observed event time. At each observed time 15 10
(event time or censored time), we compute the number of par- 16 23
ticipants at risk at that time (Nt), the number of deaths at that 17 6
time (Dt), the number censored (Ct), and the survival proba- 18 5
19 9
bility (St). The survival probabilities are computed using St+1 20 17
= St ¥ ((Nt+1 – Dt+1)/Nt+1). It is important to note that the cal-
culations using the Kaplan–Meier approach are similar to those
using the actuarial life table approach. The main difference is
the time intervals. With the actuarial life table approach we We now create an indicator variable (coded 0 or 1) to indi-
consider equally spaced intervals, while with the Kaplan–Meier cate whether the participant suffered the event of interest (death
approach we use observed event times and censoring times. in this example) or not. We again use the IF function, and if a
Example 11.1. In Example 11.2 in the textbook, we ana- year of death is recorded in column A we assign a 1; otherwise,
lyzed a small prospective cohort study with death as the pri- we assign a 0 to indicate that the observed time is censored. The
mary outcome. The study involved participants who were 65 event variable is created in column E as shown in Figure 11-3.
years of age and older who were followed for up to 24 years. Using Excel, we now construct the life table using the time
The study involved 20 participants (n = 20) who were enrolled variable and the event indicator. (It is not necessary to sort the
over a period of 5 years and followed until death, until the data to construct the table, although the data sorted by time
study ended, or until they dropped out of the study (lost to does facilitate interpretation.)
follow-up). The data are shown in Table 11-1. We use Excel to To construct the life table, we first organize the follow-up
construct a life table using the actuarial approach. The data are times into equally spaced intervals. In this example we have a
entered into Excel as shown in Figure 11-1. maximum follow-up of 24 years, and we consider 5-year in-
To construct the life table, we create two new variables. tervals (0–4 years, 5–9 years, 10–14 years, 15–19 years, and
The first is the observed time (either year of death or year of last 20–24 years). In Excel we now create two new variables that in-
contact). We create the time variable in column D, and we use dicate the start and end of each interval. The start and end of
the IF function to copy either the year of death or the year of last the desired intervals are entered into columns G and H, re-
contact from column A or B, respectively, depending on which spectively, as shown in Figure 11-4.
was measured. Specifically, in cell D2 we enter “=IF(A2>0, Next, we sum the number of participants who are alive at
A2,B2)”. Recall that the IF function checks the specified crite- the beginning of each interval (Nt), and the number who die
rion (in this case, whether A2>0). Here, if the criterion is met, (Dt) and the number who are censored (Ct) in each interval.
then the value of A2 is placed in cell D2, otherwise the value of To compute these sums, we use the COUNTIF function. For
B2 is placed in cell D2. The formula is then copied from cell D2 example, to compute the number of participants alive at the
into cell D3 through cell D21 (see the formula in the top menu beginning of each interval we specify “=COUNTIF(D2:
bar for the entry in cell D21 in Figure 11-2). D21,“>=”start of interval)”, where D2:D21 refers to the range

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 159

Estimating the Survival Function 159

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-1 Data for Life FIGURE 11-2 Creating the Time
Table Using Actuarial Approach Variable

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 11-3 Creating the Event Indicator FIGURE 11-4 Entering the Start and End of Intervals
for the Life Table

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 160

160 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
of the observed times, and start of interval refers to the value and then subtract the number of participants with year of
in cell G2 through cell G6. We illustrate the computation of Nt death greater than 4 years. Specifically, in cell J2 we enter
for each interval in Figure 11-5. “=COUNTIF(A$2:A$21,“>=”&G2) – COUNTIF(A$2:A$21,
Note in cell I6, we specify “=COUNTIF(D$2:D$21,“>=” “>”&H2)”. We then copy this formula from cell J2 into cell J3
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
&G6)”. We use absolute cell references to indicate the data through cell J6, as shown in Figure 11-6.
NOT FOR SALE OR DISTRIBUTION
range (i.e., D$2:D$21) so that when the formula is copied
NOT FOR SALE OR DISTRIBUTION
We use the same approach to sum the number of partic-
from cell I2 to cell I6 we continue to count the same data. ipants who are censored (Ct) in each interval. We again use
The second argument to the COUNTIF function is the cri- the COUNTIF function. However, we sum the number of par-
terion we wish Excel to consider. Specifically, for each inter- ticipants with year of last contact in each interval as follows. For
val we want © to
Jones & times
count all Bartlett Learning,
that are greater thanLLC
or equal © Jones
interval 1 (0–4 years), we sum& theBartlett
number ofLearning, LLC
participants with
to the startNOTof theFOR SALE OR DISTRIBUTION
interval. In row 6 this is indicated by “>=” year of last NOT FOR SALE OR DISTRIBUTION
contact between 0 and 4 years of follow-up (i.e., if
&G6, for example. To consider a cell reference in the criterion 0 £ year of last contact £ 4). Specifically, in cell K2 we enter
of the COUNTIF function, we include “&” before the cell “=COUNTIF(B$2:B$21“>=”&G2) – COUNTIF(B$2:B$21,
address. “>”&H2)”. We then copy this formula from cell K2 into cell K3
We now sum the number of participants who die (Dt) in through cell K6, as shown in Figure 11-7.
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
each interval. We again use the COUNTIF function. However, We now compute Nt*, the average number of participants
NOT FOR
we sum SALE OR of
the number DISTRIBUTION
participants with year of death in each NOT
at riskFOR
in eachSALE
intervalOR
usingDISTRIBUTION
Nt* = Nt – Ct/2. This is shown in
interval as follows. For interval 1 (0–4 years), we sum the num- Figure 11-8.
ber of deaths that occur between 0 and 4 years of follow-up Next we compute the proportion who suffer the event of
(i.e., if 0 £ year of death £ 4). To implement the two criteria interest in each interval (in this example, the proportion who
(year of death greater than or equal to 0 years and year of death die), qt = Dt/Nt*. This is shown in Figure 11-9.
less than or equal to 4 years), we first sum the number of par- Next we compute the proportion who remain event-free
ticipants with year of death greater than or equal to 0 years in each interval, pt = 1 – qt. This is shown in Figure 11-10.

FIGURE 11-5 Computing the Number at Risk, Nt

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 161

FIGURE 11-6 Computing Number of Events, Dt


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 11-7 Computing the Number Censrored, Ct

161

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 162

162 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-8 Computing the Average Number at Risk, Nt*

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 11-9 Computing the Proportion who Suffer Event, qt

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 163

Estimating the Survival Function 163

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-10 Computing Proportion who Remain Event-Free, pt

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 11-11 Computing the Survival Probabilities, St

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 164

164 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
The final step is to compute the survival probabilities, St,
using St+1 = pt+1 ¥ St. Recall, S0 = 1 and thus we compute the FIGURE 11-12 Data for Life Table
survival probability for the first interval by specifying “=N2*1” Using Actuarial Approach
in cell O2. The survival probabilities for the subsequent inter-
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
vals use the formula St+1 = pt+1 ¥ St, as shown in Figure 11-11.
NOT FOR SALE OR DISTRIBUTION
While it is tedious to construct the life table the first time,
NOT FOR SALE OR DISTRIBUTION
once the structure is entered into Excel, construction of life
tables for new datasets is relatively easy. Only the data range in
the formulas must be modified.
Example© Jones & Bartlett
11.2. Consider Learning,
a prospective LLC
cohort study where © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
30 participants (n = 30) are enrolled and followed until time of NOT FOR SALE OR DISTRIBUTION
death, until the study ends, or until they drop out of the study
(i.e., are lost to follow-up). The data are shown in Table 11-2,
and follow-up times are measured in years.
We use Excel to construct a life table using the actuarial ap-
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
proach and the template we developed for Example 11.1. We
NOT FOR SALE
first copy OR DISTRIBUTION
the worksheet with all of the formulas we developed NOT FOR SALE OR DISTRIBUTION

TABLE 11-2 Year of Death or Year of Last Contact

Participant Year of Year of


Identification Number Death Last Contact
1 19
2 21
3 6
4 34
5 7
6 23
7 12
8 18 for Example 11.1 and enter the data above into the worksheet
9 11 as shown in Figure 11-12.
10 23 Notice that we copied the formulas from cells D21 and
11 12 E21 into cells D22 and E22 through cells D31 and E31 (see
12 28
Figure 11-3) to accommodate the larger sample size. Because
13 29
14 32 the follow-up times extend to 34 years, we need to add two
15 17 additional time intervals to our life table. Specifically, we add
16 19 25–29 and 30–34 years to cells G7–H7 and G8–H8, respec-
17 21 tively. In addition, we must update the data range from 21 rows
18 29 to 31 rows for the computations of Nt, Dt, and Ct in columns
19 19
20 30
I, J, and K, respectively (see Figure 11-5 through Figure 11-7).
21 30 Finally, we copy the formulas to compute Nt*, qt, pt, and St in
22 16 cell L6 through cell O6 into cell L7 through cell O8. The life
23 21 table for the data in Example 11.2 is shown in Figure 11-13.
24 21 We now illustrate the Kaplan–Meier approach to estimate
25 28
survival probabilities using Excel.
26 18
27 3 Example 11.3. Consider again Example 11.1 (Example
28 24 11.2 in the textbook), where we analyzed a small prospective
29 9 cohort study with death as the primary outcome. The study in-
30 27 volved 20 participants (n = 20) who were 65 years of age and
older, who were followed for up to 24 years until they died,

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


Estimating the Survival Function

until the study ended, or until they dropped out of the study Using the sorted data, in particular the time and event
(were lost to follow-up). The data are shown in Table 11-1. variables, we now construct the life table using the Kaplan–
We now use Excel to construct a life table using the Meier approach. Specifically, we compute the number at risk,
Kaplan–Meier approach. The data were entered into Excel as Nt; the number of deaths, Dt; the number censored, Ct,; and
shown in Figure 11-1 and the two new variables, time and an the survival probability, St, for each time. First, we compute
indicator of event status (event) were created as shown in the number at risk, Nt, using the COUNTIF function.
Figure 11-2 and Figure 11-3, respectively. Specifically, for each time, we count the number of partici-
The first step is to sort the data by the time variable (col- pants whose observed time is greater than or equal to the cur-
umn D). This is done by highlighting the data (column A rent time. For example, at time 1 (1 year), there are 20
through column E) and choosing Sort under the Data option participants at risk (i.e., 20 participants who survive one year
from the top menu bar, as shown in Figure 11-14. or more). To compute the number at risk at the first observed
Once we select the Sort option, Excel presents the dialog time, we enter the following into cell G2: “=COUNTIF(D$2:
box shown in Figure 11-15. We specify that the primary sort D$21“>=”&D2)”. We then copy this formula from cell G2 into
variable is time and that we want the data sorted by time in as- cell G3 through cell G21 as shown in Figure 11-17.
cending order. We specify that we then wish to sort the data by Next we compute the number of events (deaths) at each
the variable event and that we want the data also sorted by time, Dt. This is done using the IF function. Specifically, at
event in descending order. This is shown in Figure 11-15. each time, if the event (column E) occurs (event = 1), then we
Notice that we also indicate that there is a Header row in the count that event. If the event does not occur (event = 0), we do
dataset (option at bottom of dialog box). When we indicate not. In cell H2 we enter “=IF(E2=1,1,0)”. Recall that the IF
that there is a Header row, Excel unselects the first row. When function checks the specified criterion (in this case whether
we click OK, Excel produces the data shown in Figure 11-16. E2 = 1). Here, if the criterion is met then a “1” is placed in cell
166 Survival Analysis

FIGURE 11-14 Sorting the Data by Time FIGURE 11-15 Sorting by Time in Ascending
and Event Order

FIGURE 11-16
95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 167

Estimating the Survival Function 167

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
H2, otherwise a “0” is placed in cell H2. This is shown in Figure In Chapter 11 of the textbook, we also computed stan-
11-18, and the formula is copied from cell H2 into cell H3 dard errors of the survival estimates. Several formulas are
through cell H21. used to produce standard errors; we discussed Greenwood’s
Next we compute the number of participants who are cen-
© Jones & Bartlett Learning, LLCformula, © Jones Dt &, where
Bartlett Learning, LLC
sored at each time, Ct. This is again done using the IF function. SE(St ) = St ∑ the quantity
NOT FOR SALE OR DISTRIBUTION
Specifically, at each time, if the event (column E) is coded 0
NOTN t
( N t FOR
− D t SALE OR DISTRIBUTION
)
(censored) then we count that event as censored. For example, D t
where the quantity is summed for numbers at
in cell I2 we enter “= IF(E2=0,1,0)”. This is shown in Figure N t
( N t
− Dt )
11-19, and the formula is copied from cell I2 into cell I3 risk (Nt) and numbers of deaths (Dt) occurring through the
© Jones
through & Bartlett Learning, LLC
cell I21. time of © Jones
interest (i.e.,&cumulative,
Bartlett across Learning,
all timesLLC
before the
NOT FOR SALE OR DISTRIBUTION
The final step is to compute the survival probabilities using time of NOT FOR SALE OR DISTRIBUTION
interest). In Example 11.4 we use Excel to compute
St+1 = St ¥ ((Nt+1 – Dt+1)/Nt+1). Before computing the survival standard errors for the survival estimates for the data in
probabilities at each time, we first insert a row to represent the Example 11.3. We also produce 95% confidence intervals for
start of the study (i.e., baseline or time = 0) using the Insert Row the survival probabilities using St ± 1.96 ¥ SE(St).
option on the top menu bar. In cell D2 we enter 0 (i.e., time = Example 11.4. Consider again Example 11.3, where we
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
0), and in cell J2 we enter 1 (i.e., survival probability = 1, S0 = 1). analyzed a small prospective cohort study with death as the
NOT FOR SALE OR DISTRIBUTION
The survival probabilities for each subsequent time areNOT then FOR SALE
primary OR DISTRIBUTION
outcome. The study involved 20 participants (n = 20)
computing using the formula St+1 = St ¥ ((Nt+1 – Dt+1)/Nt+1), who were 65 years of age and older who were followed for up
as shown in Figure 11-20 (see formula in the top menu bar). to 24 years until they died, until the study ended, or until they
Notice that the survival probabilities only change when there dropped out of the study (i.e., were lost to follow-up). Using
are observed events (in this example, deaths). Censored times do Excel, we estimated the survival function as shown in Figure
not affect the estimates of the survival probabilities. 11-20. We now use Excel to compute standard errors and 95%
Again, while it is tedious to construct the life table using confidence limits for the estimates of the survival probabilities.
the Kaplan–Meier approach the first time, once the structure Before we begin to add the computations, we first hide
is entered into Excel, construction of life tables for new datasets column A through column D of the spreadsheet to allow for
is relatively easy. The new data need to be entered and sorted, better visualization of the additional columns needed for the
and only the data range in the formulas needs to be modified. computations of the standard errors and confidence limits.

FIGURE 11-18 Computing the Number of FIGURE 11-19 Computing the Number Censored, Ct
Events (Deaths), Dt

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 168

168 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-20 Computing the Survival Probabilities, St

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 11-21 Hiding Columns

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


Estimating the Survival Function 169

This is done using the Hide Column option under Format on


the top menu bar as shown in Figure 11-21. FIGURE 11-23 Computing the Standard Errors
We now create the quantity Dt/Nt(Nt – Dt) at each time as
shown in Figure 11-22 (see formula in cell K22 showing in the
top menu bar). Dt
Next we sum the quantities in column K
N t (N t − Dt )
through each time of interest (i.e., cumulative, across all times
before the time of interest). This is done using the SUM func-
tion. For example, in cell L3 we enter “=K3”. In cell L4, we enter
“=K4+L3”. We then copy the formula in cell L4 into cell L5
through cell L22, as shown in Figure 11-23.
The next step is to compute the standard error for each
Dt
time SE(St ) = St ∑ N (N − Dt )
. Specifically, we multiply
t t

the survival probability (column J) by the square root of the


Dt
sum of the quantities ∑ (column L) to produce
N t (N t − Dt )
the standard errors at each time. This is shown in Figure 11-24.
The next step is to compute the margins of error for the
95% confidence intervals using 1.96 ¥ SE(St). This is shown in
Figure 11-25.

FIGURE 11-22 Computing the Standard Errors


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 170

170 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-25 Computing the Margins of Error

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 11-26 95% Confidence Limits for the Survival Probabilities

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 171

Plotting a Survival Function 171

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
The last step is to produce the 95% confidence limits using
the formula point estimate ± margin of error. In this case, we use FIGURE 11-27 Data for the Graphical Display
St ± 1.96 ¥ SE(St). In Figure 11-26 we create two new variables
representing the lower and upper limits of the 95% confidence
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
interval, respectively. The lower limit is computed by subtract-
NOT FOR SALE OR DISTRIBUTION
ing the margin of error (column N) from the survival probability
NOT FOR SALE OR DISTRIBUTION
(column J), and the upper limit is computed by adding the mar-
gin of error (column N) to the survival probability (column J).
The limits are computed in Figure 11-26.
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
11.2 PLOTTING
NOT FOR SALEA OR
SURVIVAL FUNCTION
DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
A graphical display of the survival function is a very useful
means of reporting or presenting survival information. A
graphical display of the Kaplan–Meier survival curve can be
produced from the life table we created using Excel.
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
Example 11.5. Consider again Example 11.3 where we an-
NOT FOR SALE
alyzedOR DISTRIBUTION
a small prospective cohort study with death as the NOTpri- FOR SALE OR DISTRIBUTION
mary outcome. The study involved 20 participants (n = 20)
who were 65 years of age and older who were followed for up
to 24 years until they died, until the study ended, or until they
dropped out of the study (i.e., were lost to follow-up). Using
Excel we estimated the survival function as shown in Figure 11-
20. We now use Excel to create a graphical display of the sur-
vival function.
In Chapter 11 of the textbook, we presented graphical dis-
plays of survival functions which we produced in Excel and
showed time along the x-axis and the survival probabilities along
the y-axis. In order to produce the graphical displays, which take FIGURE 11-28 Using the Chart Wizard
the form of step functions, some manipulation of the data is re-
quired. Here we detail the steps needed to produce the displays.
The data for the graphical display are shown in Figure
11-27. The key data elements are the time (column D) and
the survival probability (column J). Notice that the observed
data (entered in column A and column B) as well as a blank
column C are hidden.
Excel has a number of built-in graphical displays which are
available using the Chart Wizard (see Chapter 3 of the work-
book for examples using the Chart Wizard). We use the Chart
Wizard here to produce the displays of the survival function.
However, we first need to do some formatting of the data. To
illustrate why the formatting is needed, consider the following.
Suppose we open the Chart Wizard by clicking the chart icon
in the top menu bar. This is shown in Figure 11-28.
Once we select the Chart Wizard, Excel opens a dialog box
with various options. We first select the chart type. In this case,
we select the XY (Scatter) type from the list on the left and
then the bottom left option from the chart sub-types. Once
we click Next, Excel then asks for the range of the data for the
display (see Figure 11-29). In the Data Range input field, we

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 172

172 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-29 Data for the Display FIGURE 11-30 Specifying a Title and Labels for the
Axes

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

specify “D1:D22,J1:J22”. The data for the x-axis (time) is con-


tained in column D, and the data for the y-axis (survival prob-
ability) is contained in column J. FIGURE 11-31 Specifying a Location for the Display
As soon as the data range is entered, Excel shows a pre-
liminary version of the graph in the upper portion of the dia-
log box. Once we click Next, Excel then asks for a chart title as
well as labels for the x- and y-axes (see Figure 11-30).
We enter the chart title and the labels for the x- and y-
axes. When we click Next, Excel then asks where we would like
to place the display. The options are a new worksheet in the
current Excel workbook or an object (an overlay) on the cur-
rent worksheet. In Figure 11-31 we select a new worksheet and
provide a name for the new worksheet. We click Finish, and
Excel generates the new worksheet containing the display
shown in Figure 11-32.
We will make a number of formatting changes in the dis-
play prior to presenting it. However, notice that the display is
not a step function. The survival curve connects observed
changes in survival over time using diagonal lines (i.e., using
interpolation). We do not wish to interpolate; instead, we want
to show steps, i.e., vertical lines from an observed survival
probability connecting to a horizontal line at the next observed
survival probability, taking a shape like an “L” at each transi-
tion point.

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 173

Plotting a Survival Function 173

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-32 The Display

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning,


In order to produceLLC © the
a step function, we must format Jonestransition
& Bartlett Learning,
point (which LLCto inserting a row before
is equivalent
data before invoking
NOT FOR SALE OR DISTRIBUTION the Chart Wizard as follows. Using the each observed event
NOT FOR SALE OR DISTRIBUTION(death) in the dataset). This is done by plac-
data shown in Figure 11-27, we first must replace the formu- ing the cursor on a specific row and selecting the Insert Row op-
las that we used to compute the survival probabilities with tion from the top menu bar as shown in Figure 11-34. The
their calculated values. We do this is so that, in the process of resulting worksheet after inserting rows at each transition point
formatting the data to generate the plot, we do not inadver- is shown in Figure 11-35.
tently change the © Jonesof&the
estimates Bartlett
survival Learning,
probabilities.LLC The next step involves©insertingJonesdata & Bartlett
(times and Learning,
survival LLC
Specifically, we want NOT FORtheSALE
to replace OR
formulas DISTRIBUTION
currently in cell probabilities) into the newlyNOT FOR
inserted SALE
rows. OR in
Specifically, DISTRIBUTION
each
G2 through cell J21 with their computed values. This is done newly inserted (blank) row, we copy the time from the cell
by highlighting these cells and clicking on the copy icon along above and the survival probability from the cell below. For
the top menu bar. This generates a flashing dashed line around example, in cell D3 we enter “=D2” (the time from cell D2)
© Jones & Bartlett Learning, LLC
the highlighted cells. We then click on the arrow next to the © Jones
and in cell &“=J4”
J3 we enter Bartlett Learning,
(the survival LLC
probability from cell
paste
NOTicon FOR alongSALE
the top menu bar and select the Values option
OR DISTRIBUTION J4). TheNOT
same FOR
procedure
SALEis followed for rows 6, 8, 16, 18, and
OR DISTRIBUTION
as shown in Figure 11-33. 24. For example, in cell D24 we enter “=D23” and in cell J24 we
The next step is to format the data so that, at each transition enter “=J25”. The updated worksheet is shown in Figure 11-36.
point (i.e., change in survival probability), the graphical display Using the updated worksheet, we follow the same steps
shows a step. This is done by inserting a new row in between each illustrated in Figure 11-28 through Figure 11-32. Note that the
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


174 Survival Analysis

FIGURE 11-33 Replacing Formulas with Their FIGURE 11-34 Inserting a Row
Calculated Values

FIGURE 11-35 Inserting Rows at Each Transition


Point
95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 175

Plotting a Survival Function 175

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
data range is now larger to include the newly inserted rows. Next, we remove the legend by clicking on the legend box
The data for the x-axis now reside in cell D1 through cell D28, and clicking Delete. We do the same to the horizontal lines in
and the data for the y-axis now reside in cell J1 through cell J28. the display. If we click on any of the lines and click Delete, all
Thus, when specifying the data range (see Figure 11-29), we in- of the horizontal lines are removed. Finally, we format the y-
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
dicate “D1:D28,J1:J28”. Following the steps outlined above, we axis from 0 to 1. This is done by double-clicking anywhere
NOT FOR SALE OR DISTRIBUTION
produce the display shown in Figure 11-37.
NOT FOR SALE OR DISTRIBUTION
along the y-axis, which brings up the dialog box shown in
We now format the display for presentation. First, we Figure 11-39. Under the Scale tab, we specify the maximum as
change the background from grey to white (or no background 1.0 (instead of 1.2, which was the default). Clicking OK updates
color). This is done by double-clicking on any part of the back- the y-axis as shown in Figure 11-40.
© Jones
ground, which&opens
Bartlett Learning,
up the dialog LLC
box shown in Figure 11-38. The©display
Jones can &nowBartlett Learning,
be easily copied LLC
into papers, reports,
NOT FOR SALE OR DISTRIBUTION
Under the Area section, we select None. or other NOT FOR SALE OR DISTRIBUTION
presentations. In the survival curve shown in Figure

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-37 The Display

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 176

176 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-38 Formatting the Display

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

11-40, the symbols represent each event time, either a death or fidence limits as shown in Example 11.4 and in Figure 11-26.
a censored©time.
Jones & Bartlett Learning, LLC © Jones
We then format the & Bartlett
data to produce the stepLearning, LLC
functions (in this
Sometimes
NOT itFOR
is of interest
SALE to OR
plot the estimates of the sur-
DISTRIBUTION case for the estimates as well as the upper
NOT FOR SALE OR DISTRIBUTIONand lower limits of
vival probabilities (as shown in Figure 11-40) along with 95% the confidence interval). We illustrate the approach in the next
confidence limits. This is done using exactly the same approach example.
used to plot the survival function. Specifically, we first gener- Example 11.6. Consider again the data in Example 11.4,
ate the estimates of the survival probabilities and the 95% con- where we generated estimates of survival probabilities and 95%
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 177

Plotting a Survival Function 177

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-39 Rescaling the y-Axis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

confidence limits. The results of the computations are shown menu bar, and then selecting the Values option next to the
©Figure
in Jones & Bartlett
11-26. To facilitateLearning, LLC
interpretation, we first hide the © Jones
paste icon & menu
on the top Bartlett Learning,
bar. Next LLCat each
we insert rows
columns in the worksheet
NOT FOR SALE OR DISTRIBUTIONthat are not directly related to the transition in survival probability (to
NOT FOR SALE OR DISTRIBUTION produce a step function)
graphical display. We then replace the formulas in the columns as shown in Figure 11-41.
of the worksheet that are still showing by their calculated val- Because survival probabilities are between 0 and 1, in-
ues. This is done by first highlighting the cells (columns D, J, clusive, we recode any estimates to that range. Specifically,
O, and P in Figure 11-41), clicking on the copy icon on the top we recode the estimates of the upper limits from above 1.0 to
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 178

178 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-40 Plot of the Survival Function

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

1.0 as needed. This is shown in Figure 11-42. In addition, in and in cells J24, O24, and P24 we enter “=J25”, “=O25”, and
each newly inserted (blank) row, we copy the time from the “=P25”, respectively. The updated worksheet is shown in
cell above©and the survival
Jones probability
& Bartlett and the lower
Learning, LLC and Figure 11-42. © Jones & Bartlett Learning, LLC
upper limits of the confidence interval from the cells below. We again use the Chart
NOT FOR SALE OR DISTRIBUTION NOT FORWizardSALE to create the display. The
OR DISTRIBUTION
For example, in cell D3 we enter “=D2” (the time from cell easiest way to produce the plot is to highlight the data shown
D2) and in cells J3, O3, and P3 we enter “=J4”, “=O4”, and in Figure 11-42 (specifically, the data for the x-axis in column
“=P4”, respectively. The same procedure is followed for rows D and the data for the y-axis in column J, column O, and col-
6, 8, 16, 18, and 24. For example, in cell D24 we enter “=D23”, umn P). With the data highlighted, we click on the Chart
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 179

Plotting a Survival Function 179

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-41 Inserting Rows FIGURE 11-42 Data to Display
at Each Transition Point Survival Probabilities and 95%
Confidence Limits
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
Wizard icon as shown
NOT FOR SALE OR DISTRIBUTION in Figure 11-43, and we select the
NOTXY FOR SALE
An additional formatting step is needed to change the col-
OR DISTRIBUTION
(Scatter) plot under Chart Types and the chart subtype show- ors of the lines (the estimated survival probability and the
ing in the bottom left corner. lower and upper limits of the 95% confidence interval) to
When we click Next, Excel produces a preliminary ver- black. This is done by clicking on any part of the line (one at
sion of the display, and Excel then asks for a chart title as well a time). For example, suppose we click on the estimated sur-
© Jones
as labels for the x- and & Figure
y-axes (see Bartlett Learning, LLCvival probability line (the middle
11-44). © Jones & Bartlett
line shown in FigureLearning,
11-46). LLC
Once we enter NOTthe titleFOR SALE
and labels, ExcelOR DISTRIBUTION
prompts for a lo- NOT
This brings up the dialog box FOR
shown SALE
in Figure OR
11-47, DISTRIBUTION
where we
cation for the display (see Figure 11-31). If we select a new specify a new line color of black under the Line option in the
worksheet as the location, Excel produces the display shown in Patterns tab. We also change the color of the marker (both the
Figure 11-45. foreground and the background) to black using the same ap-
© Jones & Bartlett Learning, LLC
Again, there are a number of formatting changes that we proach.© Jones
This is done& Bartlett
under Learning,
the Markers option in LLC
the Patterns
make
NOTprior
FOR to presenting
SALE OR the display. First, we change the back-
DISTRIBUTION tab. NOT FOR SALE OR DISTRIBUTION
ground color to white or to no color, remove the legend and When we click OK, Excel updates the color scheme. We do
horizontal lines, and rescale the y-axis to a maximum of 1.0. the same for the upper and lower limits of the 95% confidence
(All of these changes were illustrated in Example 11.5.) The interval. We also perform one additional step in formatting
formatted display is shown in Figure 11-46. the confidence limits. We change the style of the lines to dashed
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 181

Plotting a Survival Function 181

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-46 The Formatted Display

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning,


FIGURE 11-47 LLC
Changing the Line Color
© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 182

182 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
(as opposed to solid) to distinguish between the estimates of observational study, we might be interested in comparing sur-
the survival probabilities (shown in solid black) from the con- vival between men and women, or between participants with
fidence limits (shown in dashed black). The style of the lines and without a particular risk factor (e.g., hypertension or di-
can be changed by clicking on the arrow to the right of the abetes).
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
Style box under the Patterns tab shown in Figure 11-47. The There are several tests available to compare survival among
NOT FOR SALE OR DISTRIBUTION
final display is shown in Figure 11-48.
NOT FOR SALE OR DISTRIBUTION
independent groups. In Chapter 11 of the textbook we pre-
Notice that we also select triangles as markers for both sented the log-rank test, which is a popular test to compare
the upper and lower limits of the 95% confidence interval. survival between two or more independent groups. The test
compares the entire survival experience between groups and
© Jones & Bartlett
11.3 COMPARING SURVIVAL CURVES Learning, LLC can be thought of ©asJones
a test of & Bartlett
whether Learning,
the survival LLC
curves are
NOT FOR SALE OR DISTRIBUTION identical NOT FOR SALE OR DISTRIBUTION
(overlapping) or not. Survival curves are estimated for
In many survival analysis applications, we are interested in as- each group, considered separately, using the Kaplan–Meier ap-
sessing whether there are differences in survival among dif- proach and are compared statistically using the log-rank test.
ferent groups of participants. For example, in a clinical trial The log-rank test is computed using the five-step approach for
with a survival outcome, we are often interested in comparing hypothesis testing. The test statistic for the log-rank test is as
© Jones & Bartlett
survival Learning,
between participants LLC a new drug as com- © Jones & Bartlett
receiving Learning, LLC
( Σ Ojt − Σ Ejt )2
NOT FOR
pared SALE OR(or
to a placebo DISTRIBUTION
other appropriate comparator). In an NOT FOR SALE
follows: χ 2
= ∑ OR Σ Ejt
DISTRIBUTION
, where Σ Ojt represents the

FIGURE 11-48 Estimates of Survival Probabilities with 95% Confidence Limits


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 183

Comparing Survival Curves 183

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
sum of the observed number of events in the jth group (e.g., j
= 1, 2) and Σ Ejt represents the sum of the expected number of FIGURE 11-49 Data for the Log-Rank Test
events in the jth group over time and is approximately dis-
tributed as a c2 test statistic. Using Excel we can compute the
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
value of the test statistic and compute a p-value using the
NOT FOR SALE OR DISTRIBUTION
CHIDIST function. The log-rank test statistic has degrees of
NOT FOR SALE OR DISTRIBUTION
freedom equal to k – 1, where k represents the number of com-
parison groups.
Example 11.7. In Example 11.3 in the textbook we ana-
© Jones
lyzed & Bartlett
a small clinical Learning,
trial comparing LLC
two combination treat- © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
ments in patients with advanced gastric cancer. Twenty NOT FOR SALE OR DISTRIBUTION
participants with stage IV gastric cancer who consented to par-
ticipate in the trial were randomly assigned to receive
chemotherapy before surgery or chemotherapy after surgery.
The primary outcome was death and participants were fol-
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
lowed for up to 48 months (4 years) following enrollment into
NOT FOR SALE ORThe
the trial. DISTRIBUTION
experiences of participants in each arm NOTof the FOR SALE OR DISTRIBUTION
trial are shown in Table 11-3. The question of interest is
whether there is a difference in survival between the two treat-
ments. The test is run at a 5% level of significance and the hy-
© Jones & Bartlett Learning, LLCGroup indicates whether the
potheses are as follows: © Jones & Bartlett Learning, LLC
events occurred in group 1 or in
NOT FOR SALE OR DISTRIBUTION NOT FOR
group 2. The two new variables are shown SALE OR11-50.
in Figure DISTRIBUTION
H0: The two survival curves are identical (or S1t = S2t).
Notice that the event times are not sorted; they are simply
H1: The two survival curves are not identical copied from column A and column D. The next step is to sort
(or S1t π S2t at any time t). the data. Specifically, we sort the data by the two new vari-
© Jones & Bartlett Learning, LLC © by
ables, first Jones & Bartlett
event times Learning,
in ascending LLC
order and then by group
NOT
The dataFOR SALE
are entered intoOR
ExcelDISTRIBUTION
as shown in Figure 11-49. NOTorder.
in ascending FORThis SALE
is doneOR DISTRIBUTION
by highlighting the data (col-
To prepare the data for the computations, we now create umn A through column H) and selecting sort under the Data
two new variables, event times and group. The event times are option on the top menu bar. Once we select the Sort option,
the times of the observed events (deaths in this example). Excel presents the dialog box shown in Figure 11-51. We spec-

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR 11-3
TABLE DISTRIBUTION NOT FORFIGURE
Month of Death or Month of Last Contact SALE 11-50
OR DISTRIBUTION
Creating New Variables
in Each Treatment Group

Chemotherapy Chemotherapy
Before Surgery After Surgery
© Jones & Bartlett Learning,
Month Month
LLC © Jones & Bartlett Learning, LLC
Month NOT
of Last FOR SALE
MonthOR DISTRIBUTION
of Last NOT FOR SALE OR DISTRIBUTION
of Death Contact of Death Contact
8 8 33 48
12 32 28 48
26 20 41 25
© Jones
14 & Bartlett
40 Learning, LLC 37 © Jones & Bartlett Learning, LLC
NOT21 FOR SALE OR DISTRIBUTION 48 NOT FOR SALE OR DISTRIBUTION
27 25
43

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 184

184 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
Next, we want to compute the numbers of participants at
FIGURE 11-51 Sorting the Data risk in each group at each observed event time. To do this, we
use the COUNTIF function. To compute the numbers of par-
ticipants in group 1 at risk at each event time, N1t, we count the
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
numbers of participants in group 1 with observed times (either
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
month of death or month of last contact) that are greater than
or equal to each observed event time in column G. Specifically,
in cell J2 we enter “=COUNTIF(A$2:B$7,“>=”&G2)”. Notice
that we specify the range of the data for group 1 as cell A$2
© Jones & Bartlett Learning, LLC through cell B$7 © Jones
using & cell
absolute Bartlett Learning,
references LLC
so that we con-
NOT FOR SALE OR DISTRIBUTION tinue to refer to NOT FOR SALE OR DISTRIBUTION
the same data range when we copy the formula
from cell J2 to cell J3 through cell J10. The numbers of partic-
ipants in group 1 at risk at each event time are shown in Figure
11-53.
We use the same approach to compute the numbers of
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
participants in group 2 at risk at each event time, N2t. We count
NOT FOR SALE OR DISTRIBUTION NOT FOR of
the numbers SALE OR DISTRIBUTION
participants in group 2 with observed times (ei-
ther month of death or month of last contact) that are greater
than or equal to each observed event time in column G.
Specifically, in cell K2 we enter “=COUNTIF(D$2:E$8,“>=”
© Jones & Bartlett Learning, LLC
&G2)”. Again, we specify the range© of Jones & Bartlett
the data for group 2 as Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOTcell
cell D$2 through cell E$8 using absolute FOR SALE
references OR DISTRIBUTION
so that
we continue to refer to the same data range when we copy the
formula from cell K2 to cell K3 through cell K10. The numbers
ify that the primary sort variable is event times and that we of participants in group 2 at risk at each event time are shown
want the data sorted by event times in ascending order. We in Figure 11-54.
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
In the next step, we compute the numbers of observed
specify that we then want the data sorted by group in ascend-
NOT FOR SALE OR DISTRIBUTION NOT OFOR
events in each group, SALE OR DISTRIBUTION
ing order. This is shown in Figure 11-51. Notice that we also in- 1t and O2t. These are computed in
dicate that there is a header row in the dataset using the option columns L and M using the IF function. Specifically, in col-
at the bottom of dialog box. When we indicate that there is a umn L we compute the numbers of observed events in group
Header row, Excel unselects the first row. 1 by specifying “=IF(H2=1,1,0)” in cell L2 and copying this
© Jones &
TheBartlett Learning,
sorted data are shown inLLC
Figure 11-52. © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-52 Sorted Data FIGURE 11-53 Computing the Number of
Participants in Group 1 at Risk at Each Event Time

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 185

Comparing Survival Curves 185

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
events across groups (i.e., summing O1t and O2t) at each event
FIGURE 11-54 Computing the Number of time. This is shown in Figure 11-57.
Participants in Group 2 at Risk at Each Event Time Next we compute the expected number of events in each
group, E1t and E2t, at each event time using: E1t = N1t ¥ (Ot/Nt)
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
for group 1 and E2t = N2t ¥ (Ot/Nt) for group 2. Specifically, in
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
column P we compute the expected number of events in group
1 by entering: “=J2*(O2/N2)” in cell P2 and copying this for-
mula into cell P3 through cell P10. This is shown in Figure 11-
58. We do the same for group 2 by entering “=K2*(O2/N2)” in
© Jones & Bartlett Learning, LLC © Jones
cell Q2 and copying this & Bartlett Learning,
formula into cell Q3 through LLC cell Q10.
NOT FOR SALE OR DISTRIBUTION The NOT FOR SALE OR DISTRIBUTION to
final step is to compute the test statistic. However,
do so we need the total number of observed events in each
group (i.e., the sums of O1t and O2t, respectively) and the total
number of expected events in each group (i.e., the sums of E1t
and E2t, respectively). These are computed using the SUM
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
function as shown in Figure 11-59.
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE The nextOR stepDISTRIBUTION
is to compute the log-rank test statistic:
( Σ O − Σ E )2
χ =∑ , where Σ Ojt represents the sum of the
2 jt jt
formula into cell L3 through cell L10. Similarly, in column M Σ E jt
we compute the numbers of observed events in group 2 by observed number of events in the jth group (e.g., j = 1, 2) and
© Jones
specifying “=IF(H2=2,1,0)” & Bartlett
in cell Learning,
M2 and copying this for-LLCΣ Ejt represents the sum of©the Jones
expected& number
Bartlett Learning,
of events in LLC
NOT FOR
mula into cell M3 through cell M10.SALE OR DISTRIBUTION
The numbers of observed the jth group over time. We NOT FORtheSALE
compute OR DISTRIBUTION
test statistic in two
events in each group are shown in Figure 11-55. steps. First, we take the ratio of the square the difference be-
Next, we compute the total number of participants at risk tween the sum of the observed and the sum of the expected
at each event time, Nt, by summing the numbers of partici- numbers of events to the sum of the expected number of events
pants at risk across groups (i.e., summing N1t and N2t) at each in each group (see Figure 11-60), and then we sum to produce
© Jones
event & Bartlett
time. This is shown inLearning,
Figure 11-56. LLC the test © Jones
statistic (see & Bartlett
Figure 11-61).Learning, LLC
NOT FOR SALE OR DISTRIBUTION
Next, we compute the total number of observed events at Thus, c = 6.148. We now OR
NOT 2 FOR SALE DISTRIBUTION
compute a p-value for the test
each event time, Ot, by summing the numbers of observed 2
using the c distribution and the CHIDIST function in Excel.

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
FIGURE 11-55 Computing theNOT FOR
Number SALEEvents
of Observed OR DISTRIBUTION
in Each Group at
Each Event Time

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 186

186 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-56 Computing the Total Number of Participants at Risk at Each Event
Time

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-57 Computing the Total Number of Observed Events at Each Event
Time

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

The CHIDIST © Jones &requires


function Bartlett Learning,
specification LLC
of the test sta- Because the© Jones
p-value & Bartlett
= 0.013 Learning,
is less than LLC
the level of sig-
2
tistic (c ) NOT
and theFOR
degreesSALE
of freedom. nificance a = 0.05,
ORThe log-rank statistic has
DISTRIBUTION we reject H . We have
NOT FOR SALE OR DISTRIBUTION
0 significant evidence
degrees of freedom equal to k – 1, where k represents the num- at a = 0.05 to show that the two survival curves are different.
ber of comparison groups. In this example, k = 2, so the test sta- Again, while it is tedious to construct the template for the
tistic has 1 degree of freedom. The p-value for the test is computation of the log-rank test the first time, once the struc-
computed as shown in Figure 11-62. ture is entered into Excel, implementation of the test for new
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 187

Comparing Survival Curves 187

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-58 Computing the Number of Expected Events in Group 1 at Each Event Time

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
FIGURENOT FOR
11-59 SALE the
Computing ORTotal
DISTRIBUTION NOT
Numbers of Observed and Expected Events FOR
in Each SALE OR DISTRIBUTION
Group

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones
datasets & Bartlett
is relatively Learning,
easy. Only LLC
the data range in the formulas © Jones
randomized & Bartlett
to receive either the Learning, LLCfocused
brief intervention
must
NOTbeFOR modified.
SALE OR DISTRIBUTION on abstinence
NOT FOR from alcohol
SALEorOR standard prenatal care. The
DISTRIBUTION
Example 11.8. In Example 11.4 in the textbook, we eval- outcome of interest was relapse to drinking. Women were re-
uated the efficacy of a brief intervention to prevent alcohol cruited into the study at approximately 18 weeks gestation
consumption in pregnancy. Pregnant women with a history of and followed through the course of pregnancy to delivery
heavy alcohol consumption were recruited into the study and (approximately 39 weeks gestation). The data are shown in
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 188

188 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-60 Computing the Log-Rank Test Statistic

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 11-61 The Log-Rank Test Statistic


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 189

Comparing Survival Curves 189

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-62 Computing the p-Value for the Log-Rank Test

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
Table 11-4 and indicate whether women relapsed to drinking NOT
To conduct the test, we FOR
first copy theSALE OR
worksheet DISTRIBUTION
with all of
and if so, the time of their first drink measured in the num- the formulas we developed to conduct the log-rank test in
ber of weeks from randomization. For women who did not re- Example 11.7, and we enter the data in Table 11-4 into the
lapse, we recorded the number of weeks from randomization worksheet as shown in Figure 11-63.
© Jones
that they were&alcohol-free.
Bartlett Learning, LLC Next
© we copy the
Jones & observed
Bartlettevent times in column
Learning, LLC A and
The question of interest column D into column G and indicate which group (1 or 2)
NOT FOR SALE OR isDISTRIBUTION
whether there is a difference in NOT FOR SALE OR DISTRIBUTION
time to relapse between women assigned to standard prenatal that each event occurred. We then sort the data by the event
care as compared to those assigned to the brief intervention. times and group variables as shown in Figure 11-51. The sorted
The test is run at a 5% level of significance and the hy- data are shown in Figure 11-64.
potheses are as follows:
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE H OR DISTRIBUTION
0: Relapse-free time is identical between groups.NOT FOR SALE OR DISTRIBUTION
FIGURE 11-63 Data for the Log
H1: Relapse-free time is not identical between groups. Rank Test

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
TABLE 11-4 Number
NOT ofFOR SALE
Weeks to FirstOR DISTRIBUTION
Drink or Number NOT FOR SALE OR DISTRIBUTION
of Weeks Alcohol Free by Treatment Group

Standard Prenatal Care Brief Intervention


Relapse No Relapse Relapse No Relapse
© Jones
19 & Bartlett
20 Learning,
16 LLC 21 © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
6 19 21 15 NOT FOR SALE OR DISTRIBUTION
5 17 7 18
4 14 18
5

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 190

190 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
expected numbers of events in each group at each event time.
FIGURE 11-64 Sorted Data for Analysis The test statistic is then computed along with a p-value.
For this example, c2 = 0.727 (degrees of freedom = k – 1 =
2 – 1 = 1) and the p-value is p = 0.394. Because the p-value ex-
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
ceeds the level of significance (a = 0.05), we do not reject H0.
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
We do not have statistically significant evidence at a = 0.05 to
show that the time to relapse is different between groups.

11.4 COMPARING TWO SURVIVAL CURVES


© Jones & Bartlett Learning, LLC GRAPHICALLY© Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT
In Section 11.2, we used FOR
Excel toSALE
generateOR DISTRIBUTION
graphical displays of
a survival curve in one sample. We outlined the steps to pro-
duce graphical displays of the survival curve as well as displays
that incorporated 95% confidence limits around the estimated
survival probabilities. Here we present techniques to produce
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
graphical displays of Kaplan–Meier survival functions in two
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE
independent OR Excel.
groups using DISTRIBUTION
Example 11.9. Consider again Example 11.7 where we
Next, we update the formulas used to compute the num- compared survival between two competing combination treat-
bers of participants at risk in each group, N1t and N2t, to reflect ments in patients with advanced gastric cancer. In Example
the data ranges for these data © (i.e.,
Jones & Bartlett
A$2:B$5 for group Learning,
1 and 11.7LLC
we conducted a log-rank test and© Jones & Bartlett
found a statistically sig- Learning, LLC
D$2:E$6 for group 2). OnceNOT FOR
the data SALE
ranges OR DISTRIBUTION
are updated in NOT
nificant difference in survival between FOR SALE
competing OR DISTRIBUTION
treatments
cells J2 and K2, the formulas are copied into cell J3 through cell (p = 0.013). Here we use Excel to produce a graphical display
K8 (see formula in top menu bar for cell K8), as shown in of the survival functions.
Figure 11-65. Excel then computes the observed numbers of The first step is to produce the Kaplan–Meier estimates
events, the total numbers of participants and events, and the of the survival probabilities for each group using the template
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

FIGURE 11-65 The Log-Rank Test Statistic and p-Value


© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


we developed in Excel to compute survival probabilities using
the Kaplan–Meier approach shown in Figure 11-20. We copy
the worksheet (Figure 11-20) and enter the data for group 1.
Here we need only to adjust the data ranges for the time and
event variables (column D and column E) to produce the sur-
vival estimates shown in Figure 11-66.
Next we copy the data for group 2 into the worksheet and
again copy the formulas from row 2 through row 11 to produce
the survival estimates for the data in group 2 (see specifically
row 14 through row 23), as shown in Figure 11-67. We again
need to update the data ranges for the computations. For ex-
ample, to compute the number of participants at risk in group
2, we specify the data range as D$14:D$23 in cell G23 (see top
menu bar).
Next we must prepare the data for the display. First, we
hide the columns that are not directly involved in the display
using the Format Column option on the top menu bar. We
retain the time (column D) and the survival probability (col-
umn J). Next we convert the formulas we used to compute the
time and survival probability variables to their calculated val-
ues by highlighting those columns, and clicking on the copy
icon and then on the Values option to the right of the paste
icon. These steps produce the data shown in Figure 11-68.
In order to produce a display with two survival curves, we
need some additional formatting. First, we move the survival
probability for group 2 into column K as shown in Figure 11-69, and enter “0” for the time and “1” for the survival probability
and we remove the header rows for the group 2 data (i.e., remove in each group (i.e., enter “0” for time and “1” for S1t in cell D2
rows 12 and 13 in Figure 11-68) using the Delete option under and in cell J2 for group 1, and “0” for time and “1” for S2t in cell
Edit on the top menu bar. We also rename the columns to S1t and D13 and in cell K13 for group 2). This is shown in Figure 11-70.
S2t to represent survival in groups 1 and 2, respectively. Next we insert rows at each transition point (i.e., changes
Once again, we must format the data to produce the step in survival probability), and we then copy the time from the
functions for the survival probabilities. We first insert a row row above and the survival probability from the row below.
The updated worksheet is shown in Figure 11-71.
We now use the Chart Wizard to create the display. We
highlight the data (column D, column J, and column K) and se-
lect the chart icon in the top menu bar. Once we select the
Chart Wizard, Excel opens a dialog box where we select the
chart type XY (Scatter). This is shown in Figure 11-72.
Once we click Next, Excel then asks for a chart title as well
as labels for the x- and y-axes. We enter this information and
indicate that we would like to place the display in a new work-
sheet. We click Finish, and Excel generates the new worksheet
containing the display shown in Figure 11-73.
The final step is to format the display for presentation.
This involves changing the background from grey to white or
no background color, removing the horizontal lines, rescaling
the y-axis to a maximum of 1.0, rescaling the x-axis to a max-
95313_CH11_157_196.qxd 3/23/11 3:39 PM Page 192

192 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-68 Data FIGURE 11-69 FIGURE 11-70 Data FIGURE 11-71 Data
for the Display Formatting the Data for for the Display to Create Step Functions
the Display
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:40 PM Page 193

Comparing Two Survival Curves Graphically 193

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-72 Using the Chart Wizard

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION
FIGURE 11-73 The Display NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:40 PM Page 194

194 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
FIGURE 11-74 Survival in Each Treatment Group

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
11.5 PRACTICE PROBLEMS 2. A clinical trial is run to assess the effectiveness of a
1. A study is conducted to estimate survival in patients new anti-arrhythmic drug designed to prevent atrial
following kidney transplant. Key factors that adversely fibrillation (AF). Thirty participants (n = 30) enroll
© Jones & Bartlett Learning,
affect success LLC include advanced age © Jonesin&theBartlett
of the transplant Learning,
trial and are randomized LLC
to receive the new
and diabetes.
NOT FOR SALE OR DISTRIBUTION This study involves 25 participants (n = NOT FOR SALE OR DISTRIBUTIONis AF and par-
drug or placebo. The primary outcome
25) who are 65 years of age and older, and all have ticipants are followed for up to 12 months following
diabetes. Following transplant, each participant is fol- randomization. The experiences of participants in
lowed for up to 10 years. The following are times to each arm of the trial are shown in Table 11-5.
death, in years, or the time to last contact (at which a. Estimate the survival functions for each treatment
time participant was ©known
Jones & alive).
to be Bartlett Learning, LLC group using the Kaplan–Meier © Jones & Bartlett Learning, LLC
approach.
NOT FOR SALE OR DISTRIBUTION NOT FOR
b. Test whether there is a significant SALE
difference OR DISTRIBUTION
in sur-
Deaths: 1.2, 2.5, 4.3, 5.6, 6.7, 7.3, and 8.1 years
vival between treatment groups using the log-rank
Alive: 3.4, 4.1, 4.2, 5.7, 5.9, 6.3, 6.4, 6.5, 7.3, 8.2, 8.6, 8.9,
test and a 5% level of significance.
9.4, 9.5, 10, 10, 10, and 10 years
3. Using the results from Problem 2, sketch the survival
a.© Use
Jones
the life&table
Bartlett Learning,
approach to estimate theLLC
survival functions© for
Jones & anti-arrhythmic
the new Bartlett Learning,drug and LLC
the
function. placeboNOTgroups.
NOT FOR SALE OR DISTRIBUTION FOR SALE OR DISTRIBUTION
b. Use the Kaplan–Meier approach to estimate the 4. An observational cohort study is conducted to com-
survival function. pare time to early failure in patients undergoing joint
c. Graph the survival function based on the estimates replacement surgery. Of specific interest is whether
in (b) using Excel. there is a difference in time to early failure between
© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:40 PM Page 195

Practice Problems 195

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
5. Using the results from Problem 4, sketch the survival
TABLE 11-5 Data for Practice Problems 2 and 3 functions for each group (obese and non-obese).
6. A study of patients with stage I breast cancer is run to
Placebo © Jones & Bartlett NewLearning, assess time to progression to stage II over an obser-
Drug LLC © Jones & Bartlett Learning, LLC
Month Month of Month Month of vation period of 15 years. The data are shown in Table
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
11-7. Times to progression are measured in years
of AF Last Contact of AF Last Contact
4 5 7 6 from the time at which the chemotherapy regimen
6 5 9 6 was initiated. Of interest is whether there is a differ-
7 6 12 7 ence in time to progression between women on two
© Jones
8 & Bartlett
7 Learning, LLC 8 ©different
Joneschemotherapy
& Bartlettregimens.
Learning, LLC
9 8 9
NOT11 FOR SALE 9 OR DISTRIBUTION 9 NOT FOR SALE OR DISTRIBUTION
a. Estimate the survival functions (time to progres-
11 10 sion) for each chemotherapy regimen using the
12 10 Kaplan–Meier approach.
12 11 b. Test whether there is a significant difference in
11 time to progression between treatment regimens
© Jones & Bartlett Learning, LLC 12 © Jones & Bartlett Learning, LLC
using the log-rank test and a 5% level of signifi-
NOT FOR SALE OR DISTRIBUTION 12 NOT FOR SALE cance.
OR DISTRIBUTION
7. A clinical trial is conducted to evaluate the efficacy
of a new drug for prevention of hypertension in pa-
tients with pre-hypertension (defined as systolic
patients who are considered obese versus those who
© Jones & Bartlett Learning, LLC © Jones
blood pressure between & Bartlett
120 mmHg and 139Learning,
mmHg LLC
are not. The study is run for 40 weeks, and times to
NOT FOR SALE OR DISTRIBUTION or diastolic blood NOT FOR
pressure SALE
between 80 OR
mmHgDISTRIBUTION
and
early joint failure, measured in weeks, are shown in 89 mmHg). A total of 20 patients are randomized to
Table 11-6 for participants classified as obese or not receive the new drug or a currently available drug for
at the time of surgery. treatment of high blood pressure. Participants are fol-
a. Estimate the survival functions (time to early joint lowed for up to 12 months, and time to progression
© Jonesfailure)
& Bartlett
for eachLearning,
group using LLC
the Kaplan–Meier ©toJones & Bartlett Learning, LLC
hypertension is measured. The experiences of par-
NOT FOR SALE OR DISTRIBUTION
approach. NOT FOR SALE
ticipants in each arm OR
of theDISTRIBUTION
trial are shown in Table
b. Test whether there is a significant difference in 11-8.
time to early joint failure between obese and non-
obese patients undergoing joint replacement sur-
gery using the log-rank test and a 5% level of
© Jones & Bartlett Learning,
significance.LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FORTABLE
SALE11-7 OR Data
DISTRIBUTION
for Practice Problem 6

TABLE 11-6 Data for Practice Problem 4 Regimen 1 Regimen 2


No No
© Jones & Bartlett Learning, LLC Progression © Jones
ProgressionLLC & Bartlett
Progression Learning,
Progression
Obese Not Obese
NOT FOR SALE OR DISTRIBUTION 2 12 NOT FOR 9 SALE OR 11 DISTRIBUTION
Failure No Failure Failure No Failure 6 14 4 14
28 39 27 37 7 13 7 13
25 41 31 36 3 11 9
31 37 34 39 4 15 14
© Jones
32 & Bartlett
35 Learning, LLC 40 © Jones & 10Bartlett Learning, LLC
13
38 36 8 6
NOT FOR SALE 36 OR DISTRIBUTION 32 NOT FOR SALE6
OR DISTRIBUTION 9
29 39 9 7
41 12

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.


95313_CH11_157_196.qxd 3/23/11 3:40 PM Page 196

196 Survival Analysis

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
TABLE 11-8 Data for Practice Problem 7

New© Jones & Bartlett Learning,


Drug LLC © Jones
Currently Available Drug & Bartlett
Learning, LLC
Hypertension Free of Hypertension Hypertension Free of Hypertension
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION
7 8 6 8
8 8 7 9
10 8 9 11
9 10 11
© Jones & Bartlett Learning, LLC
11 11 © Jones & Bartlett LLC Learning,
12
12
NOT FOR SALE OR DISTRIBUTION
12 NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jonesa. &Estimate


Bartlett Learning,
the survival LLC
functions (time to progres-
TABLE
NOT FOR 11-9
SALE OR Data for Practice Problem 8
DISTRIBUTION NOT FORsion SALE OR DISTRIBUTION
to hypertension) for each treatment group
using the Kaplan–Meier approach.
Participant Year of First Year of Last
b. Test whether there is a significant difference in time
Identification Number Surgery Contact
to progression between treatment groups using the
1 © Jones & Bartlett Learning, LLC log-rank test and a 5% ©
8 Jones
level & Bartlett Learning, LLC
of significance.
2 10
3 NOT FOR SALE OR4 DISTRIBUTION NOT
8. The data in Table 11-9 reflect FOR
the time SALE OR DISTRIBUTION
to first sur-
gery in children born with congenital heart disease.
4 4
5 7 Time is measured in years from birth up until the age
6 6 of 10 years. Construct a life table using the
7 9 Kaplan–Meier approach. Also include standard er-
© Jones
8 & Bartlett Learning, LLC
5 rors and©95%
Jones & Bartlett
confidence limits forLearning,
the estimatesLLC
of
NOT
9 FOR SALE OR 3 DISTRIBUTION NOT FOR
survival probability. SALE OR DISTRIBUTION
10 8
11 9 REFERENCE
12 10
13 3 1. Allison, P. Survival Analysis Using the SAS System, Cary, NC. SAS
Institute, 1995.
14 2
© Jones & Bartlett
15
Learning, LLC
6
© Jones & Bartlett Learning, LLC
NOT FOR SALE 16 OR DISTRIBUTION 7 NOT FOR SALE OR DISTRIBUTION
17 8
18 9

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC
NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy