Evaluation of An Observer Form of The Coping Inventory For Stressful Situations
Evaluation of An Observer Form of The Coping Inventory For Stressful Situations
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/231180314
CITATIONS READS
8 1,203
5 authors, including:
Some of the authors of this publication are also working on these related projects:
A Comparison of the Descriptive Information from the MMPI-A-RF and YLS/CMI 2.0 in a Rural Sample
of Juvenile Offenders View project
All content following this page was uploaded by Gary Burns on 31 May 2014.
The present study evaluates a prospective observer form of the Coping Inventory for
Stressful Situations (CISS) by comparing the two forms in terms of factor structure,
mean differences, reliability, and examining correlations between self-report and peer
ratings. A total of 163 pairs of friends complete the CISS and an observer form of the
CISS. Confirmatory factor analysis results indicate that for both rating forms, the
four-factor solution fits better. Although self-rating data fit the theoretical model bet-
ter, the peer ratings show higher reliability. The correlation between self and peer
latent factors is moderate for Avoidance-oriented coping and for its subscales, but low
for Task- and Emotion-oriented coping. Internal consistency coefficients for the CISS
scales are high across type of rating, and a significant cross-form mean difference is
found on the Task latent factor. Overall, the results provide evidence of substantial
measurement equivalence between the self-rating form and the observer form, and the
authors propose its use in dispositional coping research.
T he Coping Inventory for Stressful Situations (CISS; Endler & Parker, 1990,
1999) is a 48-item self-report inventory designed to measure multiple dimen-
sions of coping. Endler and Parker were prompted to develop CISS for a number of
reasons: (a) due to a lack of agreement among researchers about the fundamental
dimensions of dispositional coping; (b) to address psychometric deficiencies
Authors’ Note: Correspondence concerning this article should be addressed to Kyunghee Han, Central
Michigan University, Department of Psychology, Sloan #103, Mt. Pleasant, MI 48859; e-mail:
han1k@cmich.edu.
675
676 Educational and Psychological Measurement
Method
Instruments
CISS. Each participant completed the CISS (Endler & Parker, 1990, 1999), a 48-item
self-report inventory. Instructions for the CISS are as follows: ‘‘The following are ways
people react to various difficult, stressful, or upsetting situations. Please circle a number
from 1 to 5 for each item. Indicate how much you engage in these types of activities
680 Educational and Psychological Measurement
when you encounter a difficult, stressful, or upsetting situation’’ (Endler & Parker,
1990). There are 16 items each for scales measuring Task-oriented coping (focusing on
the problem or analyzing the problem), Emotion-oriented coping (feeling tense or
angry), and Avoidance-oriented coping; within Avoidance coping, Distraction has eight
items (shopping or sleeping) and Social Diversion (being with other people or calling a
friend) has five items. The CISS manual does not explain why three Avoidance items (3,
23, and 32) are scored on neither of the two Avoidance subscales (Endler & Parker,
1990, 1999). We infer that these items were not sufficiently conceptually related to either
of the two Avoidance subdomains. Participants’ item responses were summed to form
the total raw scores for the CISS subscales. The potential range of these scores on the
Task, Emotion, and Avoidance scales is from 16 to 80. The possible range for the Dis-
traction subscale is from 8 to 40; for Social Diversion the range is 5 to 25. Alpha coeffi-
cients have been reported to range from .87 to .92 for Task, from .82 to .90 for Emotion,
from .76 to .85 for Avoidance, from .69 to .79 for Distraction, and from .74 to .84 for
Social Diversion (Endler & Parker, 1999). Test-retest reliabilities of the CISS scales over
a 6-week period were found to be moderate to high (ranging from .51 for Distraction to
.73 for Task) in an undergraduate sample. Many studies support the construct validity of
the CISS scales (Endler & Parker, 1999).
CISS-Peer Form (CISS-PF). Each participant also rated his or her friend on a
peer version of the CISS, identical to the self-report version, except in instructions:
‘‘The following is the 48-item test that you completed earlier. We will administer
the test again, but this time we would like you to rate your friend on the same
items. Please indicate how much your friend engages in these types of activities
when he or she encounters a difficult, stressful, or upsetting situation. For example,
the item ‘Schedule my time better’ should be answered as though it read ‘My friend
schedules his/her time better.’’’ Unlike most personality tests (e.g., Minnesota Mul-
tiphasic Personality Inventory-Revised [Butcher et al., 2001] or NEO Personality
Inventory-Revised [Costa & McCrae, 1992a]), the majority (two thirds) of CISS
items do not contain first-person singular, possessive adjectives, or reflexive pro-
nouns (‘‘I,’’ ‘‘my,’’ or ‘‘myself,’’ respectively); such items therefore do not contain
referents that contradict the peer form instructions. One third of the CISS items do,
however, contain first-person possessive adjectives. Rather than undertake a risky
attempt to modify the instrument to minimize the contradiction between items and
the peer form instructions, we made an effort to ensure that instructions were clear.
Participants were primed to rate peers by the completion of the ADF-F2 immedi-
ately prior to the CISS peer-rating form. In addition, both written and oral instruc-
tions on how to complete the peer-rating form of CISS were provided at the
beginning of testing to ensure that that participants were fully aware of being rating
peers, and an oral reminder was provided approximately midway through the CISS
peer-rating form.
Han et al. / CISS Observer Form 681
Statistical Analysis
Normality assumption check. The relative fit of the two factor models (three vs.
four) and the robustness of the two-factor models across rating methods were tested
using LISREL 8.80 (Jöreskog & Sörbom, 2006) by specifying the appropriate cov-
ariance matrices. Maximum likelihood (ML) estimation assumes that the data
represent a normal distribution. Following the recommendation of Finney and DiS-
tefano (2006), univariate skewness, kurtosis, and multivariate kurtosis were exam-
ined, using cutoff values of 2 for univariate skewness, 7 for univariate kurtosis, and
3 for multivariate kurtosis. No items showed skew or kurtosis that exceeded the
cutoff values, indicating univariate normality. However, a multivariate normality
assumption was not met: Mardia’s normalized multivariate kurtosis (21.11 for self-
report and 24.65 for peer report) exceeded the suggested cutoff value of 3 (Finney
& DiStefano, 2006). Consequently, we used the Satorra-Bentler scaled w2 statistic
(Satorra & Bentler, 1994; S-B w2 ), which is based on ML estimation but adjusts the
w2 statistics and standard errors for nonnormality of the data. Following recommen-
dation by Curran, West, and Finch (1996), we report both normal theory ML and
S-B scaling estimation results for testing the standard CFA model. However, for
measurement invariance, we report results from only normal theory ML; S-B scal-
ing estimation produced failed convergences and required excessive computing
resources.
Overall fit indices. Chi-square goodness-of-fit statistics were used to index the
overall fit of the models. Following the recommendations of Hu and Bentler (1998,
1999), root mean square error of approximation (RMSEA; Browne & Cudeck,
1993) and standardized root mean square residual (SRMR) were used as additional
fit indexes, with cutoff values close to .06 for RMSEA and .08 for SRMR, suggest-
ing good fit. RMSEA estimates the lack of fit in a model compared to a perfect
(saturated) model, whereas SRMR indexes the average difference between sample
variances and covariances and estimated population variances and covariances. In
addition, the nonnormed fit index (NNFI; Bentler & Bonnett, 1980; Tucker &
Lewis, 1973) and the comparative fit index (CFI; Bentler & Bonnett, 1980) tested
model fit against a baseline (independent) model of uncorrelated variables, with
values above .95 suggesting acceptable fit.
Model specification and testing invariance. The relative fit of the two factor
models (three vs. four) was initially tested. The pattern of factor loadings was set to
be consistent with the models specified in the CISS manual (Endler & Parker,
682 Educational and Psychological Measurement
1999). More specifically, one factor loading was freed for each item and all other
cross loadings were fixed to zero. Metric for the latent variables was set by setting
one of the item’s loadings for each variable to unity. Latent variables were also
allowed to covary freely.
Next, we constructed a series of increasingly restrictive measurement models to
test the invariance of the four factor model across rating methods. Because both
raters were rating the same individual, the assumption of independence required
for multigroup CFA was unmet (Cheung, 1999); therefore, to evaluate measure-
ment invariance, a multiple trait–multiple source single CFA model was tested.
Using a framework described by Cheung, the covariance matrix of both self-ratings
and peer ratings for all 326 participants was included in each CFA with separate
latent factors for self-reports and peer reports.
With this method, invariance can be tested through the application of equality
constraints within the same sample rather than in the more common multisample
method. In selecting a specific sequence and method of invariance testing, we inte-
grated the seven models procedures proposed by several researchers (e.g., Byrne &
Watkins, 2003; Cheung, 1999; Cheung & Rensvold, 2000, 2002; Vandenberg &
Lance, 2000). Our Model 1 tested the equality of the overall structure and served as
a base model. This model tests configural invariance or weak factorial invariance
(Horn & McArdle, 1992) in which only the number of dimensions and patterns of
fixed and freed factor loadings are specified to be equal across rating methods. Our
Model 2 included a constraint of equal factor loadings across rating methods, test-
ing whether the strength of the relationships between items and their underlying
constructs are the same across rating methods. This has been labeled a test of fac-
torial invariance (Drasgow, 1984) or metric invariance (Horn & McArdle, 1992).
Because it is prerequisite for testing higher levels of equivalence, most of subse-
quent models were compared against Model 2.
Model 3 included the restriction from Model 2 plus the additional constraint of
equal error or residual variances. Equal residual variance indicates the similar mag-
nitude of the portion of item variance unattributable to the variance of the asso-
ciated latent construct, referred as items’ unique variances or unreliability. Model 4
included a constraint of equal intercepts across rating methods in addition to the
restriction from Model 2. This model tests whether the values of each item corre-
sponding to the zero value of the underlying construct are invariant across rating
methods. Support for this model would suggest the existence of scalar equivalence
(Mullen, 1995) or strong factorial invariance (Meredith, 1993). Model 5 included
the restriction from Model 2 plus the additional constraint of equal factor variances.
It tests whether the two rating forms exhibit an equal range or variance of latent
factors. Model 6 included the restriction from Model 2 plus the additional con-
straint of equal factor covariances, testing whether the relationships among latent
constructs are the same across rating methods.
Han et al. / CISS Observer Form 683
Last, Model 7 included the restriction from Model 4 plus the additional con-
straint of equal factor means, examining whether the mean level of each latent con-
struct is the same across rating methods. Equal intercepts (Model 4) is a necessary
condition for latent mean comparisons because it implies that the measurement
scales have the same zero point (origin). In the presence of intercept invariance,
differences in latent means can be interpreted as true differences in the latent
scale level, unconfounded with differential origin of the latent variable (Byrne &
Watkins, 2003; Cheung & Rensvold, 2000, 2002; Vandenberg & Lance, 2000).
When overall invariance test models were rejected, construct-level constrained
models were tested (Cheung & Rensvold, 2000) to locate the source of misfit and
establish partial measurement invariance. For example, if the overall factor loading
invariance test produced significant misfit, invariance tests for each construct
would be performed.
Traditionally, change in chi-square (w2 ) has been used to index the difference
in fit between models; however, given its sensitivity to sample size, Cheung and
Rensvold (2002) proposed that change in the CFI be used to avoid this bias. A sig-
nificant decrease in fit across models is indicated by a significant change in w2 or
by a decrease in CFI of more than .01.
Results
w2 df p RMSEA (CI90) SRMR NNFI CFI w2 df p RMSEA (CI90) SRMR NNFI CFI
Self 3 2401.77 1077 <.001 .068 (.065~.072) .079 .898 .893 1143.86 1077 .08 .014 (.000~.021) .079 .994 .995
Peer 3 2771.84 1077 <.001 .080 (.077~.083) .095 .896 .901 1341.76 1077 <.001 .028 (.022~.032) .095 .984 .984
Self 4 1890.56 939 <.001 .060 (.057~.064) .072 .910 .914 864.00 939 .956 .000 (.000~.000) .072 1.00 1.00
Peer 4 2179.73 939 <.001 .071 (.067~.074) .084 .916 .920 1120.19 939 <.001 .024 (.018~.030) .084 .988 .988
Note: N = 326; RMSEA = root mean square error of approximation; SRMR = standardized root mean residual; NNFI = nonnormed fit index;
CFI = comparative fit index.
Han et al. / CISS Observer Form 685
of the fit indices suggests that the self-report data provide a closer approximation
than the peer report data to the implied models.
Examination of the standardized structure coefficients for the three-factor and
four-factor solutions (available from the first author) revealed relatively high pri-
mary factor loadings, indicating substantial relationships between items and their
corresponding constructs. For the three-factor solution, all items showed strongest
loadings on intended factors and fairly low loadings on unintended factors, reflect-
ing low correlations among the three factors. In the four-factor solution, however,
there were substantial secondary loadings of Distraction items on the Social Diver-
sion factor and of Social Diversion items on the Distraction factor, which is not sur-
prising considering that these two factors are subsumed under the Avoidance
factor. It is interesting that mean factor loadings for peer report were slightly higher
than those of the self-report, especially for the Task factor (.64 vs. .55).
M1: Base model 6643.53*** 3887 .048 (.046~.050) .069 .900 .903
M2: Equal factor loading 6685.69*** 3928 42.16 (M2 – M1) 41 .048 (.046~.050) .070 .901 .903 .000
M3: Equal error variance 6835.28*** 3973 149.59*** (M3 – M2) 45 .049 (.047~.051) .070 .899 .899 .004
M3a: Equal error variance for Task 6748.03*** 3944 62.34*** (M3a – M2) 16 .048 (.046~.050) .070 .900 .902 .001
M3b: Equal error variance for Emotion 6752.70*** 3944 67.01*** (M3b – M2) 16 .049 (.047~.051) .070 .900 .901 .002
M3c: Equal error variance for Distraction 6704.72*** 3936 19.03*(M3c – M2) 8 .048 (.046~.050) .070 .901 .903 .000
M3d: Equal error variance for Social Diversion 6687.01*** 3933 1.32 (M3d – M2) 5 .048 (.046~.050) .070 .902 .903 .000
M4: Equal intercept 6916.88*** 3969 231.19*** (M4 – M2) 41 .049 (.047~.051) .070 .896 .897 .006
M4a: Equal intercept for Task 6770.85*** 3943 85.16*** (M4a – M2) 15 .049 (.047~.050) .070 .899 .901 .002
M4b: Equal intercept for Emotion 6800.82*** 3943 115.13*** (M4b – M2) 15 .049 (.047~.051) .070 .898 .900 .003
M4c: Equal intercept for Distraction 6714.05*** 3935 28.36*** (M4c – M2) 7 .048 (.046~.050) .069 .901 .902 .001
M4d: Equal intercept for Social Diversion 6688.24*** 3932 2.55 (M4d – M2) 4 .048 (.046~.050) .069 .901 .903 .000
M5: Equal factor variance 6692.86*** 3932 7.17 (M5 – M2) 4 .048 (.047~.052) .071 .901 .903 .000
M6: Equal factor covariance 6702.14*** 3938 16.45 (M6 – M2) 10 .048 (.047~.051) .072 .901 .903 .000
M7: Equal latent mean 6935.45*** 3973 18.57** (M7 – M4) 4 .049 (.047~.051) .069 .895 .896 .000
M7a: Equal mean for Task 6784.23*** 3944 13.38*** (M7a – M4a) 1 .049 (.047~.050) .069 .899 .900 .001
M7b: Equal mean for Emotion 6801.39*** 3944 .57 (M7b – M4b) 1 .049 (.047~.051) .070 .898 .900 .000
M7c: Equal mean for Distraction 6716.57*** 3936 2.52 (M7c – M4c) 1 .048 (.046~.050) .069 .901 .902 .000
M7d: Equal mean for Social Diversion 6690.54*** 3933 2.30 (M7d – M4d) 1 .048 (.046~.050) .070 .901 .903 .000
Note: N = 326; w2 = wmore constrained model – wless constrained model. CFI = CFIless constrained model – CFImore constrained model. RMSEA = root mean square
error of approximation; SRMR = standardized root mean residual; NNFI = nonnormed fit index; CFI = comparative fit index.
*p < .05. **p < .01. ***p < .001.
Han et al. / CISS Observer Form 687
the two sources of ratings. By comparing each model to the appropriate previous
model, a thorough test of measurement invariance was provided.
First, equality constraints were added to the factor loadings, specifying that item
loadings across sources were equal (M2). The constraints did not significantly
reduce fit, w2 (41) = 42.16, p > .05, CFI < .01, demonstrating factorial invar-
iance. Next, additional constraints were added to test the invariance of error terms.
As indicated in Table 2 (M3), inclusion of this set of constraints resulted in a highly
significant reduction in fit according to w2 values but not according to the CFI
value, w2 (45) = 149.59, p < .001, CFI < .01. To locate the source of the signifi-
cant misfit, we repeated the analysis, adding the equality constraints of error terms
for each construct separately (M3a to M3d). The most significant misfit resulted
when error terms of Emotion items were constrained to be equal across rating methods,
w2 (16) = 67.01, p < .001, CFI < .01, followed by Task items, w2 (16) = 62.34,
p < .001, CFI < .01. The highest equality in error variances was found in Social
Diversion items. Although the equality constraints produced a statistical significance
for Distraction items, no change was found in CFI, w2 (8) = 19.03, p < .001,
CFI < .0001.
A further test of measurement invariance involves specifying that intercepts are
invariant (M4). As mentioned previously, invariant intercepts across rating meth-
ods suggest that each item has the same predicted value given the zero value of the
underlying construct across rating methods. The hypothesis of overall intercept
invariance was rejected, w2 (41) = 231.19, p < .001, CFI < .01. Upon examining
the individual constructs for intercept invariance (M4a – M4d), it was found that
Social Diversion was invariant, but Task, Emotion, and Distraction were not (Emo-
tion being the most variant). Based on the CFI criterion, however, overall inter-
cept invariance and construct-level intercept invariance were supported. The next
two models (M5 and M6) tested equality of latent factor variances and covariances
across rating methods. The finding that latent variable variances were invariant
across rating sources suggests that range with respect to the constructs is the same,
w2 (4) = 7.17, p > .1, CFI < .01. Invariance test of factor covariances was also
supported, w2 (10) = 16.45, p > .1, CFI< .01, indicating that ratings from differ-
ent sources possess the same pattern of relationships among factors. A final addi-
tional model (M7) was tested to obtain information about the latent mean of the
measurement models. Building from the previous model (M4), additional con-
straints were added to test the equivalence of latent means across rating methods.
As the final step listed in Table 2 shows, these constraints resulted in a significant
decrease in fit in w2 , w2 (4) = 18.57, p < .001, but it did not decrease fit by the
Cheung and Rensvold (2002) criterion (CFI < .001). Construct-level mean invar-
iance tests (M7a – M7d) showed that the latent means differed significantly only
for Task, with peers providing slightly higher ratings than the self-ratings. To index
the magnitude of these differences, we then calculated an effect size by dividing
the mean difference between factors by the square root of average error variance.
688 Educational and Psychological Measurement
The standardized mean differences were –.87 for Task, .71 for Emotion, –.26 for
Distraction, and –.20 for Social Diversion.
Discussion
Table 3
Correlations among Latent Variables, Alpha Coefficients,
and Confidence Intervals of Alpha Coefficients for the Self and
Peer Coping Inventory for Stressful Situations Scales
Correlations
T_self E_self A_self D_self S_self T_peer E_peer A_peer D_peer a CI95%
The initial CFA results indicate that both self-ratings and peer ratings on the
CISS fit the three-factor and four-factor models adequately based on ML estimation
but fit well using S-B scaling. Results further suggest that the four-factor model
more closely approximates the observed covariance among items. The observed
superiority of the four-factor model is in line with a previous CFA study by Cook
and Heppner (1997). We recommend use of a four-factor model, preferring the
Social Diversion and Distraction subscales to the Avoidance scale. As we discussed
in the introduction, these two scales have demonstrated different patterns of rela-
tionships with other personality and psychopathology variables. Social Diversion
has been associated with constructive methods of coping (seeking help and support
from others), whereas Distraction has been related to self-isolation, a less healthy
way to cope. This pattern of differential relationships were also found in our study;
As indicated in Table 3, Social Diversion was positively correlated with Task, but
Distraction showed no relationship with Task in self-report and a negative correla-
tion with Task in peer report.
A central component of our study involved demonstrating measurement invar-
iance of the CISS across two types of rating method: self and peer. The CFA mea-
surement models examined provided an evaluation of the extent to which both data
sources are equivalently interpreting and using items within each scale and to
which the CISS constructs possess the same pattern of relationships with one
690 Educational and Psychological Measurement
another, regardless of the source. According to the Cheung and Rensvold (2002)
criteria for assessing fit, CFI, the CISS showed all levels of invariance across data
sources. According to the w2 difference test, invariance in factor loading and in fac-
tor variance and covariance was achieved, but there were significant differences in
the error terms, in the intercepts, and in the latent means across data sources. Invar-
iance in factor form and factor loadings suggests the existence of conceptual agree-
ment (Cheung, 1999) between the two data sources, meaning that not only
participants and their friends agree on the number of coping dimensions and on the
particular scale items associated with each dimension but also that they agree on
the strength of the relationship between scale items and the underlying coping
dimensions. An example of the latter invariance is that both individuals and their
peers agree that having a snack or food (#12) or going out for snack (#18) are sub-
stantial indicators of Distraction coping, but that taking time off (#44) is a weaker
marker of Distraction, although they agree that these items both load on the Dis-
traction dimension.
The w2 difference test showed significant differences in the error variances
across the data source. Although specification of invariant error variances and cov-
ariances has been argued to be excessively stringent and of less importance than
the other steps (Byrne, 1998), such an analysis does provide information about the
reliability of the instrument across methods. To better understand this difference in
the model constraining error variances, the values in the baseline model were
examined more closely. It was surprising that this examination revealed that for
almost every item, Emotion and Task items in particular (as indicated in Table 2,
M3a through M3d), the error variances were higher for the self-ratings than for the
peer ratings, indicating that the reliabilities of these items were lower for self-
ratings. The higher error terms of each item also explain the lower magnitude of
factor loadings for self-ratings. This finding may be an artifact of a tendency
toward halo effects in peer ratings. However, the pattern of correlations among
peer-ratings did not significantly differ from self-ratings.
Invariance of factor variances and covariances across data sources was sup-
ported. This result suggests that individuals and their friends used a similar range
of response intervals. Furthermore, individuals’ conceptualization on the pattern of
intercorrelations between CISS factors for self-report was very similar to that of
friends’ conceptualization. Task was uncorrelated with Emotion, but showed a low
positive correlation with Social Diversion. Emotion showed a low or moderate
positive correlation with Avoidance and Avoidance subscales. In sum, results pro-
vided substantial evidence of the discriminant validity of CISS scales.
Tests of invariance of item intercept and factor means revealed that Emotion
items showed the most significant variance in intercepts (M4b in Table 2), but once
intercept differences were controlled, there was no significant mean difference
across data sources (M7b in Table 2). The large effect size (.71) on Emotion, there-
fore, may be due mainly to the difference in intercept values between self-report
Han et al. / CISS Observer Form 691
and peer report, rather than due to the difference on the Emotion factor. It can be
concluded that individuals tend to endorse more often on Emotion items than their
friends given an Emotion factor score of zero. For Task items, although intercept
differences were significant across rating sources, there was no uniform response
pattern in origin in that some items showed higher intercept for self-report, but
other items showed higher intercept for peer report. Peers (3.74) provided a higher
average rating on Task than self-raters (3.13) did, and the mean difference reached
significance (M7a), even when intercept differences were controlled, which sug-
gested a lenient tendency for peers reporting desirable behaviors about their
friends, relative to self-report.
The correlation between self-and peer-report of Avoidance (.37), of Distraction
(.27), and Social Diversion (.33) latent factors was moderate and consistent with pre-
vious studies (e.g., Funder, 1995; McCrae, 1982, 1994). Although correlations across
rating sources for Task and Emotion were significant, their magnitudes were low, for
Emotion in particular. One possible reason for these poor correlations is that the
behavioral patterns defining Avoidance are readily observable (e.g., going to a party)
in social contexts and therefore are easy to judge by social peers, whereas those
defining Task (e.g., focusing on problems) or Emotion (e.g., blaming oneself) may
be less observable to friends and judged less accurately (Funder & Dobroth, 1987;
McCrae, 1982). Cheung (1999) argues that informational constraints are one of the
reasons for inequality in error variances in the context of self-other rating compari-
sons. Our finding of significant inequality in error variances on Task and Emotion
across rating source supports this argument. More study is, however, necessary to
determine the source of the lack of correlation between self-ratings and peer ratings
for these two dimensions.
This study has a number of notable limitations. First, as discussed earlier, objec-
tive level of intimacy between friends in this study is largely unknown, both in
absolute terms and relative to that of other studies that have examined self-peer
report convergence. Although intimacy is often assumed a potent moderator of dya-
dic agreement, moderating effects are difficult to assess when little is known about
what levels of variability in intimacy can be expected in samples of various popula-
tions. Future studies are needed to evaluate the extent to which self-observer corre-
lation on CISS constructs is superior for couples or peers with higher levels of
dyadic intimacy. More fundamentally, there is a need for the development and vali-
dation of intimacy indices, both to ensure psychometrically sound measurement of
this potentially important moderator variable and to facilitate comparison across
studies.
Second, although there is good reason to believe that our sample size was ade-
quate for standard CFA models (Table 1) for normal theory ML estimation, a
greater sample size would undoubtedly have allowed us produce a stable asympto-
tic covariance matrix to test measurement invariance via S-B w2 . Future research
involving complicated models with a large number of parameters and skewed data
692 Educational and Psychological Measurement
need to secure a large sample size to produce a stable asymptotic matrix. A power-
ful computer is also required to speed S-B w2 calculation time when a model con-
tains numerous indicators (e.g., over 60). Because our data departed significantly
from multivariate normality, the use of normal theory ML estimation might have
produced biased results. We believe, however, that the main findings of this study
would not change much had S-B w2 been used for the following reasons. First, w2
may be sensitive to nonnormality but the w2 test is less so (Rensvold & Cheung,
1998), and measurement invariance testing relies on the w2 test rather than the w2
test. Second, S-B scaling corrects w2 and standard errors but does not modify para-
meter estimates (Finney & DiStefano, 2006). Therefore, structure coefficients and
correlation coefficients among latent factors (Table 3) would not change with S-B
scaling, although the significance level of correlations might have.
Third, as indicated in the Instruments section, peer-rating items were unchanged
from the original instrument. There is a possibility that unaltered items inflated the
measurement equivalence between self-report and peer report. As stated earlier, we
judged that participants were primed sufficiently by the completion of the ADF-F2
immediately prior to the CISS peer-rating form and were fully aware of rating
peers due to repeated oral and written instructions on how to complete the peer-
rating form of CISS. It is also worth noting that the CISS has only 48 brief items
and therefore presented a minimal challenge to the attentional focus of the exami-
nees in this regard. In summary, despite our efforts to prime, instruct, and remind
examinees, there remains a possibility that participants mistakenly interpreted
peer-rating items as referring to themselves.
Finally, as each participant provided both ratings of the self and ratings of the
friend, shared source variance may affect our overall results. In other words, indivi-
duals may project their own coping habits onto their evaluations of their friends.
One way to eliminate this possible contamination would be to halve the data set
(reducing the sample size to N = 163) and analyzing data from each member of the
pair separately. Although reducing the sample size so drastically makes it difficult
to draw firm conclusions, we examined results of analyses conducted in this man-
ner and concluded that results were very similar. We decided to report results from
the full sample here because of their superior power but acknowledge that we have
not ruled out the possibility of this source of contamination.
Despite these limitations, our results provide evidence of substantial measure-
ment equivalence between the self-rating form and the observer form and support
the use of a novel observer form of the CISS as an alternative or supplementary
source of information about characteristic responses to stress. There is increasing
recognition of the need to obtain assessment data from multiple sources to more
fully understand the phenotypic expression of dispositional phenomena (e.g.,
Conway & Huffcutt, 1997; McCrae, 1994; Piedmont, 1994); the availability of this
new measurement tool should facilitate this end with respect to coping behaviors.
Researchers who wish to use both forms within the same study should be aware
Han et al. / CISS Observer Form 693
that although the peer form appears to be the equivalent of the self-report form in
structure and psychometrics, self–peer correlation was only moderate for Avoid-
ance (and its subscales) coping and quite modest for Task and Emotion coping. For
the purposes of many studies, the magnitude of the self–peer correlation may yet
be inadequate to justify combination of self and peer ratings into composites. As
with the modest self–peer correlation observed for some other personality variables
and the self-parent correlation observed with adolescents, researchers may wish to
consider self-assessment and peer assessment of coping styles with the CISS as par-
tially overlapping but tapping predominantly independent perspectives on an indi-
vidual’s coping behaviors.
References
Bentler, P. M., & Bonnett, D. G. (1980). Significance tests and goodness of fit in the analysis of covar-
iance structures. Psychological Bulletin, 88, 588-606.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S.
Long (Eds.), Testing structural models (pp. 136-162). Newbury Park, CA: Sage.
Butcher, J. N., Graham, J. R., Ben-Porath, Y. S., Tellegen, A., Dahlstrom, W. G., & Kaemmer, B.
(2001). Minnesota multiphasic personality inventory-2: Manual for administration and scoring.
Minneapolis: University of Minnesota Press.
Byrne, B. M. (1998). Structural equation modeling with LISREL, PRELIS, and SIMPLIS: Basic con-
cepts, applications, and programming. Mahwah, NJ: Lawrence Erlbaum.
Byrne, B. M., & Watkins, D. (2003). The issue of measurement invariance revisited. Journal of Cross-
cultural Psychology, 34, 155-175.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multi-
method matrix. Psychological Bulletin, 56, 81-105.
Cheung, G. W. (1999). Multifaceted conceptions of self-other ratings disagreement. Personnel Psychol-
ogy, 52, 1-36.
Cheung, G. W., & Rensvold, R. B. (2000). Assessing extreme and acquiescence response sets in cross-
cultural research using structural equation modeling. Journal of Cross-cultural Psychology, 31,
187-212.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement
invariance. Structural Equation Modeling, 9, 233-255.
Conway, J. M., & Huffcutt, A. I. (1997). Psychometric properties of multisource performance ratings: A
meta-analysis of subordinate, supervisor, peer, and self-ratings. Human Performance, 10, 331-360.
Cook, S. W., & Heppner, P. P. (1997). A psychometric study of three coping measures. Educational and
Psychological Measurement, 57, 906-923.
Costa, P. T., & McCrae, R. R. (1988). Personality in adulthood: A six-year longitudinal study of self-
reports and spouse ratings on the NEO personality inventory. Journal of Personality and Social Psy-
chology, 54, 853-863.
Costa, P. T., & McCrae, R. R. (1992a). NEO PI-R: Professional manual. Odessa, FL: PAR.
Costa, P. T., & McCrae, R. R. (1992b). Personality in adulthood: A six-year longitudinal study of self-
reports and spouse ratings on the NEO personality inventory. Journal of Personality and Social Psy-
chology, 54, 853-863.
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality and
specification error in confirmatory factor analysis. Psychological Methods, 1, 16-29.
694 Educational and Psychological Measurement
Drasgow, F. (1984). Scrutinizing psychological tests: Measurement equivalence and equivalent relations
with external variables are the central issues. Psychological Bulletin, 95, 134-135.
Endler, N. S., & Parker, J. D. A. (1990). Coping Inventory for Stressful Situations (CISS): Manual.
Toronto, Ontario, Canada: Multi-Health Systems.
Endler, N. S., & Parker, J. D. A. (1994). Assessment of multidimensional coping: Task, emotion, and
avoidance strategies. Psychological Assessment, 6, 50-60.
Endler, N. S., & Parker, J. D. A. (1999). Coping Inventory for Stressful Situations (CISS): Manual (2nd
ed.). Toronto, Ontario, Canada: Multi-Health Systems.
Fan, X., & Thompson, B. (2001). Confidence intervals about score reliability coefficients, please: An
EPM guidelines editorial. Educational and Psychological Measurement, 61, 517-531.
Finney, S. J., & DiStefano, C. (2006). Non-normal and categorical data in structural equation modeling.
In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (pp. 269-
314). Greenwich, CT: Information Age.
Foltz, C., Morse, J. Q., Calvo, N., & Barber, J. P. (1997). Self- and observer ratings on the NEO-FFI
in couples: Initial evidence of the psychometric properties of an observer form. Assessment, 4,
287-295.
Funder, D. C. (1995). On the accuracy of personality judgment: A realistic approach. Psychological
Review, 102, 652-670.
Funder, D. C., & Dobroth, K. M. (1987). Differences between traits: Properties associated with inter-
judge agreement. Journal of Personality and Social Psychology, 52, 409-418.
Harkness, A. R., Tellegen, A., & Waller, N. (1995). Differential convergence of self-report and infor-
mant data for multidimensional personality questionnaire traits: Implications for the construct of
negative emotionality. Journal of Personality Assessment, 64, 185-204.
Henson, R. K. (2001). Understanding internal consistency reliability estimates: A conceptual primer on
coefficient alpha. Measurement and Evaluation in Counseling and Development, 34, 177-189.
Hill, R. W., Zrull, M. C., & McIntire, K. (1998). Differences between self- and peer ratings of interper-
sonal problems. Assessment, 5, 67-83.
Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to measurement invariance in
aging research. Experimental Aging Research, 18, 117-144.
Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underpara-
meterized model misspecification. Psychological Methods, 3, 424-453.
Hu, L., & Bentler, P. M. (1999). Cut off criteria for fit indexes in covariance structure analysis: Conven-
tional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55.
Jöreskog, K. G., & Sörbom, D. (2006). LISREL 8.80 for Windows. Lincolnwood, IL: Scientific Software
International.
McCrae, R. R. (1982). Consensual validation of personality traits: Evidence from self-reports and rat-
ings. Journal of Personality and Social Psychology, 43, 293-303.
McCrae, R. R. (1994). The counterpoint of personality assessment: Self-reports and observer ratings.
Assessment, 1, 159-172.
McCrae, R. R., & Costa, P. T. (1987). Validation of the five-factor model of personality across instru-
ments and observers. Journal of Personality and Social Psychology, 52, 81-90.
McWilliams, L. A., Cox, B. J., & Enns, M. W. (2003). Use of the Coping Inventory for Stressful Situa-
tions in a clinically depressed sample: Factor structure, personality correlates, and prediction of dis-
tress. Journal of Clinical Psychology, 59, 423-437.
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika,
58, 525-543.
Mullen, M. (1995). Diagnosing measurement equivalence in cross-national research. Journal of Interna-
tional Business Studies, 3, 573-596.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.
Han et al. / CISS Observer Form 695
Piedmont, R. L. (1994). Validation of the NOE PI-R observer form for college students: Toward a para-
digm for studying personality development. Assessment, 3, 259-268.
Rensvold, R. B., & Cheung, G. W. (1998). Testing measurement models for factorial invariance: A sys-
tematic approach. Educational and Psychological Measurement, 58, 1017-1034.
Satorra, A., & Bentler, P. M. (1994). Correction to test statistics and standard errors in covariance struc-
tural analysis. In A. Von Eye & C. C. Clogg (Eds.), Latent variables analysis: Application for devel-
opmental research (pp. 399-419). Thousand Oaks, CA: Sage.
Stober, J. (1998). Reliability and validity of two widely-used worry questionnaires: Self-report and self-
peer convergence. Personality and Individual Differences, 24, 887-890.
Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psy-
chometrika, 38, 1-10.
Turner, R. A., King, P. R., & Tremblay, P. F. (1992). Coping styles and depression among psychiatric
outpatients. Personality and Individual Differences, 13, 1145-1147.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance litera-
ture: Suggestions, practices, and recommendations for organizational research. Organizational
Research Methods, 3, 4-70.
Watson, D., & Clark, L. A. (1991). Self- versus peer ratings of specific emotional traits: Evidence of
convergent and discriminant validity. Journal of Personality and Social Psychology, 60, 927-940.
Wright, P. H. (1985). The acquaintance description form. In S. Duck & D. Perlman (Eds.), Understand-
ing personal relationships: An interdisciplinary approach (pp. 39-62). Beverly Hills, CA: Sage.