RELIABILITY
RELIABILITY
Reliability
• Reliability of a test is a criterion of test quality relating to the
accuracy of psychological measurements.
• The higher the reliability of a test, relatively the freer it would be
of measurement errors.
• Some regard it as the stability of results in repeated testing, that
is, the same individual or object is tested in the same way, so
that it yields the same value from moment to moment, provided
that the thing measured has itself not changed in the meantime.
• The concept of reliability underlines the computation of error of
measurement of a single score, whereby we can predict the
range of fluctuations likely to occur in a single individual score as
a result of irrelevant chance factors.
Reliability (cont…)
• In technical terms, the measures of test reliability make it
possible to estimate as to what proportion of the total test
score is error variance. The more the error, the lesser the
reliability. Practically speaking, this means that if we can
estimate the error variance in any measure, then we can also
estimate the measure of reliability.
• This brings us two equivalent definitions of reliability:
• 1. Reliability is the proportion of the ‘true’ variance to the total
obtained variance of the data yielded by the measuring
instrument and
• 2. It is the proportion of error variance to the total obtained
variance of the data yielded by the measuring instrument
subtracted from 1.00. The index of 1.00 indicates perfect
reliability.
Methods (estimating reliability)
• Time Sampling: The Test–Retest Method
• Item Sampling: Parallel Forms Method
• Split-Half Method
• Method of rational equivalence
• Cronbach Alpha.
Time Sampling: The Test–Retest Method
separately. The results of one half of the test are then compared with the results
of the other. The two halves of the test can be created in a variety of ways. If the
test is long, the best method is to divide the items randomly into two halves. For
ease in computing scores for the different halves, however, some people prefer to
calculate a score for the first half of the items and another score for the second
half. Although convenient, this method can cause problems when items on the
second half of the test are more difficult than items on the first half. If the items
get progressively more difficult, then you might be better advised to use the odd–
even system, whereby one subscore is obtained for the odd-numbered items in
testing is needed. This technique is also better than the parallel form method to
find reliability because only one test is required. In this method, the test is scored
for the single testing to get two halves, so that variation brought about by
of the split-half technique lies in the fact that chance errors may affect scores on
the two halves of the test in the same way, thus tending to make the reliability
coefficient too high. This follows because the test is administered only once. The
larger the test, the less is the probability that the effects of temporary and
variable disturbances will be cumulative in one direction and the more accurate
Richardson formula number 20. One of the techniques for items analysis is item
difficulty index. Item difficulty (ID) is the proportion or percentage of those answering
correctly to an item; say, symbol ‘p’ is used to represent the difficulty index. Suppose
an Item ‘X’ has p = 0.74. This means Item ‘X’ was answered correctly by 74 per cent of
those who answered the item. To compute reliability with the help of Kuder–
Richardson formula number 20, the following procedure is used. First, write the first
column in a worksheet showing the number of items. The second column should give
the difficulty value (p) of each item obtained during item analysis. The third column is
given as q where q = 1 – p. The fourth column is taken as (p) (q). This column is the
product of column two and column three. The Kuder–Richardson formula number 20
Cronbach alpha
• The KR20 formula is not appropriate for evaluating internal consistency in some cases.
The KR20 formula requires that you find the proportion of people who got each item
“correct.” There are many types of tests, though, for which there are no right or wrong
answers, such as many personality and attitude scales. For example, on an attitude
extramarital sexual intercourse is immoral.” You must indicate whether you strongly
disagree, disagree, are neutral, agree, or strongly agree. None of these choices is
incorrect, and none is correct. Rather, your response indicates where you stand on the
method with this sort of item, Cronbach developed a formula that estimates the
internal consistency of tests in which the items are not scored as 0 or 1 (right or
• Factor analysis is one popular method for dealing with
the situation in which a test apparently measures
several different characteristics . This can be used to
divide the items into subgroups, each internally
consistent; however, the subgroups of items will not be
related to one another.
• Factor analysis can help a test constructor build a test
that has sub measures for several different traits. When
factor analysis is used correctly, these subtests will be
internally consistent (highly reliable) and independent
of one another.