0% found this document useful (0 votes)
5 views16 pages

RELIABILITY

The document discusses the concept of reliability in psychological testing, emphasizing its importance in ensuring accurate measurements and minimizing errors. It outlines various methods for estimating reliability, including test-retest, parallel forms, split-half, and internal consistency measures like Cronbach's alpha. Each method has its advantages and limitations, and the document provides detailed descriptions of how to implement these methods effectively.

Uploaded by

Naman Mansotra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views16 pages

RELIABILITY

The document discusses the concept of reliability in psychological testing, emphasizing its importance in ensuring accurate measurements and minimizing errors. It outlines various methods for estimating reliability, including test-retest, parallel forms, split-half, and internal consistency measures like Cronbach's alpha. Each method has its advantages and limitations, and the document provides detailed descriptions of how to implement these methods effectively.

Uploaded by

Naman Mansotra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

RELIABILITY

Reliability
• Reliability of a test is a criterion of test quality relating to the
accuracy of psychological measurements.
• The higher the reliability of a test, relatively the freer it would be
of measurement errors.
• Some regard it as the stability of results in repeated testing, that
is, the same individual or object is tested in the same way, so
that it yields the same value from moment to moment, provided
that the thing measured has itself not changed in the meantime.
• The concept of reliability underlines the computation of error of
measurement of a single score, whereby we can predict the
range of fluctuations likely to occur in a single individual score as
a result of irrelevant chance factors.
Reliability (cont…)
• In technical terms, the measures of test reliability make it
possible to estimate as to what proportion of the total test
score is error variance. The more the error, the lesser the
reliability. Practically speaking, this means that if we can
estimate the error variance in any measure, then we can also
estimate the measure of reliability.
• This brings us two equivalent definitions of reliability:
• 1. Reliability is the proportion of the ‘true’ variance to the total
obtained variance of the data yielded by the measuring
instrument and
• 2. It is the proportion of error variance to the total obtained
variance of the data yielded by the measuring instrument
subtracted from 1.00. The index of 1.00 indicates perfect
reliability.
Methods (estimating reliability)
• Time Sampling: The Test–Retest Method
• Item Sampling: Parallel Forms Method
• Split-Half Method
• Method of rational equivalence
• Cronbach Alpha.
Time Sampling: The Test–Retest Method

• Test–retest reliability estimates are used to evaluate the error


associated with administering a test at two different times. This
type of analysis is of value only when we measure “traits” or
characteristics that do not change over time.
• For instance, we usually assume that an intelligence test
measures a consistent general ability. As such, if an IQ test
administered at two points in time produces different scores, then
we might conclude that the lack of correspondence is the result of
random measurement error. Usually we do not assume that a
person got more or less intelligent in the time between tests.
• Test–retest reliability is relatively easy to evaluate: Just administer
the same test on two well-specified occasions and then find the
correlation between scores from the two administrations using
the methods
The Test–Retest Method
• The most frequently used method to fi nd the reliability of a test is by
repeating the same test on a second occasion. The reliability coefficient
(r) in this case would be the correlation between the score obtained by
the same person on two administrations of the test.
• An error variance corresponds to the random fluctuations of
performance from one test session to the other test session. The problem
related to this test is the controversy about the interval between two
administrations. If the interval between the tests is long (say, six months)
and the subjects are young children, growth changes will affect the test
scores. In general, it increases the initial score by various amounts and
tends to lower the reliability coefficient. Owing to the difficulty in
controlling the factor which influences scores on retest, the retest
method is generally less useful than are the other methods.
• Formula : Pearson correlation
Parallel Form Method
• Parallel forms reliability compares two equivalent forms of a test that
measure the same attribute. The two forms use different items; however,
the rules used to select items of a particular difficulty level are the same.
When two forms of the test are available, one can compare performance
on one form versus the other. Some textbooks refer to this process as
equivalent forms reliability, whereas others call it simply parallel forms.
Sometimes the two forms are administered to the same group of people
on the same day.
• The Pearson product moment correlation coefficient is used as an estimate
of the reliability. When both forms of the test are given on the same day,
the only sources of variation are random error and the difference between
the forms of the test. (The order of administration is usually
counterbalanced to avoid practice effects.) Sometimes the two forms of
the test are given at different times. In these cases, error associated with
time sampling is also included in the estimate of reliability.
Parallel Form Method
• The method of parallel forms provides one of the
most rigorous assessments of reliability commonly
in use. Unfortunately, the use of parallel forms
occurs in practice less often than is desirable.
Often test developers find it burdensome to
develop two forms of the same test, and practical
constraints make it difficult to retest the same
group of individuals. Instead, many test developers
prefer to base their estimate of reliability on a
single form of a test
Parallel Form Method
• To overcome the difficulty of practice and time interval in case of test–retest
method, the method of parallel or alternate form is used. Using the equivalent or
parallel forms has some advantages like lessening the possible effect of practice and
recall. But this method presents an additional problem of construction and
standardisation of the second form.
• According to Freeman, both forms should meet all of the test specifications as
follows:
• 1. The number of items should be the same,
• 2. The kinds of items in both should be uniform with respect to content, operations
or traits involved, levels and range of difficulty, and adequacy of sampling,
• 3. The items should be uniformly distributed as to difficulty,
• 4. Both test forms should have the same degree of item homogeneity in the
operations or traits being measured. The degree of homogeneity may be shown by
intercorrelations of items with subtest scores, or with total-test scores,
• 5. The means and the standard deviations of both the forms should correspond
closely and 6. The mechanics of administering and scoring should be uniform.
Parallel Form method
• Freeman (1962) states that the above are the ideal criteria of
equivalent forms, but complete uniformity in all the respects
cannot be expected. However, it is necessary that uniformity be
closely approximated. The parallel forms are administered on
the group of individuals and the correlation coeffi cient is
calculated between one form and the other. For instance, the
1937 Standford-Binet Scale has Form L and Form M (Terman and
Merrill 1973). The content of the forms was derived from one
and the same process of standardisation. The correlation of 0.91
is obtained between these two forms for chronological age of
seven years. The formula and the method used to fi nd the
reliability with the help of parallel or alternative techniques is
the same as used in test–retest method.
Split-half reliability
• In split-half reliability, a test is given and divided into halves that are scored

separately. The results of one half of the test are then compared with the results

of the other. The two halves of the test can be created in a variety of ways. If the

test is long, the best method is to divide the items randomly into two halves. For

ease in computing scores for the different halves, however, some people prefer to

calculate a score for the first half of the items and another score for the second

half. Although convenient, this method can cause problems when items on the

second half of the test are more difficult than items on the first half. If the items

get progressively more difficult, then you might be better advised to use the odd–

even system, whereby one subscore is obtained for the odd-numbered items in

the test and another for the even-numbered items.


Split-half reliability
• The advantage that this method has over the test–retest method is that only

testing is needed. This technique is also better than the parallel form method to

find reliability because only one test is required. In this method, the test is scored

for the single testing to get two halves, so that variation brought about by

difference between the two testing situations is eliminated. A marked advantage

of the split-half technique lies in the fact that chance errors may affect scores on

the two halves of the test in the same way, thus tending to make the reliability

coefficient too high. This follows because the test is administered only once. The

larger the test, the less is the probability that the effects of temporary and

variable disturbances will be cumulative in one direction and the more accurate

the estimate of score reliability.


Split-Half reliability
• The two halves of the test can be made by counting the number of odd-
numbered items answered correctly as the other half. In other words, odd-
items and even-items are scored separately and those are considered as
two separate halves. There are other methods also to split the test items
into two halves, like Items ‘1’ and ‘2’ will go to the fi rst score, Items ‘3’ and
‘4’ will go to the second score, Items ‘5’ and ‘6’ will go to the fi rst score,
Items ‘7’ and ‘8’ will go to the second score, and so on. The other method to
divide the test into two halves is to consider the fi rst 50 per cent items as
one half and the second 50 per cent items as the other half. Whenever the
diffi culty level of the test items is not the same, we apply odd–even
method, and if the diffi culty level is same, then we apply the fi rst half and
second half method to divide the test into two halves. Once the two halves
have been obtained for each individual score, these halves will be
correlated with the help of the Pearson product–moment formula
• 2r/1+r
Method of Rational Equivalence
• The coefficient of internal consistency could also be obtained with the help of Kuder–

Richardson formula number 20. One of the techniques for items analysis is item

difficulty index. Item difficulty (ID) is the proportion or percentage of those answering

correctly to an item; say, symbol ‘p’ is used to represent the difficulty index. Suppose

an Item ‘X’ has p = 0.74. This means Item ‘X’ was answered correctly by 74 per cent of

those who answered the item. To compute reliability with the help of Kuder–

Richardson formula number 20, the following procedure is used. First, write the first

column in a worksheet showing the number of items. The second column should give

the difficulty value (p) of each item obtained during item analysis. The third column is

given as q where q = 1 – p. The fourth column is taken as (p) (q). This column is the

product of column two and column three. The Kuder–Richardson formula number 20
Cronbach alpha
• The KR20 formula is not appropriate for evaluating internal consistency in some cases.

The KR20 formula requires that you find the proportion of people who got each item

“correct.” There are many types of tests, though, for which there are no right or wrong

answers, such as many personality and attitude scales. For example, on an attitude

questionnaire, you might be presented with a statement such as “I believe

extramarital sexual intercourse is immoral.” You must indicate whether you strongly

disagree, disagree, are neutral, agree, or strongly agree. None of these choices is

incorrect, and none is correct. Rather, your response indicates where you stand on the

continuum between agreement and disagreement. To use the Kuder-Richardson

method with this sort of item, Cronbach developed a formula that estimates the

internal consistency of tests in which the items are not scored as 0 or 1 (right or
• Factor analysis is one popular method for dealing with
the situation in which a test apparently measures
several different characteristics . This can be used to
divide the items into subgroups, each internally
consistent; however, the subgroups of items will not be
related to one another.
• Factor analysis can help a test constructor build a test
that has sub measures for several different traits. When
factor analysis is used correctly, these subtests will be
internally consistent (highly reliable) and independent
of one another.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy