Unit 9
Unit 9
CONSTRUCTION*
Structure
9.0 Objectives
9.1 Introduction
9.2 Psychological Test or Measure
9.2.1 Purpose of Psychological Test
9.3 Validity
9.3.1 Types of Validity
9.4 Reliability
9.4.1 Measuring Reliability
9.5 Norms
9.5.1 Types of Norms
9.6 Test Construction
9.6.1 Standardization of Psychological Tests
9.6.2 Classification of Standardized Testing
9.6.3 Steps for constructing Standardized Tests
9.7 Let Us Sum Up
9.8 Unit End Questions
9.9 Answers to Self Assessment Questions
9.10 Glossary
9.11 Suggested Readings and References
9.0 OBJECTIVES
After reading this unit, you will be able to:
• Define Reliability, Validity, Norms and Standardized Tests;
• Describe the process of test construction;
• Illustrate the different types of reliability and validity; and
• Explain the steps of constructing a standardized test.
9.1 INTRODUCTION
The present unit of this block deals with the meaning and ways of test construction
with reference to psychological research. You will also be introduced with the
concept of reliability, validity, norms and standardization of psychological tests. At
the end of this unit, you will be explained about the various steps of constructing
a standardized test.
9.3 VALIDITY
It is apparent from the description of statistical inferences that all tests do not have
a common degree of validity, it being completely dependent upon the extent to
which it has been used to measure the same dimension; along with its norms in
detail judgment based upon the test’s result is objective. For example, validity
for intelligence tests may be attributed only to those tests that actually succeed
in testing the individual’s level of intelligence. A test can be accepted as valid
only to that degree to which it can correctly gauge the mentioned dimension of
the participant, which it claims to measure. In this way, validity of a test is that
quality on the basis of which the correctness or incorrectness of judgments based
upon it is evaluated. For example, the validity of interest tests is less than the case
of intelligence tests. Here there is a slight difficulty. Suppose for the moment, that
the intelligence of some students was measured by one particular method. Now
the validity of the test will depend upon whether the students tested do, in fact,
possess the intelligence that they are indicated as possessing. The problem that
arises here is how can one ascertain whether the students do or do not possess the
dimension of intelligence as indicated by the above test. Evidently, there must be
some independent criterion for deciding upon the validity of the particular test
in question or tests in general. In this case, of the intelligence level of students,
the examination results can be the basis for measuring the validity of the test.
Generally speaking, it can be said that if there is a correlation between marks
obtained at an examination and the result of the test, then the test is valid.
9.3.1 Types of Validity
As it is evident from the foregoing description, validity is a relative term, as no
test can have complete validity. Hence, whenever a particular test is termed valid,
or whenever the lack of validity of a test is in question, it is necessary to indicate
the sense in which it is considered to be valid or invalid.
Apparently, validity is of many kinds. Psychologists have roughly accepted the
following kinds of validity:
1) Face validity: It only focuses on the form of the test. Such validity is
attributed only to the test which provides an item or subject that just appears
to be valid.
118
2) Content validity: Another kind of validity is content validity in which the Introduction to Test
Construction
validity of the content forms the basis of the validity of the test. In order to
obtain this kind of validity in a particular test, it becomes imperative that
the items of the test achieve the objective for which they are originally
designed. For example, content validity in the case of an intelligence test
will be attributed only in the event of its succeeding in discovering all the
factors that are concerned with intelligence.
3) Factorial validity: This is inclusive of the validity of the factors in the test
and in order to judge whether a test has factorial validity, it is examined by
the method of factor analysis, and a correlation between this result and the
evident factor resultant of tests is established.
4) Predictive validity: This is the most popular form of validity. In this, results
are obtained on the basis of a particular criterion, and the correlation between
the scores and the criterion is established. In this, the choice of a criterion
requires much care and attention. The coefficient obtained by this correlation
between scores and criterion is called the validity coefficient. The validity
coefficient varies between 0.5 and 0.8. A lower coefficient makes the test
inapplicable and lacking in utility, while a higher coefficient is not normally
obtained.
5) Concurrent validity: It resembles the predictive validity since in it, also, a
correlation between the test and some definite standard is established. But,
despite these common features, there are also some definite variations.
From the above analysis of the various kinds of validity, it is evident that
validity exists in a particular context, or in other words, every test is valid
for a particular objective and for a specific age group among individuals. It
can just as well be invalid for a specific age group in individuals and in a
particular context. Hence, to attribute validity to a test without qualification
is completely unjustified and inaccurate. For the sentiment to have any value
or meaning, it is essential to state the context and conditions in which it is
applicable.
9.4 RELIABILITY
In addition to validity, it is essential that every test should possess definite element
of reliability. It is only then that the conclusions of the test can be considered
reliable and worthy of trust. The term basically refers to the extent to which a
test can be relied upon, i.e. it gives consistency in scores even if it is tested on
the same group after frequent intervals/ time gap.
Reliability of a test refers to the quality of the test that may inspire confidence
and trust for the measurement. And this quality can be attributed to only that test
which provides the same score every time it is performed on the same individual.
Now, if some intelligence test yields one score for an individual at one time,
and another at the same individual if it is applied to the same individual at a
different time, it is too evident that such a test cannot be considered reliable and
the reliability of a test is not a part of it, but is in its wholeness or completeness.
Its reliability will considerably be weakened and decreased if even one part of
it is injured in some respect. Hence, it is essential that the internal parts of a test
possess internal consistency and uniformity. It is only on the basis of such a
reliable test that guidance can be given.
119
Methods of Data Collection 9.4.1 Measuring Reliability
Reliability can be measured in the following four ways:
1) Test retest method: One method of gauging reliability is to perform the
same test on the same group of individuals at two different occasions, and
then the scores or results obtained are compared. For example a group of
individuals can be subjected to the Binet intelligence test. Then later on the
same group of individuals can again be subjected to the Binet intelligence
test. If the results obtained in each case do not tally, then the tests cannot be
considered reliable.
2) Parallel form method: In the parallel form of reliability, same group is
provided two different tests measuring the same dimension or construct.
Finally, the results scores of the two can be compared or corrected to judge
the reliability of the test. Gulliksen has suggested that more than one parallel
method be devised for greater accuracy. It is also known as equivalent form
of reliability.
3) Split half method: The reliability of a test can also be judged by dividing
the components of the test into even and odd times whose results can be
individually obtained. Now the results can be compared between the groups
to check the reliability of the test.
4) Inter item consistency: In this method of measuring the reliability only
one method is applied at one time. The mutual relation between the scores
obtained for each specific item in the test is observed. At the same time the
relation between the marks obtained for one specific question and the marks
obtained for the whole test is also ascertained. This method of measuring
reliability involves considerable statistical. Skill in correlation, psychologists
Kuder and Richardson have devised some formulae for application in this
method.
As has been indicated previously, the implication and meaning of reliability
also changed the method of judging reliability. Hence, it is not sufficient to
remark that a particular test is reliable. It is equally essential that the sense
in which reliability is judged also be mentioned.
Of the above mentioned methods of judging reliability of psychological tests,
the third one is the most prevalent and useful, since it is the most easy. In
this method, the necessity of collecting the same group of individuals more
than once is obviated. Reliability is known from the coefficient of reliability
and this coefficient is known as the reliability coefficient.
In this manner, both reliability and validity are important qualities of tests.
Validity is related to the scale or structure of the test while reliability is an
attribute of its ability of costing.
9.5 NORMS
Norm refers to the typical performance level for a certain group of individuals. Any
psychological test with just the raw score is meaningless until it is supplemented
by additional data to interpret it further. Therefore, the cumulative total of a
psychological test is generally inferred through referring to the norms that
depict the score of the standardized sample. Norms are factually demonstrated
by establishing the performance of individuals from a specific group in a test.
To determine accurately a subject’s (individual’s) position with respect to the
standard sample, the raw score is transformed into a relative measure. There are
two purposes of this derived score:
1) They provide an indication to the individuals standing in relation to the
normative sample and help in evaluating the performance.
2) To give measures that can be compared and allow gauging of individuals
performance on various tests.
1) It refers to the degree to which it can correctly gauge the mentioned dimension
of the participant, which it claims to measure.
2) It refers to the consistency in scores even if it is tested on the same group
after frequent intervals/ time gap.
3) It refers to a self-report study where the answers are measured and combined
to get a total score.
Self Assessment Questions II
1) Planning
2) Standard Score
3) Norm
4) Grade Norms
9.10 GLOSSARY
Validity : It refers to the degree to which it can correctly gauge the mentioned
dimension of the participant, which it claims to measure.
Reliability : It refers to the consistency in scores even if it is tested on the same
group after frequent intervals/ time gap.
Psychological Test : It refers to a self-report study where the answers are
measured and combined to get a total score.
Norms : Norm of a psychological test refers to the typical performance level for
a certain group of individuals.
127