0% found this document useful (0 votes)
19 views25 pages

Psychology 8a Module II

Module II of Psychology 8A focuses on the fundamental concepts of psychological measurement and statistics, covering topics such as reliability, validity, item analysis, and the use of computers in test interpretation. It aims to equip students with the ability to understand measurement in psychological testing, apply statistical concepts, and evaluate the characteristics of effective tests. The module includes lessons on statistical concepts, reliability, validity, and item analysis, emphasizing the importance of norms and the systematic evaluation of test scores.

Uploaded by

cris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views25 pages

Psychology 8a Module II

Module II of Psychology 8A focuses on the fundamental concepts of psychological measurement and statistics, covering topics such as reliability, validity, item analysis, and the use of computers in test interpretation. It aims to equip students with the ability to understand measurement in psychological testing, apply statistical concepts, and evaluate the characteristics of effective tests. The module includes lessons on statistical concepts, reliability, validity, and item analysis, emphasizing the importance of norms and the systematic evaluation of test scores.

Uploaded by

cris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

PSYCHOLOGY 8A: Psychological Testing

MODULE II:
BASIC CONCEPTS IN PSYCHOLOGICAL MEASUREMENT AND
STATISTICS APPLIED TO NORMS AND INTERPRETATION OF TESTS

Scope of the Module:


This module consists of five lessons namely:
Lesson 1 Statistical Concepts and Developmental Norms
Lesson 2 Reliability: The Consistency of Test Score
Lesson 3 Validity of Measurement
Lesson 4 Item Analysis
Lesson 5 Computer Application in the Interpretation of Test

Overview of the Module:


This module gives us a proper understanding of the term “measurement", which is
critical for the interpretation and evaluation of psychological tests. It helps us
understand the basic statistical concepts (e.g., norms are applied to interpretation of
tests). Furthermore, reliability of test scores, validity of measurement and item analysis
are discussed to assist psychologists and other test users in development preparation,
selection and interpretation of tests. It also introduces us to the application of computers
in interpretation of tests.

Objectives of the Module:


At the end of this module, the student should be able to:
1. identify what measurement is and how it is used in interpretation and
evaluation of psychological tests;
2. determine the basic statistical concepts and its application to tests;
3. discuss validity, reliability, and item analysis as characteristics of good tests;
and
4. apply computers in the scoring and interpretation of tests.
Psychology 8A
Module II, Lesson 1:
STATISTICAL CONCEPTS AND DEVELOPMENTAL NORMS

Lesson Objectives:
At the end of this lesson, the student should be able to:
1. define and explain what measurement is;
2. determine the importance of the basic statistical concepts applied to
psychological testing;
3. discuss how norms are used as reference for scoring tests; and
4. identify the characteristics of a good test.

Defining Measurement
Measurement is the process of assigning numbers to objects in such a way that
specific properties of objects are faithfully represented by properties of numbers. This
definition can be refined slightly when applied to psychological measurement, which is
concerned with attributes of persons rather than attributes of objects. Psychological
measurement is the process of assigning numbers (e.g., test scores) to persons in such
a way that some attributes of the persons being measured are faithfully reflected by
some properties of the numbers.
Psychological measurement attempts to represent some attributes of persons in
terms of some properties of numbers. In other words, psychological tests do not attempt
to measure total person but only some specific set of attributes of that person.
The foundation of psychological measurement is the assumption that individuals
differ in behavior, interests, preferences, perceptions and beliefs. The task of a
psychologist interested in measurement is to devise systematic procedures for
translating these differences into quantitative terms. In other words, psychological
measurement specialists are interested in assigning numbers to individuals that will
reflect their differences.
Psychological tests are designed to measure specific attributes of persons not the
whole person. There is no test that measures whether one is a good person or a worthy
human being. Psychological tests only tell us ways in which individuals are similar or
different.

Statistical Concept
What is the importance of statistics to psychology? Psychological measurement
leaves us with lots of numbers, and statistics give us a method for answering questions
about the meaning of those numbers. The primary objective of statistical method is to
organize and summarize quantitative data in order to facilitate their understanding.
Statistics can be used to describe test scores. It can be used to make inferences about
the meaning of test scores. It can provide a method for communicating information
about test scores and for determining what conclusions can and cannot be drawn from
those scores.
Statistical methods have found extensive application in the psychological and
educational testing field and in the study of human ability. Since the time of Binet, who
developed the first extensively used and successful test of intelligence, a
comprehensive body of theory and technique has been developed which is primarily of
a statistical in nature. This body of theory and techniques is concerned with the
construction of instruments for measuring human ability, personal characteristics,
attitudes, interests, and many other aspects of behavior with the logical conditions
which such measurement instruments must satisfy; with the quantitative prediction of
human behavior; and with other related topics.
Scores on psychological tests are interpreted by reference to norms, which
represent the test performance of the standardization sample. The norms are
empirically established by determining what the persons in a representative group
actually do on test.

Norms
Scores on psychological tests rarely provide absolute, ratio scale measures of
psychological attributes. Thus, it rarely makes sense to ask, in an absolute sense, how
much intelligence, motivation, depth, perception, etc. a person has. Scores in
psychological tests do, however, provide useful relative measures. It makes perfect
sense to ask whether Juan is more intelligent, more motivated or has better depth
perception than Jose. Psychological tests provide a systematic method of answering
such questions.
One of the most useful ways of describing a person’s performance on a test is to
compare his or her test score to the test scores of some other person or group of
people. Many psychological tests base their scores on a comparison between each
examinee and some standard population that has already taken the test.
When a person’s test score is interpreted by comparing that score to the scores of
several other people, this is referred to as a norm-based interpretation. The scores to
which each individual is compared are referred to as norms which provide standards for
interpreting test scores. A norm-based score indicates where an individual stands in
comparison to the particular normative group that defined the set of standards.
Characteristics of a Good Test:
There are three characteristics of a good test. These are:
1. Validity:
It is the closeness of agreement between the scores of the test and some other
measures. It is the general worthiness of an examination. It also refers to the
degree to which the test parallels the curriculum and good teaching practice. It is
the degree to which a test measures what it is supposed to measure.
2. Reliability:
It is the test’s self-consistency. A test is considered highly reliable if it yields
approximately the same scores when given a second time or when alternative
forms of tests are administered to the same person. It is the extent to which a
test measures something consistently.
3. Practicability:
Tests must also be usable. Therefore tests should be selected on the basis of the
extent to which it can be used without unnecessary expenditure of time, effort
and money.
Psychology 8A
Module II, Lesson 2:
RELIABILITY: THE CONSISTENCY OF TEST SCORES

Lesson Objectives:
At the end of this lesson, the student should be able to:
1. define reliability;
2. discuss the four ways of evaluating or determining reliability;
3. explain coefficient of correlation; and
4. compute coefficient of correlation.

Reliability Defined
Test scores are reliable when they are reproducible and consistent. Tests may be
unreliable for a number of reasons. Confusing or ambiguous test items may mean
different things to as test taker at different times. Tests may be too short to sample the
abilities being tested adequately, or scoring may be too subjective. If a test yields
different results when it is administered on different occasions or scored by different
people, it is unreliable. A simple analogy is a rubber yardstick. If we did not know how
much it stretched each time we took a measurement, the results would be unreliable no
matter how carefully we marked the measurement. Tests must be reliable if the results
are to be used with confidence.
Reliability can be evaluated or determined in four ways:
1. Retest Reliability:
This can be done by obtaining two measures of the same individual/groups on
the same test. The two sets of scores obtained from the same test given to the
same individual/group at different times are correlated.
2. Equivalent Form Reliability:
This can be done by giving the test in two different but equivalent forms. Scores
obtained on two forms of the same test, both of which are supposed to sample
the same ability, are correlated.
3. Split Reliability:
This can be done by treating half of the test separately. Thus, it is correlating
scores on one-half of a test with scores on the other half.
If each individual/group tested achieves roughly the same scores on both
measures, then the test is reliable. Of course, even for a reliable test, some
differences are to be expected between the pair of scores due to chance and
errors of measurement. Consequently, a statistical measure of the degree of
relationship between the set of paired scores is needed. This degree of
relationship is provided by the coefficient of correlation. The coefficient of
correlation between paired scores is called a reliability coefficient. Well-
constructed tests usually have a reliability coefficient of r=. 90 or greater.
4. Internal Consistency Method:
The essential characteristic is the total score on the test itself. The performance
of the upper criterion group on each test item is then compared with that of the
lower criterion group.

Coefficient of Correlation
Correlation refers to the concomitant variation of paired measures. Suppose that a
test is designed to predict success in college. If it is a good test, high scores on it will be
related to high performance in college and low scores will be related to poor
performance. The coefficient of correlation gives us a way of stating the degree of
relationship more precisely. Coefficient of correlation is used in Retest-reliability by
determining the degree of relationship of the two sets of scores obtained from the same
test given to the same individual/group at different times. This is also applicable to other
ways of evaluating reliability.
The most frequently used method of determining the coefficient of correlation is the
product-moment method that yields the index conventionally designated r. The product
moment coefficient r varies between perfect positive correlation (r = + 1.00) and perfect
negative correlation (r = - 1.00). Lack of any relationship yields r = .00.

First Test Re-test


Students (dx) (dy) (dx)(dy)
(x-score) (y-score)
A 71 39 6 9 +54
B 67 27 2 -3 -6
C 65 33 0 3 0
D 63 30 -2 0 0
E 59 21 -6 -9 +54
Sum 325 150 0 0 +102
Mean 65 30

ðx = 4 ; ðy = 6
One of the paired measures has been labeled the x-score; the other, the y-score.
The dx and dy refer to the deviations of each score from its mean. N is the number of
paired measures, ðx and ðy are the standard deviation of the x-scores, and y-scores.
Computation of Standard Deviation

ð
d 2

N
Students x-score y-score dx dy dx2 dy2
A 71 39 6 9 36 81
B 67 27 2 -3 4 9
C 65 33 0 3 0 9
D 63 30 -2 0 4 0
E 59 21 -6 -9 36 81
Sum 80 180

Therefore:

80 180
ðx   16  4 ðy   36  6
5 5

Computation of Coefficient of Correlation


 dx dy   102
 .85
N ðx ðy  5 x 4 x6

 is less than 1 so the correlation is not perfect.


Psychology 8A
Module II, Lesson 3: VALIDITY OF MEASUREMENT

Lesson Objective:
At the end this lesson, the student should be able to:
1. define validity;
2. discuss the different procedures used to gather evidence for validity; and
3. explain what is a criterion.

Validity Defined
Once you have established reliability, validity comes into picture. Like reliability of
results, valid interpretations contribute to accuracy in evaluation. Somewhat different
procedures used to gather evidence for the validity of different kinds of interpretation.
1. Content Validation:
This involves gathering evidence that assessment tasks adequately represent the
“domain” of knowledge or skills to be assessed. Observations should not only be
adequate in number but they should also represent whatever is to be assessed and
do so in proportion to instruction. The result of a paper and pencil test of basketball
rules, for example, can represent knowledge of basketball rules, but it cannot
represent basketball skills, which are in the psychomotor domain. If you teach
psychomotor skills, you must test psychomotor skills in proportion to instruction in
these skills for the results to interpret as indications of them. Evident of content
validity is usually gathered by matching assessment tasks to anticipated learning
outcomes.
2. Construct Validation:
This involves gathering evidence that assessment results represent only what is to
be assessed. A “construct” is a meaningful interpretation of observations. An
interpretation has construct validity if it matches expectations based on theory. For
example, suppose you develop a fifth grade life science test to assess
understanding of systems of the human body using multiple choice items. After the
test, which you intended to assess comprehension of systems you discover that
some students did not know some of the general vocabulary words in some of the
test items. Factors such as limited vocabulary, tension, test anxiety, fatigue or
dishonesty can change the meaning of test results so that they do not signify what
was intended. Evidence of construct validity can be gathered by finding ways to
verify that assessment results are consequences of only what the test was intended
to assess and not of some other factor.
3. Criterion-related Validation:
This involves gathering evidence that assessment results have some value for
estimating a standard of performance (criterion). Validity can be assessed by
correlating the test score with some external criterion. For example, the positive
correlation between scores on the scholastic aptitude test and freshman grades in
college indicate that the tests had reasonable validity.

Criteria
A criterion is a measure, which could be used to determine the accuracy of a
decision. In psychological testing, criteria typically represent measures of the outcomes
that specific treatments or decisions are designed to produce. For example, workers are
selected for jobs on the basis of predictions the personnel department makes regarding
their future performance in the job. The job applicants who are actually hired are those
who, on the basis of test scores or other measures are predicted to perform at the
highest level. Actual measures of performance on the job serve as criteria for evaluating
the personnel department’s decisions if the workers who were hired actually do perform
at a higher level than those who were not hired. The predictions of the personnel
department are confirmed or validated. In similar ways, measures of grade point
average or years to complete a degree might serve as criteria for evaluating selection
and placement decisions in the schools.
The correlation between test score and a measure of the outcome of a decision
(criterion) provides an overall measure of the accuracy of predictions. Therefore, the
correlation between test scores and criterion scores can be thought of as a measure of
the validity of decisions. The validity coefficient or the correlation between test scores
and criterion scores provides the basic measure of the validity of a test for making
decisions.
Psychology 8A
Module II, Lesson 4: ITEM ANALYSIS

Lesson Objectives:
At the end of this lesson, the student should be able to:
1. discuss the purpose of item analysis as well as;
2. identify the critical features of test items;
3. learn how to measure distraction analysis and item difficulty; and finally
4. recognize the importance of item analysis.

Purpose of Item Analysis


The term “item analysis” refers to a loosely structured group of statistics that can be
computed for each item in a test. There are many “recipes” for item analysis. The exact
choice and interpretation of the statistics that make up an item analysis is partly
determined by the purpose of testing, and partly by the person designing the analysis.
Item analysis can help increase our understanding of tests. Close examination of
individual items is essential to understanding why a test shows specific levels of
reliability and validity. The reliability coefficient reveals something about the effects of
measurement error on test scores. Validity coefficient reveals something about the
accuracy of predictions that are based on test scores. A good item analysis often can be
very informative when tests are unreliable or when they fail to show levels of validity. An
item analysis can show why a test is reliable or unreliable. It may help in understanding
why test scores can be used to predict some criteria but not others. Item analysis may
also suggest ways of improving the measurement characteristics of a test.
Tests are sometimes limited in their reliability or validity because they contain items
that are poorly worded or that are “trick” questions requiring complex mental
gymnastics. Other items may look fine, but don’t actually measure the construct or the
content domain that the test is designed to assess. The reliability and validity of a test
generally can be improved by removing such items. Item analysis helps to locate items
that don’t meet the assumption that all test items measure the same thing, and
removing them improves the reliability of tests.

Critical Features of Test Items


The question that should be asked when examining each test is "Does this item do a
good job of measuring the same thing that is measured by the rest of the test?”
Attempts to answer this question usually draw upon information provided by three
types of measures:
1. Distracter Analysis:
In examining each item of a test, the first and most natural question to ask is
“How many people chose each response?” There is only one right answer; the
rest are referred to as distracters. Therefore, examining the total pattern of
responses to each item of a test is referred to as distracter analysis.
2. Item Difficulty:
Another natural question is “How many people answered the item correctly?” To
answer this question, an analysis is made of item difficulty.
3. Item Discrimination:
A third question is “Are responses to this item, related to responses to other
items on the test?” Answering this question entails an analysis of item
discrimination.

Distracter Analysis
Typically, there is one correct or preferred answer for each multiple-choice item on a
test. A lot can be learned about test items by examining the frequency with which each
of the incorrect responses is chosen by a group of examinees.
Examine the example below:
Paranoid Schizophrenia often involves delusions, persecution or grandeur. Which secondary
symptom would be most likely for a paranoid schizophrenic?
a. Auditory hallucinations
b. Motor paralysis
c. Loss of memory
d. Aversion to food
Correct response is (a):
Number choosing Percent choosing
each answer each answer
a 47 a 55
b 13 b 15
c 25 c 29
d 1 d 1

The table shows that most of the students answered the item correctly. A fair
number of students choose either b or c; very few chose d.
A perfect test item would have two characteristics.
1. People who “knew” the answer to that question would always choose the
correct response.
2. People who did not know the answer would choose randomly among the
response, meaning that some people would guess correctly. It also means
that each of the possible incorrect responses should be equally popular.
For the test item shown in the table, responses b, c, and d served as distracters.
Fifty-five percent (55%) of the students answered this item correctly. If this were a
perfect test item, we might expect the responses of the other forty-five percent (45%) of
the students to be equally divided among the three distracters. In other words, we might
expect about fifteen percent (15%) of the students to choose each of the three incorrect
responses.

What to look for in a Distracter Analysis


We can compute the number of people expected to choose each of the distracters
using the simple formula:
No. of persons expected = No. of persons answering the item
correctly
to choose each distracter No. of distracters
Thirty-nine people answered the item correctly. We therefore would expect thirteen
people to choose each distracter.
No. of persons expected to choose each distracter = 39 = 13

Item Difficulty
Difficulty is a surprisingly slippery concept. Considered the following two items:
1. (6 x 3) + 4 = _____
2. 9 π [1 n (-3.68) x (1-1n) (-3.68)] = _____
Most people would agree that b is more difficult than a. If asked why item b is more
difficult, they might say that it involves more complex advanced procedures than the
first.
Consider next another set of items:
1. Who was Savonarola?
2. Who was Babe Ruth?
Most people will agree that a is more difficult than b. If asked why item a is more
difficult, they might say that answering it requires more specialized knowledge than b.
As these examples might suggest, there is a strong temptation to define difficulty in
terms of the complexity or obscurity of the test item. Yet, if you were to look at any test
you have recently taken, you would likely find yourself hard pressed to explain precisely
why items were more difficult than others.
The psychologist doing an item analysis is faced with a similar problem. Some test
items are harder than others are, but it is difficult to explain or define difficulty in terms of
some intrinsic characteristics of the items. The strategy adopted by psychometricians is
to define difficulty in terms of the number of people who answer each test item correctly.
If everyone chooses the correct answer, the item is defined as an easy item. If only one
person in one hundred answers an item correctly, the item is defined as a difficult item.

Measuring Item Difficulty


The most common measure of item difficulty is the percentage of examinees that
answer the item correctly or the p value. An item’s p value is obtained using the
following formula:
P value for item x = Number of persons answering item correctly
Number of persons taking the test
The use of p values as a measure of item difficulty has several interesting
implications. First, the p value is basically a behavioral measure. Rather than defining
difficulty in terms of some intrinsic characteristics of the item, with this method difficulty
is defined in terms of the relative frequency with those taking the test. Choose the
correct response. Second, difficulty is a characteristic of both the item and the
population taking the test. A math problem that is very difficult when given in a high
school course will be very easy when given in a graduate physics course.
The most useful implication of the p value is that it provides a common measure of
the difficulty of test items that measure completely different domain.
Psychology 8A
Module II, Lesson 5:
COMPUTER APPLICATION IN THE INTERPRETATION OF TESTS

Lesson Objectives:
At the end of this lesson, the student should be able to:
1. determine how tests are generated from computers;
2. discuss how tests are administered by automated machines on computers;
3. explain adaptive testing and other applications of computer to test design;
and
4. demonstrate how computers are applied to scoring and interpretation of test
results.

Generating Test by Computers


The computer plays many roles behind the testing scene and can do a star turn a
tester. Because of its consistency, the computer carries standardization to an extreme,
yet it can achieve standardized measurement while presenting different questions and
personalized feedback to every test taker.
The most familiar kind of test is an organized whole, a printed booklet filled with
questions. A file of item (“item blank”) allows a computation of tests to order. A school
for example, asks a publisher for a social studies test on topics stressed in its
curriculum. Any number of equivalent tests can be made up by selecting every set of
items according to the same plan. Information on the difficulty of items in tryout groups
provides a basis for comparing individuals or groups who respond to new compilations
of items. Computers do much more than store the information. A computer can sort out
items, produce tests that meet specifications of a school system, direct the printer that
produces test copies, and, once the students have taken the test, prepare a report
students have taken the test, prepare a report comparing them with samples to whom
the items have been given previously. A human staff can assemble no more than a few
test forms for each job. With numerous items on a computer tape, the computer can
assemble a form for every applicant.
The advantage becomes considerably greater when the test covers problems a
computer can solve. Classroom teachers using the computer can prepare innumerable
forms of an instructional test. At the start of the week, the teacher could stock up on
tests over the week’s lessons. A common test maybe given to the whole class then
discussed and followed by a new form. Alternatively, copies may be put on a shelf; each
student can test himself when and as often as he chooses. The content elements the
teacher prepared are saved on cards or tape for use with future classes.
Automated Administration
Computers are good at giving structured tests directly to individuals. In an
audiometric test, for example, the computer delivers directions on a visual display or
from an audiotape, resets the signal without error and times signals precisely. The
computer can locate the threshold faster in seconds. After the last response, it can
display the final audiogram and print a permanent record. Satellite transmission makes
it possible to send the information instantly to a specialist halfway around the world.
Among the features of the computer are the following:
1. Precise adherence to schedules and plans.
2. Delicate control of stimuli and of judgment about responses.
3. Choice of successive test items in the light of performance to date.
4. Immunity to fatigue, boredom, lapse of attention and inadvertent scoring error.
5. Instant and accurate scoring.
6. Legible records in several forms, with multiple copies and distant
transmission.
For the test taker, interacting with the computer can be fascinating. A study in which
reasoning problems were presented to college students either by a computer or by a
human tester led to the following observations:
1. Computer-run subjects are more apt to pause for long periods of time without
typing anything into the computer. Human-run subjects may feel under some
pressure to keep behaving even at the risk of doing something wrong.
2. Computer-run subjects have been heard to shout for joy, curse, bang the
walls, and sing. These are modes of expression which a typical subject
avoids in the presence of an experimenter.
It might be thought that the computer is adapted only to literate and emotionally
stable subjects. But preschool children and mental patients respond well to automated
displays. The computer’s patience is inexhaustible. If the deteriorated schizophrenic
does not make a move for three hours, the display simply waits. If the distractible child
can only be captured for testing by four preliminary sessions with animated cartoons,
the computer provides them without fidgeting.
These long drawn out procedures are not expensive even when a full size computer
is used. The computer is called on only when a response is made. While “waiting” it
spreads its time among the subjects who are responding, perhaps on terminals in
distant places. Microcomputers now are in many counseling centers, clinics and schools
for on-line testing or for scoring alone.
Adaptive Testing
Not much use has been made of the computer’s ability to choose suitable test items
in the light of previous successes and failures. This is called sequential or adaptive or
tailored testing. The computer can follow elaborate rules in choosing items and can
easily compare persons who took different items.
Adjusting difficulty so that the examinee works mostly on items that are neither too
hard for him nor too easy might improve motivation. It is easy for the computer to
indicate instantly whether an answer is right or wrong and that might heighten
motivation.
Computer-assisted instruction embodies many features of adaptive testing. When
the student logs into the terminal today, the computer recalls not only yesterday’s level
of performance but also the tasks on which the student was least adequate. The
exercises themselves serve as test. Therefore, the computer’s record becomes an up to
the minute multidimensional description of the person.

Other Applications of Computer to Test Design


Besides providing the opportunity to adapt testing to the abilities and needs of
individual takers, computers can help circumvent other limitations of traditional group
tests. A major potential contribution is in expanding the variety of usable item types.
One possibility is to require the test taker to produce or construct a response from given
alternatives. Another is to allow the test taker to mediate feedback, until the correct
response is chosen. Still another involves response procedures especially suitable for
investigating the test takers problem-solving techniques. For instance, following the
initial presentation of the problem, examinees may have to ask the computer for further
information needed to proceed at each step in the solutions or they may be required to
respond by indicating the steps they follow in arriving at the solution. This approach has
been employed in medical education to test the diagnostic and therapeutic decision-
making skills of medical school students and graduates. An important educational
application is illustrated by a program designed for college freshmen in need of remedial
instruction in basic academic skills.

Computer-related Aptitudes
The rapid growth in the use of computers for office work has led to the publication of
several tests for computer-related aptitudes. These tests differ from the usual job
sample procedures employed to assess the competence of trained computer operators.
They are designed primarily for such purposes as the counseling or selection of
potential trainees, or the assignment of present employees to newly-established
function within an office.
An example is the Computer Aptitude, Literacy and Interest Profile. Standardized on
a nationally representative sample of about 1,200 persons, this test provides standard
score norms for each of its six subsets and for the total score measure largely
reasoning as applied to visual, non-verbal content.
Other tests have been developed for computer programmers, computer operators
and word processors. Particular tests vary in the extent of prior specialized training they
assume. Hence, they are designed for somewhat different populations of test takers. All
these tests clearly represent a timely application of psychometric techniques to
personnel assessment.

Computer Use in the Interpretation of Test Scores


Computers play a major impact on every phase of testing, from test construction to
administration, scoring, reporting and interpretation. The obvious uses of computers
represented simply an unprecedented increase in the speed with which data analyses
and scoring processes could be carried out. The use of computers in automated
administration of conventional tests, may also be considered in this category, insofar as
they provide easier and better ways of administering tests.
Also significant is the contribution of computers to the exploration of new procedures
and approaches to psychological testing that would have been impossible without the
flexibility and data-processing capabilities they provide.
Most current tests, especially those designed for group administration are now
adapted for computer scoring. Several test publishers, as well as independent test-
scoring organizations are equipped to provide such scoring services to test users.
Narrative computer interpretation of test results is also available for certain tests. The
computer program associates prepared verbal statements with particular patterns of test
responses. This approach has been pursued with both personality and aptitude tests.
For example, with the Minnesota Multiphasic Personality Inventory (MMPI) test users
may obtain computer printouts of diagnostic and interpretative statements about the test
taker’s personality tendencies and emotional condition, together with the numerical
scores.
For test users who have access to a microcomputer, there are increasing
opportunities to purchase computer programs that yield not only numerical but also
interpretative reports for particular tests, such as the revised Wechsler Intelligence
Scales for children (WISC-R) and for adults (WAIS-R).

Hazards and Guidelines


Although computers have undoubtedly opened the way for unprecedented
improvements in all the aspects of psychological testing, certain applications of
computers may lead to misuse and misinterpretations of test scores. In an effort to
guard against these hazards, considerable attention has been given to developing
guidelines for computer-based testing.
Psychology 8A
Module II, Lessons 1 TO 5
SELF-PROGRESS CHECK TESTS

LESSON 1
I. Fill in the blanks with the correct word or group of words.
_____ 1.
numbers Measurement is the process of assigning (___) to objects/persons in such
a way that specific properties/attributes being measured are faithfully
reflected by some properties of the numbers.
total_____
person2. Psychological test does not attempt to measure the (___) but only some
specific set of attributes of the person.
quantitative
_____data3. The primary objective of statistical method is to organize and summarize
(___) data in order to facilitate their understanding.
statistics
_____ 4. (___) provides a method for communicating information about test scores
and for determining what conclusions can and cannot be drawn from
those scores.
norms
_____ 5. Scores on psychological tests are interpreted by reference to (___), which
represent the test performance of the standardization sample.
norm-based_____
interpretation
6. When a person’s test score is interpreted by comparing that score to the
scores of several other people, this is referred to as a (___) interpretation.
Binet
_____ 7. Since the time of (___) who developed the first extensively used and
successful test of intelligence, a comprehensive body of theory and
technique has been developed which is primarily a statistical nature.
Reliability
_____ 8. (___) is the extent to which a test measures something consistently.
validity
_____ 9. The general worthiness of an examination is referred to as (___).
time, effort, money
_____ 10. Test should be selected on the basis of the extent to which it can be used
without unnecessary expenditure of (___), (___), and (___).
II. True or False.
F
_____ 1. Individuals differ only in interest and preferences but not in perceptions
and beliefs.
F
_____ 2. There is a test that measures whether one is a good person or a worthy
human being.
T
_____ 3. Statistical methods have found extensive application in psychological
testing.
T
_____ 4. A norm-based score indicates where an individual stands in comparison to
other persons.
_____
F 5. A test is considered highly valid if it yields approximately the same scores
when given a second time.
_____
T 6. Tests must also be usable.
F
_____ 7. Reliability is the closeness of agreement between the scores of the test
and some other measures.
_____
F 8. Psychological measurement specialists are interested in assigning
numbers to individuals that will reflect their differences.
T
_____ 9. Statistics can be used to make inferences about the meaning of test
scores.
_____
F 10. Scores on psychological test always provide absolute ratio scale
measures of psychological attributes.

LESSON 2
I. Fill in the blank space with the correct word or group of words.
_____ 1. Test scores are (___) when they are reproducible and consistent.
_____ 2. (___) reliability can be done by treating half of the test separately.
_____ 3. If a test yields different results when it is administered on difficult
occasions or scored by different people it is (___).
_____ 4. The coefficient of (___) between paired scored is called a reliability
coefficient.
_____ 5. A well-constructed test usually have a reliability coefficient r of (___) or
greater.
_____ 6. In the product-moment method, lack of any relationship between two tests
yields r equal to (___).
_____ 7. (___) reliability can be done by giving the test in two different but
equivalent forms.
_____ 8. (___) refers to the concomitant variation of paired measures.
_____ 9. If it is a good test, high score on it will be related to high performance in
college and low scores will be related to (___).
_____ 10. In the method of (___) consistency, the essential characteristic is that the
criterion is none other than the total score on the test itself.
II. Compute the coefficient of correlation and analyze.

Students First Test (x-score) Re-test (y-score)


A 85 72
B 73 61
C 65 60
D 60 54
E 48 40
F 30 25

LESSON 3
I. True or False.
_____ 1. Once you have established reliability, validity is out of the picture.
_____ 2. The result of a paper and pencil test of basketball rules can represent
basketball skills.
_____ 3. Validity can be assessed by correlating the test scores with some external
criterion.
_____ 4. In psychological testing criteria typically represent measures of the
outcomes that specific treatments or decisions are designed to produce.
_____ 5. Actual measures of performance on the job cannot serve as criteria for
evaluating the personnel department decision.
II. Fill in the blanks with the correct word or group of words.
_____ 1. A (___) is a meaningful interpretation of observation.
_____ 2. A criterion is a measure, which could be used to determine the (___) of
decision.
_____ 3. (___) validation involves gathering of evidences that assessment results
have some value for estimating standard.
_____ 4. (___) validation involves gathering of evidence that assessment tasks
adequately represent the “domain” of knowledge or skills to assess.
_____ 5. Evidence of (___) validity can be gathered by finding ways to verify that
assessment of results are the consequence of only what the test intended
to assess and not of some other factor(s).
LESSON 4

Answer the following briefly but clearly.


1. What are some advantages of item analysis?
2. The correct response for a particular item is b. 25 persons out of 35 answer the
question correctly.
Item Number choosing each answer
A 38
B 10
C 16
D 13

a. Compute for number of persons expected to choose each distracter.


b. Compute for the p value of the particular item.

LESSON 5
Answer the following questions briefly but clearly.
1. List down the features of computers as related to psychological testing:
2. Discuss the significant contributions of computers to scoring and interpretation of
tests.
3. Explain how classroom teachers use computers in preparation of instructional
tests.
Psychology 8A
Module II
ANSWERS TO THE SELF-PROGRESS CHECK TESTS

Lesson 1
1. numbers 6. Norm Bared
2. Total Persons 7. Binet
3. Quantitative Numerical 8. Reliability
4. Statistics 9. Validity
5. Norms 10. Effort, Time & Money
II.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

F F T F F T F T T F

Lesson 2
1. Reliable 6. .00
2. Split 7. Equivalent form
3. Unreliable 8. Correlation
4. Correlation 9. Poor Performance
5. .90 10. Internal
II. coefficient of correlation = .99 or 1 is equal to 1 so the correlation is perfect

Lesson 3
Test I.
1. F 2. F 3. T 4. T 5. F
II.
1. Construct
2. Accuracy
3. Criterion Related
Lesson 4
1. a. help increase understanding of test
b. if test is reliable or unreliable
c. determine if tests fail to show levels of validity
d. help understood why test can be used to predict some criteria
e. Suggest ways of improving the measurement of characteristics
2. Number of persons expected to chose each distracter=8; p value = 71

Lesson 5
1. a. Precise adherence to schedule and plans
b. delicate control of stimuli and of judgment about responses
c. choice of successive te0st items in the light of performance to date
d. Immunity to fatigue, boredom, lapse of attention and inadvertent scoring
error.
e. Instant and accurate scoring
f. legible records in several forms, with multiple copies and distend
transmission.
2. a. speed in data analyses and scoring
b. Automated administration provide easier and better way of administration
3. a. Teacher can prepare innumerable forms of instructional tests
b. Tests can be saved on cards or tape for future use
c. Teacher could stock up on tests over the weeks lesson.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy