Test Utility PDF
Test Utility PDF
Cada 4/05/21
AB-PSY III Prof. Quing
TEST UTILITY
B. 1. An empirical standard used to divide a group of data into two or more distinct
categories describes a:
a. norm-referenced test
b. cut score
c. predictive yield
d. hit rate
A. 2. An educational psychologist conducts a utility analysis of a teaching program used to
improve the handwriting of very young children. The measure of utility in this analysis
will most likely be:
a. increase in performance level
b. decrease in costs
c. increase in revenue
d. reduction in accidents
B. 3. An index of utility can be distinguished from an index of reliability and an index of
validity in that an index of utility can tell us something about:
a. whether a test measures what it purports to measure in a particular context
b. the practical value of the information derived from what a test measures
how consistently a test measures what it measures in a particular context
c. none of the above.
A. 4. A problem with using the known group method of setting cut score is that:
a. no standards are in place for choosing contrasting groups
b. test users must be personally familiar with each member in the known group
c. there is no consistent method of obtaining contrasting groups
d. strong deterrents to test user acceptance of the data are in place
B. 5. The "miss rate" is equivalent to:
a. the base rate/the selection ratio
b. the number of incorrect classifications/the selection ratio
c. the number of correct classifications/total number of classifications
d. the success rate/base rate of successful performance
D 6. The term "bookmark method" refers to an IRT-based method:
a. of setting cut scores based on expert judgment
b. of marking items with disregard to difficulty in a book of items
c. with no possible drawbacks such as floor or ceiling effects
d. all are correct
D 7. The Angoff method of setting cutting scores relies heavily on ________.
a. data and empirical findings
b. both data and empirical findings and the scholarly research literature
c. the scholarly research literature
d. the judgment of experts
B 8. When evaluating the utility of a particular test, an evaluation is made of the _____
incurred by testing as compared to the ________ accrued from testing.
a. benefits ; costs
b. costs ; benefits
c. gain ; benefits
d. expenses ; gains
D 9. Which of the following is a benefit of a good university admissions program?
a. enhances the university's reputation for having successful graduates
b. reduced load on counselors and on disciplinary personnel and boards
c. high morale and a good learning environment for students
d. all of the above
C 10. Which of the following is a noneconomic cost of not testing or ineffective testing?
a. cost of lawsuits filed due to negligence or injury
b. loss of ticket sales due to loss of public confidence
c. injuries, death, or other forms of serious harm
d. rentals, insurance, professional fees
D 11. The first step in constructing a psychological test is to:
a. determine the sample size to which the test is administered
b. review the relevant literature
c. identify a likely publisher for the test
d. be clear about the construct or constructs to be assessed with the test
D 12. Professor Smith who teaches an American history course, is putting together a 100
item exam. You would advise him to initially write:
a. 110 items
b. 200 items
c. 150 items
d. only the items he will use
C 13. The process of administering a preliminary test form to a group of subjects is:
a. content analysis
b. item analysis
c. pilot testing
d. standardization
A 14. Distractors in a multiple choice test should be:
a. equally attractive to the test taker
b. clearly correct or incorrect
c. substantially shorter than the stem
d. more confusing to those who know the correct answer
B 15. The difficulty of an item is defined as:
a. the percentage of persons who answer incorrectly
b. the percentage of persons who answer correctly
c. the actual number of people who know the right answer
d. the actual number of people who don’t know the right answer
A 16. The discriminability of an item refers to the capacity of the item to:
a. separate those that are high and low on the trait of interest
b. distinguish between minority and majority groups
c. identify those who get the item correct
d. identify those who get the difficult items right
D 17. A variant of the completion item:
a. multiple choice
b. true or false
c. matching type
d. fill in the blanks
C 18. In computing item discrimination indices, the best strategy from a statistical point of
view is to compare:
a. those above the median versus those below
b. “A” students versus failing students
c. upper 27% versus the lower 27%
d. those with delta scores of above 50 versus below 50
A 19. If you intend to determine if the items on a test appear to be measure the same thing,
what statistical tool should you use?
a. factor analysis
b. correlation
c. regression
d. analysis of variance
A 20. The item validity is the:
a. correlation of the item score with the total score on the test
b. correlation of the item with an external criterion measure of the construct being
tested
c. average correlation of the item with all other items
d. correlation of the item with the average score on all other items
PERFORMANCE TASK:
1. Suppose the Federal Bureau of Investigation hires 5,000 new employees in every year,
and that each employee stays with FBI for an average of four years. Let’s further suppose
that the standard deviation of performance of the employees is about $16,000 (which is
40% of their annual salary). The validity coefficient of their newly acquired
psychological test is .75, and that the average test score of the employees is .90. The test
costs $67 to test each applicant and the FBI tests 30,000 applicants annually. Compute for
the utility gain.
Guide Information:
• 5,000 new employees every year
• Each stay for an average of 4 years
• SD about $16,000
• Validity coeff = .75
• Average test score = .90
• Test cost $67
• 30,000 applicants annually
ΔU =(N) (T) (SDy) (rxy) (Zx) - (N) (Cy)
2. Ten students have taken an objective assessment. The quiz contained 10 questions. In the
table below, the students’ scores have been listed from high to low (Joe, Dave, Sujie,
Darrell, and Eliza are in the upper half). There are five students in the upper half and five
students in the lower half. The number “1” indicates a correct answer on the question; a
“0” indicates an incorrect answer. Calculate the Difficulty Index (p) assuming that there
was no random guessing that happened and the Discrimination Index (D) for each
question.
Total Questions
Student
Score
Name (%) 1 2 3 4 5 6 7 8 9 10
Joe 100 1 1 1 1 1 1 1 1 1 1
Dave 90 1 1 1 1 1 1 1 1 0 1
Sujie 80 1 1 0 1 1 1 1 1 0 0
Darrell 70 0 1 1 1 1 1 0 1 0 1
Eliza 70 1 1 1 0 1 1 1 0 0 1
Zoe 60 0 1 1 0 1 1 0 1 0 0
Grace 60 0 1 1 0 1 1 0 1 0 1
Hannah 50 0 1 1 1 0 0 1 0 1 0
Ricky 40 0 1 1 0 1 0 0 0 0 1
Anita 30 0 1 0 0 0 1 0 0 1 0
# Correct # Correct
Difficulty Discrimination
Item (Upper (Lower
(p) (D)
group) group)
Question 2 5 5 1.0 0
Question 3 4 4 0.8 0