0% found this document useful (0 votes)
19 views29 pages

V. Test Development 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views29 pages

V. Test Development 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 29

Test Development

Test Development
Process
1. Test Conceptualization

2. Test Construction

3. Test Tryout

4. Item Analysis

5. Test Revision
Test Conceptualization-
Preliminary Questions
 What is the test designed to
measure?
What is the objective of the test?
Is there a need for this test?
Who will use this test?
Who will take this test?
What content will the test cover?
How will the test be
administered
What is the ideal format of the
test?
Test Conceptualization-
Preliminary Questions
What special training will be
required of test users for
administering or interpreting the
test?
What type of response will be
required of testtakers?
Who benefits from an
administration of this test?
Is there any potential for harm as
the result of an administration of
this test?
How will meaning be attributed to
Test Conceptualization-
Item Development Issues
Norm-reference test
Criterion-referenced test

Pilot Work
Test Construction

Scaling
Writing Items
Scoring Items
Test Construction

Scaling-process of setting rules


for assigning numbers in
measurement
Absolute Scaling-procedure for
obtaining a measure of item
difficulty across samples of
testtakers who vary in ability.
Test Construction-
Types of Scale
Age-based scale-interest is on
the test performance as function
of age.
Grade-based scale-interest is
on the test performance as
function of grade.
Stanine scale-when raw scores
are to be transformed into scores
that range from 1-9.
Unidimensional vs.
Multidimensional
Test Construction-
Scaling Methods
Rating Scales-a grouping of
words, statements or symbols on
which judgments of the strength of
a particular trait, attitude or
emotion are indicated by the
testtaker.
Summative Scale-final test score
is obtained by summing the
ratings of all the items.
Test Construction-
Scaling Methods
Likert Scale-contains 5-7 alternative
responses which may include the
following continuum:
- agree/disagree -
approve/disapprove
Paired Comparisons-testtakers are
presented with 2 stimuli which they
must compare in order to select one.
Comparative Scale-entails judgment
on a stimulus in comparison with other
stimulus on the scale.
Test Construction-
Scaling Methods
Categorical Scale-done by
placing stimuli into alternative
categories that differ
quantitatively.
Guttman Scale-entails all
respondents who agree with the
stronger statements will also agree
with the milder statements.
Test Construction-
Writing Items
Questions to Consider by the
Test Developer
1. What range of content should
the items cover?
2. Which of the many different
types of item formats should be
employed?
3. How many items should be
written?
Test Construction-
Writing Items
Item Pool-reservoir or well from which
adequate items will be drawn or
discarded for the final version of the
test.
Items could be derived from the test
developer’s personal experience or
academic acquaintance with the subject
matter.
Help may also be sought through
experts in their respective fields.
Test Construction-
Writing Items
Item Format-form, plan,
structure, arrangement and layout
of individual test items.
1. Selected-response-requires
testtakers to select a response
from a set of alternative
responses.
-Multiple choice -Matching type
-Binary choice
Test Construction-
Writing Items
2. Constructed-response-
requires testtakers to supply or
create the correct answer.
-Completion item-requires
examinee to provide a word or
phrase that completes a sentence.
Test Construction-
Writing Items
Writing Items for Computer
Administration
Item bank-large, easily accessible
collection of test questions.
Computerized Adaptive Testing
(CAT)- interactive, computer-
administered testtaking process
wherein items presented to the
testtaker are based in part of the
testtaker’s performance on previous
items.
Test Construction-
Writing Items
Item branching-ability of the
computer to tailor the content and
order of presentation of test items.
Test Construction-
Scoring Items
Cumulative scoring-the higher the
score on a test, the higher the ability or
trait.
Class/Category scoring-response
earn credit toward placement in a
particular class or category with other
testtakers whose pattern of responses
are similar.
Ipsative scoring-comparison of
testtaker’s score on one scale within a
test with another scale within that same
Test Tryout

Test should be tried out on people


similar in critical respects to the people
for whom the test was designed.
Subjects should not be fewer than 5,
rather ideally 10. The more subjects,
the better.
Tryout should be executed under
conditions as identical as possible to the
condition under which the standardized
test will be administered.
Item Analysis

Item-Difficulty Index-obtained
by calculating the proportion of the
total number of testtakers who got
the item right.
Value can range from 0-1.
Optimal item difficulty should be
determined in respect to the
number of opitions.
Item Analysis

Item-Reliability Index-provides
an indication of internal
consistency of a test. The higher
the index, the greater the internal
consistency.
Obtained using factor analysis.
Item Analysis

Item-Validity Index-statistic
designed to provide an indication
of the degree to which a test is
measuring what it purports to
measure; the higher the item-
validity index, the greater the
test’s criterion-related validity.
Item Analysis

Item-Discrimination Index-indicate
how adequate an item separates or
discriminates between high scorers and
low scorers on an entire test.
Qualitative Item Analysis-
nonstatistical procedure designed to
explore how individual test items work.
“Think aloud” Test administration
Expert panels
Test Revision

Characterize each item according


to its strengths and weaknesses.
Balance various strengths and
weaknesses across items.
Administer the revised test under
standardized conditions to a
second appropriate sample of
examinees.
Test Revision
Characteristics of Tests that
are Due for Revision
1. Current testtakers cannot relate to
the test.
2. Vocabulary that is not readily
understood by the testtakers.
3. Inappropriate meaning of words
dictated by popular culture change.
4. Test norms are no longer
adequate as a result of group
Test Revision
Characteristics of Tests that
are Due for Revision
5. Test norms are no longer
adequate as a results of age-
related shifts.
6. Reliability and validity is improved
by revision.
7. Theory on which the test was
based has been improved.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy