0% found this document useful (0 votes)
60 views8 pages

Session 3 - Applying Principles

Uploaded by

james0445
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views8 pages

Session 3 - Applying Principles

Uploaded by

james0445
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

SESSION 3: APPLYING PRINCIPLES TO THE EVALUATION OF

CLASSROOM TESTS

On completion of this session, you will be able to:


• Apply the 5 key principles to evaluation of classroom tests to produce good quality
tests

Introduction
The five principles of practicality, reliability, validity, authenticity, and washback go a long way
toward providing useful guidelines for both evaluating an existing assessment procedure and
designing one on your own. Quizzes, tests, final exams, and standardized proficiency tests can
all be scrutinized through these five lenses.
Are there other principles that should be invoked in evaluating and designing assessments?
The answer, of course, is yes. Language assessment is an extraordinarily broad discipline with
many branches, interest areas, and issues. The process of designing effective assessment
instruments is far too complex to be reduced to five principles. Good test construction, for
example, is governed by research-based rules of test preparation, sampling of tasks, item
design and construction, scoring responses, ethical standards, and so on. But the five
principles cited here serve as an excellent foundation on which to evaluate existing
instruments and to build your own.
We will look at how to design tests in Session 4 and at standardized tests in Session 5. The
questions that follow here, indexed by the five principles, will help you evaluate existing tests
for your own classroom. It is important for you to remember, however, that the sequence of
these questions does not imply a priority order. Validity, for example, is certainly the most
significant cardinal principle of assessment evaluation. Practicality may be a secondary issue
in classroom testing. Or, for a particular test, you may need to place authenticity as your
primary consideration. When all is said and done, however, if validity is not substantiated, all
other considerations may be rendered useless.
1. Are the test procedures practical?
Practicality is determined by the teacher's (and the students') time constraints, costs, and
administrative details, and to some extent by what occurs before and after the test. To
determine whether a test is practical for your needs, you may want to use the checklist below.
8
Practicality checklist
o 1. Are administrative details clearly established befc;:>re the test?
o 2. Can students complete the test reasonably within the set time frame?
o 3. Can the test be administered smoothly, without procedural "glitches"?
o 4. Are all materials and equipment ready?
o 5. Is the cost of the test within budgeted limits?
o 6. Is the scoring/evaluation systel11.teasible in the teacher's time frame?
o 7. Are methods for reporting results determined in advance?
As this checklist suggests, after you account for the administrative details of giving a test, you
need to think about the practicality of your plans for scoring the test. In teachers' busy lives,
time often emerges as the most important factor, one that overrides other considerations in
evaluating an assessment. If you need to tailor a test to fit your own time frame, as teachers
frequently do, you need to accomplish this without damaging the test's validity and washback.
Teachers should, for example, avoid the temptation to offer only quickly scored multiple-
choice selection items that may be neither appropriate nor well-designed. Everyone knows
teachers secretly hate to grade tests (almost as much as students hate to take them!) and will
do almost anything to get through that task as quickly and effortlessly as possible. Yet good
teaching almost always implies an investment of the teacher's time in giving feedback-
comments and suggestions-to students on their tests.
2. Is the test reliable?
Reliability applies to both the test and the teacher, and at least four sources of unreliability
must be guarded against, as noted in the second section of this chapter. Test and test
administration reliability can b~ achieved by making sure that all students receive the same
quality of input, whether written or auditory. Part of achieving test reliability depends on the
physical context making sure, for example, that
• every student has a cleanly photocopied test sheet,
• sound amplification is clearly audible to everyone in the room,
• video input is equally visible to all,
• lighting, temperature, extraneous noise, and other classroom conditions are equal (and
optimal) for all students, and • objective scoring procedures leave little debate about
correctness of an answer.
Rater reliability, another common issue in assessments, may be more difficult, perhaps
because we too often overlook this as an issue. Since classroom tests rarely involve two
scorers, inter-rater reliability is seldom an issue. Instead, intra-rater reliability is of constant
concern to teachers: What happens to our fallible concentration and stamina over the period
of time during· which we are evaluating a test? Teachers need to find ways to maintain their
concentration and stamina over the time it takes to score assessments. In open-ended
response tests, this issue is of paramount importance. It is easy to let mentally established
8
standards erode over the hours you require to evaluate the test.
Intra-rater reliability for open-ended responses may be enhanced by the following guidelines:
•Use consistent sets of criteria for a correct response.
•Give uniform attention to those sets throughout the evaluation time.
•Read through tests at least twice to check for your consistency. ._
• If you have made "mid-stream" modifications of what you consider as a correct response,
go back and apply the same standards to all.
• Avoid fatigue by reading the tests in several sittings, especially if the time requirement is a
matter of several hours.
3. Does the procedure demonstrate content validity?
The major source of validity in a classroom test is content validity: the extent to which the
assessment requires students to perform tasks that were included in the previous classroom
lessons and that directly represent the objectives of the unit on which the assessment is based.
If you have been teaching an English language class to fifth graders who have been reading,
summarizing, and responding to short passages, and if your assessment is based on this work,
then to be content valid, the test needs to include performance in those skills.
There are two steps to evaluating the content validity of a classroom test.
1. Are classroom objectives identified and appropriately framed? Underlying every
good· classroom test are the objectives of the lesson, module, or unit of the course in question.
So the first measure of an effective classroom test is the identification of objectives.
Sometimes this is easier said than done. Too often teachers work through lessons day after
day with little or no cognizance of the objectives they seek to fulfill. Or perhaps those,
objectives are so poorly framed that determining whether or not they were accomplished is
impossible. Consider the following objectives for lessons all of which appeared on lesson plans
designed by students in teacher preparation programs:
a. Students should be able to demonstrate some reading comprehension.
b. To practice vocabulary in context.
c. Students will have fun through a relaxed activity and thus enjoy their learning.
d. Students will produce yes/no questions with final rising intonation.
Only the last objective is framed in a form that lends itself to assessment. In (a), the modal
should is ambiguous and the expected performance is not stated. In (b), everyone can fulfill
the act of "practicing"; no standards are stated or implied. For obvious reasons, (c) cannot be
assessed. Objective (d), on the other hand, includes a performance verb and a specific
linguistic target. By specifying acceptable and unacceptable levels of performance, the goal
can be tested. An appropriate test would elicit an adequate number of samples of student
performance, have a clearly framed set of standards for evaluating the performance (say, on
a scale of 1 to 5), and provide some sort of feedback to the student.

8
2. Are lesson objectives represented in the form of test specifications? The next content-
validity issue that can be applied to a classroom test centres on the concept of test
specifications. Don't let this word scare you. It simply means that a test should have a
structure that follows logically from the lesson or unit you are testing. Many tests have a
design that:

• divides them into a number of sections (corresponding, perhaps, to the objectives that are
being assessed),
• offers students a variety of item types, and
• gives an appropriate relative weight to each section.
Some tests, of course, do not lend themselves to this kind of structure. A test in a course in
academic writing at the university level might justifiably consist of an in class written essay on
a given topic-only one "item" and one response, in a manner of speaking. But in this case the
specs (specifications) would be embedded in the prompt itself and in the scoring or evaluation
rubric used to grade it and give feedback.
The content validity of an existing classroom test should be apparent in how the objectives of
the unit being tested are represented in the form of the content of items, clusters of items,
and item types. Do you clearly perceive the performance of test-takers as reflective of the
classroom objectives? If so, and you can argue this, content validity has probably been
achieved.
4. Is the procedure face valid and "biased for best"?
This question integrates the concept of face validity with the importance of structuring an
assessment procedure to elicit the optimal performance of the student. Students will
generally judge a test to be face valid if
• directions are clear,
• the structure of the test is organized logically,
 its difficulty level is appropriately pitched,
• the test has no "surprises," and
• timing is appropriate.
A phrase that has come to be associated with face validity is "biased for best," a term that
goes a little beyond how the student views the test to a degree of strategic involvement2n
the part of student and teacher in preparing for, setting up, and following up on the test itself.
According to Swain (1984), to give an assessment procedure that is "biased for best," a
teacher
• offers students appropriate review and preparation for the test,
• suggests strategies that will be beneficial, and
• structures the test so that the best students will be modestly challenged and the weaker
students will not be overwhelmed.
It's easy for teachers to forget how challenging some tests can be, and so a well planned
8
testing experience will include some strategic suggestions on how students might optimize
their performance. In evaluating a classroom test, consider the extent to which before-,
during-,.and after-test options are fulfilled.

Test-taking strategies

Before the Test


1. Give students all the information you can about the test: Exactly what will the test cover?
Which topics will be the most important? What kind of items wiII be on it? How long wiII
it be?
2. Encourage students to do a systematic review of material. For example, students should
skim the textbook and other material, outline major points, write down examples.
3. Give them practice tests or exercises, if available.
4. Facilitate formation of a study group, If possible.

S. Caution students to get a good night's rest before the test.


6. Remind students to get to the classroom early.
During the Test
1. After the test is distributed, tell students to look over the whole test quickly in order to
get a good grasp of its different parts.
2. Remind them to mentally figure out how much time they will need for each part.
3. Advise them to concentrate as carefully as possible.
4. Warn students a few minutes before the end of the class period so that they can finish on
time, proofread their answers, and catch careless errors.

After the Test


1•. When you return the test, include feedback on specific things the student did well, what
he or she did not do well, and, if possible, the reasons for your comments.
2. Advise students to pay careful attention in class to whatever you say about the test resuIts.
3. Encourage questions from students.
4. Advise students to pay special attention in the future to points on which they are weak.

Keep in mind that what comes before and after the test also contributes to its face validity.
Good class preparation will give students a comfort level with the test, and good feedback-
washback-will allow them to learn from it.

5. Are the test tasks as authentic as possible?


Evaluate the extent to which a test is authentic by asking the following questions:
• Is the language in the test as natural as possible?
• Are items as contextualized as possible rather than isolated?
• Are topics and situations interesting, enjoyable, and/or humorous?
• Is some thematic organization provided, such as through a story line or episode?
• Do tasks represent, or closely approximate, real-world tasks?

8
Consider the following two excerpts from tests, and the concept of authenticity may become
a little clearer.

Multiple-choice tasks—contextualized
"Going To"
1. What ------------this weekend?
a. you are going to do
b. are you going to do
c. your gonna do

2. I'm not sure------------- anything special?


a. Are you going to do
b. You are going to do
c.ls going to do

3. My friend Melissa and I -------------------a party. Would you like to come?


a. am going to
b. are going to go to
c. go to

4. I'd love to!


a. What's it going to be?
b. Who's going to be?
c. Where's it going to be?

5. It is-------------- to be at Ruth's house.


a. go
b. going
c. gonna

Multiple-choice tasks-decontextualized
1. There are three countries I would like to visit. One is Italy.
a. The other is New Zealand and other is Nepal.
b. The others are New Zealand and Nepal.
c. Others are New Zealand and Nepal.

2. When I was twelve years old, I used--------------- everyday.


a. swimming
b. to swimming
c. to swim

3. When Mr. Brown designs a website, he always creates it


a. artistically
b. artistic
c. artist

4. Since the beginning of the year, I -------------at Millennium Industries.

8
a. am working
b. had been working
c. have been working

5. When Mona broke her leg, she asked her husband-------------- her to work.
a. to drive
b. driving
c. drive

The sequence of items in the contextualized tasks achieves a modicum of authenticity by


contextualizing all the items in a story line. The conversation is one that might occur in the
real world, even if with a little less formality. The sequence of items in the decontextualized
tasks takes the test-taker into five different topic areas with no context for any. Each sentence
is likely to be written or spoken in the real world, but not in that sequence. Given the
constraints of a multiple-choice format, on a measure of authenticity I would say the first
excerpt is "good" and the second excerpt is only "fair."

6. Does the test offer beneficial washback to the learner?

The design of an effective test should point the. way to beneficial washback. A test that
achieves content validity demonstrates relevance to the curriculum in question and thereby
sets the stage for washback. When test items represent the various objectives of a unit,
and/or when sections of a test clearly focus on major topics of the unit, classroom tests can
serve in a diagnostic capacity even if they aren't specifically labelled as such.
Other evidence of washback may be less visible from an examination of the test itself. Here
again, what happens before and after the test is critical. Preparation time before the test can
contribute to washback since the learner is reviewing and focusing in a potentially broader
way on the objectives in question. By spending classroom time after the test reviewing the
content, students discover their areas of strength and weakness. Teachers can raise the
washback potential by asking students to use test results as a guide to setting goals for their
future effort. The key is to play down the "Whew, I'm glad that's over" feeling that students
are likely to have, and play up the learning that can now take place from their knowledge of
the results.

Some of the alternatives in· assessment referred to in Session 2 may also enhance washback
from tests. Self-assessment may sometimes be an appropriate way to challenge students to
discover their own mistakes. This can be particularly effective for writing performance: once
the pressure of assessment has come and gone, students may be able to look back on their
written work with a fresh eye. Peer discussion of the test results may also be an alternative
to simply listening to the teacher tell everyone what they got right and wrong and why.
Journal writing may offer students a specific place to recor1 their feelings, what they learned,
and their resolutions for future effort.

The five basic principles of language assessment were expanded here into six essential
questions you might ask yourself about an assessment. As you use the principles and the
guidelines to evaluate various forms of tests and procedures, be sure to allow each one of the
five to take on greater or lesser importance, depending on the context. In large-scale

8
standardized testing, for example, practicality is usually more important than washback, but
the reverse may be true of a number of classroom tests. Validity is of course always the final
arbiter. And remember, too, that these principles, important as they are, are not the only
considerations in evaluating or making an effective test. Leave some space for other factors
to enter in.

In the next session, the focus is on how to design a test. These same five principles underlie
test construction as well as test evaluation, along with some new facets that will expand your
ability to apply principles to the practicalities of language assessment in your own classroom.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy