Principle of Language Assessment 2003
Principle of Language Assessment 2003
The practically, reliability, validity, authenticity, and washback. Are parts of the five
cardinal criteria for “testing a test”.
PRACTICALLY:
RELIABILITY:
A reliable test is consistent and dependable. If you give the same test to the same
Ss or matched Ss on two different occasions, the test should yield similar results.
The issue of reliability of a test may be addressed by considering number of factors
that may contribute to the unreliability of a test.
*Ss-Related Reliability:
Is cause by temporary illness, fatigue, a “bad day”, anxiety, and other physical or
psychological which may make an “observed” score deviate from one´s “true”
score.
*Rater Reliability:
Human error, subjectively, and bias may enter into the scoring process. Inter-rater
reliability occurs when two or more scores yield inconsistent scores in the same
test, possibly for lack of attention to scoring criteria, inexperience, inattention, or
even preconceived biases.
Unreliability may also result from the conditions in which the test is administered.
For instant, the administration of a test of aural comprehension in which a tape
recorded is played but the street noise outside the classroom so the Ss sitting next
to windows could not heard the tape accurately.
*Test Reliability:
Sometime the nature of the test itself can cause measurement errors. If a test is
too long, test-takers may become fatigued by time they reach the later items and
hastily respond incorrectly.
VALIDITY:
A valid test is not just about of reading ability or to measure writing ability. Such
test would be easy to administer (practical) and the scoring quite dependable. But
it would constitute a valid test of writing ability without some consideration of
comprehensibility, rhetorical discourse elements, end the organization of the ideas,
among others factors.
*Content-Related evidence
You can usually identify content-related evidence observationally if you can clearly
define the achievement that you are measuring. A test of tennis competency that
asks someone to run 100-yards dash obviously lacks of validity.
*Criterion Related Evidence
In the case of the teacher made classroom assessments, this is the best
demonstrated trough a comparison of the results of an assessment with results of
some other measure of the same criterion. Criterion-related evidence falls into two
categories: concurrent and predictive validity. The concurrent is the result is
supported by other concurrent performance beyond the assessment itself. And
predictive of an assessment becomes important in the case of placement tests,
admission assessments batteries, language attitudes test, and so on.
*Construct-Related Evidence
*Consequential Validity
*Face Validity
Face validity refers to the degree to which a test looks right, and appears to
measure the knowledge or abilities its claims to measure, based on the subjective
subjects on the examinees who take it, the administrative personnel who decide on
its use, and other psychologically unsophisticated observes. Face validity means
that the Ss perceive the test to be valid.
The design of an effective test should point the way to beneficial washback. A test
that achieves content validity demonstrates relevance to the curriculum in question
and thereby sets the stage of washback.
Other evidence of washback may be less visible from an examination of the test
itself. What happens before and after the test is critical. Preparation time before the
test can contribute to washback since the learner is reviewing and focusing in a
potentially broader way on the objectives in question. By spending classroom time
after the test reviewing the content, students discover their areas of strength and
weakness. Teachers can raise the washback potential by asking students to the
use test results as a guide to setting goals for their future effort.
Some the “Alternatives “ in assessment referred to in chapter 1 may also enhance
washback from tests.
Principles and the guidelines to evaluate various forms of tests and procedures, be
sure to allow each one of the five to take on greater or lesser importance,
depending on the context. For example, practicality is usually more important than
washback, but the reverse may be true of a number of classroom tests.
Validity is of course always the final arbiter. These principles, important as they
are, are not the only considerations in evaluating or marking and effective test.
Exercises:
1-(I/C) Review the five principles of language assessment that are defined and
explained in this chapter. Be sure to differentiate among several types of evidence
that support the validity of tests, as well as four kinds of reliability.
2-(I/C) Do you think that consequential and face validity are appropriate
considerations in classroom-based assessment? Explain.
4-I/C) Washback is described here as a positive effect. Can tests provide negative
washback? Explain.