0% found this document useful (0 votes)
16 views31 pages

3P. Principles of Asssessment

The document outlines the principles of language assessment, emphasizing the importance of practicality, validity, reliability, authenticity, and washback in constructing effective tests. It explains how each principle impacts the assessment process and provides guidelines for teachers to evaluate and create assessments that accurately measure student achievement. Additionally, it highlights the significance of aligning tests with real-world tasks and ensuring positive influences on teaching and learning.

Uploaded by

Cristina Salazar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views31 pages

3P. Principles of Asssessment

The document outlines the principles of language assessment, emphasizing the importance of practicality, validity, reliability, authenticity, and washback in constructing effective tests. It explains how each principle impacts the assessment process and provides guidelines for teachers to evaluate and create assessments that accurately measure student achievement. Additionally, it highlights the significance of aligning tests with real-world tasks and ensuring positive influences on teaching and learning.

Uploaded by

Cristina Salazar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

PRINCIPLES OF

LANGUAGE
ASSESSMENT
What is Assessment …

?
Assessment is …

 Assessment is one
component of teaching and
activities.
 By doing assessment,
teachers can hopefully gain
information about every
aspects of their students
especially their
achievement.
 An aspect that plays crucial
role in assessment is tests.
(Taken from Brown, H.D. 2004.
 A good test is constucted Language Assessment: Principles
by considering the and Classroom Practices. New
principles of language York: Pearson Education p. 5)
assessment.
They are ….

 Practicality

 Validity

 Reliability

 Authenticity

 Washback/Backwash
Practicality

 Practicality can be simply defined as the relationship


between available resources for the test, i.e. human
resources, material resources, time, etc. and resources
which will be required in the design, development, and use
of the test (Bachman & Palmer, 1996:35-36).

 Brown (2004:19) defines practicality is in terms of:


1) Cost
2) Time
3) Administration
4) Scoring / Evaluation
Cost

 The test should not be too expensive to conduct.


 The cost for the test has to stay within the budget.
 Avoid conducting a test that requires excessive
budget.

What do you think if a teacher conducts a “Proficiency


Test” for one class consisting of 30 students in Upper
Intermediate Level that spend $ 500 for every
student?
Is it practical in term of cost?
Time

 The test should stay within appropriate time


constraints.
 The test should not be too long or too short.

What do you think if a teacher wants to conduct a test


of language proficiency that will take a student ten
hours to complete?
Is that practical in term of time?
Administration

 The test should not be too complicated or complex


to conduct.
 The test should be quite simple to administer.

What do you think if a teacher in remote area whose


students know nothing about computer conducts a test
which requires the test-takers to at least know how to
interact with the computer in order to be able to
complete the test?
Is it practical in term of administration?
Scoring / Evaluation

 The scoring/evaluation process should fit into the


time allocation.
 A test should be accompanied with scoring rubrics,
key answers, and so on to make it easy to
score/evaluate.

What do you think if a teacher conducts a test that will


take students a couple minutes to complete and take
the teacher several hours to score/evaluate?
Is it practical in term of scoring/evaluation?
Validity

 Validity does not reside in the test or any


assessment method itself, but in the …
 “…validity of interpretations of test/assessment
outcomes. Are the interpretations, decisions and
actions based on the assessment justified by the
evidence collected, and are the interpretations,
decisions and actions supported by what is the
current theoretical view of the knowledge, skills,
competencies, attitudes etc.?”
Järvinen (p. 108)
Validity

 Validity of a test is the extent, to which it exactly measures


what it is supposed to measure (Hughes, 2003:26).
 A test must aim to provide a true measure of the particular
skill which it is intended to measure not to the extent that
it measures external knowledge and other skills at the
same time (Heaton, 1990:159).

For example, if a student is given a reading test about the


metamorphosis of a butterfly, a valid test will measure the
reading ability (such as identifying general or specific
information of the text) not his/her prior knowledge (biology)
about the metamorphosis of a butterfly. The test should
make the student rely on his/her reading ability to complete
the test.
They are at least three ways to establish
validity:

1. Content Validity
2. Construct Validity
3. Face Validity
Content Validity

 The correlation between the contents of the test and


the language skills, structures, etc. with which it meant
to be measured has to be crystal clear.
 The test items should really represent the course
objective.

What do you think if a listening test requires students to


read passages to complete instead of requiring students to
listening attentively?
Does the test have content validity?
Direct testing: test itself
Indirect testing: not performing a task
Is the test fully representative of what it aims to measure?
Construct Validity

 Construct validity refers to concepts or theories


which are underlying the usage of certain ability
including language ability.
 Construct validity shows that the result of the test
really represents the same construct with the ability
of the students which is being measured
(Djiwandono, 1996:96).
 “Communicative Competence” interrelation and
correlation between oral production and the
behavior.
Example: Does the test measure the concepts that it’s
intended to measure?
Face Validity

 A test is said to have face validity if it looks to other testers,


teachers, moderators, and students as if it measures what
it is supposed to measure (Heaton, 1990:159).
 In a speaking test, for instance face validity can be shown
by speaking activities as the main activities in the test. The
test should focus on students activities in speaking, not
anything else.
 The test can be judged to have face validity by simply
looking at the items of the test.
 Note that face validity can affect students in doing the test
(Brown, 2004:27 & Heaton, 1988:160).
 To overcome this, the test constructor has to consider
these:
a. Students will be more confident if they face a well-
constructed, expected format with familiar tasks.
b. Students will be less anxious if the test is clearly
doable within the allotted time limit.
c. Students will be optimistic if the items are clear
and uncomplicated (simple).
d. Students will find it easy to do the test if the
directions are very clear.
e. Students will be less worried if the tasks are related
to their course work (content validity).
f. Students will be at ease if the difficulty level
presents a reasonable challenge.

Does the content of the test appear to be suitable to its


aims?
Reliability

 Reliability refers to the consistency of the


scores obtained (Gronlund, 1977:138).
 It means that if the test is administered to the
same students on different occasions (with no
language practice work taking place between
these occasions) then it produces (almost) the
same results.
 Reliability actually does not really deal with the
test itself. It deals with the results of the test.
The test results should be consistent.
Take a look on two scores below!
Which one is more reliable?
Note the size of difference between the two
scores for each students.

Scores on Test A Scores on Test B

Taken from Hughes, A. 2003. Testing for Language Teachers 2nd Edition.
Cambridge: Cambridge University Press p.37
Here some characteristics:

PSYCHOLOGICAL FACTORS

SCORING CRITERIA(unclear, carelessness)

BIAS (not favorites, same opportunities)

CARELESSNESS (well stated)


Reliability

… refers to the extent to which the test (or any


assessment method) produces consistent results. For
example, a test is reliable if it produces the same results
when taken by the same students on several occasions.
Another indicator of reliability is if the same scores are
given by different markers/raters.

Time 1 time 2
Authenticity
 Authenticity deals with the “real world”.
 Authenticity is the degree of correspondence of the
characteristics of a given language test task to the features
of a target language task Brown (2004:28).
 Teachers should construct a test with the test items are
likely to be used or applied in the real contexts of daily life.
 Brown (2004:28) also proposes considerations that might be
helpful to present authenticity in a test. They are:
1. The language in the test is natural as possible.
2. Items are contextualized rather than isolated.
3. Topics are meaningful (relevant, interesting) to the
learners.
4. Some thematic organization to items is provided, such as
through a story or episode.
5. Tasks represent, or closely approximate, real-world tasks.
Authentic assessment
 “a form of assessment in which students are
asked to perform real-­world tasks that
demonstrate meaningful application of
essential knowledge and skills.”

 Student performance on a task is typically


scored on a rubric to determine how
successfully the student has met specific
standards.

Mueller, 2011
Authentic assessment
TRADITIONAL AUTHENTIC
ASSESSMENT ASSESSMENT
Selecting a response Performing a task
Contrived Real-life.
Recall/recognition Construction/
application
Teacher-structured Student-structured
Indirect evidence Direct evidence
Which of these is NOT an
authentic assessment task?

a. Preparing a meal (on a cooking course)


b. Driving a car in traffic.
c. Doing a multiple-choice quiz.
d. Doing an experiment and recording the
results.
e. Playing a piece of music on the piano.
f. Writing a letter to the local
government about an environmental
issue in the students’ neighbourhood.
Authenticity

 To be represented in a real context.


 Presented in the following way:
 Natural language
 Contextualized items not isolated
 Interesting for students.
 Topics connected and organized
 Appropriate to real world.
 All skills are included
Washback/Backwash
 The term wasback is commonly used in applied
linguistics. it is rarely found in dictionaries.
 However, the word backwash can be found in
certain dictionaries and it is defined as “an effect
that is not the direct result of something” by
Cambridge Advanced Learner’s Dictionary.
 In dealing with principles of language assessment,
these two words somehow can be interchangeable.
 Washback (Brown, 2004) or Backwash (Heaton,
1990) refers to the influence of testing on teaching
and learning.
 The influence itself can be positive or negative
(Cheng et al. (Eds.), 2008:7-11)
Positive Washback
 Positive washback has beneficial influence on
teaching and learning. It means teachers and students
have a positive attitude toward the examination or
test, and work willingly and collaboratively towards its
objective (Cheng & Curtis, 2008:10).
 A good test should have a good effect.
 For example, UN (National Examination) will require
students to pay attention to the lessons more
attentively, prepare everything dealing with UN more
thoroughly, learn the lessons by heart, and so on. UN
will also require teachers to teach the lessons harder
than before, give their students extra lessons, and
give tips and tricks to study effectively and efficiently.
To the extent that these activities increase such
activity and motivation, the UN can be said it has
positive backwash.
Negative Washback
 Negative washback does not
give any beneficial influence
on teaching and learning
(Cheng and Curtis, 2008:9).
 Tests which have negative
washback is considered to
have negative influence on
teaching and learning.
Conclusion

 A test is good if it contains practicality,


good validity, high reliability,
authenticity, and positive washback.
 The five principles provides guidelines
for both constructing and evaluating the
tests.
 Teachers should apply these five
principles in constructing or evaluating
tests which will be used in assessment
activities.
References:

Alderson, J.C., Caroline C., Dianne W. 1995. Language Test Construction and
Evaluation. Cambridge: Cambridge University Press
Bachman, L.F., and Palmer, A.S. 1996. Language Testing in Practice: Designing and
Developing Useful Language Tests. New York: Oxford University Press
Brown, H.D. 2004. Language Assessment: Principles and Classroom Practices. New
York: Pearson Education
Brown, H.D. 2007. Teaching by Principles: An Interactive Approach to Language
Pedagogy. New York: Pearson Education
Cheng, L., Yoshihori W., and Andy C. (Eds.). 2008. Washback in Languge Testing:
Research Contexts and Methods. New Jersey: Lawrence Erlbaum Associates
Djiwandono, M.S. 1996. Tes Bahasa dalam Pengajaran. Bandung: ITB Bandung
Fulcher, G. and Davidson F. 2007. Language Testing and Achievement: An
Advanced Resource Book. New York: Routledge
Gronlund, N.E. 1977. Constructing Achievement Tests. New Jersey: Prentice-Hall Inc.
Heaton, J.B. 1990. Writing English Language Tests. New York: Longman
Hughes, A. 2003. Testing for Language Teachers 2nd Edition. Cambridge:
Cambridge University Press

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy