0% found this document useful (0 votes)
143 views131 pages

Group 6 - Issues in Foreign Language Testing

Uploaded by

Minh Nguyet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
143 views131 pages

Group 6 - Issues in Foreign Language Testing

Uploaded by

Minh Nguyet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 131

APPLIED LINGUISTICS

ISSUES IN FOREIGN
LANGUAGE TESTING
TOPIC 7 - GROUP 6 (PG45)
Members - Group 6:
1. Nguyen Thi Khanh Linh
2. Truong Anh Phuong
3. Nguyen Thi Thanh
4. Tran Le Gia Linh
5. Pham Hong Ngoc
6. Nguyen Thu Hien
CONTENT OUTLINE:
01 02 03
Testing and Different approaches Types of foreign
language testing to foreign language language test
testing

04 05 06
Practical steps in The current situation The relationships
designing a foreign and issues of English between foreign
language test language testing in language testing
schools in Vietnam and FLT
01
Testing and
Language
testing
Presented by Nguyen Thi Khanh Linh
A- Testing
Definition of terms

➹ ➹ ➹
1. Measurement 2. Test 3. Evaluation
1. MEASUREMENT
➽ the process of quantifying the characteristics of
persons according to explicit procedures and rules.

➽ three distinguishing features:


+ Quantification
+ Characteristics
+ Rules and procedures

➽ There are many different types of measures in the social


sciences, including rankings, rating scales, and tests.
2. TEST

2.1 2.2 2.3


What is a test? What is testing? What is the
purpose of
testing?

2.4 2.5
The effect of Why are tests
testing inaccurate?
What is a test?
2.1. What is a test?

- A general definition:
“a procedure intended to establish the
quality, performance, or reliability of something,
especially before it is taken into widespread use.”
(Concise Oxford English Dictionary)
- Carroll (1968) provides the following definition of a
test:
A psychological or educational test is a
procedure designed to elicit certain behavior
from which one can make inferences about
certain characteristics of an individual.
(Carroll 1968: 46)
2.1. What is a test?
- In simple terms: It is a method of measuring person’s
ability, knowledge or performance in a given domain.

Most language tests: measure one’s ability to perform


language (speak, write, read, and listen to a subset of
language).
A proficiency test: even though the actual performance
on the test involves only a sample of skills, that domain is
overall proficiency in a language- general competence in
all skills of a language.
Other tests may have very specific criteria.
E.g: A test of pronunciation might be only a test of
only a limited set of phonemic minimal pairs.
A vocabulary test may focus on only the set of
words covered in a particular lesson or unit.
2.1. What is a test?
- Method: It is an instrument, a set of
techniques, procedures or items that
requires performance on the part of the
test takers. To qualify as a test, the
method must be explicit and structured.
- Measure:
+ General ability
+ Specific competencies
2.1. What is a test?

- A test is a measurement instrument designed to elicit a


specific sample of an individual’s behavior. As one type of
measurement, a test necessarily quantifies characteristics of
individuals according to explicit procedures.
- What distinguishes a test from other types of measurement is
that it is designed to obtain a specific sample of behavior.
2.1. What is a test?

E.g: The Interagency Language Roundtable (ILR)


oral interview (Lowe 1982), is a test of speaking
consisting of:
(1) a set of elicitation procedures, including a
sequence of activities and sets of question types
and topics
(2) a measurement scale of language
proficiency ranging from a low level of ‘0’ to a high
level on which samples of oral language obtained
via the elicitation procedures are rated. Each of
the six scale levels is carefully defined by an
extensive verbal description.

https://www.researchgate.net/figure/nteragency-language-roundtable-ilr-scale-Oral-Proficiency-Interview-OPI-
descriptor_tbl1_295539760
Language tests can provide the means for more
carefully focusing on the specific language abilities
that are of interest.
➽ They could be viewed as supplemental to other
methods of measurement.

Conclusion:
+ Other measures are no less valuable than tests.
+ The value of tests lies in their capability for
eliciting the specific kinds of behavior that the test user
can interpret as evidence of the attributes or abilities
which are of interest.
2.2. What is testing?

Testing is the practice of making objective


judgements regarding the extent to which
the system meets, exceeds or fails to
meet stated objectives.
2.3. What is the purpose of testing?

➷ ➷ ➷ ➷
To measure language To discover how To diagnose students’ To assist placement
proficiency. successful students strengths and of students by
have been in weaknesses, to identifying the stage
achieving the identify what they or part of a teaching
objectives of a course know and what they programme most
of study. don’t know. appropriate to their
ability.
(Arthur Hughes and Jake Hughes, Testing for Language Teachers. Oxford University Press.)
2.4. What is the effect of testing?

➽ The effect of testing on teaching and learning is known as


backwash.
2.5. Why are tests inaccurate?
3. EVALUATION

➽ Evaluation can be defined as the systematic gathering of information for


the purpose of making decisions (Weiss 1972).

➽ Evaluation does not necessarily entail testing. By the same token, tests in
and of themselves are not evaluative.
3. EVALUATION
Comparison:

Measurement Tests Evaluation


quantification of quantification of observations qualitative descriptions
observations

provide information - elicit a specific sample of involve decision making


behavior
- used for:
+ pedagogical purposes
+ purely descriptive purposes
3. EVALUATION

➽ Not all measures are tests, not all tests are evaluative, and not all evaluation
involves either measurement or tests.
4. ESSENTIAL MEASUREMENT QUALITIES

Reliability Validity
4. ESSENTIAL MEASUREMENT QUALITIES

Reliability
Reliability is a quality of test scores, and a perfectly reliable score, or
measure, would be one which is free from errors of measurement.
(American Psychological Association 1985)

E.g: A student receives a low score on a test one day and a high score on the
same test two days later, the test does not yield consistent results, and the
scores cannot be considered reliable indicators of the individual’s ability.

➽ Reliability thus has to do with the consistency of measures across


different times, test forms, raters, and other characteristics of the
measurement context.
4. ESSENTIAL MEASUREMENT QUALITIES
Validity
The most important quality of test interpretation or use is validity, or the
extent to which the inferences or decisions we make on the basis of test
scores are meaningful, appropriate, and useful.
(American Psychological Association 1985)

E.g: If we ask students to listen to a lecture and then to write a short essay
based on that lecture, the essays they write will be affected by both their
writing ability and their ability to comprehend the lecture. Ratings of their
essays, therefore, might not be valid measures of their writing ability.

➽ In examining validity*, we must be concerned with the appropriateness


and usefulness of the test score for a given purpose.

(*validity: content validity, criterion-related validity (concurrent – predictive


validity), face validity, validity in scoring)
5. PROPERTIES OF MEASUREMENT SCALES
The scales we define can be distinguished in terms of FOUR PROPERTIES:

Distinctiveness Ordering

Absolute zero
Equal intervals point
5. PROPERTIES OF MEASUREMENT SCALES
Measurement specialists have defined FOUR TYPES OF MEASUREMENT SCALES:

Nominal Ordinal
E.g: right - wrong, E.g: first, second, third…
male - female…

Interval Ratio
5. PROPERTIES OF MEASUREMENT SCALES

➽ The test scores indicate that these individuals are not equally distant from each other on
the ability measured.
5. PROPERTIES OF MEASUREMENT SCALES
➽ Conclusion:
These different four scales are also sometimes referred to as levels of measurement.
+ The nominal scale is thus the lowest type of scale, or level of measurement,
since it is only capable of distinguishing among different categories
+ The ratio scale is the highest level, possessing all four properties and thus
capable of providing the greatest amount of information.
B - LANGUAGE
TESTING
1. What is language testing?
● Language testing lies at the core of both language teaching and applied linguistics.
(Alan Davies)
● Language testing is the practice and study of evaluating the proficiency of an
individual in using a particular language effectively.
(Winning entry from the 2009/10, Priscilla Allen, University of Washington)

● As a psychometric activity, language testing traditionally was more concerned with


the production, development and analysis of tests. Recent critical and ethical
approaches to language testing have placed more emphasis on the uses of language
tests. The purpose of a language test is to determine a person’s knowledge and/or
ability in the language and to discriminate that person’s ability from that of
others.
(Alan Davies, University of Edinburgh)
2. Uses of language tests in educational programs
qualitative (non-
measurement)

information
quantitative
(measurement)

Evaluation

value judgements,
or decisions
3. Types of decisions

Decisions

Individuals Programs

Students Teachers
02
Different approaches to
foreign language testing
Presented by Truong Anh Phuong
How do you describe the
relationship between teaching and
testing in your educational
context?
Types of Approach

Essay - translation Structuralist Integrative

Performance -
Communicative
based
Essay - translation
- It is referred to as the pre-scientific stage of language
testing.
- No requirements for testing skill
- Tests often consist of essay writing, translation and
grammatical analysis.
Essay - translation

- It is straightforward to implement and requires minimal training for


test administrators.
- It may be used for testing any level of examinees.
- It provides a broad overview of a learner's language proficiency,
encompassing writing, reading, and translation skills.
Essay - translation
- Subjective judgement of the teacher can be biased.
- The approach often fails to accurately measure real-world
language use
- Impractical for large-scale assessments
- The test have a heavy literary and cultural bias.
Structuralist
- It views language as several components that interact with
each other and form the rules of language:
Phonology: Sounds of the language
Vocabulary: Words and their meanings
Grammar: Language structure
Structuralist

Key characteristics:
- Discrete-point testing
- Objective scoring
- Emphasis on form over function
- Efficiency
Structuralist

Strengths:
● Objectivity and Reliability: allows for clear and consistent scoring criteria,
reducing rater bias - ensures a higher degree of reliability in test results.
● Efficiency: tests can be administered quickly, allowing for large-scale
assessment.
● Practicality: easy to implement and requires minimal resources.
● Diagnostic Value: provide valuable information for targeted instruction.
● Foundation for Further Development: provides a solid foundation for more
complex language tests that incorporate communicative elements.
Structuralist
Weakness:
● Lack of Communicative Validity: fails to accurately measure real-world language use -
isolates language components and neglects the communicative context in which language is
used
● Artificiality: Test items often appear artificial and unrelated to authentic language use, leading
to a mismatch between test performance and real-world language ability.
● Limited Scope: By focusing on discrete language points, the structuralist approach overlooks
higher-order language skills such as fluency, coherence, and strategic competence.
● Neglect of Context: It fails to consider the influence of cultural and social factors on language
use.
● Overemphasis on Form: The excessive focus on grammar and vocabulary can lead to a
neglect of meaning and discourse.
Integrative

It involves testing of language in context and it is concerned


primarily with meaning and total communicative effect of
discourse.
Integrative

Advantages of the Integrative Approach


● Better reflection of real-world language use: Tests align more
closely with how language is used in everyday life.
● Assessment of communicative competence: It goes beyond
grammar and vocabulary to evaluate overall language
proficiency.
● Development of language skills: Integrative tests can promote
the development of all language skills simultaneously.
Integrative

Limitations of the Integrative Approach


● Subjectivity: Scoring integrative tasks can be more subjective, leading to potential inconsistencies
among raters.
● Time-consuming: Developing, administering, and scoring integrative tests can be more time-
consuming than traditional tests.
● Practical Constraints: In some contexts, such as large-scale assessments, it may be difficult to
implement integrative tests due to resource limitations.
● Difficulty in Identifying Specific Language Problems: Because integrative tests assess language as a
whole, it can be challenging to pinpoint specific areas of weakness for individual learners.
● Limited Focus on Specific Skills: While integrative tests assess multiple skills, they may not provide
detailed information on specific language components, such as grammar or vocabulary.
Communicative

The communicative approach is intended to provide teachers with


information about the learners’ ability.

It is to perform in the target language in a certain context-specific


task.
Communicative

Key Characteristics of the Communicative Approach

● Authentic Tasks: Tests employ real-life language tasks, such as role-plays,


simulations, and problem-solving activities.
● Focus on Communicative Competence: Emphasis on assessing a learner's ability
to use language effectively in real-world contexts, rather than just grammatical

accuracy

● Integration of Language Skills: Tests often combine reading, writing, listening,


and speaking in a single task to simulate real-life communication.
Communicative

Key Characteristics of the Communicative Approach

● Meaning-Centered: Prioritizes the communication of meaning over form,


reflecting real-world language use.
● Contextualization: Tests are designed to reflect real-life situations and contexts,
making them more relevant to learners.
● Learner-Centered: The focus is on the learner's ability to use language to achieve
communicative goals, rather than simply demonstrating knowledge of language
rules.
● Dynamic Assessment: Often involves interaction between the tester and the test-
taker, allowing for assessment of language development over time.
Communicative

Requirements for communicative approach testing


(according to H. Douglas Brown, in his book "Language Assessment: Principles and Classroom Practices" (2005))

● Meaningful communication
● Authentic situation
● Unpredictable language input
● Creative language output
● Integrated language skills
Performance - based
Performance-based assessment believes students will learn best when
they are given a chance to perform and show what they know
according to their own :

★ plan
★ draw conclusions
★ collect data
★ take a stand or deliver presentations.
★ infer patterns
Performance - based
Principles (Approach According to Brown (2004))
● Authenticity: Assessments should closely mirror real-world language use, providing learners with opportunities
to demonstrate their abilities in authentic contexts.
● Complexity: Tasks should require learners to integrate multiple language skills and engage in higher-order
thinking processes, challenging them to apply their knowledge in meaningful ways.
● Contextualization: Language should be assessed within its social and cultural context to reflect authentic
communication.
● Learner-centeredness: Assessments should focus on the learner's ability to use language for communicative
purposes, rather than simply testing isolated language skills.
● Reliability and Validity: While performance-based assessments can be more challenging to score reliably,
efforts should be made to develop clear scoring rubrics and train raters to ensure consistency and accuracy.
● Washback: Assessments should positively influence teaching and learning by providing feedback that helps
learners improve their language skills.
Performance - based
Strengths
● Authenticity: It closely reflects real-world language use, providing a more valid
assessment of learners' communicative competence.
● Holistic assessment: Evaluates multiple language skills simultaneously, providing a
comprehensive picture of language proficiency.
● Focus on process: Allows for evaluation of the learner's thinking and problem-solving
skills, beyond just the final product.
● Motivation: Engaging tasks can increase learner motivation and interest in the
assessment process.
● Formative assessment: Can provide valuable feedback to learners and teachers for
improvement.
● Differentiation: Can accommodate learners with diverse learning styles and abilities.
Performance - based
Challenges of the Performance-Based Approach
● Complexity and Time-Consuming: Developing and administering
performance-based assessments can be demanding.
● Subjectivity: Scoring can be influenced by rater bias and
inconsistency.
● Practical Constraints: Implementing performance-based
assessment on a large scale can be challenging due to resource
requirements.
03
Types of foreign language tests

Presented by Nguyen Thi Thanh


MAIN POINTS

Language test Language test


purposes Types of foreign features
language tests
Language test purposes

↪ to measure proficiency
↪ to measure achievement
↪ to diagnose linguistic strengths and
weaknesses
↪ to help place students in appropriate classes
1. Proficiency
Final achievement
tests
tests
2. Achievement
tests
Types of Progress
3. Diagnostic
foreign tests
achievement
tests
language tests 4. Placement
tests
5. Screening
tests
(Arthur Hughes, Jake Hughes ,2004)
1. Proficiency tests
Purposes:
- To assess an individual's ability in a foreign language, independent of any prior
training or specific course content.
- To determine whether a candidate can perform adequately in the language for
specific situations, such as academic study or professional roles.

Characteristics:
- To have detailed specifications that successful candidates have demonstrated
that they can do.
- Be not based on courses that candidates may have previously taken.
- All users of a test (teachers, students, employers, etc.) can then judge whether
the test is suitable for them, and can interpret test results.
2. Achievement tests
Purpose:
- To measure how well students have met the objectives of a specific language
course, focusing on individual or group performance.

Types:

● Final Achievement Tests: Administered at the end of a course, these tests assess
whether course objectives have been met. They can be based on a course
syllabus or objectives, though aligning them with course objectives generally
yields more accurate outcomes.
● Progress Achievement Tests: These tests gauge student progress towards course
objectives often using short-term objectives to measure improvement over time.
3. Diagnostic tests
Purposes:
- To identify students' strengths and weaknesses in language skills,
- To guide future instruction.
- To ascertain specific areas where learners struggle, facilitating targeted
teaching.

Challenges: Creating detailed diagnostic tests can be complicated, as accurately


assessing nuanced grammatical knowledge requires extensive assessment items.
Nonetheless, tools like DIALANG have emerged to provide broader diagnostic insights.

DIALANG is an online language diagnosis system, which offers versions in fourteen


European languages, each having five modules: reading, writing, listening,
grammatical structures, and vocabulary.
4. Placement tests
Purposes:
- To assign students to appropriate levels within a language program based on
their current abilities.
- To ensure that each student is placed in a learning environment suited to their
skills.

Characteristics: Successful placement tests are often customized for specific


educational contexts rather than generic, commercial tests. While they can be
enhanced by brief interviews to gather additional context about the student, their
primary function is to determine class placement quickly.
5. Screening tests

Purposes:
- To provide preliminary assessments to determine if students qualify to
take more extensive tests, thus saving time and resources.

Implementation: a simpler multiple-choice screening test might be


administered before a more complex proficiency test, helping to filter
students more effectively.
Important features of language tests

1. Direct versus indirect testing

2. Discrete point versus integrative testing

3. Norm-referenced versus criterion-referenced


testing

4. Objective testing versus subjective testing


1. Direct versus indirect testing

Direct Testing: Measures candidates' performance in skills directly (e.g.,


speaking, writing). This method captures the actual use of language. Direct
tests are often favored for proficiency and achievement assessment due to
their straightforward nature.

Indirect Testing: Assesses underlying skills (e.g., grammar and vocabulary)


without requiring direct application, often yielding less reliable results
regarding practical language abilities.
2. Discrete point versus integrative testing
Discrete Point Testing: Focuses on one language element at a time, typically
using multiple-choice formats.

Integrative Testing: Requires combining various language elements to


complete tasks (e.g., writing, conversing), fostering a more realistic
assessment of language use. Most proficiency tests tend to lean towards
integrative testing.

Discrete point tests will almost always be indirect, while integrative tests will tend to
be direct.

Diagnostic tests of grammar of the kind referred to in an earlier section will tend to be
discrete point.
3. Norm-referenced versus criterion-reference testing

Norm-Referenced: Compares a test taker’s performance against that of


peers.

Criterion-Referenced: Evaluates a candidate’s performance based on


predefined standards, indicating specific language capabilities rather than
relative standing among peers. Contemporary tests like IELTS increasingly
adopt a criterion-referenced approach to provide specific benchmarks for
language proficiency.
4. Objective testing versus subjective testing

Objective Testing: Results are based on clearly defined correct answers (e.g.,
multiple-choice).

Subjective Testing: Evaluation involves personal judgment by the scorer,


often seen in written assessments or oral presentations.

A balance between objective and subjective methodologies often enhances


test reliability and validity.
Means of test delivery

1. Paper-and-Pencil Tests

2. Face-to-Face Tests

3. Computer-Based Tests
Sample analysis
An ielts test
04
Practical steps in
designing a foreign
language test
Presented by Tran Le Gia Linh
Language tests must be developed using a rational series of 3 steps that
connect the observed performance to the supposed skill or construct
(Bachman, 1991):

Determining and
conceptually defining Creating protocols for
the construct calculating observations

Determining the
construct's operational
definition
1. Determining and conceptually defining the construct

Measuring a given language ability is to distinguish the construct


teachers wish to measure from other similar constructs clearly,
precisely, and unambiguously.
This can be accomplished by determining what specific
characteristics are relevant to the given construct.
2. Determining the construct's operational definition

Elicitation of the sort of performance that will reveal the extent to


which the given construct is present in the individual must be
decided. Relevant operations will be suggested by the theoretical
specification itself.
The procedures are also influenced by the environment of the
language exam.
2. Determining the construct's operational definition

- Reflecting both the theoretical definition and the context of language


use, these operations or tests serve as the construct's operational
definitions.
- To be a reliable measuring tool, an operational definition must
provoke linguistic performance in a consistent manner and under
uniform circumstances.
2. Determining the construct's operational definition

Almost all language examinations


have variations in testing
methodologies. When establishing
tests, it is important to outline the
testing technique in detail to reduce
inconsistencies and ensure that the
CREDITS: This presentation template was created by Slidesgo, and
results
includes iconsaccurately reflect
by Flaticon, and infographics thebylinguistic
& images Freepik

abilities being assessed.


3. Creating protocols for calculating observations

- Physical attributes (height and weight) may be measured and


compared directly to established standard scales.
- When measuring mental constructs, however, observations are
indirect, and no such standards exist to define the units of
measurement.
Language tests can be measured in 2 ways: as points or levels of
performance on a scale.
Some practical steps to test construction (Davies,
1990)

- Assessing clear, unambiguous objectives


- Drawing up test specifications
- Devising test tasks
- Designing multiple-choice test items
- Designing Multiple-Choice Test Items
Assessing clear, unambiguous objectives

Taking a careful look at everything that teachers think the students


should “know” or be able to “do” based on the material that the
students are responsible for.
The objectives may already be stated clearly in performance terms,
or teachers may have to go back through a unit and formulate them.
Each objective is stated in terms of the performance elicited and the
target linguistic domain and cannot possibly be tested for every
single one of them.
=> Need to choose a possible subset of the objectives to test.
Drawing up test specifications

The specifications comprise a broad outline, evaluated skills,


and item types and tasks. They give an indication of the topics
that the teachers will cover, the implied elicitation and
response formats for items, the number of items in each
section, and the time to be allocated for each. Additionally, a
plan for scoring and assigning relative weight to each section
and each item within is also included.
Devising test tasks
1. Are the directions to each section absolutely clear?
2. Is there an example item for each section?
3. Does each item measure a specified objective?
4. Is each item stated in clear, simple language?
5. Does each multiple-choice item have appropriate distractors, that is, are the
wrong items clearly wrong and yet sufficiently “alluring” that they aren’t
ridiculously easy?
6. Is the difficulty of each item appropriate for our students?
7. Is the language of each item sufficiently authentic?
8. Do the sum of the items and the test as a whole adequately reflect the learning
objectives?
Designing multiple-choice test items

Hughes (2003, p.76-78) cautions against a number of weaknesses


of multiple-choice items:
➔ The technique tests only recognition knowledge.
➔ Guessing may have a considerable effect on test scores.
➔ The technique severely restricts what can be tested.
➔ It is very difficult to write successful items.
➔ Washback may be harmful.
➔ Cheating may be facilitated.
The 2 principles are practicality and reliability. If teachers’
objective is to design a large-scale standardized test for repeated
administrations, then a multiple-choice format does indeed
become viable.
1. Design each item to measure a specific objective.
2. State both stem and options as simply and directly as possible.
3. Make certain that the intended answer is clearly the only
correct one.
4. Use item indices to accept, discard, or revise items.
Stages of test development (Hughes, 2004)
1. Make a full and clear statement of the testing ‘problem’.
2. Write complete specifications for the test.
3. Write and moderate items.
4. Trial the items informally on native speakers and reject or modify
problematic ones as necessary.
5. Trial the test on a group of non-native speakers similar to those for
whom the test is intended.
6. Analyze the results of the trial and make any necessary changes.
7. Calibrate scales.
8. Validate.
9. Write handbooks for test takers, test users, and staff. 10. Train any
necessary staff (interviewers, raters, etc.).
10. Train any necessary staff.
1. Stating the problem
(i) What kind of test is it to be? Achievement (final or progress),
proficiency, diagnostic, or placement?
(ii) What is its precise purpose?
(iii) What abilities are to be tested?
(iv) How detailed must the results be?
(v) How accurate must the results be?
(vi) How important is backwash?
(vii) What constraints are set by the unavailability of expertise and
facilities (and for construction, administration, and scoring)?
→ Once the problem is clear, steps can be taken to solve it.
2. Writing specifications for the test

A set of specifications for the test must be written at the outset. This will
include information on:
- Content
- Test structure, timing, medium/channel, techniques to be used
- Criterial levels of performance
- Scoring procedures.
(i) Content
This refers to the entire potential content of any number of versions.
→ The fuller the information on content, the less arbitrary should be the
subsequent decisions as to what to include in the writing of any version of
the test.
- The way in which content is described will vary with its nature.
(ii) Structure, timing, medium/channel, and techniques

The following should be specified:


Test structure: What sections will the test have and what will be tested in
each?
(iii) Criterial levels of performance
The required level(s) of performance for different levels of success should be specified. For
speaking or writing, one can expect a description of the criteria level to be much more
complex.
Example: The handbook of the Cambridge Certificates in Communicative Skills in English
(CCSE) specifies the following degree of skill for the award of the Certificate in Oral
Interaction at level 2:
Accuracy: Pronunciation must be clearly intelligible, even if still obviously influenced by L1.
Grammatical/lexical accuracy is generally high, although some errors that do not destroy
communication are acceptable.
Appropriacy: The use of language must be generally appropriate to function. The overall
intention of the speaker must be generally clear.
Range: A fair range of language must be available to the candidate. Only in complex
utterances is there a need to search for words.
Flexibility: There must be some evidence of the ability to initiate and concede a conversation
and to adapt to new topics or changes of direction.
Size: Must be capable of responding with more than short-form answers where appropriate.
Should be able to expand simple utterances with occasional prompting from the interlocutor.
(iv) Scoring procedures

Where scoring will be subjective.


→ The test developers should be clear as to how they will achieve high
reliability and validity in scoring. What rating scale will be used? How
many people will rate each piece of work? What happens if two or more
raters disagree about a piece of work?
3. Write and moderate items.

(i) Sampling
- Everything found under the heading of ‘Content’ in the specifications
can be covered by the items in any one version of the test. Choices have
to be made.
- For content validity and for beneficial backwash: choose widely from
the whole area of content, sample widely and unpredictably, although
one will always wish to include elements that are particularly important.
(ii) Writing items
- It is no use writing ‘good’ items if they are not consistent with
the specifications, try to look at it through the eyes of test takers
and imagine how learners might misinterpret the item (in which
case it will need to be rewritten).
- Mention of the intended response is a reminder that the key to
an item is an integral part of it. An item without a key is
incomplete.
- The best way to identify items that have to be improved or
abandoned is through the process of moderation.
(iii) Moderating items
Moderation is the scrutiny of proposed items by (ideally) at least
two colleagues, neither of whom is the author of the items being
examined.
→ Learners’ task is to try to find weaknesses in the items and,
where possible, remedy them. Where successful modification is
not possible, they must reject the item.
4. Informal trialling of items on native speakers

- Items that have been through the process of moderation should be


presented in the form of a test (or tests) to a number of native speakers—
twenty or more, if possible.
+ The ‘test’ can be taken in the participants’ own time.
+ The native speakers should be similar to the people for whom the test is
being developed, in terms of age, education, and general background.
+ ‘Experts’ are unlikely to behave in the way of native test takers that is
being looked for.
- Items that prove difficult for the native speakers almost certainly need
revision or replacement.
5. Trial the test on a group of non-native speakers
similar to those for whom the test is intended.
- Those items that have survived moderation and informal trialing on
native speakers should be put together into a test, which is then
administered under test conditions to a group similar to that for
which the test is intended. Problems in administration and scoring
are noted.
- In some situations, a group for trialing may simply not be available.
In other situations, although a suitable group exists, it may be
thought that the security of the test might be put at risk.
→ It is often the case that faults in a test are discovered only after it
has been administered to the target group.
6. Analyze the results of the trial and make any
necessary changes.

- Statistical analysis: reveal qualities (such as reliability) of the test as a


whole and of individual items (for example, how difficult they are, how well
they discriminate between stronger and weaker candidates).
- Qualitative analysis: responses should be examined in order to discover
misinterpretations, unanticipated but possibly correct responses, and any
other indicators of faulty items.
7. Calibrate scales

- Where rating scales are going to be used for oral testing or the
testing of writing, these should be calibrated.
- Collecting samples of performance (for example, pieces of
writing) that cover the full range of the scales.
- A team of ‘experts’ then looks at these samples and assigns
each of them to a point on the relevant scale. The assigned
samples provide reference points for all future uses of the scale,
as well as being necessary training materials.
8. Validate

- For a high-stakes or published test, this should be regarded as essential.


- For relatively low-stakes tests that are to be used within an institution,
this may not be thought necessary, although where the test is likely to be
used many times over a period of time, informal, small-scale validation is
still desirable.
9. Write handbooks for test takers, test users, and
staff.
★ The rationale for the test;
★ An account of how the test was developed and validated;
★ A description of the test (which may include a version of the
specifications);
★ Sample items (or a complete sample test);
★ Advice on preparing for taking the test;
★ An explanation of how test scores are to be interpreted;
★ Training materials (for interviewers, raters, etc.);
★ Details of test administration.
10. Train any necessary staff ( such as interviewers,
raters, etc.).
Using the handbook and other materials, all staff who will be
involved in the test process should be trained. This may include
interviewers, raters, scorers, computer operators, and invigilators
(proctors).
05
The current situation and issues of
English language testing in schools
in Vietnam
Presented by Pham Hong Ngoc
5.1. Assessment
5.1.1. Definition:

Payne (2003)
Airasian (2010)
“the process of collecting, synthesizing, “the interpretive integration of application tasks
and interpreting information to aid in (procedures) to collect objectives-relevant
decision making” information for educational decision making and
communication about the impact of the teaching-
learning process”

Assessment is a form of collecting data which has meaning when


making judgments on students’ learning.
5.1. Assessment
5.1.2. Types of assessment:
Diagnostic Formative Summative
assessment assessment assessment

Identify students’ - Provide feedback and Take place after the learning
current knowledge of a information during the has been completed and
subject, their skill sets instructional process, provides information and
and capabilities, and to while learning is taking feedback that sums up the
clarify misconceptions place teaching and learning
before teaching takes process.
place. - Measure both
students and teachers’
progress.
?
What forms of diagnostic, formative and summative
assessment do you know?
Diagnostic Formative Summative
assessment assessment assessment

● Pre-tests ● Observations ● Examinations


● Self-assessments ● Homework exercises ● Term papers
● Discussion board ● Reflection journals ● Projects
responses ● Question and answer ● Portfolios
● Interviews sessions ● Performances
● Conferences ● Student evaluation
● In-class activities of the course
● Student feedback ● Instructor self-
evaluation
5.1. Assessment
5.1.3. English assessment in Vietnam
In the mainstream K-12 public schools,:

● Teachers tend to be responsible for low-


stakes assessment.
● High-stakes assessment is managed by
either schools or local departments of
education and training (LDOETs).
5.1. Assessment
5.1.3. English assessment in Vietnam
In specialising or gifted high schools:

Teachers tend to have more autonomy over


instruction and assessment to prepare students
for district, provincial, regional and national
excellent student competitions.
5.1. Assessment
5.1.3. English assessment in Vietnam
At the tertiary level:
Formal assessment tasks such as quizzes,
portfolios, presentations, and projects are
optional or non-existent in non-major programs,
such classroom-based assessment is highly
valued and constitutes substantial weighting in
English major programs.
5.1. Assessment
5.1.3. English assessment in Vietnam
- The striking presence of international
tests - IELTS, TOEFL and TOEIC:

● The country’s global integration and


economic growth
● Overseas studies
● A means to fulfill English requirements
for graduation from universities and
employment in the private sector.

- Two other ETS products: TOEFL Primary


and TOEFL Junior,
5.1. Assessment
5.1.3. English assessment in Vietnam
VSTEP:
● Stand for Vietnamese Standardised Test
of English Proficiency.

● Meet English requirements for


graduation from undergraduate courses,
entry into postgraduate programs,
English teacher registration, and
admission into the state tenure.
5.2. The current situation and issues of English
language testing in Vietnam
❖ A big gap between theory and practice:
- Vietnamese authorities have affirmed the goal of instruction as to develop students’ ability
to communicate in English (Vietnamese Government, 2008) and demanded a shift from
knowledge-based to competence-based assessment (MOET, 2014b) with speaking and
listening assessment explicitly mentioned in various documents such as Circular 32 and
Dispatch 5333.
- In practice:
● In primary schools: English - elective subject,
● In some lower secondary and upper secondary schools in cities and affluent areas, and
foreign languages specialized upper secondary schools: Communicational tests
● In the remaining majority of the schools across the country, especially upper secondary
schools: Non-communicative tests (testing students’ linguistic competence in phonology,
lexis, grammar and one communicative skill - reading skill).

(Hoang, 2017)
?

What are the reasons why we can not combine listening and
writing skills into exams until now?
● Time-consuming
● Costly - human
resource + financial
resource
Unrealistic
The subjectivity of the
examiners while marking
candidates’ speaking skills
Reasons

English language
proficiency is different by
regions/ areas.
Unreasonable

Writing (paragraph writing


and essay writing) is rarely
taught properly in schools.
5.2. The current situation and issues of English
language testing in Vietnam

❖ The mode of test design is monotonous (Hoang, 2017)


All test items are designed in the multiple-choice mode. It cannot test all the knowledge
and skills of the English language, because “Many of the elements of any language course
may not be testable in the most objective test types, such as multiple-choice, true-false
and matching” (Brown, [18, 31]).

The tests:

- Lose some features of the criterion of validity


- Create undesired negative backwash effects on classroom teaching and learning
- Challenge the communication goal of foreign language education in schools in
Vietnam.
5.2. The current situation and issues of English
language testing in Vietnam

❖ Policymakers’ endorsement of formative and alternative


assessment vs. Dominance of summative tests
- Summative tests have continued to predominate English assessment practices.
- Test results are not often accompanied by teacher feedback or used to guide subsequent
instruction. There is a serious lack of self- and peer- assessment and group-based
assessment as well as little attempt to provide feedback to students, to link the test scores
to learning outcomes, and use test results to enhance teaching and learning.
5.2. The current situation and issues of English
language testing in Vietnam

❖ Domestic vs. International standardised tests

- VSTEP tests have faced and will possibly continue to encounter fierce competition from
other established international counterparts such as IELTS, TOEFL, and TOEIC to name just
a few.
- There has been a persistent lack of trust in domestic tests among tertiary institutions
(Nguyen & Gu, 2020) and in the private business sector (Huyen, 2020).
- On a macro level, MOET’s push for benchmarking students and institutions’ performance
against international standards has further motivated some universities to designate
international tests as their preferred exit tests (Nguyen et al., 2020; Tran, 2015) and even
to offer priority admission to students with IELTS and TOEFL scores (Le, 2020).
5.2. The current situation and issues of English
language testing in Vietnam

❖ Domestic vs. International standardised tests

- VSTEP tests have faced and will possibly continue to encounter fierce competition from
other established international counterparts such as IELTS, TOEFL, and TOEIC to name just
a few.
- There has been a persistent lack of trust in domestic tests among tertiary institutions
(Nguyen & Gu, 2020) and in the private business sector (Huyen, 2020).
- On a macro level, MOET’s push for benchmarking students and institutions’ performance
against international standards has further motivated some universities to designate
international tests as their preferred exit tests (Nguyen et al., 2020; Tran, 2015) and even
to offer priority admission to students with IELTS and TOEFL scores (Le, 2020).
5.3. Implications and solutions:
Assessment reforms
The Vietnamese authorities should involve:
should not view high-stakes teachers + other
standardised testing as a low- stakeholders (students,
cost instrument to implement parents)
English education reforms

Classroom-based assessment (CBA)


needs to play a more prominent role; ● Ensure the VSTEP suite’s
teachers should be given opportunities to compliance with strict technical
standards expected of high-
develop their assessment literacy during
pre-service education and in-service stakes standardised tests such
professional development activities. as IELTS and TOEFL.
● Enhance the public’s perception
and uptake of VSTEP tests.
06
Relationship between
foreign language testing
and FLT
Presented by Nguyen Thu Hien
- Teaching and testing are so closely interrelated
- Tests, especially high-stakes ones, influence test takers,
teachers,...

-> washback effect (negative and positive)


Positive washback
- Tests are well-aligned with educational goals and reflect a
broad range of language skills

E.g: a test emphasizes communicative competence,


-> teachers incorporates more interactive,
communicative activities
-> a more holistic language learning experience(
students are prepared not just for the test, but for real-
world language use)
-> encouraged innovative teaching methods: task-based
learning, project-based learning and promoted a more
interactive and communicative classroom environment
2006 curriculum, MOET 2018 curriculum, MOET
Positive washback
- Teachers receive feedback on the students’ weaknesses
and strengths, and their progress and achievement in the
course
You will never know how valid your theory is unless you
systematically measure the success of your learners – the
success of your theory – in practice (Brown, 1987)
Negative washback

- Narrowing of Content

Teachers restricts classroom activities to those directly


related to the test
-> limit students' exposure to authentic language
materials:real-life conversation,....

-> less room for innovative teaching methods or


exploration of topics
Negative washback

- Focus on Test Preparation

Teachers adopt instructional strategies that prioritize test preparation

-> focusing heavily on the format and timing of the test

-> improve test scores, they may not foster genuine


language learning
Negative washback

- Reduced Autonomy
- Teachers feel pressured to follow a rigid curriculum designed to
maximize test scores
-> reduce their motivation to adopt new teaching methods or
tailor lessons to the specific needs of students

-> stress and burn-out


We cannot expect testing only to follow teaching.
It should be supportive of good teaching and
exerts a corrective influence on bad teaching.
QUIZ
&
DISCUSSION
*DISCUSSION QUESTIONS:
1. How can you ensure that your test is free from bias and cultural
sensitivity issues?
2. How will you ensure a balance of question types (e.g., multiple choice,
essay, practical)?
3. How can you ensure that your test items are clear, concise, and
unambiguous?
4. What types of data can be collected to evaluate test effectiveness?
REFERENCES
Bachman, L. F. (1991). Fundamental Considerations in Language Testing. Oxford: Oxford University
Press.
Bachman, L. F. & A. S. Palmer. (1996). Language Testing in Practice: Designing and Developing Useful
Language Tests. Oxford: Oxford University Press.
Brown, H. D. (1987). Principles of language learning and teaching (2nd ed.). Prentice Hall.
Davies, A. (1990). Principles of Language Testing. Oxford: Basil Blackwell.
Hoang Van Van (2017). The 2016 National Matriculation and General Certificate of Secondary
Education English Test: A Challenge to the Goal of Foreign Language Education in Vietnamese
Schools. VNU Journal of Educational Research, 33(4), 1-16. https://doi.org/10.25073/2588-
1159/vnuer.4118
Hughes, A. (2004). Testing for Language Teachers. Oxford: Oxford University Press.
Hughes, A. and Hughes. J (2020), Testing for Language Teachers. Cambridge University Press.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy