Group 6 - Issues in Foreign Language Testing
Group 6 - Issues in Foreign Language Testing
ISSUES IN FOREIGN
LANGUAGE TESTING
TOPIC 7 - GROUP 6 (PG45)
Members - Group 6:
1. Nguyen Thi Khanh Linh
2. Truong Anh Phuong
3. Nguyen Thi Thanh
4. Tran Le Gia Linh
5. Pham Hong Ngoc
6. Nguyen Thu Hien
CONTENT OUTLINE:
01 02 03
Testing and Different approaches Types of foreign
language testing to foreign language language test
testing
04 05 06
Practical steps in The current situation The relationships
designing a foreign and issues of English between foreign
language test language testing in language testing
schools in Vietnam and FLT
01
Testing and
Language
testing
Presented by Nguyen Thi Khanh Linh
A- Testing
Definition of terms
➹ ➹ ➹
1. Measurement 2. Test 3. Evaluation
1. MEASUREMENT
➽ the process of quantifying the characteristics of
persons according to explicit procedures and rules.
2.4 2.5
The effect of Why are tests
testing inaccurate?
What is a test?
2.1. What is a test?
- A general definition:
“a procedure intended to establish the
quality, performance, or reliability of something,
especially before it is taken into widespread use.”
(Concise Oxford English Dictionary)
- Carroll (1968) provides the following definition of a
test:
A psychological or educational test is a
procedure designed to elicit certain behavior
from which one can make inferences about
certain characteristics of an individual.
(Carroll 1968: 46)
2.1. What is a test?
- In simple terms: It is a method of measuring person’s
ability, knowledge or performance in a given domain.
https://www.researchgate.net/figure/nteragency-language-roundtable-ilr-scale-Oral-Proficiency-Interview-OPI-
descriptor_tbl1_295539760
Language tests can provide the means for more
carefully focusing on the specific language abilities
that are of interest.
➽ They could be viewed as supplemental to other
methods of measurement.
Conclusion:
+ Other measures are no less valuable than tests.
+ The value of tests lies in their capability for
eliciting the specific kinds of behavior that the test user
can interpret as evidence of the attributes or abilities
which are of interest.
2.2. What is testing?
➷ ➷ ➷ ➷
To measure language To discover how To diagnose students’ To assist placement
proficiency. successful students strengths and of students by
have been in weaknesses, to identifying the stage
achieving the identify what they or part of a teaching
objectives of a course know and what they programme most
of study. don’t know. appropriate to their
ability.
(Arthur Hughes and Jake Hughes, Testing for Language Teachers. Oxford University Press.)
2.4. What is the effect of testing?
➽ Evaluation does not necessarily entail testing. By the same token, tests in
and of themselves are not evaluative.
3. EVALUATION
Comparison:
➽ Not all measures are tests, not all tests are evaluative, and not all evaluation
involves either measurement or tests.
4. ESSENTIAL MEASUREMENT QUALITIES
Reliability Validity
4. ESSENTIAL MEASUREMENT QUALITIES
Reliability
Reliability is a quality of test scores, and a perfectly reliable score, or
measure, would be one which is free from errors of measurement.
(American Psychological Association 1985)
E.g: A student receives a low score on a test one day and a high score on the
same test two days later, the test does not yield consistent results, and the
scores cannot be considered reliable indicators of the individual’s ability.
E.g: If we ask students to listen to a lecture and then to write a short essay
based on that lecture, the essays they write will be affected by both their
writing ability and their ability to comprehend the lecture. Ratings of their
essays, therefore, might not be valid measures of their writing ability.
Distinctiveness Ordering
Absolute zero
Equal intervals point
5. PROPERTIES OF MEASUREMENT SCALES
Measurement specialists have defined FOUR TYPES OF MEASUREMENT SCALES:
Nominal Ordinal
E.g: right - wrong, E.g: first, second, third…
male - female…
Interval Ratio
5. PROPERTIES OF MEASUREMENT SCALES
➽ The test scores indicate that these individuals are not equally distant from each other on
the ability measured.
5. PROPERTIES OF MEASUREMENT SCALES
➽ Conclusion:
These different four scales are also sometimes referred to as levels of measurement.
+ The nominal scale is thus the lowest type of scale, or level of measurement,
since it is only capable of distinguishing among different categories
+ The ratio scale is the highest level, possessing all four properties and thus
capable of providing the greatest amount of information.
B - LANGUAGE
TESTING
1. What is language testing?
● Language testing lies at the core of both language teaching and applied linguistics.
(Alan Davies)
● Language testing is the practice and study of evaluating the proficiency of an
individual in using a particular language effectively.
(Winning entry from the 2009/10, Priscilla Allen, University of Washington)
information
quantitative
(measurement)
Evaluation
value judgements,
or decisions
3. Types of decisions
Decisions
Individuals Programs
Students Teachers
02
Different approaches to
foreign language testing
Presented by Truong Anh Phuong
How do you describe the
relationship between teaching and
testing in your educational
context?
Types of Approach
Performance -
Communicative
based
Essay - translation
- It is referred to as the pre-scientific stage of language
testing.
- No requirements for testing skill
- Tests often consist of essay writing, translation and
grammatical analysis.
Essay - translation
Key characteristics:
- Discrete-point testing
- Objective scoring
- Emphasis on form over function
- Efficiency
Structuralist
Strengths:
● Objectivity and Reliability: allows for clear and consistent scoring criteria,
reducing rater bias - ensures a higher degree of reliability in test results.
● Efficiency: tests can be administered quickly, allowing for large-scale
assessment.
● Practicality: easy to implement and requires minimal resources.
● Diagnostic Value: provide valuable information for targeted instruction.
● Foundation for Further Development: provides a solid foundation for more
complex language tests that incorporate communicative elements.
Structuralist
Weakness:
● Lack of Communicative Validity: fails to accurately measure real-world language use -
isolates language components and neglects the communicative context in which language is
used
● Artificiality: Test items often appear artificial and unrelated to authentic language use, leading
to a mismatch between test performance and real-world language ability.
● Limited Scope: By focusing on discrete language points, the structuralist approach overlooks
higher-order language skills such as fluency, coherence, and strategic competence.
● Neglect of Context: It fails to consider the influence of cultural and social factors on language
use.
● Overemphasis on Form: The excessive focus on grammar and vocabulary can lead to a
neglect of meaning and discourse.
Integrative
accuracy
● Meaningful communication
● Authentic situation
● Unpredictable language input
● Creative language output
● Integrated language skills
Performance - based
Performance-based assessment believes students will learn best when
they are given a chance to perform and show what they know
according to their own :
★ plan
★ draw conclusions
★ collect data
★ take a stand or deliver presentations.
★ infer patterns
Performance - based
Principles (Approach According to Brown (2004))
● Authenticity: Assessments should closely mirror real-world language use, providing learners with opportunities
to demonstrate their abilities in authentic contexts.
● Complexity: Tasks should require learners to integrate multiple language skills and engage in higher-order
thinking processes, challenging them to apply their knowledge in meaningful ways.
● Contextualization: Language should be assessed within its social and cultural context to reflect authentic
communication.
● Learner-centeredness: Assessments should focus on the learner's ability to use language for communicative
purposes, rather than simply testing isolated language skills.
● Reliability and Validity: While performance-based assessments can be more challenging to score reliably,
efforts should be made to develop clear scoring rubrics and train raters to ensure consistency and accuracy.
● Washback: Assessments should positively influence teaching and learning by providing feedback that helps
learners improve their language skills.
Performance - based
Strengths
● Authenticity: It closely reflects real-world language use, providing a more valid
assessment of learners' communicative competence.
● Holistic assessment: Evaluates multiple language skills simultaneously, providing a
comprehensive picture of language proficiency.
● Focus on process: Allows for evaluation of the learner's thinking and problem-solving
skills, beyond just the final product.
● Motivation: Engaging tasks can increase learner motivation and interest in the
assessment process.
● Formative assessment: Can provide valuable feedback to learners and teachers for
improvement.
● Differentiation: Can accommodate learners with diverse learning styles and abilities.
Performance - based
Challenges of the Performance-Based Approach
● Complexity and Time-Consuming: Developing and administering
performance-based assessments can be demanding.
● Subjectivity: Scoring can be influenced by rater bias and
inconsistency.
● Practical Constraints: Implementing performance-based
assessment on a large scale can be challenging due to resource
requirements.
03
Types of foreign language tests
↪ to measure proficiency
↪ to measure achievement
↪ to diagnose linguistic strengths and
weaknesses
↪ to help place students in appropriate classes
1. Proficiency
Final achievement
tests
tests
2. Achievement
tests
Types of Progress
3. Diagnostic
foreign tests
achievement
tests
language tests 4. Placement
tests
5. Screening
tests
(Arthur Hughes, Jake Hughes ,2004)
1. Proficiency tests
Purposes:
- To assess an individual's ability in a foreign language, independent of any prior
training or specific course content.
- To determine whether a candidate can perform adequately in the language for
specific situations, such as academic study or professional roles.
Characteristics:
- To have detailed specifications that successful candidates have demonstrated
that they can do.
- Be not based on courses that candidates may have previously taken.
- All users of a test (teachers, students, employers, etc.) can then judge whether
the test is suitable for them, and can interpret test results.
2. Achievement tests
Purpose:
- To measure how well students have met the objectives of a specific language
course, focusing on individual or group performance.
Types:
● Final Achievement Tests: Administered at the end of a course, these tests assess
whether course objectives have been met. They can be based on a course
syllabus or objectives, though aligning them with course objectives generally
yields more accurate outcomes.
● Progress Achievement Tests: These tests gauge student progress towards course
objectives often using short-term objectives to measure improvement over time.
3. Diagnostic tests
Purposes:
- To identify students' strengths and weaknesses in language skills,
- To guide future instruction.
- To ascertain specific areas where learners struggle, facilitating targeted
teaching.
Purposes:
- To provide preliminary assessments to determine if students qualify to
take more extensive tests, thus saving time and resources.
Discrete point tests will almost always be indirect, while integrative tests will tend to
be direct.
Diagnostic tests of grammar of the kind referred to in an earlier section will tend to be
discrete point.
3. Norm-referenced versus criterion-reference testing
Objective Testing: Results are based on clearly defined correct answers (e.g.,
multiple-choice).
1. Paper-and-Pencil Tests
2. Face-to-Face Tests
3. Computer-Based Tests
Sample analysis
An ielts test
04
Practical steps in
designing a foreign
language test
Presented by Tran Le Gia Linh
Language tests must be developed using a rational series of 3 steps that
connect the observed performance to the supposed skill or construct
(Bachman, 1991):
Determining and
conceptually defining Creating protocols for
the construct calculating observations
Determining the
construct's operational
definition
1. Determining and conceptually defining the construct
A set of specifications for the test must be written at the outset. This will
include information on:
- Content
- Test structure, timing, medium/channel, techniques to be used
- Criterial levels of performance
- Scoring procedures.
(i) Content
This refers to the entire potential content of any number of versions.
→ The fuller the information on content, the less arbitrary should be the
subsequent decisions as to what to include in the writing of any version of
the test.
- The way in which content is described will vary with its nature.
(ii) Structure, timing, medium/channel, and techniques
(i) Sampling
- Everything found under the heading of ‘Content’ in the specifications
can be covered by the items in any one version of the test. Choices have
to be made.
- For content validity and for beneficial backwash: choose widely from
the whole area of content, sample widely and unpredictably, although
one will always wish to include elements that are particularly important.
(ii) Writing items
- It is no use writing ‘good’ items if they are not consistent with
the specifications, try to look at it through the eyes of test takers
and imagine how learners might misinterpret the item (in which
case it will need to be rewritten).
- Mention of the intended response is a reminder that the key to
an item is an integral part of it. An item without a key is
incomplete.
- The best way to identify items that have to be improved or
abandoned is through the process of moderation.
(iii) Moderating items
Moderation is the scrutiny of proposed items by (ideally) at least
two colleagues, neither of whom is the author of the items being
examined.
→ Learners’ task is to try to find weaknesses in the items and,
where possible, remedy them. Where successful modification is
not possible, they must reject the item.
4. Informal trialling of items on native speakers
- Where rating scales are going to be used for oral testing or the
testing of writing, these should be calibrated.
- Collecting samples of performance (for example, pieces of
writing) that cover the full range of the scales.
- A team of ‘experts’ then looks at these samples and assigns
each of them to a point on the relevant scale. The assigned
samples provide reference points for all future uses of the scale,
as well as being necessary training materials.
8. Validate
Payne (2003)
Airasian (2010)
“the process of collecting, synthesizing, “the interpretive integration of application tasks
and interpreting information to aid in (procedures) to collect objectives-relevant
decision making” information for educational decision making and
communication about the impact of the teaching-
learning process”
Identify students’ - Provide feedback and Take place after the learning
current knowledge of a information during the has been completed and
subject, their skill sets instructional process, provides information and
and capabilities, and to while learning is taking feedback that sums up the
clarify misconceptions place teaching and learning
before teaching takes process.
place. - Measure both
students and teachers’
progress.
?
What forms of diagnostic, formative and summative
assessment do you know?
Diagnostic Formative Summative
assessment assessment assessment
(Hoang, 2017)
?
What are the reasons why we can not combine listening and
writing skills into exams until now?
● Time-consuming
● Costly - human
resource + financial
resource
Unrealistic
The subjectivity of the
examiners while marking
candidates’ speaking skills
Reasons
English language
proficiency is different by
regions/ areas.
Unreasonable
The tests:
- VSTEP tests have faced and will possibly continue to encounter fierce competition from
other established international counterparts such as IELTS, TOEFL, and TOEIC to name just
a few.
- There has been a persistent lack of trust in domestic tests among tertiary institutions
(Nguyen & Gu, 2020) and in the private business sector (Huyen, 2020).
- On a macro level, MOET’s push for benchmarking students and institutions’ performance
against international standards has further motivated some universities to designate
international tests as their preferred exit tests (Nguyen et al., 2020; Tran, 2015) and even
to offer priority admission to students with IELTS and TOEFL scores (Le, 2020).
5.2. The current situation and issues of English
language testing in Vietnam
- VSTEP tests have faced and will possibly continue to encounter fierce competition from
other established international counterparts such as IELTS, TOEFL, and TOEIC to name just
a few.
- There has been a persistent lack of trust in domestic tests among tertiary institutions
(Nguyen & Gu, 2020) and in the private business sector (Huyen, 2020).
- On a macro level, MOET’s push for benchmarking students and institutions’ performance
against international standards has further motivated some universities to designate
international tests as their preferred exit tests (Nguyen et al., 2020; Tran, 2015) and even
to offer priority admission to students with IELTS and TOEFL scores (Le, 2020).
5.3. Implications and solutions:
Assessment reforms
The Vietnamese authorities should involve:
should not view high-stakes teachers + other
standardised testing as a low- stakeholders (students,
cost instrument to implement parents)
English education reforms
- Narrowing of Content
- Reduced Autonomy
- Teachers feel pressured to follow a rigid curriculum designed to
maximize test scores
-> reduce their motivation to adopt new teaching methods or
tailor lessons to the specific needs of students