Developing Writing Test For Efl Learners
Developing Writing Test For Efl Learners
DEFINITION OF WRITING
Writing is a communicative act and a way of sharing observation,
information, thoughts, and ideas with others through written language (Cohen
et. al., 1989; Troyka 1987). As a communicative act, writing involves both
physical and mental process. Cohen et al. (1989) state further that as a physical
process, writing means producing graphemes and orthographic symbols in the
form of letters or combination of letters that relates to the sounds in spoken
language using hands while the eye movements follow over the words or
sentences. While physical activity, according to Richards (1990:101),can be
captured visually, mental activity, on the other hand, cannot be seen or observed
directly. It is the process in the writer’s mind that includes making connection
between ideas and processing thoughts to be expressed in a meaningful written
text by employing linguistics organization.
82 Lingua Scientia, Volume 3, Nomor 2, Nopember 2011
in the first drafts, using paraphrases and synonyms, soliciting peers and
instructor feedback, and using feedback for revising and editing.
writing skill will still be guided to large extent, the English teachers are
supposed to vary the activities given to the students in order to avoid
monotonous approach. Thus, the teacher’s task to develop the students’ writing
skill is more complicated than the other skills.
WRITING TEST
There are so many experts proposed what kinds of aspects should be
judged in writing test. First is Inman et al. (1979) who asserted that there are
five aspects should be assessed in a composition. They are logic, organization,
development, style, and mechanics. While Jacobs (1981) proposed five aspects as
well for ESL context, namely content, organization, vocabulary, language use,
and mechanics. Harris, (1974), in addition, proposed five general components:
content, form, grammar, style, and mechanics. Moreover, Heaton (1988) states
five general components or main areas for writing good prose such as language
use, mechanics, content, style, and a judgment skill that is ability to write in an
appropriate manner for a particular purpose with a particular audience in mind,
together with an ability to select, organize and order relevant information. It is
in line with Burgess & Head (2005) statement that an answer in writing test
that has some errors but achieves its communicative purpose will get a higher
mark than an answer that is grammatically accurate but does not meet the task
requirement. Other experts supported this mark by saying that effective writing
ability can be reached through a combination of sociocultural competence,
involving appropriate conventions, register, and (rhetorical) style; discourse
competence, involving ideas and their structuring, coherence, and cohesion with
an intended audience in the mind; and linguistic competence, involving
appropriate and broad lexis, fluent and accuracy syntax, and accurate
mechanics.
Furthermore, Langan (1985) has something different in evaluating essay
writing. He proposed five areas to be based for assessment in writing. They are
unity, support, coherence, and sentence skills. However, in fact, those elements
of writing have something in common; that is they are more or less the same.
Language, for example, has two elements. They are sentence structure and
diction. Mechanics has four points. They are paraphrasing, punctuation, spelling,
and capitalization. Style has five aspects. They are economy, simplicity, clarity,
congruity, and courtesy. Organization is the rhetorical form, while logic is
something to do with content. In short, among several features of composition to
be assessed, they have many things in common. They can be grouped into four:
content, organization, language, and style.
PROCEDURES OF SCORING
As mentioned previously, there are two approaches that can be used to
measure the students’ writing ability, namely the direct and indirect
measurement. Considering that each approach result in different tasks, a
different procedure in scoring is applied accordingly. In a direct measurement,
the score that the students get is derived from the rater’s judgment on the basis
of the pre-determined criteria stated in the scoring guide. On the other hand, in
an indirect measurement, the score is obtained from an objective scoring
procedure which is based on an answer key.
holistic scoring, a piece of writing is viewed as a whole and complete idea rather
than as a separate element. The rater bases his judgment on his impression of
the composition and he might be guided by a holistic scoring guide in scoring the
composition. While the frequency count marking, on the other hand, is a
procedure in evaluating a piece of writing by tallying and/or enumerating certain
elements of the composition, like the number of cohesive devices, spelling errors,
grammatical errors, punctuations errors and things like that.
Another classification of direct measurement in writing is introduced by
Spandel and Stiggins (1990). They classify writing assessment into three types:
primary trait, holistic, and analytic scoring procedure. Primary trait scoring is a
procedure in scoring a piece of writing by focusing on the domain trait of the
piece, such as on descriptive, narrative, and argumentative writing. Holistic
scoring means scoring a piece of writing as a whole where each paper receives
only one score. The final score is not the total of sub scores. The difference
between primary trait scoring and holistic scoring is on the emphasis. In the first
procedure, a different mode of writing has a different scoring guide depending on
the types of discourse whereas in holistic scoring, there is no specific emphasis.
That is why the holistic scoring can be applied to all types of discourse. Analytic
scoring, unlike the first two procedures, scores a piece of writing by referring to a
list of features or sub skills on which a rater bases his judgment. The writing
quality is shown by the total of the sub scores.
Still another classification of direct measurement in writing is given by
White (1985), and Brown (2005), who classify evaluation on writing into two
basic scoring procedures. The first is holistic scoring and the second is analytic
scoring. In holistic scoring, a rater judges a piece of writing as a whole without
any separable aspects and their sub-scores. The holistic evaluation must come up
with a single score which does not result from summing up the sub-scores.
Analytic scoring, in contrast, come up with a single score resulting from
summing up of the sub-scores which are derived from the scoring of the features,
or aspects of the piece.
In short, there are two types in common for writing scoring. If the
procedure of scoring is based on the analysis of features, it is called analytic.
When the scoring is based on the judgment of rater(s) as a whole without
separating features or aspects, it is called holistic.
due to the fact that indirect measurement uses an objective scoring system with
definite answer, so there will be no subjective judgment. A direct measurement,
in contrast, is usually difficult to obtain high reliability coefficient for the scores
depend on the raters’ judgment. Avoiding subjectivity is extremely difficult. To
overcome this problem, at least two raters, even more, are needed. A third rater
is required in case the scores from the two raters are more than pre-determined
maximum acceptable difference of scores.
of being a good test. A good language test should possess three qualities i.e.
validity, reliability, and practicality (Harris, 1974).
Validity
Every test, whether it be a short, informal classroom test or a public
examination should be as valid as the constructor can make it (Heaton,
1988:159). Validity, according to Ebel & Frisbie, 1986 in Latief, 2000:98) refers to
the appropriateness of making specific inferences or of making certain decision
on the basis of scores from a test. In other words, the test must aim to provide a
true measure of the particular skill which is intended to measure: to the extent
that measures external knowledge and other skills at the same time, it will not
be a valid test (Heaton, 1988:159).
Differ with conventional concept that claims there are so many kinds of
validity; Djiwandono (2008:165) point out that validity is unitary concept. It
means that validity conceptually is only one kind, the variousness are lies on
how to prove the validity. They are mainly three ways to give a support or
evidence to validity, and we can choose one of the most appropriate one to get
evidence of our test validity. They are content validity, criterion-related validity,
and construct validity.
Content validity is a kind of validity that depends on a careful analysis of
the language being tested and of the particular course objectives (Heaton,
1988:160). It means that a test should contain a representative sample of the
course, the relationship between the test items and the course objectives always
being apparent. Furthermore, Gronlund (1985) in Latief (2000:1) states that
content validation is the process of determining the extent to which a set of test
tasks provides a relevant and representative sample of the domain of tasks
under consideration. Heaton (1988:160) stated that if we want to use content
validity to give evidence to our test, the test writer should first draw up a table of
test specifications, describing in very clear and precise terms the particular
language skills and areas to be included in the test. Then the important point is
that the test writer has attempted to quantify and balance the test components,
assigning a certain value to indicate the importance of each component in
relation to the other components in the test. By so doing, the test will achieve
content validity and reflect the component skills and areas which test writer
wishes to include in the assessment.
As cited from Djiwandono (2008:165), criterion-related validity can be
proved by comparing between the students’ score obtained and the scores from
the similar test which has been marked as a good test at the same time, we
speak of ‘concurrent validity’. For instance, we compare the students’
achievement in English course and the students’ scores in TOEFL. When the
correlation between the students’ achievement in the course and their
achievement in TOEFL test is high, the result of the test/measurement in the
English course has strong criterion validity evidence. Therefore, the result of the
English test is believed to have high concurrent validity. On the contrary, when
the comparison of the two test results makes low correlation, the result of the
test in the English course is said to have weak or low concurrent validity
evidence. If it concerns the degree to which a test can predict the test takers’
Ary Setya,Developing Writing Test 89
Reliability
Reliability of the result of language test refers to the preciseness of the
result in presenting the actual level of the language proficiency of the examinees
(students) (Latief, 2001:214). If the test is administered to the same candidates
on different occasions, then, to the extent that it produces differing results, it is
not reliable (Heaton, 1988:162). Reliability measured in this way is commonly
referred to as test/re-retest reliability to distinguish it from mark/re-mark
reliability. Another common reliability denotes the extent to which the same
marks or grades are awarded if the same test papers are marked by (i) two or
more different examiners or (ii) the same examiner on different occasion. In
short, in order to be reliable, a test must be consistent in its measurements.
Since there are so many kinds of test for language proficiency, so there
are many ways as well in calculating the level of reliability (see Djiwandono,
2008:171). The calculation of reliability level always required two sets of scores
to measure the consistency of the test. Correlation coefficient as a measurement
of the consistency of the test can be got by calculation method. There are so
90 Lingua Scientia, Volume 3, Nomor 2, Nopember 2011
many methods of estimating the reliability test. Based on the score obtained to
calculate correlation coefficient, there are eight kinds of reliability. They are (1)
test-retest reliability, (2) equivalent-forms reliability or alternate-forms
reliability, (3) split-half reliability, (4) cronbach alpha reliability, (5) cronbach
alpha for writing ability, (6) Kuder-Richardson (KR) reliability, (7) scorer
reliability or rater reliability, and (8) estimated reliability.
The approval of reliability is completely an empirical matter in which it
involves statistical analysis. The statistical analysis is used to show the
correlation in a various level. It is expressed in the form of correlation coefficient.
Since the reliability is a correlation in a various level, therefore, reliability is
actually a spectrum of level and is not dicotomically reliable and unreliable. The
reliability is spread from the highest to the lowest with some levels in between.
It is a form of a continuum of coefficient. The highest reliability is statistically
figured as 1.00. The reliability with value 1.00 means the score has absolute
consistency without any deviation at all. This kind of reliability is theoretical,
because in the reality almost there is no result of measurement which is
absolutely consistent without any difference at all, moreover in the measurement
of a multi-aspect subject such as language teaching. In practicing, the level of
reliability is usually found to be lower than the absolute correlation coefficient
(1.00) that is 0.95, 0.90, 0.70, and so forth.
Practicality
There are so many experts in language testing discuss about the
practicality of the test. According to Djiwandono (2008: 190), practicality of a test
does not have relationship with something abstract or theoretical, but it is
something done with its application, mainly in 1) the practicality of
administering the test, and 2) the financial aspect. It is line with Bell (1981:200)
that involved two parameters in determining practicality that is economy (in
terms of money and time) and ease. In addition, Harris (1974) asserted that a
test is said to be practical if it is economical in terms of cost and time, easy to
administer, score, and interpret.
CONCLUSION
Every teaching and learning process needs an evaluation. In this paper I
have briefly touched upon several issues related to assessment that writing
teacher should be aware of. A solid understanding of assessment issues should
be part of every teacher’s knowledge base, and teacher should be encouraged to
equip themselves with this knowledge as part of their ongoing professional
development.
92 Lingua Scientia, Volume 3, Nomor 2, Nopember 2011
REFERENCES
Bachman, L.F. 1990. Fundamental Considerations in Language Testing. Oxford:
O.U.P.
Bell, R.T. 1981. An Introduction to Applied Linguistics. London: Batsford
Academic and Educational Ltd. (Appendix C, “Language Testing”).
Brown, H. D. 2001. Teaching by Principles. An Interactive Approach to Language
Pedagogy (2nd ed). White Plains, New York: Pearson Education.
Brown, J.D. 2005. Testing in Language Programs: A Comprehensive Guide to
English Language Assessment. New York: McGraw-Hill.
Burgess, Sally & Head, Katie. 2005. How to Teach for Exams. Jeremy Harmer
(Ed). Harlow: Pearson Education Limited.
Byrne, D. 1988. Teaching Writing Skills: Handbook for Language Teachers.
London: Longman Group UK Limited Company.
Cohen, M., and Margaret, R. 1989. The Effect of distance on Students’ Writing.
American Educational Research Journal Vol.26, no.2, pp.143-159.
Das, K Birkam (Ed). 1989. Principles of Language Learning and Teaching.
Singapore: SEAMO Report Language Center.
Djiwandono, M. Soenardi. 2008. Tes Bahasa: Pegangan Bagi Pengajar Bahasa.
Jakarta: PT. Indeks.
Eanes, R. 1983. Content Area Literacy: Teaching for Today and Tommorow.
Albany: Delimar publisher.
Harris, D.D. 1974. Testing English as a Second Language. New York: McGraw-
Hill.
Heaton, J.B. 1988. Writing Engish Language Test. New York: Longman, Inc.
Hughes, A. 1989. Testing for Language Teachers. Cambridge: C.U.P.
Inman, B.A & Ruth, G. 1979. Aspects of Composition. Second edition. New York:
Harcourt Brace Jovanovich, Inc.
Jacobs, H.L., Zinkgraf, S.A., Wormuth, D.R., Hartfiel, V, F. & Hughey, J.B. 1981.
Testing ESL Composition: A Practical Approach. Massachusetts: Newbury
House Publishers.
Langan, J. 1985. College Writing Skills with Reading. New York: McGraw-Hill.
Latief, M.A. 2000. Validitas Hasil Pengukuran. Bahasa dan Seni, 28 (1): 95-104.
Leki, I. 1994. Teaching Second-Language Writing: Where We Seem to Be. In Karl,
T. (Ed.), Teacher Development: Making the Right Moves (pp. 170-178).
Washinggton, Dc: USIA.
Reid, J. 1993. Testing ESL Writing. New York: Prentice Hall.
Richards, J.C. 1990. The Language Teaching Matrix. New York: Cambridge
University Press.
Rivers, W. M. 1987. Interactive Language Teaching. Cambridge: Cambridge
University Press.
Savignon, Sandra J. 1983. Communicative Competence: Theory and Classroom
Practice. Massachusetts:Addison-Wesley Publishing Company.
Swales, J.M. 1990. Genre Analysis: English an Academic and Research Setting.
Cambridge: Cambridge University Press.
Troika, L.Q. 1987. Handbook for Writers. New Jersey: Prentice Hall Inc.
Weir, C.J. 1995. Understanding and Developing Language Test. Singapore:
Phoenix.
Ary Setya,Developing Writing Test 93