two
two
ISSUES IN TESTING
Validity
Reliability
Practicality
Backwash effect
Discrimination – whether the test discriminates the ss who know the subject well from those
who do not know the subject.
VALIDITY
Face validity – what is the first impression that the candidate will get when he/she sees the
question/test paper. Whether the q’s are typed well, whether the charts are clearly
mentioned.
1
o Is the time allocation sufficient? Paper is two hours but will take 3 hours to write
the answers. Paper is 2 hours but you finish within one hour.
o Are the items arranged in a regular way? Simple to complex. Known to unknown.
Mcq’s structured. E.g. first essays, then mcq’s then structured. Then it will be
difficult for candidates to manage time. There is a proper arrangement for it.
She will only ask short questions no essay questions. She will never ask for definitions.
Content validity – e.g you have a syllabus for testing and evaluation. Look at these three
approaches: structural, communicative and eclectic. We took 9 hours to complete this. But in
the q paper, you did not get any q from these three. In syllabus also it has been instructed to
give more time for these three approaches. But in q paper no q comes. So q paper has not
covered important elements in the syllabus. So it fails to achieve content validity. Content
validity means:
Test should represent all the important areas in the syllabus or taught and in suitable
proportions. Something may not have been in the syllabus but you may have taught
that. E.g. if 9 hours were taken for those 3 approaches. There should be more q’s on
those three. E.g. issues in testing, ½ hour so limited no. of q’s from that. Those q’s are
given in appropriate proportions, i.e. what is the time allocated for teaching, what are
the hours allocated in the syllabus. The no. of exams that you get in the exam should be
similar to that.
Content validity is concerned with the manner in which the question paper is
representative of the respective syllabus. Q paper should be a representative sample of
the syllabus.
It refers to the extent to which the question paper can be treated as a representative
sample of the subject matter content actually learnt or taught or was expected to be
learnt or taught.
A representative sample of the subject matter arises from the fact that it is impossible
to inquire about the whole of learning acquired during a long period of time within a
very short span of time.
The test can be considered as a representative sample of the subject matter content,
only if proper weightage has been given to each of the areas in the subject matter,
taking into consideration the length of time expected to be devoted to the teaching and
learning process of each of these areas. When preparing the test paper, you need to
consider the particular length of time that was taken to teach the test paper.
In addition it is also important that the question paper makes an effort to identify and
assess skills that are expected to have been attained during the process of learning.
2
The Table of Specifications (Blue Print) is used to secure the content validity of a test.
We want to ensure that it should be tested in suitable proportions. So a blueprint/table
of specification should be prepared. It is a summary of the test paper. What is the time
we will allocate for this paper: 2 hours. How many mcq’s? are we going to give mcq’s?
how many marks for mcq’s? how many marks altogether? Lang. paper: how many q’s on
grammar. how many structured or essay q’s? how many on reading/writing/speaking?
Are we going to test speaking in this? Everything regarding the test paper should be
given in it. Summary of a test paper should prepared to set to content validity of the
test.
The greater the test content validity, the more likely it is to be an accurate measure of
what it is supposed to measure. GCE O/L and A/L, ss normally they get is 8 As, etc. A/L is
more difficult. Chemistry. Content validity is maintained to the max. unless you study
everything/whole syllabus you cannot get a good score. O/L, comparatively, the content
validity is less. O/L Maths, geometry and trig, you can get 3 A pass. You don’t need to
study the whole syllabus. By studing selected items you know you will get A pass
definitely. That is content validity.
Construct validity – the theory that was used in language teaching should be followed in
setting the paper
Finding out to what extent the test conforms to the theory which had served as its basis.
Test paper should be set following the methods that we use in teaching. What is the
methodology or approach that you used in the l/t process and your test paper should
tally to that.
A test to have construct validity it should match with the theory on which the language
teaching is based.
E.g. If the language teaching was based on Audio Lingual Method the test should
match with Behaviorist Theory
Audio lingual method not in our syllabus. It is based on behavioural
theory of language learning. Language learning is similar to habit formation and the ss
should form one lag. Habit at a time. So inteaching one structure at a time. Ss. were
tested at that rate. E.g. one lang. habit at one time. So in audiolingual method. Drilling- I
like to eat apple, banana. E.g. one structure. In exam also you will get such structures,
such tables, like fill in the blanks acitivities.
Construct validity is more important in intelligent tests, aptitude tests and personality
tests.
3
Another e.g. communicative approach, lang. learning was based on info gap principle. One
person knows something, the other person does not. So you need to use information to
communicate to bridge the gap. So if you use this approach in your l/t process, in your paper
there will be there will be role-play, dialogue, construct dialogue, fill in the blanks in format of
dialogue.
Today we are using eclectic approach – mixture of all these methods and techniques. The paper
will have, fill in the blanks, cloze, grammar items, seed test. Variety.
To what extent is the test that is subjected to standardization agrees with other
contemporary tests. Simply,
This is important in speaking tests as some speaking tests may lack content validity.
Concurrent validity is a research based inquiry. You have to calculate it and find the
value for it.
E.g. A teacher wants to know that the English Language paper made by him/her
is a valid test. In finding out the validity it has to be compared with the marks the
students have obtained at a contemporary test. Here the teacher compares the
marks the students have obtained for English Language at the mid-term test with
the marks they have obtained at the test made by the teacher. If there is an
agreement shown by the correlation coefficient, the teacher made test can be
considered as a test having concurrent validity.
If there is a correlation between the marks obtained by the ss, e.g. Ayesha got
75% for the teacher-made test. Fro the mid-term test she got 80%. Dasuni got
80% for the tr test and 40% for the mid-term test. Majority of ss’ marks don’t
correlate. So the tr prepared test is not a standardized test. There is a
correlation/similarity of marks between the test prepared by the teacher and the
standardized test. E.g. trs test low and other test low and vice versa.
In concurrent validity, you prepare the test item. You want to compare it with
the standardized test and see how far your test agrees with the standardized
test,O/L paper. E.g. GCE O/L is a standardized test that is prepared by the
National Evaluation Unit. Suppose you prepare the monthly test paper. You want
to see how far, your test agrees with that standardized test. Monthly test, you
teach the paper, you prepare the paper you mark the paper. National paper, and
external body prepares the paper, another external body prepares the paper.
Another external body will mark it and learners will be taught by teachers. 3
unknown parties come together. Test paper is constructed – standardized test.
4
You are finding out how far your test conforms with the standardized test. You
have to calculate a value there is a formula. That’s not imp. Now. One element in
validity is concurrent validity. Concurrent validity is a research-based inquiry.
Predictive validity
Predictive validity is meant to what extent the test is capable of predicting whether
individuals are competent to perform a task in future. E.g. if a s, performs well in a test,
we will be able to predict his performances in future. Do those ss who get through the
grade 5 scholarship exam means do they perform well in future exams as well? If they
perform in other exams as well we can say that Grade 5 scholarship exam has predictive
value. They will enter uni. But that’s not true because those who do not get through the
grade 5 scholarship will enter medical college/engineering faculty/entrepreneurs. We
can’t measure success based on grade 5 exam. If we can predict the future performances
of ss based on a part. Exam we call it predictive validity.
If a test can predict the future performance of the child, that test has predictive validity.
A good test should predict a candidate’s future performance.
E.g. If it’s expected that those who obtain high marks at grade 5 scholarship
would do their studies well in future, the scholarship exam is considered as one
having predictive validity.
Face validity
Content validity
Concurrent validity
Predictive validity
Validity means, whether the test measured accurately, what it wanted to measure. E.g.
English Language test to measure the ss. lang. skills. Valangubaawaya.
5
In an English test, if a s. gets who wrote the novel wuthering heights/ oliver twist. That is not
a good question. In a language test, you have to test language skills. Listening, speaking,
writing, reading. Etc. not General knowledge.
Tester should include enough samples. E.g. if you take 10 mcq’s you can’t measure, the
ss ability, For one mcq you allocate 10 marks. 10*10=100. Enough numbers hould be
there.
Should not be given much samples to choose. 2nd paper, there are 20 questions, you
have to answer 2 questions. One candidate will answer 2 questions.
Instructions should be clear and to the point. E.g. you should not give ambiguous
instructions. Esp. When you ask them to select q’s. first one is compulsory, four more
questions.
Test items should not be ambiguous. Ss cannot answer not because they don’t know the
content but because they do not comprehend the q. you may have your own
experience. You can say I wrote this, another s may say I wrote that.
Test items should be legible (readable)
Should be familiar with the format of the test items. How many mcq’s you will give,
whether you are giving structured/shortwritten/essay q’s. e.g. national level testing, if
they change the structure of the paper, they will conduct session and make it aware to
6
the teachers and ss. you should make the candidate aware of the format/structure of
the paper.
Tester should make their best attempts to minimize the subjectivity of the test items.
Subjectivity is the opposite of objectivity. Our personal opinion should not influence the
mark. Personal opinions may have a weight in essay q’s. e.g. what is your opinion about
the current government. Do they rule better than the previous one. What is your
opinion. It is a very subjective q. answer will vary acc. to the opinion of the marker. We
have to minimize that. We can’t eradicate that. There will be some q’s like that, e.g.
political science, religion, mother tongue, etc. one step is to make your scoring very
objective. Content these points are there 10 marks, these points are there: 9 marks. o/l
paper is marked by two marking examiner. Try your best to make your scoring
procedure objective. Give a detailed marking system.
Try your best to identify students not by name, but by number. If you write the name, if
you are the favourite student then she will give more marks. So use the index number
only.
Employ multiple independent scoring/ try to maintain scorer reliability. Means marked
by more than one marking examiner. Marked by two.
Standardization of the marking scheme (will enable the maintenance of reliability). E.g. a
letter is marked out of 15. Content 5, language 5, mechanics of writing 3. Organization 2.
DISCRIMINATION – wen kota salakanawa. Whether the test discriminates those ss who
knows the subject and those ss who do not know the subject. To identify the candidates who
really know the subject from those that don’t.
7
A good test should discriminate the performance of students from that of another.
Discrimination aspect is important when you select students to courses. Specially for
selection tests. E.g. to get admission to uni, A/L paper is tough. Unlike o/l it is a selective
exam for uni so discrimination is maintained.
Need to include different samples with different standards. Include very simple
and challenging q’s. only bright ss can answer. Take the physics, chemistry, bio
papers, not all ss can answer. There are very tough q’s. not all q’s are simple
there are tough ones that only the bright ss can answer those.
Having a very detailed marking scheme. Detailed marking scheme will help to
discriminate the learner. Even very minute details will be there. The full mark will
be given only if these elements are given.
Different techniques (of testing). Not only essay, you will get structured fill in the
blanks, some q’s out of the syllabus ,not directly related.
BACKWASH EFFECT
8
A beneficial backwash means when test and the instructional process influence each
others reciprocally and the testing is beneficial for teaching and learning.
Norm reference testing: E.g. shamla’s mark, we compare other ss’ marks to hers in the bell
curve. Does she have to have a particular mark to compare the other ss’ marks with hers.
In CRT we compare ss marks, based on a criteria. E.g. driving test, if you are abel to drive, you
know the road signals there is no competition. You are through. No issues to compete with
one another to pass the test. You can get the license.
But entrance to medical college in SL, all have scored 250 but room only for 800 ss from SL,
then mark will be decided by that.
Though you got 250 if you are not in that group you cannot enter.
Law entrance, you score better than her, you will not enter if you are not in the top 100.
When you draw it , you draw the last best 30%. 50, 50. If 10% is taken, you should come to
that 10%. Mark may be 98%. If all ss scored well. That is the mark.