0% found this document useful (0 votes)
248 views57 pages

Mid-Term Scope Dpe 104

DPE 104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
248 views57 pages

Mid-Term Scope Dpe 104

DPE 104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 57

1

BASIC CONCEPTS IN ASSESSMENT


1.0 Introduction

As teachers, we are continually faced with the challenge of assessing the progress of our
students as well as our own effectiveness as teachers. Assessment decisions could substantially
improve student performance, guide the teachers in enhancing the teaching-learning process and
assist policy makers in improving the educational system. At the same time, however, poor
assessment procedures could adversely affect the students, teachers and administrators. Assessment
of learning is a tricky business, indeed, for it requires measuring concepts, ideas and abstract
constructs quite unlike the assessment of physical quantities which can be done with appropriate
degree of accuracy. In assessment of learning, we deal with intangibles and attempt to characterize
them in a manner that would be widely understood.

Not too long ago, assessment of learning was confined to techniques and procedures for
determining whether or not cognitive knowledge (memorization of facts and theories) was
successfully acquired. Thus, assessment was essentially confined to pencil-paper testing of the
cognitive levels of learning (Bloom, 1954). In the past two decades, however, educators and
educationists recognized that not only are we expected to know facts and figures in today’s society,
but we are also expected to function effectively in the modern world, interact with other people,
and adjust to situations. Such expectations have not been matched with appropriate assessment
methods which could identify successful acquisition of skills other than cognitive skills until the
early to late 1990’s. Consequently, the traditional assessment method of pencil and paper testing
identified potentially high performing students who have not been successful in coping with the
demands of modern society.

The most common method assessing student learning is through tests (teacher-made or
standardized). Despite some criticisms leveled against using tests in determining if students are
learning or if schools are successful, these tests will continue to be used in the foreseeable future
(Shepard, 2000). Test results provide an easy and easily understood means of informing the student
about his progress or the school about his performance. Standardized tests, in particular, provide
clear targets to aim for when teachers and administrators want improvement (Jason, 2003). Tests,
coupled with other observational performance-based techniques, provide a powerful combination
for objective and precise assessment procedure.

1.1 Educational Measurement

The first step towards elevating a field of study into a science is to take measurements of the
quantities and qualities of interest in the field. In the Physical Sciences, such measurements are
quite easily understood and well-accepted. For instance, if we want to measure the length of a
piece of string with a standard ruler or meter stick; to find the weight of an object, we compare the
2

heaviness of the object with a standard kilogram or pound and so on. Sometimes, we can measure
physical quantities by combining directly measurable quantities to derived quantities. For example,
to find the area of a rectangular piece of paper, we simply multiply the lengths of the sides of the
paper. In the field of educational measurement, however, the quantities and qualities of interest are
more abstract, unseen and cannot be touched, they cannot be observed thus makes the
measurement process in education much more difficult.

For instance, knowledge of the subject matter is often measured through standardized test
results. In this case, the measurement procedure is testing. The same concept can be measured in
another way. We can ask a group of experts to rate a student’s (or teacher’s) knowledge of the
subject matter is measured through perceptions.

1.1.1 Types of Measurement

Measurements can therefore be objective (as in testing) or subjective (as in perceptions). In


the example above, testing produces objective measurement while expert ratings provide subjective
measurements. Objective measurements are more stable than subjective measurements in the sense
that repeated measurements of the same quantity or quality of interest will produce more or less the
same outcome. For this reason many people prefer objective measurements over subjective
measurements whenever they are available. However, there are certain facets of the quantity or
quality of interest that cannot be successfully captured by objective procedures but which can be
done by subjective methods e.g. aesthetic appeal of a product or project of a student, etc. It follows
that it maybe best to use both methods of assessments whenever the constraints of time and
resources permits.

Objective measurements are measurements that do not depend on the person or individual
taking the measurements. Regardless of who is taking the measurement, the same measurement
values should be obtained when using an objective assessment procedure. In contrast, subjective
measurements often differ from one assessor to the next even if the same quantity or quality is
being measured.

1.1.2 Indicators, Variables and Factors

An educational variable (denoted by an English alphabet, like X) is a measurable


characteristics of a student. Variables may be directly measurable as in X = age or X = height of a
student. However, many times, a variable cannot be directly measurable when we want to measure
“class participation” of a student. For those variables where direct measurements are not feasible,
we introduce the concept of indicators.
An indicator, I, denotes the presence or absence of a measured characteristics. Thus:
I = 1, if the characteristic is present
= 0, if the characteristic is absent

For the variable X= class participation, we can let I1, I2, …, In denote the participation of a
student in n class recitations and let X = sum of the I’s divided by n recitations. Thus, if there were
n = 10 recitations and the student participated in 5 of these 10, then X = 5/10 or 50%.
3

Indicators are the building block of educational measurement upon which all other forms of
measurement are built. A group of indicators constitute a variable. A group of variables form a
construct or a factor. The variables which form a factor correlate highly with each other but have
low correlations with variables in another group.

Example: The following variables were measured in a battery of tests:


X1 = computational skills
X2 = reading skills
X3 = vocabulary
X4 = logic and reasoning
X5 = sequences and series
X6 = manual dexterity

These variables can be grouped as follows:


Group 1 : ( X1, X4, X5) = mathematical ability factor
Group 2 : ( X2, x3) = language ability factor
Group 3 : (x6) = psychomotor ability factor

In educational measurement, we shall be concerned with indicators, variables and factors of


interest in the field of education.

1.2 Assessment

Once measurements are taken of an educational quantity or quality of interest, then the next
step is to assess the status of the educational phenomenon. For example, suppose that the quantity
of interest is the level of Mathematics for Grade VI pupils in the district. The proposed
measurements are test in Mathematics for Grade VI pupils in the district. The District Office
decided to target an achievement test results, the school officials can assess whether their Grade VI
pupils are within a reasonable range of this target i.e. whether they are above or below the
achievement level target.

1.2.1 Various Roles of Assessment

Assessment plays a number of roles in making instructional decisions:

Summative Role. An assessment may be done for summative purposes as in the illustration
given above for grade VI mathematics achievement. Summative assessment tries to determine the
extent to which the learning objectives for a course (like Grade VI Mathematics) are met and why.

Diagnostic Role. Assessment may also be done for diagnostic purposes. In this case, we are
interested in determining the gaps. Thus, on the topic of sentence construction, a diagnostic
examination may reveal the difficulties encountered by the students in matching subject and verb
or identifying subject and predicate, in vocabulary etc. This function of assessment is akin to a
medical doctor trying to perform laboratory tests to determine a patient’s illness or disease.
4

As in a medical doctor, a teacher needs to be extra careful in making diagnoses of learning


difficulties. Accurate diagnosis leads to appropriate teaching measures and intervention programs
but a single misdiagnosis can also have potentially disastrous consequences. For instance, a pupil
who is unable to read and unable to score well in a test for grammar may not actually have
difficulties in the language per se but could, in fact, be suffering from physical disabilities like
dyslexia. Consequently, the teacher may provide for tutorial sessions believing that the student may
just be a slow learner or having language difficulties thus wasting precious resources to no avail.

Formative Assessment. Another purpose of assessment is formative. In this role, assessment


guides the teacher on his/her day-to-day teaching activity. Should a topic be taught again? Should
there be more drills and exercises? In the context of teaching-learning situation, the formative
value of assessment is perhaps the most important. It allows the teacher to redirect and refocus the
course of teaching a subject matter.

Placement. The final role in assessment in curricular decisions concerns placement.


Assessment plays a vital role in determining the appropriate placement of a student both in terms
of achievement and aptitude. Aptitude refers to the area or discipline where a student would most
likely excel or do well. Thus, an aptitude test determines if a student would do well in scientific or
humanities courses; in technical-vocational or academic courses etc. Placement examinations also
determine if a student’s ability is equivalent to, say, a typical 3rd year or 4th year high school
student. The Department of Education has this type of placement examination and it used such as
examination to calibrate the placement of such personalities such as many Pacquiao and other
famous personalities on TV or widescreen.
The defunct NCEE (National College Entrance Examination) was an academic placement
examination.

1.3 Evaluation of Learning and Programs

Evaluation models are important in the context of education. Evaluation implies that measurements
and assessments of an educational characteristic had been done and that it is now desired to pass on
value judgment on the educational outcome. In evaluating an outcome, we consider the objectives
of the educative process and analyze whether the outputs and outcomes satisfy these objectives, if
they do not, then we need to find the possible reasons for our failure to meet such objectives. The
possible reasons can, perhaps, be identified from the context, inputs, process and outputs of the
educational system. Figure 1 illustrates these ideas:

→ →
CONTEXT INPUTS → PROCESS → OUTPUT → OUTCOME
E

FIGURE 1
A Systems Model for Evaluation
Evaluation provides a tool for determining the extent to which an educational process or program is
effective and at the same time indicates directions for remediation processes of the curriculum that do
5

not contribute to successful student performance. To this end, evaluation enhances organizational
efficiency by providing focus for teacher and administrator efforts as well as allows resources to be
directed to areas of greatest need.

Improving student performance is inextricably linked to improvement in the inputs and processes
that shape the effectiveness of teaching and learning. Evaluation, therefore, is of greatest interest to
both teachers and administrators who plan orchestrate the entire learning activities.

According to Brainard (1996), effective program evaluation is a systematic process that focuses
on program improvement and renewal and discovering peaks of program excellence. Program
evaluation needs to be viewed as an important ongoing activity, on that goes beyond research or
simple fact-finding to inform decisions about the future shape of the program under study. Program
evaluation contributes to quality services by providing feedback from program activities and outcomes
to those who can introduce changes in the program or who decide which services are to be carried out
effectively.

Program evaluation need not be limited to evaluation of educational processes and systems.
Program evaluation is also important in many programs of government agencies. For instance,
agencies of the government undertake programs with the assistance of foreign funding agencies to
target specific social concerns poverty reduction or governance and empowerment at the local
government level. Such programs normally involve millions and it is very important that proper
evaluation be undertaken to ensure that resources invested in these programs are not wasted. In such
cases, the method called PERT (Program Evaluation Review Technique) is an indispensable
quantitative evaluation tool.

SUMMARY

Evaluation

Is the process of gathering and interpreting evidence regarding the problems and progress of
individuals in achieving desirable educational goals?

Chief Purpose of Evaluation


- The Improvement of the individual learner

Other Purposes of Evaluation


- To maintain standard
- To select students
- To motivate learning
- To guide learning
- To furnish instruction
- To appraise educational instrumentalities

Function of Evaluation
6

- Prediction
- Diagnosis
- Research

Areas of Educational Evaluation


- Achievement
- Aptitude
- Interest
- Personality

A well defined system of evaluation:

- Enable one to clarify goals


- Check upon each phase of development
- Diagnose learning difficulties
- Plan carefully for remediation

Evaluation & the Teaching-Learning Process

Teaching, Learning and Evaluation are three interdependent aspects of the educative process.
(Gronlund 1981) This interdependence is clearly seen when the main purpose of instruction is
conceived in terms of helping pupils achieve a set of learning is outcomes which include changes in
the intellectual, emotional or physical domains. Instructional objectives or in other words, desired
changes in the pupils, are brought about by planned learning activities and pupil’s progress is
evaluated by tests and other devices.

This integration of evaluation into the teaching-learning process can be seen in the following
stages of the process:

- Setting instructional objectives


- Determining pupil variables that can affect instruction
- Providing instructional activities that are relevant and necessary to achieve the desired learning
outcomes
- Determining the extent to which desired outcomes are achieved

Principles of Educational Evaluation


- Evaluation must be based on previously accepted educational objectives.
- Evaluation should be continuous comprehensive and a commutative process.
- Evaluation should recognize that the total individual personality is involved in learning.
- Evaluation should be democratic and cooperative.
- Evaluation should be positive and action-directed.
- Evaluation should be positive and action-directed
- Evaluation should give opportunity to the pupil to become increasingly independent in self-
appraisal and self-direction.
- Evaluation should include all significant evidence from every possible source.
7

- Evaluation should take into consideration the limitations of the particular educational
situations.

Measurement
- is a part of the educational evaluation process whereby some tools or instruments are used to
provide a quantitative description of the progress of students towards desirable educational
goals?

Test or Testing

- is a systematic procedure to determine the presence or absence of certain characteristics or


qualities in a learner.

Types of Evaluation

- Placement
- Formative
- Diagnostic
- Summative
(These types show that evaluation is integrated with the various phases of instruction)

Placement

- Evaluation accounts for a student’s entry behavior or performance. It determines the


knowledge and skills he possesses which are necessary at the beginning of instruction in a
given subject area.

Formative

- Evaluation provides the students with feedback regarding his success or failure in attaining
instructional objectives.
It identifies the specific learning errors that need to be corrected and provides reinforcement for
successful performance as well.
For the teacher, formative evaluation provides information for making instruction and remedial
work more effective.

Pointers in Formative Evaluation

1. There should be an achievement continuum at very level of instruction.


2. Criteria and standards are well-defined at the beginning of the learning process.
3. Criterion-referenced tests should be administered.
4. Students should always be informed of their progress
5. Enrichment/remedial/opportunities should be made available.

Diagnostic
8

- Evaluation is used to detect students’ learning difficulties which are not revealed by
informative tests or checked by remedial instruction and other instructional adjustments.
Since it discloses the underlying causes of learning difficulties, diagnostic tests are therefore more
comprehensive and detailed.

Summative

- Evaluation is concerned with what students have learned. This implies that the instructional
activity has for the most part been completed and that little correction of learning deficiencies
is possible.

Stages of Teaching- Learning in which Educational Evaluation is integrated

1. Clarifying objectives
2. Identifying variables that affect learning
3. Providing relevant instructional activities to achieve objectives
4. Determining the extent to which the objectives are achieved.

PRINCIPLES OF HIGH QUALITY ASSESSMENT

2.1 Clarity of Learning Targets

Assessment can be made precise, accurate and dependable only if what are to be achieved are
clearly stated and feasible. To this end, we consider learning targets involving knowledge,
reasoning, skills, products and effects. Learning targets need to be stated in behavioral terms which
denote something which can be observed through the behavior of the students. Thus, the objective
“to understand the concept of buoyancy” is not stated in behavioral terms. It is not clear how one
measures “understanding”. On the other hand, if we restate the target as “to determine the volume
of water displaced by a given object submerged”, then we can easily measure the extent to which a
student understands “buoyancy”.

2.1.1 Cognitive Targets

As early as the 1950’s Bloom (1954), proposed a hierarchy of educational objectives at the
cognitive level. These are:
9

Level 1. KNOWLEDGE which refers to the acquisition of facts, concepts and theories. Knowledge
of historical facts like the date of the EDSA revolution, discovery of the Philippines or of
scientific concepts like the scientific name of milkfish, the chemical symbol of argon etc.
all fall under knowledge.

Knowledge forms the foundation of all other cognitive objectives for without
knowledge; it is not possible to move up to the next higher level of thinking skills in the
hierarchy of educational objectives.

Level 2. COMPREHENSION refers to the same concept as “understanding”. It is a step higher


than mere acquisition of facts and involves a cognition or awareness of the
interrelationships of facts and concepts.

EXAMPLE: The Spaniards ceded the Philippines to the Americans 1898 (knowledge of
facts). In effect, the Philippines declared independence from the Spanish rule only to be
ruled by yet another foreign power, the Americans (comprehension).

Level 3. APPLICATION refers to the transfer of knowledge from one field of study to another or
from one concept to another concept in the same discipline.

EXAMPLE: The classic experiment Pavlov on dogs showed that animals can be
conditioned to respond in a certain way to certain stimuli. The same principle can be
applied in the context of teaching and learning on behavior modification for school
children.

Level 4. ANALYSIS refers to the breaking down of a concept or idea into its components and
explaining the concept as a composition of these concepts.

EXAMPLE: Poverty in the Philippines, particularly at the barangay level, can be traced
back to the low income levels of families in such barangays and the propensity for large
household with an average of about 5 children per family. (Not: Poverty is analyzed in the
context of income and number o children).

Level 5. SYNTHESIS refers to the opposite of analysis and entails putting together the
components in order to summarize the concept.

EXAMPLE: The field of geometry is replete with examples of synthetic lessons. From
the relationship of the parts of a triangle for instance, one can deduce that the sum of the
angles of a triangle is 180 degrees. (Padua, Roberto and Rosita G. Santos. (1997)
“Educational Evaluation and Measurement” Quezon City: Katha Publishing) pp. 21-22.

Level 6. EVALUATION AND REASONING refers to valuing and judgment or putting the
“worth” of a concept or principle.

2.1.2. Skills, Competencies and Abilities Targets


10

Skills refer to specific activities or tasks that a student can proficiently do e.g. skills in coloring,
language skills. Skills can be clustered together to form specific competencies characterize a student’s
ability in order that he program of study can be designed as to optimized his/her innate abilities.

Abilities can be roughly categorized into: cognitive, psychomotor and affective abilities. For
instance, the ability to work well with others and to be trusted by every classmate (affective ability) is
an indication that the student can most likely succeed in work that requires leadership abilities. On the
other hand, other student are better at doing things alone like programming and web designing
(cognitive ability) and, therefore, they would be good at highly technical individualized work.

2.1.3. Product, Outputs and Projects Targets

Products, outputs and projects are tangible and concrete evidence of as student’s ability. A
clear target for products and projects need to clearly specify the level of workmanship of such projects
e.g. expert level, skilled level of workmanship of such projects e.g. expert level, skilled level or novice
level output can be characterized by the indicator “at most four (4) imperfections noted” etc.

2.2. Appropriateness of Assessment Methods

Once the learning targets are clearly set, it is now necessary to determine and appropriate
assessment procedure or method. We discuss the general categories of assessment methods or
instruments below.

2.2.1 Written-Response Instruments

Written – response instruments include objective (multiple choices, true false, matching or
short answer) tests, essays, examinations and checklists. Objectives tests are appropriate for assessing
the various levels of hierarchy of educational objectives. Multiple choice tests in particular can be
constructed in such a way as to tests in particular can be constructed in such a way as to test higher
thinking skills. Essays, when properly planned, can test the student’s grasp of the higher level
cognitive skills particularly in the areas of application analysis, synthesis and judgment. However,
when the essay question is not sufficiently precise and when the parameters are not properly defined,
there is a tendency for the students to write irrelevant and unnecessary things just to fill in blank
spaces. When this happens, both the teacher and the students will experience difficulty and frustration.

EXAMPLE: (POOR) write an essay about the first EDSA revolution.


(BETTER) write an essay about the first EDSA revolution and their respective
roles.

In the second essay question, the assessment foci are narrowed down to: (a) the main characters
of the event, and (b.) the roles of each character in the revolution leading to the ouster of the incumbent
President at that time. It becomes clear what the teacher wishes to see and what the students are
supposed to write.

2.2.2 Product Rating Scales


11

A teacher is often tasked to rate products. Examples of products that are frequently rated in
education are book reports, maps, charts, diagrams, notebooks, essays and creative endeavors of all
sorts. An example of a product rating scale is the classic “handwriting” scale used in the California
Achievement Test, Form W (1957). There are prototype handwriting specimens of pupils and students
(of various grades and ages). The sample handwriting of a student is then moved along the scale until
the quality of the handwriting sample is most similar to the prototype handwriting. To develop a
product rating scale for the various products in education, the teacher must possess prototype products
over his/her years of experience.

2.2.3 Performance Tests

One of the most frequently used measurement instruments is the checklist. A performance
checklist consists of a list of behaviors that make up a certain type of performance (e.g. using a
microscope, typing a letter, solving a mathematics performance and so on.). It is used to determine
whether or not an individual behaves in a certain (usually desired) way when asked to complete a
particular task. If a particular behavior is present when an individual is observed, the teacher places a
check opposite it on the list.

EXAMPLE: (Performance Checklist in Solving a mathematics problem)


Behavior:
1. Identifies the given information____
2. Identifies what is being asked____
3. Uses variables to replace the unknown____
4. S
5. Performs algebraic operations____
6. Obtains an answer____
7. Checks if the answer makes sense____

2.2.4 Oral Questioning

The traditional Creeks used oral questioning extensively as assessment method. Socrates himself,
considered the epitome of a teacher, was said to have handled his classes solely based on questioning
and oral interactions.

Oral questioning is an appropriate assessment method when the objectives are: (a) to assess the
student’s stock knowledge and/or (b) to determine the student’s ability to communicate ideas in
coherent verbal sentences. While oral questioning is indeed an option for assessment, several factors
need to be considered when using this option. Of particular significance are the student’s state of mind
and feelings, anxiety and nervousness in making oral presentations which could mask the student’s
true ability.

2.2.5 Observation and Self Reports

A tally sheet is a device often used by teachers to record the frequency of student behaviors,
activities or remarks. How many high school students follow instructions during fire drill, for
example? How many instances of aggression or helpfulness are observed when elementary students
12

are observed in the playground? In the class of Mr. Sual in elementary statistics, how often do they ask
questions about inference? Observational tally sheets are most useful in answering these kinds of
questions.

A self-checklist is a list of several characteristics or activities presented to the subjects of a study.


The individuals are asked to study the list and then to place a mark opposite the characteristics which
they possess or the activities which they have engaged in fir a particular length of time. Self-checklists
are often employed by teachers when they want to diagnose or to appraise the performance of students
from the point of view of the students themselves.

Observation and self-reports are useful supplementary assessment methods when used in
conjunction with oral questioning and performance tests. Such methods can offset the negative impact
on the students brought about by their fears and anxieties during oral questioning or when performing
actual task under observation. However, since there is a tendency to overestimate one’s capability, it
may be useful to consider weighing self-assessment and observational reports against the results of
oral questioning and performance tests.

2.3 Properties of Assessment Methods

The quality of the assessment instrument and method used in education is very important since
the evaluation and judgments that the teacher gives on a student are based on the information he
obtains using these instruments. Accordingly, teachers follow a number of procedures to ensure that
the assessment process is valid and reliable.

Validity had traditionally been defined as the instrument’s ability to measure what it purports to
measure. We shall learn in this section that the concept has, of recent, been modified to accommodate
a number of concerns regarding the scope of this traditional definition. Reliability, on the other hand,
is defined as the instrument’s consistency.

2.3.1 Validity

Validity, in recent years, has been defined as referring to the appropriateness, correctness,
meaningfulness and usefulness of the specific conclusions that a teacher reaches regarding the
teaching-learning situation. Content-validity refers to the content and format of the instrument. How
appropriate is the content? How comprehensive? Does the instrument logically get the intended
variable or factor? How adequately does the sample of items or questions represent the content to be
assessed? Is the format appropriate? The content and format must be consistent with the definition of
the variable or factor to be measured. Some criteria for judging content validity are given as follows:

1.Do students have adequate experience with the type of task posed by the item?
2.Did the teachers cover sufficient material for most students to be able to answer the item
correctly?
3. Does the item reflect the degree of emphasis received during instruction?

With these as guide, a content validity table may be constructed in two (2) forms as provided
below:
13

FORM A: ITEM VALIDITY


Criteria I T E M N O.
1 2 3 4 5 6
1. Material covered sufficiently
2. Most students are able to
answer item correctly
3. Students have prior
experience with the type pf task.
4. Decision: Accept or reject

FORM B: ENTIRE TEST


Knowledge/Skills Area Estimated Percent of Percentage of
Instruction Items
Covered in
Test
1. Knowledge
2. Comprehension
3. Application
4. Analysis
5. Synthesis
6. Evaluation

Based on Form B, adjustments in the number of items that relate to a topic can be made
accordingly.

While content validity is important, there are other types of validity that one needs to verify. Face
validity refers to the outward appearance of the test. It is the lowest form of the test validity. A more
important type of validity is called criterion-related validity. In criterion related validity, the test item
is judged against a specific criterion e.g. relevance to a topic like the topic on conservation, for
example. The degree to which the item measures the criterion is said to constitute its criterion validity.
Criterion validity can also be measured by correlating the test with a known valid test (as a criterion).
Finally, a test needs to possess construct validity. A “construct” is another term for a factor, and we
already know that a group of variables that correlate highly with each other form a factor. It follows
that an item possesses construct validity if it loads highly on a given construct or factor. A technique
called factor analysis is required to determine the construct validity of an item. Such technique is
beyond the scope of this book.

2.3.2 Reliability

The reliability of an assessment method refers to its consistency. It is also a term that is
synonymous with dependability or stability.
14

Stability or internal consistency as reliability measures can be estimated in several ways. The
Split-half method involves scoring two halves (usually, odd items versus even items) of a test
separately for each person and then calculating a correlation coefficient for the two sets of scores. The
coefficient indicates the degree to which the two halves of the test provide the same results and hence,
describes the internal consistency of the test. The reliability of the test is calculated using what is
known as the Spearman-Brown prophecy formula:

Reliability of test = (2 x rhalf)/ (1 + rhalf).

Where rhalf = reliability of half of the test

The Kuder-Richardson is the more frequently employed formula for determining internal
consistency, particularly KR2O and KR21. We present the latter formula since KR2O is more difficult
to calculate and requires a computer program:

KR21 = K / (K-1) [1 – {n(K-M)}/K(Variance)]

where K = number of items on the test, M = mean of the test, Variance = variance of the test
scores.

The mean of a set of scores is simply the sum of the scores divided by the number of scores; its
variance is given by:

Variance = Sum of differences of individual scores and mean/ n-1

where n is the number of test takers.

Reliability of a test may also mean the consistency of test results when the same test is
administered at two different time periods. This is the test-retest method of estimating reliability. The
estimate of test reliability is then given by the correlation of the two test results.

2.3.3 Fairness

An assessment procedure needs to be fair. This means many things. First, students need to know
exactly what the learning targets are and what method of assessment will be used. If students do not
know what they are supposed to be achieving then they could lost in the maze of concepts being
discussed in class. Likewise, students have to be informed how their progress will be assessed in order
to allow them to strategize and optimize their performance.

Second, assessment has to be viewed as an opportunity to learn rather than an opportunity to


weed out poor and slow learners. The goal should be that of diagnosing the learning process rather
than the learning object.

Third, fairness also implies freedom from teacher-stereotyping. Some examples of stereotyping
include: boys are better than girls in Mathematics or girls are better than boys in language. Such
15

stereotyped images and thinking could lead to unnecessary and unwanted biases in the way that
teachers assess their students.

2.3.4 Practicality and Efficiency

Another quality of a good assessment procedure is practicality and efficiency. An assessment


procedure should be practical in the sense that the teacher should be familiar with it, does not require
too much time and is in fact, implementable. A complex assessment procedure tends to be difficult to
score and interpret resulting in a lot of misdiagnosis or too long a feedback period which may render
the test inefficient.

2.3.5 Ethics in Assessment

The term “ethics” refers to questions of right and wrong. When teachers think about ethics, they
need to ask themselves if it is right to assess a specific knowledge or investigate a certain question. Are
there some aspects of the teaching-learning situation that should not be assessed? Here are some
situations in which assessments may not be called for:
 Requiring students to answer checklists of their sexual fantasies;
 Asking elementary pupils to answer sensitive questions without consent of their parents;
 Testing the mental abilities of pupils using an instrument whose validity and reliability are
unknown;

When a teacher thinks about ethics, the basic question to ask in this regard is: “Will any physical
or psychological harm come to any one as a result of the assessment or testing?” Naturally, no teacher
would want this to happen to any of his/her student.

Webster defines ethical (behavior) as ‘conforming to the standards of conduct of a given


profession or group”. What teachers consider ethical is therefore largely a matter of agreement among
them. Perhaps, the most important ethical consideration of all is the fundamental responsibility of a
teacher to do all in his or her power to ensure hat participants in an assessment program are protected
from physical or psychological harm, discomfort or danger that may arise due to the testing procedure.
For instance, a teacher who wishes to test a student’s physical endurance may ask students to climb a
very steep mountain thus endangering them physically!

Test results and assessment results are confidential results. Such should be known only by the
student concerned and the teacher. Results should be communicated to the students in such a way that
other students would not be in possession of information pertaining to any specific member of the
class.

The third ethical issue in assessment is deception. Should students be deceived? There are
instances in which it is necessary to conceal the objective of the assessment from the students in order
to ensure fair and impartial results. When this is the case, the teacher has a special responsibility to (a)
determine whether the use of such techniques is justified by the educational value of the assessment,
(b) determine whether alternative procedures are available that does not make use of concealment and
(c) ensure that students are provided with sufficient explanation as soon as possible.
16

Finally, the temptation to assist certain individuals in class during assessment or testing is ever
present. In this case, it is best if the teacher does not administer the test himself if he believes that such
a concern may, at a later time, be considered unethical.

DEVELOPMENTS OF ASSESSMENT TOOLS:


KNOWLEDGE AND REASONING
Types of Objective Tests
We are concerned with developing objective tests for assessing the attainment of educational
objectives based on Bloom’s taxonomy in this Chapter. For this purpose, we restrict our attention to
the following types of objective tests: (a) true-false items, (b) multiple-choice type items, (c) matching
types, (d) enumeration and filling of blanks and (e) essays. The first four types of objective tests are
used to test the first four to five levels of the hierarchy of educational objectives while the last (essay)
is used for testing higher order thinking skills.
Development of objective tests requires careful planning and expertise in terms of actual test
construction. The more seasoned teachers can produce true-false items that can test even higher order
thinking skills and not just mere rote memory learning. Essays are easier to construct than the other
types of objective tests but the difficulty with which objective grades are derived from essay
examinations often discourage teachers from using this particular form of examination in actual
practice.

3.2 Planning a Test and Construction of Table of Specifications (TOS)


The important steps in planning for a test are:
 Identifying test objectives
 Deciding on the type of objective test to be prepared
 Preparing a Table of Specifications (TOS)
 Constructing the draft test items
 Try-out and validation

We will discuss these steps in more detail below:

Identifying Test Objectives. An objective test, if it is to be comprehensive, must cover the


various levels of Bloom’s taxonomy. Each objective consists of a statement of what is to be achieved
and, preferably, by how many percent of the students.
Example. We want to construct a test on a topic:
“Subject-Verb Agreement in English” for a Grade V class. The following are typical objectives:
Knowledge. The students must be able to identify the subject and the verb in a given sentence.
Comprehension. The students must be able to determine the appropriate form of a verb to be
used given the subject of a sentence.
Application. The students must be able to write sentences observing rules to be followed
regarding subject-verb agreement.
17

Analysis. The students must be able to break down a given sentence into its subject and
predicate.
Synthesis. The students must be able to formulate rules to be followed regarding subject-verb
agreement.
Deciding on the type of objective test. The test objectives dictate the kind of objective tests that
will be designed and constructed by the teacher. For instance, for the first four (4) levels, we may want
to construct a multiple-choice type of test while for application and judgment, we may opt to give an
essay test or a modified essay test.
Preparing a table of specifications (TOS). A table of specifications or TOS is a test map that
guides the teacher in constructing a test. The TOS ensures that there is a balance between items that
test lower level thinking skills and those which test higher order thinking skills (or alternatively, a
balance between easy and difficult items) in the test. The simplest TOS consist of four (4) columns: (a)
level of objective to be tested, (b) statement of objective, (c) item numbers where such an objective is
being tested, and (d) Number of items and percentage out of the total for that particular objective. A
prototype table is shown below:

Table of Specifications Prototype


LEVEL OBJECTIVE ITEM NUMBERS NO. %
1. Knowledge Identify subject-verb 1,3,5,7,9 5
2. Comprehension Forming appropriate 2,4,6,8,10 5
verb forms
3. Application Determining subject and 11,13,15,17,19 5
predicate
4. Analysis Formulating rules on 12, 14, 16, 18, 20 5
agreement
5. Synthesis Writing of sentences Part II 10pts
observing rules on
subject-verb agreement
TOTAL 30

In the table of specification above, we see that there are five items that deal with knowledge and
these items are items 1, 3, 5, 7, 9. Similarly, from the same table we see that five items represent
synthesis, namely: 12, 14, 16, 18, 20. The first four levels of Bloom’s taxonomy are equally
represented in the test while application (tested through essay) is weighted equivalent to ten (10) points
or double the weight given to any of the first four levels. The table of specifications guides the teacher
in formulating the tests. As we can see, the TOS also ensures that each of the objectives in hierarchy of
educational objectives is well represented in the test. As such, the resulting test that it will be
constructed by the teacher will be more or less comprehensive. Without the table of specification, the
tendency for the test maker is to focus too much on facts and concepts at the knowledge level.
Constructing the test items. the actual construction of the test items follow the TOS. As a general
rule, it is advised that the actual number of items to be constructed in the draft double the desired
number of items, For instance, if there are five (5) knowledge level items to be included in the final
test form, then at least ten (10) knowledge level items should be included in the draft. The subsequent
test try-out and item analysis will most likely eliminate many of the constructed items in the draft
18

(either they are too difficult, too easy or non-discriminatory), hence, it will be necessary to construct
more items than will actually be included in the final test form.
Item analysis and try-out. The test draft is tried out to a group of pupils or students. The purpose
of this try-out is to determine the: (a) item characteristics through item analysis, and (b) characteristics
of the test itself-validity, reliability, and practicality.

3.3 Constructing a True-False Test


Binomial-choice tests are tests that have only two (2) options such as true or false, right or
wrong, good or better and so on. A student who knows nothing of the content of the examination
would have 50% chance of getting the correct answer by sheer guess work. Although correction-for-
guessing formulas exist, it is best that the teacher ensures that a true-false item is able to discriminate
properly between those who know and those who are just guessing. A modified true-false test can
offset the effect of guessing by requiring students to explain their answer and to disregard a correct
answer if the explanation is incorrect. Here are some rules of thumb in constructing true-false items.
Rule 1. Do not give a hint (inadvertently) in the body of the question.
Example. The Philippines gained its independence in 1898 and therefore celebrated its
centennial year in 2000.____

Obviously, the answer is FALSE because 100 years from 1898 is not 2000 but
1998.
Rule 2. Avoid using the words “always”, “never”, “and often” and other adverbs that tend to be
either always true or always false.
Example: Christmas always falls on a Sunday because it is a Sabbath day._______.
Statements that use the word “always” are almost always false. A test-wise
student can easily guess is way through a test like these and get high scores even if he
does not know anything about the test.
Rule 3: Avoid log sentences as these tend to be “true” Keep sentences short.
Example: tests need to be availed, reliable and useful, although, it would require a great
amount of time and effort to ensure that tests possess this test
characteristic.______
Notice that the statement is true. However, we are also not sure
which part of the sentence is deemed true by the student. It is just fortunate that in this
case, all parts of the above sentence are true. Te following example illustrates what can
go wrong in long sentences:
Example: tests need to be valid, reliable and useful since it takes very little amount of
time, money and effort to construct tests with these characteristics.______
The first part of the sentence is true but the second part is debatable and may, in
fact, be false. Thus, a “true” response is correct and also, a “false” response is correct.
Rule 4. Avoid trick statements with some minor misleading word or spelling anomaly, misplaced
phrases etc. a wise student who does not know the subject matter may detect this strategy
and thus get the answer correctly.
Example: True or False. The Principle of our school is Mr. Albert P. Panadero.
The principal’s name may actually be correct but since the word is, misspelled
and the entire sentence takes a different meaning, the answer would be false!
This is an example of a tricky but utterly useless item.
19

Rule 5. Avoid quoting verbatim from reference materials or textbooks. This practice sends the
wrong signal to the students that it is necessary to memorize the textbook word for word
and thus, acquisition of higher level thinking skills are not given due importance.

Rule 6. Avoid specific determiners or give-away qualifiers. Students quickly learn that strongly
worded statements are more likely to be false than true, for example, statements with
“never” “no” “all” or “always. Moderately worded statements are more likely to be true
than false. Statements with “many” “often” “sometimes” “generally”
”frequently” or “some” should be avoided.

Rule 7. With true or false questions, avoid a grossly disproportionate number of either true or
false statements or even patterns in the occurrence of true and false statements.

3.4 Multiple Choice Tests


A generalization of the true-false test, the multiple choice type of test offers the student with
more than two (2) options per item to choose from. Each item in a multiple choice test consists of two
parts: (a) the stem, and (b) the options. In the set of options, there is a “correct” or best” option while
all the others are considered “distracters”. The distracters are chosen in such a way that they are
attractive to those who do not know the answer or are guessing but at the same time, have no appeal to
those who actually know the answer. It is this feature of multiple choice type tests that allow the
teacher to test higher order thinking skills even if the options are clearly stated. As in true-false items,
there are certain rules of thumb to be followed in constructing multiple choice tests.
.
Guidelines for Constructing Multiple Choice Items
1. Do not use unfamiliar words, terms and phrases. The ability of the item to discriminate or its level
or difficulty should stem from the subject matter rather than from the wording of the question.
Example: what would be the system reliability or a computer system whose slave and peripherals
are connected in parallel circuits and each one has a known time to failure probability
of 0.05?
A student completely unfamiliar with the terms “slave” and “peripherals” may not be
able to answer correctly even if he knew the subject matter of reliability.
2. Do not use modifiers that are vague and whose meanings can differ from one person to the next
such as much, often, usually etc.

Example:
Much of the process of photosynthesis takes place in the:
a. bark
b. leaf
c. c. stem
The qualifiers “much” is vague and could have been replaced by more specific qualifiers
like:” 90% of the photosynthetic process” or some similar phrase that would be more precise.

3. Avoid complex or awkward word arrangements. Also, avoid use of negatives in the stem as this may
add unnecessary comprehension difficulties.
Example:
20

(Poor) As President of the Republic of the Philippines, Corazon Cojuangco Aquino would stand
next to which President of the Philippines Republic subsequent after Corazon C. Aquino?
4. Do not use negatives or double negatives as such statements tend to be confusing. It is best to use
simpler sentences rather than sentences that would require expertise in grammatical construction.
Example:
(Poor) Which of the following will not cause inflation in the Philippine economy?
(Better) Which of the following will cause inflation in the Philippine economy?
Poor: What does the statement “Development patterns acquired during the formative years are
NOT Unchangeable” imply?
A.
B.
C.
D.
Better: What does the statement “Development patterns acquired during the formative years are
changeable” imply?
A.
B.
C.
D.
5.) Each item stem should be as short as possible; otherwise you risk testing more for reading and
comprehension skills.
6.) Distracter should be equally plausible and attractive.
Example:
The short story: May Day’s Eve, was written by which Filipino author?
a. Jose Garcia Villa
b. Nick Joaquin
c. Genoveva Edrosa Matute
d. Robert Frost
e. Edgar Allan Poe
If distracters had all been Filipino authors, the value of the item would be greatly increased. In this
particular instance, only the first three carry the burden of the entire item since the last two can be
essentially disregarded by the students.
7.) All multiple choice options should be grammatically consistent with the stem.
8.) The length, explicitness, or degree of technicality of alternatives should not be the determinants of
the correctness of the answer. The following is an example of this rule:
Example:
If the three angles of two triangles are congruent, then the triangles are:
a. congruent whenever one of the sides of the triangles are congruent
b. similar
c. equiangular and therefore, must also be congruent
d. equilateral if they are equiangular
The correct choice, “b” may be obvious from its length and explicitness alone. The other
choices are long and tend to explain why they must be the correct choices forcing the students to
think that they are, in fact, not the correct answers!
9.) Avoid stems that reveal the answer to another item.
10.) Avoid alternatives that are synonymous with others or those that include or overlap others.
21

Example:
What causes ice to transform from solid state to liquid state?
a. Change in temperature
b. Changes in pressure
c. Change in the chemical composition
d. Change in heat levels
The options a and d are essentially the same. Thus, a student who spots these identical choices
would right away narrow down the field of choices to a, b, c. the last distracter would play no
significant role in increasing the value of the item.
11.) Avoid presenting sequenced items in the same order as in the text.
12.) Avoid use of assumed qualifiers that many examinees may not be aware of.
13.) Avoid use of unnecessary words or phrases. Which are not relevant to the problem at hand (unless
such discriminating ability is the primary intent of the evaluation). The item’s value is
particularly damaged if the unnecessary material is designed to distract or mislead. Such items
test the student’s reading comprehension rather than knowledge of the subject matter.
Example: The side opposite the thirty degree angle in a right triangle is equal to half the length of
the hypotenuse. If the sine of a 30-degree is 0.5 and its hypotenuse is 5, what is the length of the side
opposite the 30-degree angle?
a. 2.5
b. 3.5
c. 5.5
d. 1.5
The sine of a 30-degree is really quite unnecessary since the first sentence already gives the
method for finding the length of the side opposite the thirty-degree angle. This is a case of a
teacher who wants to make sure that no student in his class gets the wrong answer!
14.) Avoid use of non-relevant sources of difficulty such as requiring a complex calculation when only
knowledge of a principle is being tested.
Note in the previous example, knowledge of the sine of a 30-degree angle would have led some
students to use the sine formula for calculation even if a simpler approach would have sufficed.
15.) Avoid extreme specificity requirements in responses.
16.) Include as much of the item as possible in the stem. This allows less repetition and shorter choice
options.
17.) Use the “None of the above” option only when the keyed answer is totally correct. When choice of
the “best” response is needed, “none of the above” is not appropriate, since the implication has
already been made that the correct response may be partially inaccurate.
18.) Note that use of “all of the above” may allow credit for partial knowledge. In a multiple option
item, (allowing only one option choice) if a student only knew that two (2) options were correct,
he could then deduce the correctness of “all of the above”. This assumes you are allowed only
one correct choice.
19.) Having compound response choices may purposefully increase difficulty of an item.
20.) The difficulty of a multiple choice item may be controlled by varying the homogeneity or degree of
similarity of responses. The more homogeneous, the more difficult the item.

Example:
(Less Homogeneous)
Thailand is located in:
22

a. Southeast Asia
b. Eastern Europe
c. South America
d. East Africa
e. Central America

(More Homogeneous)
Thailand is located next to:
a. Laos and Kampuchea
b. India and China
c. China and Malaya
d. Laos and China
e. India and Malaya

3.5 Matching Type and Supply Items


The matching type items may be considered as modified multiple choice type items where
the choices progressively reduce as one successfully matches the items on the left with the items on
the right.
Example: Match the items in column A with the items in column B.

A B
___1. Magellan a. First President of the Republic
___2. Mabini b. National Hero
___3. Rizal c. Discovered the Philippines
___4. Lapu-Lapu d. Brain of Katipunan
___5. Aguinaldo e. The great painter
f. Defended Limasawa Island
Normally, column B will contain more items than column A to prevent guessing on the part
of the students. Matching type items, unfortunately, often test lower order thinking skills
(knowledge level) and are unable to test higher order thinking skills such as application and
judgment skills.
A variant of the matching type items is the data sufficiency and comparison type of test
illustrated below:
Example: Write G if the item on the left is greater the item on the right; L if the item on the
left is less than the item on the right; E if the item on the left equals the item on the right and D if
the relationship cannot be determined.

A B
1. Square root of 9 ______ a. -3
2. Square of 25 ______ b. 615
3. 36 inches ______ c. 3 meters
4. 4 feet ______ d. 48 inches
5. 1 kilogram ______ e. 1 pound
23

The data sufficiency test above can, if properly constructed, test higher order thinking skills.
Each item goes beyond simple recall of facts and, in fact, requires the student to make decisions.
Another useful device for testing lower order thinking a skill is the supply type if tests. Like
the multiple choice test, the items in this kind of test consist of a stem and a blank where the
students would write the correct answer.
Example: The study of life and living organisms is called______________.
Supply type tests depend heavily on the way that the stems are constructed. These tests allow
for one and only one answer and, hence, often test only the students’ knowledge. It is, however,
possible to construct supply type of tests that will test higher order thinking as the following
example will show:

Example: Write an appropriate synonym for each of the following. Each blank corresponds to
a letter:
Metamorphose: _ _ _ _ _ _
Flourish: _ _ _ _
The appropriate synonym for the first is CHANGE with six(6) letters while the appropriate
synonym for the second is GROW with four(4) letter. Notice that these questions require not
only mere recall of words but also understanding of these words.

3.6 Essays
Essays, classified as non-objective tests, allow for the assessment of higher order thinking skills.
Such tests require students to organize their thoughts on a subject matter in coherent sentences in order
to inform an audience. In essay tests, students are requested to write one or more paragraphs on a
specified topic.
Essay questions can be used to measure attainment for a variety of objectives. Stecklein (1955)
has listed 14 types of abilities that can be measured by essay items:
1. Comparisons between two or more things
2. The development and defense of an opinion
3. Questions of cause and effect
4. Explanations of meanings
5. Summarizing of information in a designed area
6. Analysis
7. Knowledge of relationships
8. Illustrations of rules, principles, procedure, and applications
9. Applications of rules, laws, and principles to new situations
10. Criticisms of the adequacy, relevance, or correctness of a concept, idea, or information
11. Formulation of new questions and problems
12. Reorganization of facts
13. Discriminations between objects, concepts, or events
14. Inferential thinking
Note that all these involve the higher-level skills mentioned in Bloom’s Taxonomy.
The following are rules of thumb which facilitate the grading of essay papers:
Rule 1: Phrase the direction in such a way that students are guided on the key concepts to be
included.
Example: Write an essay on the topic: “Plant Photosynthesis” using the following keywords and
phrases: chlorophyll, sunlight, water, carbon dioxide, oxygen, by-product, stomata.
24

Note that the students are properly guided in terms of the keywords that the teacher is
looking for in this essay examination. An essay such as the one given below will get a
score of zero (0). Why?

Plant Photosynthesis
Nature has its own way of ensuring the balance between food producers and
consumers. Plants are considered producers of food for animals. Plants produce food
for animals through a process called photosynthesis. It is a complex process that
combines various natural elements on earth into the final product which animals can
consume in order to survive. Naturally, we all need to protect plants so that we will
continue to have food on our table. We should discourage burning of grasses, cutting of
trees and illegal logging. If the leaves of plants are destroyed, they cannot perform
photosynthesis and animals will also perish.
Rule 2: Inform the students on the criteria to be used for grading their essays. This rule allows
the students to focus on relevant and substantive materials rather than on peripheral
and unnecessary facts and bits of information.
Example: Write an essay on the topic: “Plant Photosynthesis” using the keywords indicated. You
will be graded according to the following criteria: (a) coherence, (b) accuracy of
statements, (c) use of keywords, (d) clarity and (e) extra points for innovative
presentation of ideas.
Rule 3: Put a time limit on the essay test.
Rule 4: Decide on your essay grading system prior to getting the essays of your students.
Rule 5: Evaluate all of the students’ answer to one question before proceeding to the next
question.
Scoring or grading essay tests question by question, rather than student by student,
makes it possible to maintain a more uniform standard for judging the answers to each
questions. This procedure also helps offset the halo effect in grading. When all of the
answers on one paper are read together, the grader’s impression of the paper as a whole
is apt to influence the grades he assigns to the individual answers. Grading question by
question, of course, prevents the formation of this overall impression of the student’s
paper. Each answer is more apt to be judged on its own merits when it is read and
compared with other answers to the same question, than when it is read and compared
with other answers by the same student.
Rule 6: Evaluate answers to essay questions without knowing the identity of the writer. This is
another attempt to control personal bias during scoring. Answers to essay questions
should be evaluated in terms of what is written, not in terms of what is known about the
writers from other contracts with them. The best way to prevent our prior knowledge
from influencing our judgment is to evaluate each answer without knowing the identity
of the writer. This can be done by having the students write their names on the back of
the paper or by using code numbers in place of names.
Rule 7: Whenever possible, have two or more persons grade each answer. The best way to
check on the reliability of the scoring of essay answers is to obtain two or more
independent judgments. Although this may not be a feasible practice for routine
classroom testing, it might be done periodically with a fellow teacher (one who is
equally competent in the area). Obtaining two or more independent ratings becomes
especially vital where the results are to be used for important and irreversible decisions,
25

such as in the selection of students for further training or for special awards. Here the
pooled ratings of several competent persons may be needed to attain level of reliability
that is commensurate with the significance of the decision being made.
Some teachers use the cumulative criteria i.e. adding the weights given to each
criterion, as basis for grading while others use the reverse. In the latter method, each
student begins with a score of 100. Points are then deducted every time a teacher
encounters a mistake or when a criterion is missed by the student in his essay.

Benjamin Bloom's Taxonomy of Behavioral


Objectives

In 1956, Benjamin Bloom headed a group of educational psychologists who developed a classification
of levels of intellectual behavior important in learning. This became a taxonomy including three
overlapping domains; the cognitive, affective and psychomotor.

Cognitive learning is demonstrated by knowledge recall and the intellectual skills: comprehending
information, organizing ideas, analyzing and synthesizing data, applying knowledge, choosing among
alternatives in problem-solving, and evaluating ideas or actions. This domain on the acquisition and
use of knowledge is predominant in the majority of courses. Bloom identified six levels within the
cognitive domain, from the simple recall or recognition of facts, as the lowest level, through
increasingly more complex and abstract mental levels, to the highest order which is classified as
evaluation. Verb examples that represent intellectual activity on each level are listed here, and each
level is linked to questions appropriate to the level.

1. Knowledge: arrange, define, duplicate, label, list, memorize, name, order, recognize, relate,
recall, repeat, reproduce state.
2. Comprehension: classify, describe, discuss, explain, express, identify, indicate, locate,
recognize, report, restate, review, select, translate,
3. Application: apply, choose, demonstrate, dramatize, employ, illustrate, interpret, operate,
practice, schedule, sketch, solve, use, write.
4. Analysis: analyze, appraise, calculate, categorize, compare, contrast, criticize, differentiate,
discriminate, distinguish, examine, experiment, question, test.
5. Synthesis: arrange, assemble, collect, compose, construct, create, design, develop, formulate,
manage, organize, plan, prepare, propose, set up, write.
6. Evaluation: appraise, argue, assess, attach, choose compare, defend estimate, judge, predict,
rate, core, select, support, value, evaluate.

Affective learning is demonstrated by behaviors indicating attitudes of awareness, interest, attention,


concern, and responsibility, ability to listen and respond in interactions with others, and ability to
demonstrate those attitudinal characteristics or values which are appropriate to the test situation and
the field of study. This domain relates to emotions, attitudes, appreciations, and values, such as
26

enjoying, conserving, respecting, and supporting. Verbs applicable to the affective domain include
accepts, attempts, challenges, defends, disputes, joins, judges, praises, questions, shares, supports, and
volunteers.

Psychomotor learning is demonstrated by physical skills; coordination, dexterity, manipulation,


grace, strength, speed; actions which demonstrate the fine motor skills such as use of precision
instruments or tools, or actions which evidence gross motor skills such as the use of the body in dance
or athletic performance. Verbs applicable to the psychomotor domain include bend, grasp, handle,
operate, reach, relax, shorten, stretch, write, differentiate (by touch), express (facially), perform
(skillfully).

 KNOWLEDGE
o remembering;
o memorizing;
o recognizing;
o identification and
o recall of information
 Who, what, when, where, how ...?
 Describe

 COMPREHENSION
o interpreting;
o translating from one medium to another;
o describing in one's own words;
o organization and selection of facts and ideas
 Retell...

 APPLICATION
o problem solving;
o applying information to produce some result;
o use of facts, rules and principles
 How is...an example of...?
 How is...related to...?
 Why is...significant?

 ANALYSIS
o subdividing something to show how it is put together;
o finding the underlying structure of a communication;
o identifying motives;
o separation of a whole into component parts
 What are the parts or features of...?
 Classify...according to...
 Outline/diagram...
 How does...compare/contrast with...?
27

 What evidence can you list for...?

 SYNTHESIS
o creating a unique, original product that may be in verbal form or may be a physical
object;
o combination of ideas to form a new whole
 What would you predict/infer from...?
 What ideas can you add to...?
 How would you create/design a new...?
 What might happen if you combined...?
 What solutions would you suggest for...?

 EVALUATION
o making value decisions about issues;
o resolving controversies or differences of opinion;
o development of opinions, judgements or decisions
 Do you agree...?
 What do you think about...?
 What is the most important...?
 Place the following in order of priority...
 How would you decide about...?
 What criteria would you use to assess...?

Learning Domains or Bloom's


Taxonomy

The Three Types of Learning


There is more than one type of learning. A committee of colleges, led by Benjamin Bloom,
identified three domains of educational activities:
28

o Cognitive: mental skills (K n o w l e d g e )


o Affective: growth in feelings or emotional areas (A t t i t u d e )
o Psychomotor: manual or physical skills (S k i l l s )

Since the work was produced by higher education, the words tend to be a little bigger than we
normally use. Domains can be thought of as categories. Trainers often refer to these three
domains as KSA (Knowledge, Skills, and Attitude). This taxonomy of learning behaviors can
be thought of as "the goals of the training process." That is, after the training session, the
learner should have acquired new skills, knowledge, and/or attitudes.

The committee also produced an elaborate compilation for the cognitive and affective
domains, but none for the psychomotor domain. Their explanation for this oversight was that
they have little experience in teaching manual skills within the college level (I guess they
never thought to check with their sports or drama department).

This compilation divides the three domains into subdivisions, starting from the simplest
behavior to the most complex. The divisions outlined are not absolutes and there are other
systems or hierarchies that have been devised in the educational and training world.
However, Bloom's taxonomy is easily understood and is probably the most widely applied
one in use today.

Cognitive (1)

The cognitive domain involves knowledge and the development of intellectual skills. This
includes the recall or recognition of specific facts, procedural patterns, and concepts that
serve in the development of intellectual abilities and skills. There are six major categories,
which are listed in order below, starting from the simplest behavior to the most complex. The
categories can be thought of as degrees of difficulties. That is, the first one must be mastered
before the next one can take place.

Category Example and Key


Words

Knowledge: Recall data Examples: Recite a policy. Quote prices


or information. from memory to a customer. Knows the
safety rules.

Key Words: defines, describes, identifies,


knows, labels, lists, matches, names,
29

outlines, recalls, recognizes, reproduces,


selects, states.

Comprehension: Examples: Rewrites the principles of test


Understand the writing. Explain in oneís own words the
meaning, translation, steps for performing a complex task.
interpolation, and Translates an equation into a computer
interpretation of spreadsheet.
instructions and
problems. State a Key Words: comprehends, converts,
problem in one's own defends, distinguishes, estimates, explains,
words. extends, generalizes, gives Examples,
infers, interprets, paraphrases, predicts,
rewrites, summarizes, translates.

Application: Use a Examples: Use a manual to calculate an


concept in a new employeeís vacation time. Apply laws of
situation or unprompted statistics to evaluate the reliability of a
use of an abstraction. written test.
Applies what was
learned in the classroom Key Words: applies, changes, computes,
into novel situations in constructs, demonstrates, discovers,
the work place. manipulates, modifies, operates, predicts,
prepares, produces, relates, shows, solves,
uses.

Analysis: Separates Examples: Troubleshoot a piece of


material or concepts into equipment by using logical deduction.
component parts so that Recognize logical fallacies in
its organizational reasoning. Gathers information from a
structure may be department and selects the required tasks
understood. for training.
Distinguishes between
facts and inferences. Key Words: analyzes, breaks down,
30

compares, contrasts, diagrams,


deconstructs, differentiates, discriminates,
distinguishes, identifies, illustrates, infers,
outlines, relates, selects, separates.

Synthesis: Builds a Examples: Write a company operations or


structure or pattern process manual. Design a machine to
from diverse elements. perform a specific task. Integrates training
Put parts together to from several sources to solve a problem.
form a whole, with Revises and process to improve the
emphasis on creating a outcome.
new meaning or
structure. Key Words: categorizes, combines,
compiles, composes, creates, devises,
designs, explains, generates, modifies,
organizes, plans, rearranges, reconstructs,
relates, reorganizes, revises, rewrites,
summarizes, tells, writes.

Evaluation: Make Examples: Select the most effective


judgments about the solution. Hire the most qualified candidate.
value of ideas or Explain and justify a new budget.
materials.
Key Words: appraises, compares,
concludes, contrasts, criticizes, critiques,
defends, describes, discriminates,
evaluates, explains, interprets, justifies,
relates, summarizes, supports.

Affective (2)

This domain includes the manner in which we deal with things emotionally, such as feelings,
values, appreciation, enthusiasms, motivations, and attitudes. The five major categories are
listed from the simplest behavior to the most complex:

Category Example and Key


Words
31

Receiving Examples: Listen to others with respect.


Phenomena: Listen for and remember the name of newly
Awareness, willingness introduced people.
to hear, selected
attention. Key Words: asks, chooses, describes,
follows, gives, holds, identifies, locates,
names, points to, selects, sits, erects,
replies, uses.

Responding to Examples: Participates in class


Phenomena: Active discussions. Gives a presentation.
participation on the part Questions new ideals, concepts, models,
of the learners. Attends etc. in order to fully understand them.
and reacts to a Know the safety rules and practices them.
particular phenomenon.
Learning outcomes may Key Words: answers, assists, aids,
emphasize compliance in complies, conforms, discusses, greets,
responding, willingness helps, labels, performs, practices, presents,
to respond, or reads, recites, reports, selects, tells, writes.
satisfaction in
responding
(motivation).

Valuing: The worth or Examples: Demonstrates belief in the


value a person attaches democratic process. Is sensitive towards
to a particular object, individual and cultural differences (value
phenomenon, or diversity). Shows the ability to solve
behavior. This ranges problems. Proposes a plan to social
from simple acceptance improvement and follows through with
to the more complex commitment. Informs management on
state of matters that one feels strongly about.
commitment. Valuing is
based on the Key Words: completes, demonstrates,
internalization of a set of differentiates, explains, follows, forms,
specified values, while initiates, invites, joins, justifies, proposes,
clues to these values are reads, reports, selects, shares, studies,
expressed in the works.
learnerís overt behavior
32

and are often


identifiable.

Organization: Examples: Recognizes the need for


Organizes values into balance between freedom and responsible
priorities by contrasting behavior. Accepts responsibility for oneís
different values, behavior. Explains the role of systematic
resolving conflicts planning in solving problems. Accepts
between them, and professional ethical standards. Creates a
creating an unique value life plan in harmony with abilities, interests,
system. The emphasis and beliefs. Prioritizes time effectively to
is on comparing, meet the needs of the organization, family,
relating, and and self.
synthesizing values.
Key Words: adheres, alters, arranges,
combines, compares, completes, defends,
explains, formulates, generalizes, identifies,
integrates, modifies, orders, organizes,
prepares, relates, synthesizes.

Internalizing values Examples: Shows self-reliance when


(characterization): Has a working independently. Cooperates in
value system that group activities (displays teamwork). Uses
controls their an objective approach in problem solving.
behavior. The behavior Displays a professional commitment to
is pervasive, consistent, ethical practice on a daily basis. Revises
predictable, and most judgments and changes behavior in light of
importantly, new evidence. Values people for what they
characteristic of the are, not how they look.
learner. Instructional
objectives are concerned Key Words: acts, discriminates, displays,
with the student's influences, listens, modifies, performs,
general patterns of practices, proposes, qualifies, questions,
adjustment (personal, revises, serves, solves, verifies.
social, emotional).

Psychomotor (3)
33

The psychomotor domain includes physical movement, coordination, and use of the motor-
skill areas. Development of these skills requires practice and is measured in terms of speed,
precision, distance, procedures, or techniques in execution. The seven major categories are
listed from the simplest behavior to the most complex:

Category Example and Key


Words

Perception: The ability Examples: Detects non-verbal


to use sensory cues to communication cues. Estimate where a ball
guide motor activity. will land after it is thrown and then moving
This ranges from to the correct location to catch the ball.
sensory stimulation, Adjusts heat of stove to correct
through cue selection, to temperature by smell and taste of food.
translation. Adjusts the height of the forks on a forklift
by comparing where the forks are in
relation to the pallet.

Key Words: chooses, describes, detects,


differentiates, distinguishes, identifies,
isolates, relates, selects.

Set: Readiness to act. It Examples: Knows and acts upon a


includes mental, sequence of steps in a manufacturing
physical, and emotional process. Recognize oneís abilities and
sets. These three sets limitations. Shows desire to learn a new
are dispositions that process (motivation). NOTE: This
predetermine a personís subdivision of Psychomotor is closely
response to different related with the "Responding to
situations (sometimes phenomena" subdivision of the Affective
called mindsets). domain.

Key Words: begins, displays, explains,


moves, proceeds, reacts, shows, states,
volunteers.

Guided Response: The Examples: Performs a mathematical


early stages in learning equation as demonstrated. Follows
34

a complex skill that instructions to build a model. Responds


includes imitation and hand-signals of instructor while learning to
trial and error. operate a forklift.
Adequacy of
performance is achieved Key Words: copies, traces, follows, react,
by practicing. reproduce, responds

Mechanism: This is the Examples: Use a personal


intermediate stage in computer. Repair a leaking faucet. Drive a
learning a complex car.
skill. Learned responses
have become habitual Key Words: assembles, calibrates,
and the movements can constructs, dismantles, displays, fastens,
be performed with some fixes, grinds, heats, manipulates,
confidence and measures, mends, mixes, organizes,
proficiency. sketches.

Complex Overt Examples: Maneuvers a car into a tight


Response: The skillful parallel parking spot. Operates a computer
performance of motor quickly and accurately. Displays
acts that involve competence while playing the piano.
complex movement
patterns. Proficiency is Key Words: assembles, builds, calibrates,
indicated by a quick, constructs, dismantles, displays, fastens,
accurate, and highly fixes, grinds, heats, manipulates,
coordinated measures, mends, mixes, organizes,
performance, requiring a sketches.
minimum of
energy. This category NOTE: The Key Words are the same as
includes performing Mechanism, but will have adverbs or
without hesitation, and adjectives that indicate that the
automatic performance is quicker, better, more
performance. For accurate, etc.
example, players are
often utter sounds of
satisfaction or expletives
as soon as they hit a
tennis ball or throw a
35

football, because they


can tell by the feel of
the act what the result
will produce.

Adaptation: Skills are Examples: Responds effectively to


well developed and the unexpected experiences. Modifies
individual can modify instruction to meet the needs of the
movement patterns to learners. Perform a task with a machine
fit special requirements. that it was not originally intended to do
(machine is not damaged and there is no
danger in performing the new task).

Key Words: adapts, alters, changes,


rearranges, reorganizes, revises, varies.

Origination: Creating Examples: Constructs a new theory.


new movement patterns Develops a new and comprehensive training
to fit a particular programming. Creates a new gymnastic
situation or specific routine.
problem. Learning
outcomes emphasize Key Words: arranges, builds, combines,
creativity based upon composes, constructs, creates, designs,
highly developed skills. initiate, makes, originates.

Other Psychomotor Domains


As mentioned earlier, the committee did not produce a compilation for the psychomotor
domain model, but others have. The one discussed above is by Simpson (1972). There are
two other popular versions:

Dave's:(4)

o Imitation: Observing and patterning behavior after someone else. Performance may be of low
quality. Example: Copying a work of art.
o Manipulation: Being able to perform certain actions by following instructions and practicing.
Example: Creating work on one's own, after taking lessons, or reading about it.
o Precision: Refining, becoming more exact. Few errors are apparent. Example: Working and
36

reworking something, so it will be "just right."


o Articulation: Coordinating a series of actions, achieving harmony and internal consistency.
Example: Producing a video that involves music, drama, color, sound, etc.
o Naturalization: Having high level performance become natural, without needing to think
much about it. Examples: Michael Jordan playing basketball, Nancy Lopez hitting a golf ball,
etc.

Harrow's:(5)

o Reflex movements - Reactions that are not learned.


o Fundamental movements - Basic movements such as walking, or grasping.
o Perception - Response to stimuli such as visual, auditory, kinesthetic, or tactile discrimination.
o Physical abilities - Stamina that must be developed for further development such as strength
and agility.
o Skilled movements - Advanced learned movements as one would find in sports or acting.
o No discursive communication - Effective body language, such as gestures and facial
expressions.

Reference
1. Bloom B. S. (1956). T a x o n o m y o f E d u c a t i o n a l O b j e c t i v e s ,
H a n d b o o k I : T h e C o g n i t i v e D o m a i n . New York: David McKay Co Inc.

2.Krathwohl, D. R., Bloom, B. S., & Masia, B. B. (1973). T a x o n o m y o f


Educational Objectives, the Classification of Educational
G o a l s . H a n d b o o k I I : A f f e c t i v e D o m a i n . New York: David McKay Co.,
Inc.

3. Simpson E. J. (1972). T h e C l a s s i f i c a t i o n o f E d u c a t i o n a l O b j e c t i v e s
i n t h e P s y c h o m o t o r D o m a i n . Washington, DC: Gryphon House.

4. Dave, R. H. (1975). D e v e l o p i n g a n d W r i t i n g B e h a v i o u r a l
O b j e c t i v e s . (R J Armstrong, ed.) Educational Innovators Press.

5. Harrow, Anita (1972) A t a x o n o m y o f p s y c h o m o t o r d o m a i n : a g u i d e


f o r d e v e l o p i n g b e h a v i o r a l o b j e c t i v e s . New York: David McKay.
37

CHARACTERISTICS AND CLASSIFICATION


OF EDUCATIONAL MEASURING INTRUMENTS

There are certain characteristics of a good measuring instrument that make it useful; otherwise it may
not serve its purpose well.
These characteristic are:

1. Validity. The validity of a test is the degree of accuracy by which it measures what it aims to
measure. For instance, if a test aims to measure proficiency in solving linear systems in algebra,
and it does measure proficiency in solving linear system in algebra, then it is valid. But if the test
measures only proficiency in solving linear equations then the test is not valid. The degree of
validity of a test is often expressed numerically as a coefficient of correlation with another test of
the same kind and of known validity. This is computed and explained in a later chapter.

Types of validity. These are generally accepted types of validity.


They are:

a. Content validity. This refers to the relevance of the test items of a test to the subject matter or
situation from which they are taken. For instance, an achievement test in elementary algebra is to
be constructed. If all the items to be included in the test are all taken from elementary algebra,
then the test has a high content validity. However, if most of the items are taken from arithmetic,
then the test will have a very low content validity. This type of validity is also called “face
validity” or “logical validity”
b. Concurrent validity. This refers to the correspondence of the scores of a group in a test with the
scores of the same group in a similar test of already known validity used as a criterion. Suppose a
man constructs an intelligence test and he wants to know how valid his test is. He takes another
intelligence test already known validity and uses this as criterion. He gives the two tests, his test
and the criterion test, to the same group. Then he computes the coefficient of correlation between
the scores of the group in the two tests. If the coefficient of correlation between the two tests is
high, say .80, then the new tests has a high concurrent validity. (The degree of correlation is
expressed numerically from-1.00 to 0 for negative correlation and from 0 to 1.00 for positive
correlation between two tests is high if the examinees getting in the first test also get relatively
high scores in the second test and those getting low scores in the first test also get relatively low
scores in the second test.)

c. Predictive validity. This refers to the degree of accuracy of how a test predicts the level of
performance in a certain activity which it intends to foretell. Example: Intelligence tests usually
predict the level of performance in activities involving intellectual ability like school work. So, if
an individual scores high in an intelligence test and also gets high grades in school work, then the
intelligence test has a high predictive validity.

d. Construct validity. This refers to the agreement of test results with certain characteristics which
the test aims to portray. Consider the following examples. If children with higher than children
with lower intellectual ability, the intelligence test has a high construct validity. Another
example. Suppose in an intelligence test for high school students, the second year student’s score
generally higher than the first students, the third year student’s score generally higher than the
38

second year students, the said intelligence test has high construct validity. Another example. True
extroverts score higher for extroversion than true introverts in a test of personality if the test has
high construct validity.

2. Reliability. The reliability of a test is the degree of consistency of measurement that it gives.
Suppose a test is given to an individual and after the lapse of a certain length of time the same
test is again given to the same individual. If the scores in the two administrations of the test are
identical or almost identical, the test is reliable. Or, if the test is given to a group again and the
means (averages) of the scores in the two test administrations are the same or almost the same,
the test is reliable. Like validity, the degree of reliability of a test is numerically expressed as a
coefficient of correlation.

There are ways of computing the degrees of validity and reliability but they are
complicated statistical methods and they are within the scope of books in higher statistics and so
there is no intention of including them here.

Factors of reliability. There are factors that affect reliability, among which are:

a. Adequacy. Adequacy refers to the appropriate length of the test and the proper sampling of the
test content. A test is adequate if it is long enough to contain a sufficient number of
representative items of the behavior to be measured so that it is able to give true measurement.
To make a test more reliable, make it longer and the subject matter which is the subject of the
test.

b. Objectivity. A test is objective if it yields the same score no matter who checks it or even if it
checked different times. Suppose a teacher scores a paper and the number of correct answer is 73.
Another teacher checks the same paper and the number of correct responses is also 80. After
several days he checks the same test paper and the number of correct responses is also 80. The
test is objective. To make a test objective, make the responses to the items single symbols, words
or phrases.

c. Testing condition. This refers to the conditions of the conditions of the examination room. If the
room is too warm, poorly lighted or unevenly lighted, poorly ventilated or unevenly ventilated,
and quiet. The seats and writing edges of the testers cannot score as much as when the room is
properly lighted, ventilated and quite. The seats and writing edges of the testers should also be
made a possible to ensure good scoring by the testers.

d. Test administration procedures. The manner of administering a test also affects its reliability.
Explicit directions usually accompany a test and they should be followed strictly because these
procedures are standardized. Directions should be clearly understood before starting the test. The
testees may be allowed to ask questions for better understanding of the procedures before the
start of the examination. Testees are no longer expected to ask questions during the test period
because this will distract the others. Testing materials should be sufficient and available. If
possible, the testees should have to two pens so that if one runs out of ink, there is an immediate
replacement.
39

Reliability is a factor of validity; that is, a test cannot be valid without it being reliable. However,
validity is not a factor of reliability because a test can be reliable without it being valid.

3. Usability. Usability refers to the characteristics of administrability, scorability, economy,


comparability, and utility of a test. A test is usable if is easy to administer, easy to score,
economical, when results are given meaning and utile.

a. Administrability. These are tests that are easy to administer and there are tests that are hard to
administer. Group test are usually easy to administer because the directions are easy to follow.
This increases the usability of the test because they are more in demand. On the other hand, there
are tests that are quite difficult to administer on account of the complexity of directions and the
demand for these test.

b. Scorability. This is another factor of usability. There are tests that are easy to score and they are
usually in demand. But there are tests, some of them personality tests that are difficult to score on
account of the different weights, some positive and some negative, given to the items and the
computations to arrive at the final scores are very complicated. This situation lessens the demand
for these tests.

c. Economy. There are test the answer to which are written in the tests themselves and so they
cannot be used again. This makes these kinds of tests costly and this limits their usability. There
are also tests that utilize separate answer sheets so that they can be again and again. Because
these tests are cheaper, they are more in demand enchancing their usability.

d. Comparability. This refers to the availability of norms with which scores of testees are compared
to determine the meanings of their scores. For instance, in an intelligence test of 75 items one
obtains a score of 70. Comparing this with the norms, a score of 70 is equivalent to 95 percentile
rank. This means that the person obtaining the score of 70 has a higher intelligence than 94 percent
of the population for which the test is intended to cover.

e. Utility. A test is utile if it adequately serves the very purpose for which it is intended. If a test is
intended. If a test is intended to measure achievement in mathematics and it does measure
achievement in mathematics, then the test has a high utility. The test is usable.

Classification of Measuring Instruments

As far as educational measurement is concerned, there are two general kinds of measuring
instruments. They are:

1. Standard test. A standard or standardized test is one for which content has been selected and
checked empirically, for which norms have been established, for which uniform methods of
administering and scoring have been developed, and which may be scored with a relatively high
degree of objectivity. ( Good, 565) Some examples of standards tests are intelligence test,
aptitude test, personality test and interest test.
40

2. Teacher-made tests. Teacher-made tests are those made by teachers an administered to their
students to determine the achievements of the latter in the subjects they are taking for purposes of
marking and promotion. Some examples of teacher-made tests are essay examinations and
objective types of tests such as true-false, fill-in-the blanks, multiple choice, etc.

Standard Tests Differentiated from Teacher-Made Tests

Standard test and teacher-made test are very similar in function. Both are for measurement.
However, they differ in many respects. Among their differences are:

Standard Tests (2) Teacher- made tests are often prepared


(1) Standard test are generally prepared by hurriedly and haphazardly to be able to
specialist who know very well the meet the deadline for administration.
principles of test construction
(3) Teacher-made tests are usually given
(2)Standard tests are prepared very only to a class or classes for which the
carefully following accepted principles tests are intended. Usually, no norms
of test construction. are computed

(3) Standard tests are given to a large (4) Teacher-made tests are not subjected
portion of the population for which they to any statistical procedures to
are intended for the computation of determine their validity and reliability
norms.

(4) Standard tests are generally correlated (5) Teacher-made test may be objective
with other tests of known and may be essay in which case
Validity and reliability or with measures scoring is subjective.
such as school marks to determine their
validity and reliability. (6) Teacher-made tests have no norms
unless the teacher computes the
(5)Standard test generally are highly median, mean, and other measures for
objective comparison and interpretation.

(6) Standard test have their norms


Computed for purposes of comparison and
interpretation
(8) Standard tests are intended to be used
(7) Standard tests measure innate for a long period of time and for all
capacities and characteristic as well as people of the same class in the culture
achievement. where they are validated.

Teacher-made tests (9) Standard tests are accompanied by


(1) Teacher-made test are made by manuals of instructions on how to
teachers who may not know very well administer and score the tests and how
the principles of test construction. to interpret the results
41

(10) Standard tests are generally achievement of students in a subject matter


copyrighted. studied during a certain period
(9) Teacher-made tests do not have manuals
of instructions for the different types of tests
which may be given orally or in writing.
(10) Teacher-made tests are not copyrighted
(7) Teacher-made tests generally measure
subject achievement only

(8) Teacher-made tests are intended to be


used only once or twice to measure
Classification and Uses of Standard Tests

The more common ways of classifying standard tests are the following:
A. According to Function

1. Psychological test. This is test that measure ad individual’s ability or personality as


developed by general experience. (Good, 561) The types of psychological tests are the
following:
a. Intelligence test. This is a composite test made of parts that have been to correlate
well with some practical measure of intellectual ability, such as success in
school. (Good 560) This is popularly called I.Q. test. It measures general
mental ability
b. Aptitude. This is a test designed to indicate a person’s potential ability for
performance of a type of activity. Examples are musical aptitude test, prognostic
test, scholastic aptitude test, mechanical aptitude test and the like. (Good, 557)
This measures special ability or talent
c. Personality test. This is a test designed to measure some aspects of an individual’s
personality.
The types of personality tests are:
(1) Rating test. A device used in evaluating products, attitudes, or other
characteristics of instructors or learners. The usual form is an evaluation chart
carrying some suggestive points for checking. (good, 440)
(2) Personality inventory. This is a measuring device for determining and individual’s
personal characteristics such as his emotional adjustment or tendencies toward
introversion; may be arranged for self-rating of for rating by other persons.(Good, 300)
This is test also measures dominance and submissiveness.
(3) Projective test. A method of measurement of an individual’s personality in which the
stimulus is usually unstructured and produces responses reflecting the person’s
individually. (Good, 300) An example of this test is the Rorschach Test consisting of ink-
blots which the subject interprets and his interpretations reveal his personality. Another
one is the Thematic Apperception Test consisting of standardized pictures which the
42

respondent interprets and his interpretations will reveal his values, motives, and other
aspects of his personality.
d. Vocational and professional interest inventory. This is a test used to
determine the extent to which a person’s likes and dislikes relate to a given
vocation of profession. (Good, 566) This test reveals the type of work or
career a person is interested in whether business, teaching, nursing. Etc.
2. Educational test. This is an achievement test which aims to measure a person’s
knowledge, skills, abilities, understanding and other outcomes in subjects taught in school.
(Good, 556-557) Examples are achievement tests in mathematics, English, etc.

B. According to Construction

1. Structured test. A test is said to be structured when the examinee is required to


respond within the framework or design of the test and correct responses are
expected. Examples are objective test whether standardized or teacher-made.
These are also called restricted tests because there are restrictions imposed.

2. Unstructured test. In this test, the examinee is free to respond in any way he
likes, thinks, feels, or has experienced and there are no incorrect answers.
Examples are projective tests. These are also called unrestricted tests because
there are no restrictions imposed.

C. According to the Number of Persons to Whom Test is Administered

1. Individual test. This test is administered to only one person at a time. Examples
are personality tests that can be given to only one person at a time

2. Group test. This is a test that can be given to more than one person at a time.
Intelligence tests are usually given to several persons at a time.

D. According to the Degree to Which Words Are Used in Test Items and in Pupil
Responses
1. Verbal test. A verbal test is of the paper-and-pencil test variety but questions may
be presented orally or in written form or objects may be presented for
identification. The answers, however, are given in words usually written but
sometimes given orally.

2. Nonverbal test. This is a test which a minimum amount of language is used. The
test, composed mostly of symbols, may be written or given orally but the answers
are given solely in numbers, graphical representation, or three-dimensional
43

objects or materials. Some intelligence tests are nonverbal and they are used with
people with language difficulty.

3. Performance test. This test is also nonverbal but the pupils may be required to
use paper and pencil for responding, or the manipulation of physical objects and
materials. An example of this test is the arrangement of blocks. This is also used
with persons with language difficulty

E. According to Difficult of Items


1. Speed test. This is a test whose items are of the same level of difficulty. Pupils
are tested on the number of items they can answer in a certain period. It is speed
and accuracy that are measured.
2. Power test. The items in this test have different degrees of difficulty and are
arranged in ascending order of difficulty, i.e., from easy to difficult for them.
Intelligence tests are examples of power tests.
F. According to the Arrangement of Items
1. A Test in which Arrangement of items is not important
Speed tests are of this kind because the items are of equal difficulty. Arrangement
of items has no affected whatsoever upon performance.

2. Scaled tests. This is a test in which the items are of different difficulty and are
arranged from easy to difficult. Examples are power tests. The process of
determining the difficulty of test items and arranging them in an ascending order
of different is called scaling.”A scale is a series of objective samples or products
of different difficulty or quality that have been arranged in a definite order, or
position, usually in ascending order of difficulty or quality.”

G. According to the Amount to be Performed

1. Maximum-performance test. In this test, the examinee is urged to accomplish as


much as he can to show his ability, capacity, etc. Examples are intelligence,
aptitude, and achievement tests.
2. Typical performance test. This test tries to reveal what a person really is. The
examinee is urged to answer all items honestly. Examples are tests of personality,
vocational interest, emotional adjustment, etc. time limits are not important.

Advantages of Standard Tests


1. Standard tests are generally valid and reliable
2. Standard tests are accompanied by manuals of instructions concerning their
administration and scoring and so there are no problems on how they are
administered and scored.
44

3. Standard tests have norms with which test results are compared and given
meaning. Hence, interpretation of test results is easy
4. Standard tests can be used again and again provided they are not given to the
same group twice. Their validity and reliability will be affected because of the
effect of practice if given again and again to the same group.
5. Standard tests provide a comprehensive coverage of the basic knowledge, skills,
abilities and other traits that are generally considered as essential.

Limitations of Standard Tests

1. Since standard test are for general use, their contents may not fully correspond to
the expected outcomes of the instructional objectives of a particular school,
subject, or course. This is especially true with standard achievement tests. Hence,
very careful selection has to be done if standard tests are to be used for
measurement.
2. Since standard tests are very objective they may not be able to measure the
ability to reason, explain, contrast, organize one’s ideas and the like
3. Standard test of the right kind for a purpose may be very scarce and hard to find.

TEACHER-MADE EXAMINATIONS

TEACHER-MADE examination, as mentioned before, is those constructed by teachers to be


given their students for the purpose of marking and promotion. Teacher-made examinations are
principal tools in measuring school achievement. They are grouped into the following major
classed:

1. Oral examination. These are tests in which the answers are given in spoken words. The
questions may be given in spoken words or in writing. Examples are oral recitations.
(Good 562) another example is the oral defense of a thesis or dissertation in graduate
studies.
2. Written examination. These are tests in which the answers are given in writing. The
questions may be given orally or in writing. Examples are essay and objective
examinations.
3. Performance examinations. These are examinations in which the responses are given
by means of overt actions. (Good, 562) Examples are calisthenics in physical education,
marching and assembling a gun in military training, planning in woodworking, making a
dress, etc. the questions may be given in words or in writing.
45

Creating Good Teacher-Made Tests

• Where do I begin?
— Begin with your objectives What did you want the students to
-

know or be able
to do in each of the lessons?

Note the active verb & each learning objective must be


measurable.

— Remember your taxonomy:


Domains - cognitive, psychomotor, and affective. Use
-

these levels to determine how you ask questions


.
o Cognitive Domain- Knowledge, Comprehension,
Application, Analysis, Synthesis. Evaluation (Bloom
(1956) identified 6 levels within the cognitive domain)
o Psychomotor Domain - Perception, Set, Guided
practice, Habitual response, Complex overt
response, Adaptation, Origination (psychomotor
domain emphasizes physical skills as outlined in 7 levels by
Simpson (1972)

o Affective Domain- Receiving, Responding, Valuing,


Organization, Value Complex (5 skills involving the
acquisition and use of knowledge as developed Krathwohl,
Bloom and Masia (1964)

— Create a simple Table of Specifications to help you create


questions at appropriate levels
• Know how you plan to use the test
— Pre-testing
— Post-testing
— Grading purposes
— Diagnostic purposes
46

— For placement in groups or levels


— Advisement
— Information gathering about an individual
— Information gathering about groups
• Preselect items that relate to the learning
objectives
• Plan how to grade or score (Rubric)
• Determine how this evaluation fits with others

ITEM ANALYSIS AND VALIDATION


Introduction

The teacher normally prepares a draft of the test. Such a draft is subjected to item
analysis and validation in order to ensure that the final version of the test would be
useful and functional. First, the teacher tries out the draft test to a group of students of
similar characteristics as the intended test takers (try-out phase). From the try-out
group, each item will be analyzed in terms of its ability to discriminate between those
who know and those who do not know and also its level of difficulty (item analysis
phase). The item analysis will provide information that will allow the teacher to
decide whether to revise or replace an item (item revision phase). Then, finally, the
final draft of the test is subjected to validation if the intent is to make use of the test as
a standard test for the particular unit or grading period. We shall be concerned with
these concepts in this Chapter.

4.1 Item Analysis

There are two important characteristics of an item that will be of interest to the
teacher. These are: (a) item difficulty, and (b) discrimination index. We shall learn how
to measure these characteristics and apply our knowledge in making a decision about the
item in question.

The difficulty of an item difficulty is defined as the number of students who are able
to answer the item correctly divided by the total number of students. Thus:
Item difficulty = number of students with correct answer/total number of students
The item difficulty is usually expressed in percentage.
Example: what is the item difficulty index of an item if 25 students are unable to answer
it correctly?
47

Here, the total number of students is 100; hence, the difficulty index is 75/100 or
75%.

One problem with this type of difficulty index is that it may not actually indicate that
the item is difficult (or easy). A student who does not know the subject matter will
naturally be unable to answer the item correctly even if the question is easy. How do we
decide on the basis of this index whether the item is too difficult or too easy? The
following arbitrary rule is often used in the literature:

Range of Difficulty Index Interpretation Action

0 – 0.25 Difficult Revise or discard

0.26 – 0.75 Right difficulty Retain

0.76 – above Easy Revise or discard

Difficult items tend to discriminate between those who know and those who do not
know the answer. Conversely, easy items cannot discriminate between these two groups
of students. We are therefore interested in deriving a measure that will tell us whether an
item can discriminate between these two groups of students. Such a measure is called an
index of discrimination.

An easy way to derive such a measure how difficult an item is with respect to those in
the upper 25% of the class and how difficult it is with respect to those in the lower 25%
of the class. If the upper 25% of the class found the item easy yet the lower 25% found it
difficult, then the item can discriminate properly between these two groups. Thus:
Index of discrimination = DU – DL

Example: obtain the index of discrimination of an item if the upper 25% of the class
had a difficulty index of 0.60 (i.e. 60% of the upper 25% got the correct answer) while
the lower 25% of the class had a difficulty index of 0.20.
Here, DU = 0.60 while DL = 0.20, thus index of discrimination = .60 - .20 = .40.
Theoretically, the index of discrimination can range from -1.0 (when DU = 0 and DL = 1)
to 1.0 (when DU =1 and DL – 0). When the index of discrimination is equal to -1, then
this means that all of the lower 25% of the students got the correct answer while all of the
upper 25% got the wrong answer. In a sense, such index discrimination correctly between
the two groups but the item itself is highly questionable. Why should the bright ones get
the wrong answer and the poor ones get the right answer? On the other hand, if the index
of discrimination is 1.0, then this means that all of the lower 25% failed to get the correct
answer. This is perfectly discriminating item and is the ideal item that should be included
in the test. From these discussions, let us agree to discard or revise all items that have
negative discrimination index for although they discriminate correctly between the upper
and lower 25% of the class, the content of the item itself may be highly dubious. As in
the case of the index of difficulty, we have the following rule thumb:
48

Index Range Interpretation Action

-1.0 – -.56 can discriminate but Discard


Item is questionable
- .55 – 0.45 Non-discriminating Revise

0.46 – 1.0 Discriminating item Include

Example: Consider a multiple choice type of test of which the following data were
obtained:

Item Options
A B C D
1 0 40 20 20 Total
0 15 5 0 Upper 25%
0 5 10 5 Lower 25%

The correct response is B. let us compute the difficulty index and index of discrimination:
Difficulty Index = no. of students getting correct response/total
= 40/100 = 40%, within range of a “good item”
The discrimination index can similarly be computed:
DU = no. of students in upper 25% with correct response/no. of students in the upper
25%
= 15/20 = .75 or 75%
DL = no. of students in lower 75% with the correct response/no. of students in the lower
25%
= 5/20 = .25 or 25%
Discrimination Index = DU – DL = .75 - .25 = .50 or 50%.

Thus, the item also has a “good discriminating power”.

It is also instructive to note that the distracter A is not an effective distracter since this
was never selected by the students. Distracters C and D appear to have good appeal as
distracters.

ITEM ANALYSIS

In a normal classroom situation, test papers are usually returned to students to give them
feedback about their standing in the test and their performance in the lessons covered,
sometimes, upon or after returning the test papers, a teacher explains the answers of the
more difficult items or the entire test to review and intensify the learning of the students.
However, while he expresses surprise over the students’ inability to answer either
difficult or easy questions, he sometimes fails to consider the nature of the items on the
basis of the entire class performance. He rarely goes into actual percentage of the class
49

that got an item right and into the idea that the item discriminates between the bright and
the poor students.

The Meaning of Item Analysis


Item analysis refers to the process of examining the student’s response to each item in
the test. There are two characteristics of an item. An item that has desirable
characteristics can be retained for subsequent use and that with undesirable
characteristics is either to be revised or rejected. The desirability or undesirability of an
item is determined by the three important criteria namely, difficulty of the item,
discriminating power of the item and the measures of attractiveness.

Difficulty Index
Difficulty index refers to the proportion of the number of students in the upper and
lower groups who answered an item correctly. Therefore, difficulty index of an item
maybe obtained by adding the proportion in the upper and lower groups who go the item
right and divide it by 2.

Table 1 below shows the index of difficulty of an item. This table should serve the
teacher in classifying the item from the easiest to the most difficult ones.

TABLE 1

Index of Difficulty of an Item

Index Range Difficulty Level


0.00-0.20 Very difficult
0.21-0.40 Difficult
0.41-0.60 Optimum difficulty
0.61-0.80 Easy
0.81-1.00 Very easy

Another criterion that indicates the acceptability of an item is the discriminating power of
an item. Usually, a good item properly discriminates bright students from the poor ones.
To determine this, a discrimination index of difficulty is computed.

The discrimination index refers to the proportion of the students in the upper who got an
item right minus the proportion of students in the lower group who got an item right.

A maximum positive discriminating power of an item is indicated by an index of 1.00


and is obtained when all the upper groups answered correctly and no one in the lower
group did. A zero discriminating power is obtained when an equal number of students in
both groups got the item right. A negative discriminating power is obtained when more
students in the lower group got the item right than in the upper group.
The index of discrimination of an item will guide the teacher in knowing which of the
item is very discriminating or questionable (see table 2 below)
50

Table 2
Index of Discrimination of an Item
Index range Discrimination of an item
Below-0.10 Questionable item
0.11-0.20 Not discriminating
0.21-0.30 Moderately discriminating
0.31-0.40 Discriminating
0.41-1.00 Very discriminating

Basic Item Analysis Statistics

The Michigan State University Measurement and Evaluation Department reports a


number of item statistics which aid in evaluating the effectiveness of an item. The first of
these is the index of difficulty which MSU (http//www.msu.edu/dept/) defines as the
proportion of the total group who got the item wrong. “Thus a high index indicates a
difficult item and a low index indicates an easy item. Some item analysts prefer an index
of difficulty which is the proportion of the total group who got an item right. This index
may be obtained by marking the PROPORTION RIGHT option on the item analysis
header sheet. Whichever index is selected is shown as the INDEX OF DIFFICULTY on
the item analysis print-out. For classroom achievement tests, most test constructors desire
items with indices of difficulty no lower than 20 nor higher than 80, with an average
index of difficulty from 30 or 40 to a maximum of 60.

The INDEX OF DISCRIMINATION is the difference between the proportion of the


upper group who got an item right and the proportion of the lower group who got the
item right. This index is dependent upon the difficulty of an item. It may reach a
maximum value of 100 for an item with an index of difficulty of 50, that is, when 100%
of the upper group and none of the lower group answer the item correctly. For items of
less than 100. Interpreting the Index of Discrimination document contains o more
detailed discussion of the index of discrimination.” (http//www.msu.edu/dept).

More Sophisticated Discrimination Index

Item discrimination refers to the ability of an item to differentiate among students on


the basis of how well they know the material being tested. Various hand calculation
procedures have traditionally been used to compare item responses to total test scores
using high and low scoring groups of student. Computerized analyses provide more
accurate assessment of the discrimination power of items because they take into account
responses of all students rather than just high and low scoring groups.
The item discrimination index provided by ScorePak is a Pearson Product Moment
correlation between student responses to a particular item and total scores on all other
items on the test. This index is the equivalent of a point-biserial coefficient in this
application. It provides an estimate of the degree to which an individual item is
measuring the same thing as the rest of the items.
51

Because the discrimination index reflects the degree to which an item and the test as a
whole are measuring a unitary ability or attribute, values of the coefficient will tend to be
lower for test measuring a wide range of content areas than for more homogeneous tests.
Item discrimination indices must not be interpreted in the context of the type of test
which is being analyzed. Items with low discrimination indices are often ambiguously
worded and should be examined. Items with negative indices should be examined to
determine why a negative value was obtained. For example, a negative value may
indicate that the item was mis-keyed, so that students who knew the material tended to
choose an unkeyed, but correct, response option.
Tests with high internal consistency consist of items with mostly positive
relationships with total test score. In practice, values of the discrimination index will
seldom exceed .50 because of the differing shapes of item and total score distributions.
ScorePak classifies item discrimination as “good” if the index is above .30; “fair” if it is
between .10 and .30; and “poor” if it is below .10.
A good item is one that has good discriminating ability and has sufficient level of
difficult (not too difficult nor too easy). In the two tables presented for the levels of
difficulty and discrimination there is a little area of intersection where the two indices
will coincide (between 0.56 to 0.67) which represent the good items in a test. (Source:
Office of Educational Assessment, Washington DC, USA
http://www.washington.edu/oea/services/scanning_scoring/item_analysis.html)
At the end of the Item Analysis report, test items are listed according their degrees of
difficulty (easy, medium, hard) and discrimination (good, fair, poor). These distributions
provide a quick overview of the test, and can be used to identify items which are not
performing well and which can perhaps be improved of discarded.

Summary

The Item Analysis Procedure for Norm-Provides the following information


1. The difficulty of the item
2. The discriminating power of the item
3. The effectiveness of each alternative

Benefits derived from Item Analysis


1. It provides useful information for class discussion of the test.
2. It provides data which helps students improve their learning.
3. It provides insights and skills that lead to the preparation of better tests in the
future.

Index of Difficulty

P = Ru + RL x 100
T
52

Where:
Ru – The number in the upper group who answered the item correctly.
RL – The number in the lower group who answered the item correctly.
T - The total number who tried the item.

Index of Item Discriminating Power

D = Ru + RL
½T
Where:
P – Percentage who answered the item correctly (index of difficulty)
R – Number who answered the item correctly
T – Total number who tried the item.

P = _8_ x 100 = 40%


20
The smaller the percentage figure the more difficult the item

Estimate the item, discriminating power using the formula below:

D = Ru – RL = 6 – 2 = 0.40
½T 10
The discriminating power of an item is reported as a decimal fraction;
maximum discriminating power is indicated by an index of 1.00.
Maximum discrimination is usually found at the 50 percent level of
difficulty

1.0 – 0.20 = Very difficult


0.21 – 0.80 = moderately difficult
0.81 – 1.00 = Very easy

Validation

After performing the item analysis and revising the items which need revision, the
next step is to validate the instrument. The purpose of validation is to determine the
characteristics of the whole test itself, namely, the validity and reliability of the test.
Validation is the process of collecting and analyzing evidence to support the
meaningfulness and usefulness of the test.

Validity. We already discussed the concept of validity. There, we defined validity


as the extent to which a test measure or as referring to the appropriateness, correctness,
meaningfulness and usefulness of the specific decisions a teacher makes based on the test
results. These two definitions of validity differ in the sense that the first definition refers
to the test itself while the second refers to the decisions made by the teacher based on the
test.
53

A teacher who conducts test validation might want to gather different kinds of
evidence. There are essentially three main type of the evidence that may be collected:
content-related evidence of validity, criterion-related evidence of validity and construct-
related evidence of validity. Content-related evidence of validity refers to the content and
format of the instrument. How appropriate is the content? How comprehensive? Does it
logically get at the intended variable? How adequately does the sample of items or
question represent the content to be assessed?

Criterion-related evidence of validity refers to the relationship between scores


obtained using the instruments and scores obtained using one or more other tests (often
called criterion). How strong is this relationship? How well do such scores estimate
present or predict future performance of a certain type?

Construct-related evidence of validity refers to the nature of the psychological


construct or characteristic being measured by the test. How well does a measure of the
construct explain differences in the behavior of the individuals or their performance on a
certain task?

The usual procedure for determining content validity may be described as


follows: the teacher writes out the objectives of the test based on the table of
specifications and then gives these together with the test to at least two (2) experts along
with a description of the intended test takers. The experts look at the objectives, read over
the items in the test and place a check mark in front of each question or item that they
feel does not measure one or more objectives. They also place a check mark in front of
each objective not assessed by any item in the test. The teacher then rewrites any item so
checked and resubmits to the experts and/or writes new items to cover those objectives
not heretofore covered by the existing test. This continues until the experts approves of
all items and also until the experts agree that all of the objectives are sufficiently covered
by the test.

In order to obtain evidence of criterion-related validity, the teacher usually


compares scores on the test in question with the scores on some other independent
criterion test which presumably has already high validity. For example, if a test is
designed to measure mathematics ability of students and it correlates highly with the
standardized mathematics achievement test (external criterion), then we say we have high
criterion-related evidence of validity. In particular, this type of criterion-related evidence
of validity is called the concurrent validity. Another type of criterion-related evidence of
validity is called predictive validity wherein the test scores in the instrument and
correlated with scores on a later performance (criterion measure) of the students. For
example, the mathematics ability test constructed by the teacher may be correlated with
their later performance in a Division wide mathematics achievement test.

Apart from the use of correlation coefficient in measuring criterion-related


validity, Gronlund suggest using the so-called expectancy table. This table is easy to
construct and consists of the test (predictor) categories listed on the left hand side and the
criterion categories listed horizontally along the top of the chart. For example, suppose
54

that a mathematics achievement test is constructed and the scores are categorized as high,
average, and low. The criterion measure used is the final average grades of the students in
high school: Very Good, Good, and Needs Improvement. The two way table lists down
the number of students falling under each of the possible pairs of (test, grade) as shown
below:

Grade Point Average


Test Score Very Good Good Needs Improvement

High 20 10 5

Average 10 25 5

Low 1 10 14

The expectancy table shows that there were 20 students getting high test scores
and subsequently rated excellent in terms of their final grades; 25 students got average
scores and subsequently rated good in their finals; and finally, 14 students obtained low
test scores and were later graded as needing improvement. The evidence for this
particular test tends to indicate that students getting high scores on it would be graded
excellent; average scores on it would be rated good later; and students getting low scores
on the test would be graded as needing improvement later.
We will not be able to discuss the measurement of construct-related validity in
this book since the method to be used require sophisticated statistical techniques falling
in the category of factor analysis.

Reliability

Reliability refers to the consistency of the score obtained – how consistent they
are for each individual from one administration of an instrument to another and from one
set of items to another. We already gave the formulas for computing the reliability of a
test: for internal consistency, for instance, we could use the split-half method or the
Kuder-Richardson formulates (KR-20 or KR-21).

Reliability and validity are related concepts. If an instrument is unreliable, it


cannot get valid outcomes. As reliability improves, validity may improve (or may not).
However, if an instrument is shown scientifically to be valid then it is almost certain that
it is also reliable. The following table is the standard followed almost universally in
educational tests and measurement.
55

Reliability Interpretation

.90 and above Excellent reliability; at level of the best standardized


tests

.80 – .90 Very good for a classroom test

.70 - .80 Goof for a classroom test; in the range of most. There
are probably a few items which could be improved.

.60 - .70 Somewhat low. This test needs to be supplement by


other measures (e.g., more tests) to determine grades.
There are probably some items which could be
improved.

.50 - .60 Suggests need for revision of test, unless it is quite


short (ten or fewer items). The test definitely needs to
be supplemented by other measures (e.g., more tests)
for grading.

.50 or below Questionable reliability. This tests should mot


contribute heavily to the course grade and it needs
revision.

TIPS FOR CREATING GOOD (REASONABLE) TEACHER MADE TESTS

1. Write questions and use formats that match the developmental levels of
your students.
56

2. Clearly describe what you expect the students to know for the test and
test over that information.

3. Create questions that match the content covered and the objectives
identified for your content.

4. Determine frequency of tests and announce tests at least a week in


advance.

5. Let older students know how you will be grading - grading scale, etc.

6. Balance tests with both objective and subjective items in order to meet
the various learning styles.

7. Include questions that require the use of both higher and lower level
thinking.

8. Stay away from the use of questions intended to “trick” students.

9. Create a detailed answer key to grade tests fairly.

10. Return graded tests within one week time so students can benefit from
the feedback.

11. Consider the possibility of retake opportunities - if concerned about


mastery.

12. Give clear directions on how to take the various parts of the test.

13. Identify the point value of the various items so students have an idea of
how much time or effort to place on the various questions.

14. Stick with positive statements rather than “Which of the following is
NOT...”

15. With multiple choice items limit choices to about 3 for elementary
students and no more than 4 for secondary students.
57

THE 19 COMMANDMENTS OF TESTING

1. Judge students’ performance against their own knowledge and not against their peers.

2. Use tests for improvement, for feedback to students, so they can know what their
problems and improve accordingly (formative rather than summative).

3. Use tests also to evaluate our own teaching, so we can find out what we had not
taught well and improve our teaching accordingly.

4. Use tests along with teaching and learning.

5. Test to improve knowledge and skills and not just for judgment.

6. Train teachers to ask students to explain what he/she meant by a given answer.

7. Determine in advance criteria for success.

8. Ask students to give teacher feedback on the quality of the test.

9. Base grades on a number of tests, never on ONE test.

10. Use multiple testing methods to tap knowledge of a given skill (multiple-choice,
open-ended, true-false, etc.)

11. Test the skill directly and not via other skill.

12. Return the test with meaningful feedback and not just with a numerical score.

13. Plan a test in advance and not in the last minute.

14. When students fall on a test, let’s not rule out the possibility that the teaching was bad
or that the test was poorly constructed.

15. Familiarize the students with the testing method.

16. Make an effort to reduce students’ anxiety before a test.

17. Not use a test as a punishment.

18. Remember to return the tests to the students in a short time after they were
administered.

19. Try to make tests fun, challenging and interesting.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy