Educ 107 Midterm Course Pack
Educ 107 Midterm Course Pack
Learning Outcomes At the end of the lesson, the students are expected to:
define the following terms: validity, reliability,
practicality and efficiency, ethics, administrability,
scorability, objectivity, practicality; and
discuss the different principles of high-quality
testing/assessing.
Time Frame 2 weeks
INTRODUCTION
Welcome to your third module which will be guiding you in the development of classroom
assessment tool. In this lesson, you are expected to master different terminologies and
principles you need to know in developing various assessment tools.
ACTIVITY
A. Surf the Net
Instructions: Search for different proponents who gave updated definitions of
the following terms in relation to assessment in learning:
1. validity
2. reliability
3. practicality and efficiency
4. ethics
5. administrability
6. scorability
7. objectivity
8. practicality
1|Page
ANALYSIS
A. Surf and annotate. Search for different articles and journals that present updated
information on the general principles of testing. Browse and read at least 3 articles.
Annotate each article.
2|Page
ABSTRACTION
3|Page
Students are entitled to feedback on submitted formative assessment tasks, and on
summative tasks, where appropriate. The nature, extent and timing of feedback for each
assessment task should be made clear to students in advance.
Principle 9 - Staff development policy and strategy should include assessment
All those involved in the assessment of students must be competent to undertake their
roles and responsibilities.
Types of Validity
1. Content Validity
a logical process where connections between the test items and the
outcome-related and job-related tasks are established
Subject matter experts (SMEs) review the test items. They are given the
list of content areas specified in the test blueprint, along with the test
items intended to be based on each content area. The SMEs are then
asked to indicate whether or not they agree that each item is appropriately
matched to the content area indicated.
Any items that the SMEs identify as being inadequately matched to the
test blueprint, or flawed in any other way, are either revised or
dropped from the test.
2. Concurrent Validity
a statistical method using correlation, rather than a logical method
examinees who are known to be either masters or non-masters on the
content measured by the test are identified, and the test is administered to
them under realistic exam conditions.
relationship is estimated between the examinees’ known status as either
masters or non-masters and their classification as masters or non-masters
(i.e., pass or fail) based on the test.
it provides evidence that the test is classifying examinees correctly. The
stronger the correlation is the greater the concurrent validity of the
test is.
3. Predictive Validity
it measures the relationship between examinees' performances on the test
and their actual status as masters or non-masters. However, with
4|Page
predictive validity, it is the relationship of test scores to an examinee's
future performance as a master or non-master that is estimated.
this type of validity is especially useful for test purposes such as selection
or admissions.
4. Face Validity
determined by a review of the items and not through the use of statistical
analyses.
is not investigated through formal procedures and is not determined by
subject matter experts. Instead, anyone who looks over the test, including
examinees and other stakeholders, may develop an informal opinion as to
whether or not the test is measuring what it is supposed to measure.
The validity of a test is critical because, without sufficient validity, test scores have no
meaning. The evidence you collect and document about the validity of your test is also
your best legal defense should the exam program ever be challenged in a court of law.
While there are several ways to estimate validity, for many certification and licensure
exam programs the most important type of validity to establish is content validity.
RELIABILITY
.90 and above Excellent reliability; at the level of the best standardized tests
.70-.80 Good for a classroom test; in the range of most. There are
probably a few items which could be improved.
5|Page
.50-.60 Suggests need for revision of test, unless it is quite short (ten or fewer
items). The test definitely needs to be supplemented by other means (e.g.
more tests) for grading.
.50 or below Questionable reliability. This test should not contribute heavily to the
course grade, and it needs revision.
PRACTICALITY
The test must be of sufficient length to yield dependable and meaningful results
(Harris, 1977).
The number of questions and the type(s) of questions used both affect the
amount of time needed for completion of the test. Nitko (in Planning the test,2001)
provides some estimates of time to complete various types of questions for junior and
senior high school students. Oosterhof (2001), gives similar estimates but indicates that
elementary students and poor readers might need more time (Alabama Department of
Education, 2001).
Most of the tasks students encounter should tap the kinds of cognitive skills that have
been characterized as “higher-level”—skills that support transferable learning, rather
than emphasizing only skills that tap rote learning and the use of basic procedures.
While there is a necessary place for basic skills and procedural knowledge, it must be
balanced with attention to critical thinking and applications of knowledge to new
contexts.
7|Page
Criteria for high-quality assessment. Stanford, CA: Stanford Center for Opportunity
Policy in Education.
APPLICATION
A. Summary Report
8|Page
2. FORMATIVE ASSESSMEN
Module No. and Title MODULE 3: DEVELOPMENT OF CLASSROOM
ASSESSMENT TOOL
Learning Outcomes At the end of the lesson, the students are expected to:
construct a table of specification using different
formats;
Examine the different rules in constructing multiple-
choice test, matching types test, completion test, true
or false test; and
construct multiple-choice test, matching types test,
completion test, true or false test
Time Frame 1 week
INTRODUCTION
You are done with knowing the principles and criteria that you need to consider in testing
and assessing your learner’s knowledge and skills. This time, you will be acquainted with the
proper way of making a table of specifications and rules in constructing different assessment
tools that you can use in your classroom.
ACTIVITY
A. Supposed you are already teaching. What do you think should be considered and
done first thing before making an assessment tool? What are the necessary
preparations that you may do in order for you to craft an appropriate and successful
classroom assessment tool?
9|Page
ANALYSIS
A. Study the blueprint for an assessment test below. Dissect each part. Share your
opinion on how should this be used in the assessment procedure.
10 | P a g e
ABSTRACTION
Many teacher-made tests often suffer from inadequate and improper planning. Many
teachers often jump into the classroom to announce to the class that they are having a
test or construct the test haphazardly.
It is at the planning stage that such questions as the ones listed below are resolved:
(i) What is the intended function of this test? Is it to test the effectiveness of your
method, level of competence of the pupils, or diagnose area of weakness before other
topics are taught?
(ii) What are the specific objectives of the content area you are trying to achieve?
(iii) What content area has been taught? How much emphasis has been given to each
topic?
(iv) What type of test will be most suitable (in terms of effectiveness, cost and
practicality) to achieve the intended objectives of the contents?
Defining Objectives
As a competent teacher, you should be able to develop instructional objectives that are
behavioral, precise, realistic and at an appropriate level of generality that will serve as a
useful guide to teaching and evaluation.
However, when you write your behavioural objectives, use such action verbs like define,
compare, contrast, draw, explain, describe, classify, summarize, apply, solve, express,
state, list and give. You should avoid vague and global statements involving the use of
verbs such as appreciate, understand, feel, grasp, think etc.
11 | P a g e
Haberman (1964) says the teacher receives the following benefits by using behavioural
objectives:
You should determine the area of the content you want to test. It is through the content
that you will know whether the objectives have been achieved or not.
Test blueprint is a table showing the number of items that will be asked under each
topic of the content and the process objective. This is why it is often called
Specification Table. Thus, there are two dimensions to the test blueprint, the content
and the process objectives.
As mentioned earlier, the content consists of the series of topics from which the
competence of the pupils is to be tested. These are usually listed on the left-hand side
of the table. The process objectives or mental processes are usually listed on the top-
row of the table.
The process objectives are derived from the behavioural objectives stated for the
course initially. They are the various mental processes involved in achieving each
objective. Usually, there are about six of these as listed under the cognitive domain viz:
Knowledge, Comprehension, Analysis, Synthesis, Application and Evaluation.
The proportion of test items on each topic depends on the emphasis placed on it
during teaching and the amount of time spent. Also, the proportion of items on
each process objectives depends on how important you view the particular
process skill to the level of students to be tested. However, it is important that
you make the test a balanced one in terms of the content and the process
objectives you have been trying to achieve through your series of lessons.
Percentages are usually assigned to the topics of the content and the process
objectives such that each dimension will add up to 100%. (see the table below).
12 | P a g e
After this, you should decide on the type of test you want to use and this will
depend on the process objective to be measured, the content and your own skill
in constructing the different types of tests.
Determination of the Total Number of Items
At this stage, you consider the time available for the test, types of test items to be
used (essay or objective) and other factors like the age, ability level of the
students and the type of process objectives to be measured.
When this decision is made, you then proceed to determine the total number of
items for each topic and process objectives as follows:
(i) To obtain the number of items per topic, you multiply the percentage of each
by the total number of items to be constructed and divide by 100. This you will
record in the column in front of each topic in the extreme right corner of the
blueprint. In the table below, 25% was assigned to soil. The total number of items
is 50 hence 12 items for the topic (25% of 50 items = 12 items).
(ii) To obtain the number of items per process objective, we also multiply the
percentage of each by the total number of items for test and divide by 100. These
will be recorded in the bottom row of the blueprint under each process objective.
In the table below:
(b) To decide the number of items in each cell of the blue print, you simply
multiply the total number of items in a topic by the percentage assigned to the
process objective in each row and divide by 100. This procedure is repeated
for all the cells in the blue print. For example, to obtain the number of items
on water under knowledge, you multiply 30% by 10 and divide by 100 i.e. 3.
13 | P a g e
CONSTRUCTION OF DIFFERENT ASSESSMENT TOOLS
Multiple-choice questions are said to be objective in two ways. First is that each student
has an equal chance. He/she merely chooses the correct options from the list of
alternatives. The candidates have no opportunity to express a different attitude or
special opinion. Secondly, the judgment and personality of the marker cannot influence
the correction in any way. Indeed, many objective tests are scored by machines. This
kind of test may be graded more quickly and objectively than the subjective or the easy
type.
14 | P a g e
An example of a fairly unambiguous instructions are stated below: read the instructions
carefully.
i. Candidates are advised to spend only 45 minutes on each subject and attempt
all questions.
ii. A multiple-choice answer sheet for the four subjects has been provided. Use
the appropriate section of the answer sheet for each subject.
iii. Check that the number of each question you answer tallies with the number
shaded on your answer sheet.
iv. Use an HB pencil throughout.
15 | P a g e
Basic Principles for Constructing Short-Answer Tests
Some of the principles for constructing multiple choice tests are relevant to constructing
short-answer tests.
i. The instructions must be clear and unambiguous. Candidates should know what to
do.
ii. Enough space must be provided for filing in gaps or writing short answers.
iii. As much as possible the questions must be set to elicit only short answers. Do not
construct long answer-question in a short answer test.
iv. The test format must be consistent. Do not require fill in gaps and matching in the
same question.
v. The questions should be related to what is taught, what is to be taught or what to be
examined. Candidates must know beforehand the requirements and demands of the
test.
16 | P a g e
Basic Principles for Constructing Essay Tests
Essay or subjective type of test is considered to be subjective because you are able to
express your own opinions freely and interpret information in anyway you like, provided
it is logical, relevant, and crucial to the topic. In the same way, your teacher is able to
evaluate the quality and quantity of your opinions and interpretations as well as your
organization and logic of your presentation. The following are the basic principles
guiding the setting of essay question:
Source: Carroll, J. B. (1983), Psychometric Theory and Language Testing. Rowley, Mass: Newbury
House.
Additional Reading:
APPLICATION
A. Construct a table of specifications for a 50-item test. Follow the format and guidelines
discussed in the lesson.
17 | P a g e
B. Construct three multiple-choice, short answers and essay-tests each. Use each test
constructed to analyze the basic principles of testing.
Learning Outcomes At the end of the lesson, the students are expected to:
discuss the basic concepts in reproducing and
administering test items;
identify the steps in improving test items;
perform item analysis properly and correctly; and
interpret the results of item analysis
Time Frame 3 weeks
18 | P a g e
INTRODUCTION
Welcome to your Module 4! This module will be focusing on guiding you in
administering, analyzing, and improving tests.
ACTIVITY
A. Revisit and revise the table of specifications you created in the last module. Improve
the table of specifications with emphasis on the assessment of higher-order thinking
skills. Consider one unit of lesson in line with your specialization as the content area.
Follow Bloom’s Revised taxonomy as process objectives.
ANALYSIS
A. Highlight the revisions you made in your table of specifications. What are the
considerations you have when you made the revisions?
ABSTRACTION
1. Development of Test
19 | P a g e
c. Writing the test
The test follows the format and principles in test creation
2. Validation
Test should be given to experts in the field for validation and review of content.
3. Editing of Test
The test should be edited based on the comments and suggestions of the experts.
5. Pilot Testing
The responses are to be analyzed to establish the construct validity and the reliability of
the test using Rasch Model. Item analysis is to follow.
Item Response Theory (aka IRT) is a modern test theory. It states that items on a test
have a particular difficulty attached to them; that they can be placed in order of
difficulty, and that the test taker has a fixed level of ability. This means that difficult
items can only be answered by high performing students.
In IRT, the true score is defined on the construct of interest rather than on the test.
Some applications where IRT is handy include: Item bias analysis-- a test of item
equivalence across groups. An item is tested if it is behaving differently for males and
females.
Item Response Theory (IRT) provides an estimate of the true score that is not
based on the number of correct items. This frees us to give different people different
20 | P a g e
test items but still place people on the same scale. One particularly exciting feature of
tailored testing is the capability to give people test items that are matched (close) to
them. This has implications for test security -- different people get different tests.
Differential Item Functioning (DIF) investigates the items in a test, one at a time, for
signs of interactions with sample characteristics (Tinsley, 2005).
Moreover, Differential Item Functioning (DIF) occurs when people from different groups
(commonly gender or ethnicity) with the same construct(ability/skill) have a different
probability of giving a certain response on a questionnaire or test. Differential Item
Functioning (DIF) analysis provides an indication of unexpected behavior by item on a
test.
This item response analysis was used by McNamara(1990) to Occupational English
Test in Australia and found out that there are misfits in the test. Some test items were
not matched to the students’ abilities.
APPLICATION
21 | P a g e
Congratu
lations
for
completi
ng your
midterm!
!!
22 | P a g e