0% found this document useful (0 votes)
169 views22 pages

Educ 107 Midterm Course Pack

The document discusses principles of assessment including defining key terms like validity, reliability, practicality and ethics. It provides activities for students to research definitions of these terms and principles of testing. The document also shares general principles of assessment including ensuring assessments are valid, reliable, equitable and manage criteria for quality like validity, reliability and practicality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
169 views22 pages

Educ 107 Midterm Course Pack

The document discusses principles of assessment including defining key terms like validity, reliability, practicality and ethics. It provides activities for students to research definitions of these terms and principles of testing. The document also shares general principles of assessment including ensuring assessments are valid, reliable, equitable and manage criteria for quality like validity, reliability and practicality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Module No.

and Title MODULE 3: DEVELOPMENT OF CLASSROOM


ASSESSMENT TOOL

Lesson No. and Title Lesson 1: General Principles of Testing

Learning Outcomes At the end of the lesson, the students are expected to:
 define the following terms: validity, reliability,
practicality and efficiency, ethics, administrability,
scorability, objectivity, practicality; and
 discuss the different principles of high-quality
testing/assessing.
Time Frame 2 weeks

INTRODUCTION

Welcome to your third module which will be guiding you in the development of classroom
assessment tool. In this lesson, you are expected to master different terminologies and
principles you need to know in developing various assessment tools.

ACTIVITY
A. Surf the Net
Instructions: Search for different proponents who gave updated definitions of
the following terms in relation to assessment in learning:
1. validity
2. reliability
3. practicality and efficiency
4. ethics
5. administrability
6. scorability
7. objectivity
8. practicality

1|Page
ANALYSIS

A. Surf and annotate. Search for different articles and journals that present updated
information on the general principles of testing. Browse and read at least 3 articles.
Annotate each article.

2|Page
ABSTRACTION

GENERAL PRINCIPLES OF ASSESSMENT


Principle 1 - Assessment should be valid
Validity ensures that assessment tasks and associated criteria effectively measure
student attainment of the intended learning outcomes at the appropriate level.
Principle 2 - Assessment should be reliable and consistent
There is a need for assessment to be reliable and this requires clear and consistent
processes for the setting, marking, grading and moderation of assignments.
Principle 3 - Information about assessment should be explicit, accessible and
transparent
Clear, accurate, consistent and timely information on assessment tasks and procedures
should be made available to students, staff and other external assessors or examiners.
Principle 4 - Assessment should be inclusive and equitable
As far as is possible without compromising academic standards, inclusive and equitable
assessment should ensure that tasks and procedures do not disadvantage any group or
individual.
Principle 5 - Assessment should be an integral part of programme design and
should relate directly to the programme aims and learning outcomes
Assessment tasks should primarily reflect the nature of the discipline or subject but
should also ensure that students have the opportunity to develop a range of generic
skills and capabilities.
Principle 6 - The amount of assessed work should be manageable
The scheduling of assignments and the amount of assessed work required should
provide a reliable and valid profile of achievement without overloading staff or students.
Principle 7 - Formative and summative assessment should be included in each
programme
Formative and summative assessment should be incorporated into programmes to
ensure that the purposes of assessment are adequately addressed. Many programmes
may also wish to include diagnostic assessment.
Principle 8 - Timely feedback that promotes learning and facilitates improvement
should be an integral part of the assessment process

3|Page
Students are entitled to feedback on submitted formative assessment tasks, and on
summative tasks, where appropriate. The nature, extent and timing of feedback for each
assessment task should be made clear to students in advance.
Principle 9 - Staff development policy and strategy should include assessment
All those involved in the assessment of students must be competent to undertake their
roles and responsibilities.

CRITERIA FOR QUALITY OF TESTING AND ASSESSMENT


VALIDITY
Validity refers to whether or not the test measures what it claims to measure.
Items will be closely linked to the test’s intended focus.

Types of Validity

1. Content Validity
 a logical process where connections between the test items and the
outcome-related and job-related tasks are established
 Subject matter experts (SMEs) review the test items. They are given the
list of content areas specified in the test blueprint, along with the test
items intended to be based on each content area. The SMEs are then
asked to indicate whether or not they agree that each item is appropriately
matched to the content area indicated.
 Any items that the SMEs identify as being inadequately matched to the
test blueprint, or flawed in any other way, are either revised or
dropped from the test.

2. Concurrent Validity
 a statistical method using correlation, rather than a logical method
 examinees who are known to be either masters or non-masters on the
content measured by the test are identified, and the test is administered to
them under realistic exam conditions.
 relationship is estimated between the examinees’ known status as either
masters or non-masters and their classification as masters or non-masters
(i.e., pass or fail) based on the test.
 it provides evidence that the test is classifying examinees correctly. The
stronger the correlation is the greater the concurrent validity of the
test is.

3. Predictive Validity
 it measures the relationship between examinees' performances on the test
and their actual status as masters or non-masters. However, with

4|Page
predictive validity, it is the relationship of test scores to an examinee's
future performance as a master or non-master that is estimated.
 this type of validity is especially useful for test purposes such as selection
or admissions.

4. Face Validity
 determined by a review of the items and not through the use of statistical
analyses.
 is not investigated through formal procedures and is not determined by
subject matter experts. Instead, anyone who looks over the test, including
examinees and other stakeholders, may develop an informal opinion as to
whether or not the test is measuring what it is supposed to measure.

The validity of a test is critical because, without sufficient validity, test scores have no
meaning. The evidence you collect and document about the validity of your test is also
your best legal defense should the exam program ever be challenged in a court of law.
While there are several ways to estimate validity, for many certification and licensure
exam programs the most important type of validity to establish is content validity.

Source: Professional Testing, Inc. (2006)

RELIABILITY

Test should produce consistent results at different times. If the test


conditions stay the same, different groups of students at a particular level of
ability should get the same result each time (British Council, 2005). However, it is
difficult to say precisely how high a reliability quotient should be before it may be
regarded as satisfactory; much depends upon the decisions a teacher makes on the
basis of the test results (Harris, 1977).
According to Understanding Item Analysis Report (2008), the test’s reliability
must be interpreted as follows:

.90 and above Excellent reliability; at the level of the best standardized tests

.80-.90 Very good for a classroom test

.70-.80 Good for a classroom test; in the range of most. There are
probably a few items which could be improved.

.60-.70 Somewhat low. The test should be supplemented by other


measures (e.g. more tests) to determine grades. There are
probably some items which could be improved.

5|Page
.50-.60 Suggests need for revision of test, unless it is quite short (ten or fewer
items). The test definitely needs to be supplemented by other means (e.g.
more tests) for grading.

.50 or below Questionable reliability. This test should not contribute heavily to the
course grade, and it needs revision.

Hughes (2003) however suggested some pointers to consider in ensuring the


reliability of a test:
1. Exclude items which do not discriminate well between low performing and high-
performing students
2. Make your items unambiguous.
3. Provide clear and explicit instructions.
4. Ensure that tests are well laid out and perfectly legible.
5. Make candidates familiar with format and testing techniques.
6. Provide uniform and non-distracting conditions of administration.
7. Use items that permit scoring which is as objective as possible.
8. Provide a detailed scoring key.
9. Train raters.
10. Agree acceptable response and appropriate scores at the outset of scoring.
11. Identify candidates by number, not name.
12. Employ independent scoring.

Source: Roble, C. (2020). Module on Language Testing and Assessment. USEP.

PRACTICALITY
The test must be of sufficient length to yield dependable and meaningful results
(Harris, 1977).

The number of questions and the type(s) of questions used both affect the
amount of time needed for completion of the test. Nitko (in Planning the test,2001)
provides some estimates of time to complete various types of questions for junior and
senior high school students. Oosterhof (2001), gives similar estimates but indicates that
elementary students and poor readers might need more time (Alabama Department of
Education, 2001).

True-False questions 15-30 seconds


Multiple choice (brief recall questions) 30-60 seconds
More complex multiple-choice questions 60-90 seconds
Multiple choice problems with calculations 2-5 minutes
Short answer (one word) 30-60 seconds
Short answer (longer than one word) 1-4 minutes
Matching (5 premises, 6 responses) 2-4 minutes
Short essays 15-20 minutes
Data analyses/graphing 15-25 minutes
Drawing models/labelling 20-30 minutes
Extended essays 35-50 minutes
6|Page
Source: Roble, C. (2020). Module on Language Testing and Assessment. USEP.

CRITERIA FOR HIGH-QUALITY ASSESSMENT


1. Assessment of Higher-Order Cognitive Skills

Most of the tasks students encounter should tap the kinds of cognitive skills that have
been characterized as “higher-level”—skills that support transferable learning, rather
than emphasizing only skills that tap rote learning and the use of basic procedures.
While there is a necessary place for basic skills and procedural knowledge, it must be
balanced with attention to critical thinking and applications of knowledge to new
contexts.

2. High-Fidelity Assessment of Critical Abilities:


In addition to key subject matter concepts, assessments should include the critical
abilities articulated in the standards, such as communication (speaking, reading, writing,
and listening in multi-media forms), collaboration, modeling, complex problem solving,
planning, reflection, and research. Tasks should measure these abilities directly as they
will be used in the real world, rather than through a remote proxy.

3. Standards that Are Internationally Benchmarked


The assessments should be as rigorous as those of the leading education countries, in
terms of the kind of content and tasks they present, as well as the level of performance
they expect.

4. Use of Items that Are Instructionally Sensitive and Educationally Valuable


The tasks should be designed so that the underlying concepts can be taught and
learned, rather than reflecting students’ differential access to outside-of-school
experiences (frequently associated with their socioeconomic status or cultural context)
or depending on tricky interpretations that mostly reflect test-taking skills. Preparing for
and participating
in the assessments should engage students in instructionally valuable activities, and
results from the tests should provide instructionally useful information.

5. Assessments that Are Valid, Reliable, and Fair


In order to be truly valid for a wide range of learners, assessments should measure well
what they purport to measure, accurately evaluate students’ abilities, and do so reliably
across testing contexts and scorers. They should also be unbiased and accessible and
used in ways that support positive outcomes for students and instructional quality.
Source: Darling-Hammond, L., Herman, J., Pellegrino, J., et al. (2013).

7|Page
Criteria for high-quality assessment. Stanford, CA: Stanford Center for Opportunity
Policy in Education.

APPLICATION

A. Summary Report

Report your learning by highlighting the criterion of quality assessment and


testing that you considered the top most important among the criteria enlisted and
discussed. Explain why you considered it the most important consideration in crafting an
assessment tool.

8|Page
2. FORMATIVE ASSESSMEN
Module No. and Title MODULE 3: DEVELOPMENT OF CLASSROOM
ASSESSMENT TOOL

Lesson No. and Title Lesson 2: Principles of Test Creation

Learning Outcomes At the end of the lesson, the students are expected to:
 construct a table of specification using different
formats;
 Examine the different rules in constructing multiple-
choice test, matching types test, completion test, true
or false test; and
 construct multiple-choice test, matching types test,
completion test, true or false test
Time Frame 1 week

INTRODUCTION

You are done with knowing the principles and criteria that you need to consider in testing
and assessing your learner’s knowledge and skills. This time, you will be acquainted with the
proper way of making a table of specifications and rules in constructing different assessment
tools that you can use in your classroom.

ACTIVITY

A. Supposed you are already teaching. What do you think should be considered and
done first thing before making an assessment tool? What are the necessary
preparations that you may do in order for you to craft an appropriate and successful
classroom assessment tool?

9|Page
ANALYSIS

A. Study the blueprint for an assessment test below. Dissect each part. Share your
opinion on how should this be used in the assessment procedure.

10 | P a g e
ABSTRACTION

PRINCIPLES OF TEST CREATION


(Measurement and Evaluation in Education)

CONSTRUCTION OF TESTS IN THE CLASSROOM

Teacher-made tests are indispensable in evaluation as they are handy in


assessing the degree of mastery of the specific units taught by the teacher. The
principles behind the construction of the different categories of tests are as follows:

Planning for the Test

Many teacher-made tests often suffer from inadequate and improper planning. Many
teachers often jump into the classroom to announce to the class that they are having a
test or construct the test haphazardly.

It is at the planning stage that such questions as the ones listed below are resolved:

(i) What is the intended function of this test? Is it to test the effectiveness of your
method, level of competence of the pupils, or diagnose area of weakness before other
topics are taught?
(ii) What are the specific objectives of the content area you are trying to achieve?
(iii) What content area has been taught? How much emphasis has been given to each
topic?
(iv) What type of test will be most suitable (in terms of effectiveness, cost and
practicality) to achieve the intended objectives of the contents?

Defining Objectives

As a competent teacher, you should be able to develop instructional objectives that are
behavioral, precise, realistic and at an appropriate level of generality that will serve as a
useful guide to teaching and evaluation.

However, when you write your behavioural objectives, use such action verbs like define,
compare, contrast, draw, explain, describe, classify, summarize, apply, solve, express,
state, list and give. You should avoid vague and global statements involving the use of
verbs such as appreciate, understand, feel, grasp, think etc.

It is important that we state objectives in behavioural terms so as to determine the


terminal behaviour of a student after having completed a learning task. Martin

11 | P a g e
Haberman (1964) says the teacher receives the following benefits by using behavioural
objectives:

1. Teacher and students get clear purposes.


2. Broad content is broken down to manageable and meaningful pieces.
3. Organizing content into sequences and hierarchies is facilitated.
4. Evaluation is simplified and becomes self-evident.
5. Selecting of materials is clarified (The result of knowing precisely what youngsters
are to do leads to control in the selection of materials, equipment and the management
of resources generally).

Specifying the Content to be covered

You should determine the area of the content you want to test. It is through the content
that you will know whether the objectives have been achieved or not.

Preparation of the Test Blueprint

Test blueprint is a table showing the number of items that will be asked under each
topic of the content and the process objective. This is why it is often called
Specification Table. Thus, there are two dimensions to the test blueprint, the content
and the process objectives.

As mentioned earlier, the content consists of the series of topics from which the
competence of the pupils is to be tested. These are usually listed on the left-hand side
of the table. The process objectives or mental processes are usually listed on the top-
row of the table.

The process objectives are derived from the behavioural objectives stated for the
course initially. They are the various mental processes involved in achieving each
objective. Usually, there are about six of these as listed under the cognitive domain viz:
Knowledge, Comprehension, Analysis, Synthesis, Application and Evaluation.

Weighting of the Content and Process Objectives

The proportion of test items on each topic depends on the emphasis placed on it
during teaching and the amount of time spent. Also, the proportion of items on
each process objectives depends on how important you view the particular
process skill to the level of students to be tested. However, it is important that
you make the test a balanced one in terms of the content and the process
objectives you have been trying to achieve through your series of lessons.

Percentages are usually assigned to the topics of the content and the process
objectives such that each dimension will add up to 100%. (see the table below).

12 | P a g e
After this, you should decide on the type of test you want to use and this will
depend on the process objective to be measured, the content and your own skill
in constructing the different types of tests.
Determination of the Total Number of Items

At this stage, you consider the time available for the test, types of test items to be
used (essay or objective) and other factors like the age, ability level of the
students and the type of process objectives to be measured.

When this decision is made, you then proceed to determine the total number of
items for each topic and process objectives as follows:

(i) To obtain the number of items per topic, you multiply the percentage of each
by the total number of items to be constructed and divide by 100. This you will
record in the column in front of each topic in the extreme right corner of the
blueprint. In the table below, 25% was assigned to soil. The total number of items
is 50 hence 12 items for the topic (25% of 50 items = 12 items).
(ii) To obtain the number of items per process objective, we also multiply the
percentage of each by the total number of items for test and divide by 100. These
will be recorded in the bottom row of the blueprint under each process objective.
In the table below:

(a) the percentage assigned to comprehension is 30% of the total number of


items which is 50. Hence, there will be 15 items for this objective (30% of 50
items).

(b) To decide the number of items in each cell of the blue print, you simply
multiply the total number of items in a topic by the percentage assigned to the
process objective in each row and divide by 100. This procedure is repeated
for all the cells in the blue print. For example, to obtain the number of items
on water under knowledge, you multiply 30% by 10 and divide by 100 i.e. 3.

13 | P a g e
CONSTRUCTION OF DIFFERENT ASSESSMENT TOOLS

Basic Principles for Constructing Multiple-Choice Questions

Multiple-choice questions are said to be objective in two ways. First is that each student
has an equal chance. He/she merely chooses the correct options from the list of
alternatives. The candidates have no opportunity to express a different attitude or
special opinion. Secondly, the judgment and personality of the marker cannot influence
the correction in any way. Indeed, many objective tests are scored by machines. This
kind of test may be graded more quickly and objectively than the subjective or the easy
type.

14 | P a g e
An example of a fairly unambiguous instructions are stated below: read the instructions
carefully.

i. Candidates are advised to spend only 45 minutes on each subject and attempt
all questions.
ii. A multiple-choice answer sheet for the four subjects has been provided. Use
the appropriate section of the answer sheet for each subject.
iii. Check that the number of each question you answer tallies with the number
shaded on your answer sheet.
iv. Use an HB pencil throughout.

15 | P a g e
Basic Principles for Constructing Short-Answer Tests

Some of the principles for constructing multiple choice tests are relevant to constructing
short-answer tests.

i. The instructions must be clear and unambiguous. Candidates should know what to
do.
ii. Enough space must be provided for filing in gaps or writing short answers.
iii. As much as possible the questions must be set to elicit only short answers. Do not
construct long answer-question in a short answer test.
iv. The test format must be consistent. Do not require fill in gaps and matching in the
same question.
v. The questions should be related to what is taught, what is to be taught or what to be
examined. Candidates must know beforehand the requirements and demands of the
test.

16 | P a g e
Basic Principles for Constructing Essay Tests

Essay or subjective type of test is considered to be subjective because you are able to
express your own opinions freely and interpret information in anyway you like, provided
it is logical, relevant, and crucial to the topic. In the same way, your teacher is able to
evaluate the quality and quantity of your opinions and interpretations as well as your
organization and logic of your presentation. The following are the basic principles
guiding the setting of essay question:

i. Instructions of what to do should be clear, unambiguous and precise.


ii. Your essay questions should be in layers. The first layer tests the concept, fact, its
definition and characteristics. The second layer tests the interpretation of and
inferences from the concept, fact or topic, concept, structure, etc to real life situation.
In the third layer, you may be required to construct, consolidate, design, or produce
your own structure, concept, fact, scenario or issue.
iii. Essays should not merely require registration of facts learnt in the class. They should
not also be satisfied with only the examples given in class.
iv. Some of the words that can be used in an essay type of test are: compare and
contrast,
criticize, critically examine, discuss, describe, outline, enumerate, define, state, relate,
illustrate, explain, summarize, construct, produce, design, etc. Remember, some of
the words are mere words that require regurgitation of facts, while others require
application of facts.

Source: Carroll, J. B. (1983), Psychometric Theory and Language Testing. Rowley, Mass: Newbury
House.

Additional Reading:

-PRINCIPLES OF TEST CREATION: A Self-Instructional Handbook for BYU Educators. (2003).


-ASSESSMENT TOOL TYPES. (2000)

APPLICATION

A. Construct a table of specifications for a 50-item test. Follow the format and guidelines
discussed in the lesson.

17 | P a g e
B. Construct three multiple-choice, short answers and essay-tests each. Use each test
constructed to analyze the basic principles of testing.

Module No. and Title MODULE 4: ADMINISTERING, ANALYZING AND


IMPROVING TESTS

Lesson No. and Title Lesson 1: Administering, analyzing, and improving


teacher-made tests

Learning Outcomes At the end of the lesson, the students are expected to:
 discuss the basic concepts in reproducing and
administering test items;
 identify the steps in improving test items;
 perform item analysis properly and correctly; and
 interpret the results of item analysis
Time Frame 3 weeks

18 | P a g e
INTRODUCTION
Welcome to your Module 4! This module will be focusing on guiding you in
administering, analyzing, and improving tests.

ACTIVITY

A. Revisit and revise the table of specifications you created in the last module. Improve
the table of specifications with emphasis on the assessment of higher-order thinking
skills. Consider one unit of lesson in line with your specialization as the content area.
Follow Bloom’s Revised taxonomy as process objectives.

ANALYSIS

A. Highlight the revisions you made in your table of specifications. What are the
considerations you have when you made the revisions?

ABSTRACTION

Process of Test Creation

1. Development of Test

a. Planning the test


The syllabus was reviewed for the content and scope of the test afterwhich, the
competencies to be measured were identified. Then, the table of specifications was
prepared.
b. Preparing the test

19 | P a g e
c. Writing the test
The test follows the format and principles in test creation

2. Validation

Test should be given to experts in the field for validation and review of content.

3. Editing of Test

The test should be edited based on the comments and suggestions of the experts.

4. Production of the Test

Copies of the test were produced for pilot testing.

5. Pilot Testing

The test should be pilot tested to target students.

6. Analysis for Construct Validity, Reliability and Item Characteristics.

The responses are to be analyzed to establish the construct validity and the reliability of
the test using Rasch Model. Item analysis is to follow.

7. Revision of Test Items


Test item flaws were revised based on the result of the analysis using Rasch Model.

8.Reproduction of the Test


The revised test with answer sheets should be reproduced.

Item Response Theory

Item Response Theory (aka IRT) is a modern test theory. It states that items on a test
have a particular difficulty attached to them; that they can be placed in order of
difficulty, and that the test taker has a fixed level of ability. This means that difficult
items can only be answered by high performing students.

In IRT, the true score is defined on the construct of interest rather than on the test.
Some applications where IRT is handy include: Item bias analysis-- a test of item
equivalence across groups. An item is tested if it is behaving differently for males and
females.

Item Response Theory (IRT) provides an estimate of the true score that is not
based on the number of correct items. This frees us to give different people different

20 | P a g e
test items but still place people on the same scale. One particularly exciting feature of
tailored testing is the capability to give people test items that are matched (close) to
them. This has implications for test security -- different people get different tests.

The Rasch Model for Measurement

In the Rasch model, the probability of a specified response (e.g. right/wrong


answer) is modeled as a function of person and item parameters. Specifically, in
the simple Rasch model, the probability of a correct response is modeled as a logistic
function of the difference between the person and item parameter. In most contexts, the
parameters of the model pertain to the level of a quantitative trait possessed by a
person or item. For example, in educational tests, item parameters pertain to the
difficulty of items while person parameters pertain to the ability or attainment
level of people who are assessed. The higher a person's ability relative to the difficulty
of an item, the higher the probability of a correct response on that item. When a
person's location on the latent trait is equal to the difficulty of the item, there is by
definition a 0.5 probability of a correct response in the

Differential Item Functioning

Differential Item Functioning (DIF) investigates the items in a test, one at a time, for
signs of interactions with sample characteristics (Tinsley, 2005).
Moreover, Differential Item Functioning (DIF) occurs when people from different groups
(commonly gender or ethnicity) with the same construct(ability/skill) have a different
probability of giving a certain response on a questionnaire or test. Differential Item
Functioning (DIF) analysis provides an indication of unexpected behavior by item on a
test.
This item response analysis was used by McNamara(1990) to Occupational English
Test in Australia and found out that there are misfits in the test. Some test items were
not matched to the students’ abilities.

APPLICATION

A. Create, Conduct, Recreate

Create a 50-item test questionnaire using your revised table of specifications.


Follow the steps in preparing and administering tests. Administer the test to your target
students. Analyze the test results and revise your test based on the result.

Make a checklist of the things and steps you have done.

21 | P a g e
Congratu
lations
for
completi
ng your
midterm!
!!

22 | P a g e

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy