Assessment of Learning Handout
Assessment of Learning Handout
Test
0
An instrument designed to measure any quality, ability, skill or knowledge.
0
Comprised of test items of the area it is designed to measure.
Measurement
0
A process of quantifying the degree to which someone/something possesses a given
trait (i.e. quality, characteristics or features)
0
A process by which traits, characteristics and behaviour’s are differentiated.
Assessment
0 0
A process of gathering and organizing data into an interpretable form to have basis for
decision-making
0
It is a prerequisite to evaluation. It provides the information which enables evaluation to
take place.
Evaluation
0
A process of systematic analysis of both qualitative and quantitative data in order to
make sound judgment or decision.
0
It involves judgment about the desirability of changes in students.
MODES OF ASSESSMENT
MODE DESCRIPTION EXAMPLES
ADVANTAGE DISADVANTAGE
The objective paper- ■ Standardized ■S Scoring is ■ Preparation of
S
and-pen test which Tests objective instrument is
Traditional
usually assesses low- ■ Teacher-made ■ Administration timeconsuming
level thinking skills Tests is easy ■ Prone to cheating
because
A mode of ■ Practical Test ■ Preparation of ■ Scoring tends to
assessment that ■ Oral and Aural the instrument be subjective
Performance requires actual
Tests is relatively without rubrics
demonstration of ■ Projects easy ■ Administration is
skills or creation of ■ Measures time consuming
A process of ■ Working ■ Measures ■ Development is
gathering multiple Portfolios student’s time consuming
Portfolio
indicators of student ■ Show growth and ■ Rating tends to
progress to support Portfolios development be subjective
course goals in without rubrics
of
notprerequisite
certifies
■ determine
mastery
graded recumng
of
learning
3) Validity
0
This refers to the degree to which a score-based inference is
appropriate, reasonable, and useful.
4) Reliability
0
This refers to the degree of consistency when several items in a test
measure the same thing, and stability when the same measures are
given across time.
5) Fairness
0
Fair assessment is unbiased and provides students with opportunities
to demonstrate what they have learned. / - r-,C ' .. <. \
W
6) Positive Consequences
0
The overall quality of assessment is enhanced when it has a positive
effect on student motivation and study habits. For the teachers, high-
quality assessments lead to better information and decision-making
about students.
B. AFFECTIVE DOMAIN
Categories Description Some Illustrative Verbs
■ Willingness to receive or to attend to a ■ Acknowledge, ask, choose,
Receiving
particular phenomenon or stimulus follow, listen, reply, watch
C. PSYCHOMOTOR DOMAIN
Categories Description Some Illustrative Verbs
■ Early stages in learning a complex skill after an ■ Carry out, assemble,
Imitation
indication of readiness to take a particular type practice, follow, repeat,
of action. sketch, move
■ A particular skill or sequence is practiced (same as imitation) •
Manipulation continuously until it becomes habitual and done acquire, complete,
with some confidence and proficiency. conduct, improve,
(same as imitation and
■ A skill has been attained with proficiency and
Precision manipulation)
efficiency.
• Achieve, accomplish, excel,
master, succeed, surpass
■ An individual can modify movement patterns to a ■ Adapt, change, excel,
Articulation reorganize, rearrange,
meet a particular situation.
revise
■ An individual responds automatically and creates
Naturalization ■ Arrange, combine, compose,
new motor acts or ways of manipulation out of
construct, create, design
understandings, abilities, and skills developed.
DIFFERENT TYPES OF TESTS
MAIN POINTS FOR TYPES OF TESTS
COMPARISON
Psychological Educational
■ Aims to measure students ■ Aims to measure the result of
intelligence or mental ability in instructions and learning (e.g.
Purpose
a large degree without Achievement Tests,
reference to what the students Performance Tests)
has learned (e.g. Aptitude
Survey Mastery
■ Covers a broad range of ■ Covers a specific objective
Scope of Content objectivesgeneral achievement
■ Measures ■ Measures fundamental skills
in certain subjects
■ Constructed by trained and abilities
■ Typically constructed by the
professional teacher
Verbal Non-Verbal
Language Mode ■ Words are used by students in ■ Students do not use words in
attaching meaning to or attaching meaning to or in
responding to test items responding to test items
Standardized Informal
■ Constructed by a professional ■ Constructed by a classroom
item writer
■ Covers teacher
a broad range of content ■ Covers a narrow range of
covered in a subject area content
Construction ■ Uses mainly multiple choice ■ Various types of items are used
■ Items written are screened and ■ Teacher picks or writes items as
the best items were chosen for needed for the test
■ Can
the be scored
final by a machine
instrument ■ Scored manually by the teacher
■ Interpretation of results is ■ Interpretation is usually
usually norm-referenced criterion-referenced
Individual Group
■ Mostly given orally or requires • This is a paper-and-pen test
actual demonstration
■ One-on-one situations, of skill
thus, ■ Loss of rapport, insight and
Manner of
Administration many opportunities for clinical knowledge about each
■ Chance to follow-up examinee’s
observation ■ Same amount of time needed to
examinee
response in order to clarify or gather information from one
comprehend it more clearly student
Objective Subjective
■ Scorer's personal judgment ■ Affected by scorer’s personal
Effect of Biases does not
■ Worded affect
that onlythe
onescoring
answer is opinions,
- Several biasesare
answers andpossible
acceptable
■ Little or no disagreement on ■ Possible to disagreement on
what is the correct answer what is the correct answer
Power Speed
■ Consists of series of items ■ Consists of items
Time Limit and Level
arranged in ascending order of approximately equal in difficulty
of Difficulty
■ Measures
difficulty student’s ability to ■ Measure’s student’s speed or rate
answer more and more difficult and accuracy in responding
items
Selective Supply
■ There are choices for the answer ■ There are no choices for the
Norm-Referenced Criterion-Referenced
■ Result is interpreted by comparing ■ Result is interpreted by comparing
one student’s performance with student’s performance based on
other students’ performance a predefined standard (mastery)
■ Some will really pass ■ All or none may pass
■ There is competition for a limited ■ There is no competition for a
percentage of high scores limited percentage of high score
■ Typically covers a large domain of ■ Typically focuses on a delimited
Interpretation
learning tasks domain of learning tasks
■ Emphasizes discrimination among ■ Emphasizes description of what
individuals in terms of level of learning tasks individuals can and
■ Favors items of average difficulty
learning ■ Matches item difficulty to learning
cannot perform
and typically omits very easy and tasks, without altering item
very hard items difficulty or omitting easy or hard
■ Interpretation requires a clearly ■ Interpretation requires a clearly
defined group defined and delimited
achievement domain
a. Multiple Choice - consists of a stem which describes the problem and 3 or more alternatives which
give the suggested solutions. The incorrect alternatives are the distractors.
b. True-False or Alternative Response - consists of declarative statement that one has to mark true
or false, right or wrong, correct or incorrect, yes or no, fact or opinion, and the like.
c. Matching Type - consists of two parallel columns: Column A, the column of premises from which a
match is sought: Column B. the column of responses from which the selection is made.
■ Easy to construct
■ Can be used only when dichotomous
■ Can be effectively and objectively
answers represent sufficient response
scored
options
■ Allows comparison of related ideas, ■ Difficult to produce a sufficient number
Matching Type
2. Supply Test
a. Short Answer - uses a direct question that can be answered by a word, phrase, a number, or a
symbol
b. Completion Test - consists of an incomplete statement
Advantages Limitations
■ Easy to construct « Generally limited to measuring recall of
■ Require the student to supply the information
answer • More likely to be scored erroneously due
to a variety of responses
3. Essay Test
a. Restricted Response - limits the content of the response by restricting the scope of the topic
b. Extended Response - allows the students to select any factual information that they think is
pertinent, to organize their answers in accordance with their best judgment
Advantages Limitations
■ Measure more directly behaviors • Provide a less adequate sampling of
specified by performance objectives content
■ Examine students’ written • Less reliable scoring
communication skills • Time-consuming to score
GENERAL SUGGESTIONS IN WRITING TESTS
SPECIFIC SUGGESTIONS
A. SUPPLY TYPE
1. Word the item/s so that the required answer is both brief and specific.
2. Do not take statements directly from textbooks to use as a basis for short answer items.
3. A direct question is generally more desirable than an incomplete statement.
4. If the item is to be expressed in numerical units, indicate type of answer wanted.
5. Blanks should be equal in length.
6. Answers should be written before the item number for easy checking.
7. When completion items are to be used, do not have too many blanks. Blanks should be at the
center of the sentence and not at the beginning.
Essay Type
8. Restrict the use of essay questions to those learning outcomes that cannot be satisfactorily
measured by objective items.
9. Formulate questions that will cell forth the behavior specified in the learning outcome.
10. Phrase each question so that the pupils’ task is clearly indicated.
11. Indicate an approximate time limit for each question.
12. Avoid the use of optional questions.
B. SELECTIVE TYPE
Alternative-Response
1. Avoid broad statements.
2. Avoid trivial statements.
3. Avoid the use of negative statements especially double negatives.
4. Avoid long and complex sentences.
5. Avoid including two ideas in one sentence unless cause and effect relationship is being
measured.
6. If opinion is used, attribute it to some source unless the ability to identify opinion is being
specifically measured.
7. True statements and false statements should be approximately equal in length.
8. The number of true statements and false statements should be approximately equal.
9. Start with false statement since it is a common observation that the first statement in this type is
always positive.
Matching Type
1. Use only homogenous materials in a single matching exercise.
2. Include an unequal number of responses and premises, and instruct the pupils that response
may be used once, more than once, or not at all.
3. Keep the list of items to be matched brief, and place the shorter responses at the right.
4. Arrange the list of responses in logical order.
5. Indicate in the directions the bass for matching the responses and premises.
6. Place all the items for one matching exercise on the same page.
Multiple Choice
1. The stem of the item should be meaningful by itself and should present a definite problem.
2. The item should include as much of the item as possible and should be free of irrelevant
information.
3. Use a negatively stated item stem only when significant learning outcome requires it.
4. Highlight negative words in the stem for emphasis.
5. All the alternatives should be grammatically consistent with the stem of the item.
6. An item should only have one correct or clearly best answer.
7. Items used to measure understanding should contain novelty, but beware of too much.
8. All distracters should be plausible. m .A&
9. Verbal association between the stem and the correct answer should be avoided.
10. The relative length of the alternatives should not provide a clue to the answer.
11. The alternatives should be arranged logically. '
12. The correct answer should appear in each of the alternative positions and approximately equal
number of times but in random number.
13. Use of special alternatives such as “none of the above” or “all of the above” should be done
sparingly.
14. Do not use multiple choice items when other types are more appropriate.
15. Always have the stem and alternatives on the same page.
16. Break any of these rules when you have a good reason for doing so.
ALTERNATIVE ASSESSMENT
PORTFOLIO ASSESSMENT
Characteristics:
1. Adaptable to individualized instructional goals
2. Focus on assessment of products
3. Identify students’ strengths rather than weaknesses
4. Actively involve students in the evaluation process
5. Communicate student achievement to others
6. Time-consuming
7. Need of a scoring plan to increase reliability
TYPES DESCRIPTION
Showcase ■ A collection of students’ best work
Reflective • Used for helping teachers, students, and family members think about various
dimensions of student learning (e.g. effort, achievement, etc.)
• A collection of items done for an extended period of time
Cumulative
■ Analyzed to verify changes in the products and process associated with student
learning of works chosen by students and teachers to match pre-established
• A collection
Goal-based
objectives
■ A way of documenting the steps and processes a student has done to complete
Process
a piece of work
RUBRICS
^ scoring guides, consisting of specific pre-established performance criteria, used in evaluating
student work on performance assessments
Two Types:
1. Holistic Rubric - requires the teacher to score the overall process or product as a whole, without
judging the component parts separately
2. Analytic Rubric - requires the teacher to score individual components of the product or
performance first, then sums the individual scores to obtain a total score \
AFFECTIVE ASSESSMENTS
1. Closed-Item or Forced-choice Instruments - ask for one or specific answer
o' a. Checklist - measures students’ preferences, hobbies, attitudes, feelings, beliefs, interests, etc. by
marking a set of possible responses
b. Scales - these instruments that indicate the extent or degree of one’s response
1) Rating Scale - measures the degree or extent of one’s attitudes, feelings, and perception
about ideas, objects and people by marking a point along 3- or 5- point scale
2) Semantic Differential Scale - measures the degree of one’s attitudes, feelings and perceptions
about ideas, objects and people by marking a point along 5- or 7- or 11- point scale of
semantic adjectives
3) Likert Scale - measures the degree of one’s agreement or disagreement on positive or
negative statements about objects and people
VALIDITY - the degree to which a test measures what is intended to be measured. It is the
usefulness of the test for a given purpose. It is the most important criteria of a good examination.
FACTORS influencing the validity of tests in general
0
Appropriateness of test - it should measure the abilities, skills and information it is supposed to
measure
0
Directions - it should indicate how the learners should answer and record their answers
0
Reading Vocabulary and Sentence Structure - it should be based on the intellectual level of
maturity and background experience of the learners
0
Difficulty of Items- it should have items that are not too difficult and not too easy to be able to
discriminate the bright from slow pupils
0
Construction of Items - it should not provide clues so it will not be a test on clues nor should it
be ambiguous so it will not be a test on interpretation
0
Length of Test - it should just be of sufficient length so it can measure what it is supposed to
measure and not that it is too short that it cannot adequately measure the performance we want
to measure
0
Arrangement of Items - it should have items that are arranged in ascending level of difficulty
such that it starts with the easy ones so that pupils will pursue on taking the test
0
Patterns ofAnswers - it should not allow the creation of patterns in answering the test
0
Construct Validity - is established statistically by comparing psychological traits or factors that
influence scores in a test, e.g. verbal, numerical, spatial, etc.
0
Convergent Validity - is established if the instrument defines another similar trait other
than what it intended to measure (e.g. Critical Thinking Test may be correlated with
Creative Thinking Test)
0
Divergent Validity - is established if an instrument can describe only the intended trait
and not other traits (e.g. Critical Thinking Test may not be correlated with Reading
Comprehension Test)
RELIABILITY - it refers to the consistency of scores obtained by the same person when retested using
the same instrument or one that is parallel to it.
Type of Reliability
Method Procedure Statistical Measure
Measure
Measure of Give a test twice to the same
Test-Retest Pearson r
stability group with any time interval
between sets from several minutes
Measure of Give parallel forms of test at the
Equivalent Forms Pearson r
equivalence same time between forms
Test-Retest with Measure of stability Give parallel forms of test with
Pearson r
Equivalent Forms and equivalence increased time intervals between
forms
Give a test once. Score equivalent Pearson r and
Split Half
halves of the test (e.g. odd-and Spearman-Brown
even numbered items) Formula
Give the test once, then correlate Kuder-Richardson
Kuder-Richardson Measure of Internal
the proportion/percentage of the
Formula 20 and 21
Consistency
students passing and not passing
Give a test once. Then estimate
Cronbach Kuder-Richardson
reliability by using the standard
Coefficient Alpha Formula 20
deviation per item and the
standard deviation of the test
ITEM ANALYSIS
STEPS:
1. Score the test. Arrange the scores from highest to lowest.
2. Get the top 27% (upper group) and below 27% (lower group) of the examinees.
3. Count the number of examinees in the upper group (PT) and lower group (PB) who got ear item
correct.
4. Compute for the Difficulty Index of each item.
(PT + PB) ।-----------------------------------------------------1
Df =---------------------- I N= the total number of examinees I
N ।_________________________________1
INTERPRETATION
0
Leniency error: Faculty tends to judge better than it really is.
0
Generosity error: Faculty tends to use high end of scale only.
0
Severity error: Faculty tends to use low end of scale only.
0
Central tendency error: Faculty avoids both extremes of the scale.
0
Bias: Letting other factors influence score (e.g., handwriting, typos)
0
Halo effect: Letting general impression of student influence rating of specific criteria (e.g., student’s
prior work)
0
Contamination effect: Judgment is influenced by irrelevant knowledge about the student or other
factors that have no bearing on performance level (e.g., student appearance)
0
Similar-to-me effect: Judging more favourably those students whom faculty see as similar to
themselves (e.g., expressing similar interests or point of view)
0
First-impression effect: Judgment is based on early opinions rather than on a complete picture (e.g.,
opening paragraph)
0
Contrast effect: Judging by comparing student against other students instead of established criteria
and standards
0
Rater drift: Unintentionally redefining criteria and standards over time or across a series of scorings
(e.g., getting tired and cranky and therefore more severe, getting tired and reading more
quickly/leniently to get the job done)
FOUR TYPES OF MEASUREMENT SCALES
Measuremen Characteristics Examples
t
Nominal Groups and labal data Gender (1-male; 2-female)
1.
Normal / Bell-Shaped / Symmetrical
2.
Positively Skewed - most scores are below the mean and there are extremely high scores
3.
Negatively Skewed - most scores are above the mean and there are extremely low scores
4.
Leptokurtic - highly peaked and the tails are more elevated above the baseline
5.
Mesokurtic - moderately peaked
6.
Platykurtic - flattened peak
7.
Bimodal Curve - curve with 2 peaks or modes
8.
Polymodal Curve - curve with 3 or more modes
9.
Rectangular Distribution - there is no mode
MEASURES OF CORRELATION
Pearson r
STANDARD SCORES
0
Indicate the pupil’s relative position by showing how far his raw score is above or below average
0
Express the pupil’s performance in terms of standard unit from the mean
0
Represented by the normal probability curve or what is commonly called the normal curve
0
Used to have a common unit to compare raw scores from different tests
PERCENTILE
0
tells the percentage of examines that lies below one’s score
Z-SCORES
0
tells the number of standard deviations equivalent to a given raw score
GRADES:
a. Could represent:
0
how a student is performing in relation to other students (norm-referenced grading)
0
the extent to which a student has mastered a particular body of knowledge (criterion-
referenced grading)
0
how a student is performing in relation to a teacher’s judgment of his or her
potential
b. Could be for:
0
Certification that gives assurance that a student has mastered a specific content or
achieved a certain level of accomplishment - \\<:>
0
Selection that provides basis in identifying or grouping students for certain educational paths
or programs > &O
0
Direction that provides information for diagnosis and planning
0
Motivation that emphasizes specific material or skills to be learned and helping students to
understand and improve their performance
0
Norm-Referenced Grading - or grading based on relative standards where a student’s grade
reflects his or her level of achievement relative to the performance of other students in the class.
In this system, the grade is assigned based on the average of test scores.
0
Point or Percentage Grading System whereby the teacher identifies points or percentages for
various tests and class activities depending on their importance. The total of these points will be
the bases for the grade assigned to the student.
0
Contract Grading System where each student agrees to work for a particular grade according to
agreed-upon standards.
GUIDELINES IN GRADING STUDENTS
1. Explain your grading system to the students early in the course and remind them of the grading
policies regularly.
2. Base grades on a predetermined and reasonable set of standards.
3. Base your grades on as much objective evidence as possible.
4. Base grades on the student’s attitude as well as achievement, especially at the elementary and high
school level.
5. Base grades on the student’s relative standing compared to classmates.
6. Base grades on a variety of sources.
7. As a rule, do not change grades, once computed.
8. Become familiar with the grading policy of your school and with your colleague’s standards.
9. When failing a student, closely follow school procedures.
10. Record grades on report cards and cumulative records.
11. Guard against bias in grading.
12. Keep pupils informed of their standing in the class.