0% found this document useful (0 votes)
74 views23 pages

CA 2 Assessment in Learning

The document discusses key concepts in assessment of learning including: 1. Assessment involves systematically gathering data on student learning to make informed decisions. It includes measurement and evaluation. 2. Common assessment types are formative, summative, diagnostic, and placement assessments. Formative assessments provide feedback during instruction while summative assessments evaluate learning after instruction. 3. Important principles of assessment are that it should have a clear purpose, be ongoing to enhance learning, and be learner-centered to improve teaching. Assessment considers both the process and products of learning.

Uploaded by

jjusayan474
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views23 pages

CA 2 Assessment in Learning

The document discusses key concepts in assessment of learning including: 1. Assessment involves systematically gathering data on student learning to make informed decisions. It includes measurement and evaluation. 2. Common assessment types are formative, summative, diagnostic, and placement assessments. Formative assessments provide feedback during instruction while summative assessments evaluate learning after instruction. 3. Important principles of assessment are that it should have a clear purpose, be ongoing to enhance learning, and be learner-centered to improve teaching. Assessment considers both the process and products of learning.

Uploaded by

jjusayan474
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

AGUSAN DEL SUR COLLEGE, INC.

ASSESSMENT IN LEARNING 1

CHAPTER I -INTRODUCTION TO ASSESSMENT IN LEARNING


 54 hours
 Course Map
 Basic Concepts & Principles in Assessing Learning
Introduction to  Assessment Purposes, Learning Targets, and Appropriate Methods
Assessment in  Classifications of Assessment
Learning

Development and  Planning a Written Test


Administration of  Construction of Written Tests
Tests  Establishing Test Validity and Reliability

Organization,  Planning a Written Test


Utilization, and  Construction of Written Tests
Communication of  Establishing Test Validity and Reliability
Test Results

LESSON 1 BASIC CONCEPTS AND PRINCIPLES IN ASSESSING LEARNING


What is Assessment in Learning?
- It is the process of gathering quantitative data to make decisions.
- It is vital to the educational process similar to curriculum and instruction.
Assessment in Learning
- Is a systematic and purpose-oriented collection, analysis, and interpretation of evidence of student learning to make
informed decisions relevant to the learners.
- Assessment in learning can be characterized as
o Process
o Based on specific objectives
o Multiple sources
- How is assessment in learning similar or different from the Concept of Measurement or Evaluation of Learning?
o Measurement is the process of quantifying the attributes of an object.
o Evaluation – the process of making value judgments on the information collected from measurement based on
specified criteria.
- Therefore, assessment can be considered as an umbrella term consisting of measurement and evaluation.
- However, some authors considered assessment as distinct and separate from evaluation (Huba and Freed 2000, Popham
1998)
Assessment and Testing
- The most common form of assessment is testing.
- A test is a form of assessment, but not all assessments use tests or testing.
- A test can be categorized as either:
o a selected response (e.g. – Matching Type of Test) or
o Constructed response (e.g. – Essay test or short answer test)
- A test can make use of an objective format.
o Multiple Choice
o Enumeration
- It provides for a more bias – free scoring, subjective format - Essay
- Allows for a less objective means of scoring especially if no rubric is used.
- A Table of Specifications (TOS) – a table that maps out the essential aspects of a test.
- A teacher is expected to be competent in the design and development of classroom tests.
Assessment and Grading
- Grading is a related concept of assessment in learning
- It is the process of assigning value to the performance of a learner based on the standard
- Bases for grading learners:
o Performance in:
 Recitation
 Homework
 Seatwork
 Project
- The final grade of a learner is the summation of information from multiple sources.
- Grading is a form of evaluation that provides information on whether a learner passed or failed a subject on a particular
assessment task.
What are the different Measurement Frameworks used in Assessment?
- The two (2) most common psychometric theories that serve as a framework for assessment and measurement:
o The Classical Test Theory (CTT)

1
 Known as the true score theory.
 Explains that variation in the performance of the examinees on a given measure is due to variation in
their abilities.
 CTT also assumes that an examinee’s observed score in a given measurement is the sum of the
examinee’s true score and some degree of error in the measurement
 Provides an estimation of the item's difficulty based on the frequency or number of examinees who
correctly answer a particular item.
 Items with fewer examinees with correct answers are considered more difficult.
 It provides an estimation of item discrimination based on the number of examinees with a higher or
lower ability to answer a particular item.
 If an item can distinguish between examinees with higher ability (higher total test score) and lower
ability (lower total test score) then an item is considered to have a good discrimination.
 Test reliability can also be estimated using approaches from CTT (e.g. Kuder-Richardson 20,
Cronbach’s alpha). Item analysis based on CTT has been the dominant approach because of the
simplicity of calculating the statistics (e.g. item difficulty index, item discrimination index, item–total
correlation)
o The Item Response Theory (IRT)
 Analyses test items by estimating the probability that an examinee answers an item correctly or
incorrectly.
 It is assumed that the characteristics of an item can be estimated independently of the characteristics or
ability of the examinee and vice-versa.
 It provides significantly more information on items and tests.
 There are also different IRT Models (e.g. one-parameter model, three-parameter model)
What are the Different Types of Assessment in Learning?
- The most common types are:
o Formative Assessment
 Refers to assessment activities that provide information to both teachers and learners on how they can
improve the teaching-learning process.
 It is used at the beginning and during interaction of the teacher to assess the learner’s understanding
o Summative Assessment
 Provide information on the quantity or quality of what students learned or achieved at the end of
instruction.
 Provide effectiveness of teaching strategies and how they can improve their instruction in the future.
o Diagnostic Assessment
 Aims to detect the learning problems or difficulties of the learners.
 It can be done right after seeing signs of learning problems in teaching.
 It can be also done at the beginning of the school year for a spirally-designed curriculum so that
corrective actions are applied.
o Placement Assessment
 Is usually done at the beginning of the school year to determine what the learners already know on what
their needs are that could inform the design of instructions.
 The entrance examination given in school is an example
o Traditional Assessment
 Use of conventional strategies or tools that provide information about the learning of students.
Examples: - Multiple Choice
- Essay Test
- Paper-Pencil Test
o Authentic Assessment
 Assessment strategies or tools that allow learners to perform or create a meaningful product.
 The authenticity of the assessment task is best described in terms of degree rather than the presence of
authenticity.
 Allows performance that most closely resembles the real world.
What are the Different Principles in Assessing Learning?
The Core Principles in Assessing Learning:
1. Assessment should have a clear purpose.
o There should be a method with purpose
o The interpretation of the data collected should be aligned with the purpose that is set.
o The assessment principle is congruent with the Outcomes–Based Education (OBE) principles of clarity of focus
and design dowry.
2. Assessment is NOT an end in itself.
o It serves as a means to enhance student learning.
o Collecting information about student learning whether formative or summative should lead to the decision that
will allow improvement of the learner.
3. Assessment is an ongoing, continuous, and formative process
o Series of tasks and activities over time.
o Continuous feedback.
o Congruent to OBE of expanded opportunity.
4. Assessment is Learner-Centered.
o Assessment of learners provides teachers with an understanding of how they can improve their teaching.

2
5. Assessment is both process and product-oriented.
o Gives equal importance to learner performance and process they engage in to perform or produce a product.
6. Assessment must be comprehensive and holistic.
o Assessment should be conducted in multiple periods to assess learning over time.
o It is congruent with OBE of expanded opportunity.
7. Assessment requires the use of appropriate measures.
o It must connect to psychometric properties, but not limited to validity and reliability.
8. Assessment should be as authentic as possible.
o Assessment should contain from the least to most authentic tasks expected of a learner.

LESSON 2 ASSESSMENT PURPOSES, LEARNING TARGETS, AND APPROPRIATE METHODS


What Should We Assess at the End of Classroom Instruction?
 Explain the purpose of classroom assessment, and
 Formulate learning targets that match the appropriate assessment method.
What is the Purpose of Classroom Assessment?
 The teachers are expected to know the instructional goals and learning outcomes. The purpose of classroom assessment
may be classified into:
1. Assessment of Learning. Knowledge and skills from instruction. It is summative.
2. Assessment for Learning. Learning activities in the classroom. It is formative.
3. Assessment as Learning. It is formative but uses tasks, results, and feedback to help learners practice self-regulation
and make adjustments to achieve the curriculum outcome.
The Role of Classroom Assessment in the Teaching-Learning Process
Assessment is an integral part of the instructional process, and the purpose of assessment is classified into three. The
following Roles of Classroom Assessment are helpful to Classroom Assessment:
1. Formative – Teachers conduct assessments to acquire information about where the learners are.
2. Diagnostic – To identify specific learners' weaknesses and difficulties.
3. Evaluative – To Measure the learner’s performance or achievement from making a judgment.
4. Facilitative – Teachers monitor, evaluate, and improve the learning strategies.
5. Motivational – Provide opportunities for assessment to be motivating.
What are Learning Targets?
Let’s define…
Goals General statements about desired learning outcome.
Standards Statement about what learners should know in a year or duration of a
program (SHS) – McMillan 2014
Educational Objectives are specific statements of learner performance at the end of educational
unit.
Bloom’s Taxonomy of Educational Objectives
Revised Educational Objectives
 Anderson and Krathwohl, 2001
Bloom’s Taxonomy of Educational Objectives
1. Knowledge 4. Analysis
2. Comprehension 5. Synthesis
3. Application 6. Evaluation
Cognitive Process in the Revised BTL
1. Create 4. Application
2. Evaluate 5. Understanding
3. Analyse 6. Remembering
Anderson and Krathwohl, 2001. In the Cognitive domain – introducing a two-dimensional model for writing learning
objectives.
1. The first dimension – The knowledge dimension includes 4 types:
o Factual – this type answers the who, where, what, when
o Conceptual – tells the concepts, generalizations, principles, theories, and model
o Procedural – answers Q that begins with “how”
o Metacognitive – Q one experience in real life why, how?
2. The verbs and nouns are tabulated – one must have a copy. Put it in a folder.
Learning Targets
 “A statement of student performance for a relatively restricted type of learning outcome that will be achieved in a
single lesson or a few days.”
 Contains “both a description of what students should know, understand, and be able to do at the end of instruction.”
 Know some criteria for judging the level of performance
 It should be congruent with the standards prescribed by the program or level.
 It should also be aligned with the instructional or learning objectives.
 It will inform learners what they should be able to demonstrate as evidence of their learning.
 Classroom instruction and assessment should be aligned.
 McMillan (2014) proposed five (5) criteria for selecting learning targets. Establish:
1. The right number of learning targets (too many or too far)
2. Comprehensive learning targets (are all important types of learning included)
3. Learning targets that reflect school goals and 21st-century skills
4. Learning targets that are challenging yet feasible (best work)
5. Learning targets that are consistent with current principles and motivation.
3
Types of Learning Targets
 Experts consider four primary targets but include a fifth – Affect – Which includes attitudes, beliefs, interests, and
values.
Description and Sample Learning Targets
Type of Learning Description Sample
Targets
Knowledge Targets Refers to factual, conceptual, and procedural I can explain the role of conceptual
information that learners must learn in a subject or framework in research
content area
Reasoning targets Knowledge-based thought processes that learners I can justify my research problems
must learn. It involves the application of knowledge with the theory.
in problem-solving, decision-making, and other tasks
that require mental skills
Skills targets Use of knowledge and reasoning to perform or I can facilitate a focus group
demonstrate physical skills discussion (FGD) with research
participants
Product targets Use of knowledge, reasoning, and skills in creating a I can write a thesis proposal.
concrete or tangible product
Appropriate Methods of Assessment
 Once the learning targets are identified, appropriate assessment methods can be selected to measure student learning.
Matching Learning Target with Paper-and-Pencil Type of Assessment
Selected Response Constructed Response
Learning
Multiple True or False Matching Short Answer Problem- Essay
Targets
Choice Type Solving
Knowledge      
Reasoning      
Skills      
Product      

LESSON 3 DIFFERENT CLASSIFICATIONS OF ASSESSMENT


What are the Different Classifications of Assessment that Teachers Can Use?
 The different forms of assessment are classified according to purpose, form, interpretation of learning, functions,
ability, and kind of learning.
Classification Type
Educational
Purpose
Psychological
Paper-and-Pencil
Form
Performance-based
Teacher-made
Function
Standardized
Achievement
Kind of Learning
Aptitude
Speed
Ability
Power
Norm-referenced
Interpretation of Learning
Criterion-referenced
When do we use Educational and Psychological Assessments?
1. Educational assessments are used in the school setting to track the growth of learners and grade their performance.
2. This assessment setting comes in the form of Formative and Summative:
1. Formative Assessment is a continuous process of gathering information (Beginning, During and After)
o Formative Assessment is to track and monitor student learning.
o It can be paper-pencil or performance-based.
o It can serve as a diagnostic tool to determine whether learners already know about the learning target.
Examples:
1. Determine the by-product of photosynthesis. If learners can’t determine right away, teachers can
provide/recommend references.
2. If the teachers would let three (3) digits be divided by two (2) digits then learners can’t get the correct
procedure, starting from the simple one.
3. Recite a long poem, if a learner can’t start it through singing.
 If the teacher observes that the majority or all of the learners can demonstrate, then he can conduct
the summative assessment.
2. Summative Assessment – to determine and record the learners' learning.
Examples:
1. Physical Education – practice is needed to be able to present on stage / a program
Psychological Assessment
 Tests and scales – are measures that determine the learners' cognitive and non-cognitive characteristics.
e.g. Those that measure ability, intelligence, and critical thinking.

4
Affective Measure – personality, motivation, attitude, interest and disposition
 These are processed by the school’s guidance counselor to perform interventions on the learners' academic, career,
and social and emotional development.
Why do we use Paper-Pencil and Performance-based types of assessments?
 Paper-and-pencil type of assessment are  The skills applied are usually complex and
cognitive tasks that require a single correct require integrated skills to arrive at a target
answer. response.
e.g. - Binary (true or false) e.g. - an essay
- short answer: Identification, Matching - reporting in front of a class
Type, Multiple Choice - reciting a poem
- the items usually pertain to a specific - problem-solving
cognitive skill: RU, AA, EC - creating a word problem
 Performance-based type assessment requires the  Other Examples of the type of Assessment.
learner to perform o Identify the parts of the plants
e.g. - demonstration o Label the parts of the microscope
- arrive at a product o Complete the compound interest
- present information o Classify the phase of matter
o Provide an appropriate verb in a
sentence
o Identify the type of sentence
VISUAL AID
Below are learning targets that need Performance-based Assessment
o Varnish a wooden cabinet
o Draw a landscape using a paintbrush on the computer.
o Word problem involving multiplication of polynomial
o Deliver a speech
o Essay explaining how human and plant benefits from each other.
o Mount a plant specimen on a glass slide.
How do we Distinguish Teacher-made from Standardized Tests?
Standardized Test – fixed direction for administering and scoring
o Can be purchased with test manuals, booklets, and answer sheets.
o It was developed on a large number of target groups called the NORM. It is used to compare the results of
those who took the test.
e.g.
- Intelligence Test - Critical Thinking Test
- Achievement Test - Interest Test
- Aptitude Test - Personality Test
Teacher-made Test
o Non-standardized intended for classroom assessment
e.g. - quizzes, long tests, exams
- formative and summative test
* Can a teacher-made test become a standardized test? Yes
What Information is sought from the Achievement and Aptitude Test?
Achievement Test
o Measure what learners have learned after instruction.
o A measure of what a person has learned within a given time (Yaremko et.al. 1982)
o A measure of the accomplished skills (Atkinson 1995, Kimball 1989). Explained the traditional and
alternative views on the achievement of learners.
o It can also be measured as the Wide Range Achievement, California Achievement, and IOWA Test for
Basic Skill.
Aptitude Test
o According to Longman (2005), Aptitudes are the characteristics that influence a person’s behavior that aid
goal attainment in a particular situation.
o It refers to the degree of readiness to learn and perform (Corno et.al., 2002)
e.g.
 Ability to comprehend instruction
 Manage one’s time
 Use previously acquired knowledge appropriately.
 Make good influences and generalizations
 Manage one’s emotion
How do we Differentiate Speed from Power Test?
 Speed Test consists of easy items that need to be completed within a time limit
e.g. - Typing Test – Type many words given a limited amount of time
- Power Test – Consists of items with increasing levels of difficulty
e.g. Developed by the National Council of Teachers in Mathematics
The Difference between a Norm-Reference from Criterion-Reference Test
 There are two types of tests based on how the scores are interpreted:
Norm-referenced Test interprets result using the distribution of scores of a sample group
 It is based on the mean and standard deviation of the sample
 A norm is a standard and is based on a very large of samples.
5
 It takes that of a bell curve. It also reports the percentage of people with a particular score.
 The use of a norm is the basis of interpreting a test score.
 It can be used to interpret a particular score.
 It is reported a very large group of samples
Criterion-referenced Test has a given set of standards. The scores are compared to the given criterion.
e.g. - 50 Item Test:
- 40-50 Very High
- 30-39 High
- 20-29 Average
- 10-19 Low
- 0-9 Very Low

CHAPTER II - DEVELOPMENT AND ADMINISTRATION OF TESTS

LESSON 4 PLANNING A WRITTEN TEST


How Do We Make a Well-Planned Written Test?
 It is assumed that the competencies for instruction are cognitive in nature and create a table of specifications
 A lesson plan is a MUST.
Why do you need to define the Test Objectives or Learning Outcomes Targeted for Assessment?
 Setting objectives for assessment is the process of establishing direction to guide both the teacher in teaching and
the students in learning.
 It is to identify the Intended Learning Outcome (ILO).
What are the objectives of Teaching?
 Follow Bloom’s Taxonomy of Learning or the research partner: Anderson and Krathwohl – Lower Order Thinking
Skills or the reverse order.

What is a Table of Specifications? TOS!


 A Table of Specification – a blueprint, a tool used by teachers to design a test.
 It contains:
o Cognitive behavior to be measured. o Placement
o Topics o Weighty test item
o Distribution of Items o Test format
Thus: ILO, Assessment, and Learning are ALIGNED.
STEPS IN DEVELOPING TOS
1. Determining the objective of the tests – in the course syllabus the three objectives should be identified. It could be:
o Multiple choice o Matching type
o Alternative response test o Essay or open-minded test
2. Determine the coverage of the test discussed and relevant in class.
3. Calculate the weight of each topic.
4. Determine the number of items for the whole test.
5. Determine the number of items per topic.
Determine the formats of test TOS.
1. One-Way – Mapping out the content and objectives.
No. of Format and No. and
Topic Test Objective Hours Placement of Percent
Spent Items of Items
Theories and Recognize important concepts in personality 0.5 Multiple 5 (10.0%)
Concepts theories Choice Item
#s 1-5
Psychoanalytic 1.5 1.5 Multiple 15
Theories Choice Item (30.0%)
#s 6-20
Etc.
5 50
Total
(100%)
2. Two-Way TOS – reflects the content, time spent, number of items, and level of Cognitive Behavior. Most Common
TOS. (from DepEd – 2015’.
No. & Level of Cognitive Behavior, Item Format, No. of
Time
Content1 Percent KD* Placement of Items
Spent
of Items R U AP AN E C
Theories and Concepts 0.5 5 (10.0%) F 1.3
hours #1-3
C 1.2
#4-5
Psychoanalytic 1.5 15 F 1.2
Theories hours (30.0%) #6-7
C 1.2 1.2
#8-9 #10-11
6
P 1.2 1.2
#12-13 #14-15
M 1.3 11.1 11.1
#16-18 #41 #42
Etc.
Scoring 1 point per 2 points per item 3 points per
item item
OVERALL TOTAL 5 50 20 20 10
(100.0%)
3. Three-Way TOS – same as the TWO-WAY but it takes time and longer to develop.
 Use of Rubric – Score 5-1
 Educator’s Input – Prof Mallillin from Teacher Education Instruction in Metro Manila
o TOS should be shown to students.

LESSON 5 CONSTRUCTION OF WRITTEN TEST


Guidelines for Choosing the Appropriate Test Formats:
1. It depends on your desired learning outcomes of the subject/unit/lesson being assessed (DLOs)
2. Level of thinking to be assessed.
3. It should be aligned within the course.
4. Are the test items realistic to the students?
Major Categories and Formats of Traditional Tests.
1. Student-response tests include:
o Multiple Choice Test
o True-False Alternative Response Test
o Matching Type
2. Constructed-Response Test – supply answers to a given question or problem.
o Short Answer Tests like
 Completion – fill in the blanks
 Identification
 Enumeration
o Essay Test
o Problem Solving
Guidelines in Writing Multiple-Choice Test Items.
 Writing Multiple-Choice Items requires Content, Mastery, Writing Skills, and Time.
Content
1. Write items that reflect only one specific content and cognitive processing skills.
Faulty: Which of the following is a type of statistical procedure used to test a hypothesis regarding a significant
relationship between variables, particularly in terms of the extent and direction of association?
a. ANCOVA c. Correlation
b. ANOVA d. T-test
2. Do not lift and use statements from the textbook or other learning materials as test questions.
3. Keep the vocabulary simple and understandable based on the level of learners/examinees.
4. Edit and proofread the items for grammatical and spelling before administering them to the learners.
Stem
1. Write the directions in the stem in a clear and understandable manner.
Faulty: Read each question and indicate your answer by shading the circle corresponding to your answer.
Good: This test consists of two parts. Part A is a reading comprehension test, and Part B is a grammar/language
test. Each question is a multiple-choice test item with five (5) options. You are to answer each question but will
not be penalized for a wrong answer or for guessing. You can go back and review your answers during the time
allotted.
2. Write stems that are consistent in form and structure, that is, present all items either in question form or in descriptive
or declarative form.
Faulty: (1) Who was the Philippine President during Martial Law?
(2) The first president of the Commonwealth of the Philippines was .
Good: (1) Who was the Philippine President during Martial Law?
(2) Who was the first president of the Commonwealth of the Philippines?
3. Word the stem positively and avoid double negatives, such as NOT and EXCEPT in a stem. If a negative word is
necessary, underline or capitalize the words for emphasis.
Faulty: Which of the following is not a measure of variability?
Good: Which of the following is NOT a measure of variability?
4. Refrain from making the stem too wordy or containing too much information unless the problem/question requires
the facts presented to solve the problem
Faulty: What does DNA stand for, and what is the organic chemical of complex molecular structure found in all
cells and viruses and codes genetic information for the transmission of inherited traits?
Good: As a chemical compound, what does DNA stand for?
Options
1. Provide three (3) to five (5) options per item, with only one being the correct or best answer/alternative.
2. Write options that are parallel or similar in form and length to avoid giving clues about the correct answer.

7
Faulty: What is an ecosystem?
a. It is a community of living organisms in conjunction with the non-living components of their environment
that interact as a system. These biotic and abiotic components are linked together through nutrient cycles
and energy flows.
b. It is a place on Earth’s surface where life dwells.
c. It is an area where one or more individual organisms defend against competition from other organisms.
d. It is the biotic and abiotic surroundings of an organism or population.
e. It is the largest division of the Earth’s surface filled with living organisms.
Good: What is an ecosystem?
a. It is a place on the Earth’s surface where life dwells.
b. It is the biotic and abiotic surroundings of an organism or population.
c. It is the largest division of the Earth’s surface filled with living organisms.
d. It is a large community of living and non-living organisms in a particular area.
e. It is an area where one or more individual organisms defend against competition from other organisms.
3. Place options in a logical order (e.g. alphabetical, from shortest to longest).
Faulty: Which experimental gas law describes how the pressure of a gas tends to increase as the volume of the
container decreases (i.e., “The absolute pressure exerted by a given mass of an ideal gas is inversely
proportional to the volume it occupies.”)
a. Boyle’s Law d. Avogadro’s Law
b. Charles Law e. Faraday’s Law
c. Beer-Lambert Law
Good: Which experimental gas law describes how the pressure of a gas tends to increase as the volume of the
container decreases? (i.e., “The absolute pressure exerted by a given mass of an ideal gas is inversely
proportional to the volume it occupies.”
a. Avogradro’s Law d. Charles Law
b. Beer-Lambert Law e. Faraday’s Law
c. Boyle’s Law
4. Place correct responses randomly to avoid a discernible pattern of correct answers.
5. Use None-of-the-above carefully and only when there is one absolutely correct answer, such as in spelling or math
items.
Faulty: Which of the following is a nonparametric statistic?
a. ANCOVA c. T-test
b. ANOVA d. None of the above
Good: Which of the following is a nonparametric statistic?
a. ANCOVA d. Mann-Whitney U
b. ANOVA e. T-test
c. Correlation
6. Avoid All of the Above as an option, especially if it is intended to be the correct answer.
Faulty: Who among the following has become the President of the Philippine Senate?
a. Ferdinand Marcos d. Quintin Paredes
b. Manuel Quezon e. All of the above
c. Manuel Roxas
Good: Who was the first ever President of the Philippine Senate?
a. Eulogio Rodriguez d. Manuel Roxas
b. Ferdinand Marcos e. Quintin Paredes
c. Manuel Quezon
7. Make all options realistic and reasonable.
General Guidelines in Writing Matching Type Items?
1. Clearly state in the directions the basis for matching the stimuli with responses.
Faulty: Directions: Match the following.
Good: Directions: Column I is a list of countries while Column II presents the continent where these countries are
located. Write the letter of the continent corresponding to the country on the line provided in Column I.
Item #1’s instruction is less preferred as it does not detail the basis for matching the stem and the response options.
2. Ensure that the stimuli are longer and the responses are shorter.
Faulty: Match the description of the flag to its country.
A B
Bangladesh A. Green background with a red circle in the center.
Indonesia B. One red strip on top and a white strip at the bottom.
Japan C. Red background with a white five-petal flower in the center.
Singapore D. Red background with a large yellow circle in the center
Thailand E. Red background with a large yellow pointed star in the center.
F. White background with a large red circle in the center.
Good: Match the description of the flag to its country.
A B
Green background with a red circle in the center. A. Bangladesh
One red strip on and a white strip at the bottom. B. Hongkong
Red background with a white five-petal flower in the center. C. Indonesia
Red background with a large yellow-pointed star in the center. D. Japan
White background with a red circle in the center. E. Singapore
F. Vietnam

8
Item #2 is a better version because the descriptions are presented in the first column while the response options are
in the second column. The stems are also longer than the options.
3. For each item, include only topics that are related to one another and share the same foundation of information.
Faulty: Match the following:
A B
1. Indonesia A. Asia
2. Malaysia B. Bangkok
3. Philippines C. Jakarta
4. Thailand D. Kuala Lumpur
5. Year ASEAN was established E. Manila
F. 1967
Good: On the line to the left of each country in Column I, write the letter of the country’s capital presented in
Column II.
Column I Column II
1. Indonesia A. Bandar Seri Begawan
2. Malaysia B. Bangkok
3. Philippines C. Jakarta
4. Thailand D. Kuala Lumpur
E. Manila
Item #1 is considered an unacceptable item because its response options are not parallel and include different kinds
of information that can provide clues to the correct/wrong answers. On the other hand, item #2 details the basis
for matching and the response options only include related concepts.
4. Make the response options short, homogenous, and arranged in logical order.
Faulty: Match the chemical elements with their characteristics.
A B
Gold A. Au
Hydrogen B. Magnetic metal used in steel
Iron C. Hg
Potassium D. K
Sodium E. With lowest density
F. Na
Good: Match the chemical elements with their symbols.
A B
Gold A. Au
Hydrogen B. Fe
Iron C. H
Potassium D. Hg
Sodium E. K
F. Na
In Item #1, response options are not parallel in content and length. They are also arranged alphabetically.
5. Include response options that are reasonable and realistic and similar in length and grammatical form.
Faulty: Match the subjects with their course description
A B
History A. Studies the production and distribution of goods/services
Political Science B. Study of politics and power
Psychology C. Study of Society
Sociology D. Understands the role of mental functions in social behavior
E. Uses narratives to examine and analyze past events
Good: Match the subjects with their course description.
A B
1. Study of living things A. Biology
2. Study of mind and behavior B. History
3. Study of politics and power C. Political Science
4. Study of recorded events in the past D. Psychology
5. Study of Society E. Sociology
F. Zoology
Item #1 is less preferred because the response options are not consistent in terms of their length and grammatical
form.
6. Provide more response options than the number of stimuli.
Faulty: Match the following fractions with their corresponding decimal equivalents:
A B
1/4 A. 0.25
5/4 B. 0.28
7/25 C. 0.90
9/10 D. 1.25
Good: Match the following fractions with their corresponding decimal equivalents:
A B
1/4 A. 0.09
5/4 B. 0.25
7/25 C. 0.28
9/10 D. 0.90
E. 1.25

9
Item #1 is considered inferior to item #2 because it includes the same number of response options as that of the
stimuli, thus making it more prone to guessing.
General Guidelines in Writing True or False Items
 True or false items are best used when a learner’s ability to judge or evaluate is one of the desired learning outcomes
of the course.
Variation
1. T-F Correction or Modified True-or-False Question. In this format, the statement is presented with a keyword
or phrase that is underlined, and the learner has to supply the correct word or phrase.
e.g. Multiple-Choice Test is authentic.
2. Yes-No Variation. In this format, the learner has to choose yes or no, rather than true or false.
e.g. The following are kinds of tests. Circle Yes if it is an authentic test and No if not.
Multiple Choice Test Yes No
Debates Yes No
End-of-the-Term Project Yes No
True or False Test Yes No
3. A-B Variation. In this format, the learner has to choose A or B, rather than true or false.
e.g. Indicate which of the following are traditional or authentic tests by circling A if it is a traditional test and
B if it is authentic.
Traditional Authentic
Multiple Choice Test A B
Debates A B
End-of-the-Term Project A B
True or False Test A B
 Because true or false test items are prone to guessing, as learners are asked to choose between two options, utmost
care should be exercised in writing true or false items.
1. Include statements that are completely true or completely false.
Faulty: The presidential system of government, where the president is only the head of state or government, is
adopted by the United States, Chile, Panama, and South Korea.
Good: The presidential system, where the president is only the head of state or government, is adopted by Chile.
Item #1 is of poor quality because, while the description is right, the countries given are not all correct. While
South Korea has a presidential system of government, it also has a prime minister who governs alongside
with the president.
2. Use simple and easy-to-understand statements.
Faulty: Education is a continuous process of higher adjustment for human beings who have evolved physically and
mentally, which is free and conscious of God, as manifested in nature around the intellectual emotional,
and humanity of man.
Good: Education is the process of facilitating learning or the acquisition of knowledge, skills, values, beliefs, and
habits.
Item #1 is somewhat confusing, especially for younger learners because there are many ideas in one statement.
3. Refrain from using negatives – especially double negatives.
Faulty: There is nothing illegal about buying goods through the internet.
Good: It is legal to buy things or goods through the internet.
Double negatives are sometimes confusing and could result in wrong answers, not because the learner does not
know the answer but because of how the test items are presented.
4. Avoid using absolutes such as “always” and “never”.
Faulty: The news and information posted on the CNN website is always accurate.
Good: The news and information posted on the CNN website is usually accurate.
Absolute words such as “always” and “never” restrict possibilities and make a statement as 100 percent or all the
time. They are also a hint for a “false” answer.
5. Express a single idea in each test item.
Faulty: If an object is accelerating, a net force must be acting on it, and the acceleration of an object is directly
proportional to the net force applied to the object.
Good: If an object is accelerating, a net force must be acting on it.
Item #1 consists of two conflicting ideas, wherein one is not correct.

6. Avoid the use of unfamiliar words or vocabulary.


Faulty: Esprit de corps among soldiers is important in the face of hardships and opposition in fighting the terrorists.
Good: Military morale is important in the face of hardships and opposition in fighting the terrorists.
Students may have a difficult time understanding the statement, especially if the word “esprit de corps” has not
been discussed in class. Using unfamiliar words would likely lead to guessing.
7. Avoid lifting statements from the textbook and other learning materials.
General Guidelines for Writing Short-Answer Test Items
 The following are the general guidelines for writing good fill-in-the-blank or Completion Test Items.
1. Omit only significant words from the statement.
Faulty: Every atom has a central called a nucleus.
Good: Every atom has a central core called a (n) .
Item #1, the word “core” is not a significant word. The item is also prone to many and varied interpretations,
resulting in many possible answers.
2. Do not omit too many words from the statement such that the intended meaning is lost.
Faulty: is to Spain as the is to the United States and as is to Germany.
Good: Madrid is to Spain as the is to France.

10
Item #1 is prone to many and varied answers. For example, a student may answer the question based on the capital
of these countries or based on what continent they are located. Item #2 is preferred because it is more specific and
requires only one correct answer.
3. Avoid obvious clues to the correct response.
Faulty: Ferdinand Marcos declared martial law in 1972. Who was the president during the period?
Good: The president during the martial law years was .
Item #1 already gives a clue that Ferdinand Marcos was the president during this time because only the president
of a country can declare martial law.
4. Be sure that there is only one correct response.
Faulty: The government should start using renewable energy sources for generating electricity, such as
.
Good: The government should start using renewable resources of energy by using turbines called
.
Item #1 has many possible answers because the statement is very general (e.g., wind, solar, biomass, geothermal,
and hydroelectric). Item #2 is more specific and only requires one correct answer (i.e. wind).
5. Avoid grammatical clues to the correct response.
Faulty: A subatomic particle with a negative electric charge is called an .
Good: A subatomic particle with a negative electric charge is called a(n) .
The word “an” in item #1 provides a clue that the correct answer starts with a bowel.
6. If possible, put the blank at the end of a statement rather than at the beginning.
Faulty: is the basic building block of matter.
Good: The basic building block of matter is .
In item #1, learners may need to read the sentence until the end before they can recognize the problem and then re-
read it again, and then answer the question. On the other hand, in item #2, learners can already identify the context
of the problem by reading through the sentence only once and without having to go back and re-read the sentence.
General Guidelines in Writing Essay Test
 Teachers generally choose and employ essay items over other forms of assessment.
 They are the most preferred form of assessment to measure learners higher order of thinking skills;
o Understanding the subject matter content
o Ability to reason with their knowledge of the subject
o Problem solving and decision skills
 There are two types of essay test:
1. Extended Response Essay – requires much longer response.
2. Restricted-Response Essay – more focused.
 The following are the general guidelines for constructing good essay questions.
1. Clearly define the intended learning outcome to be assessed by the essay test.
2. Refrain from using tests for intended learning outcomes that are better assessed by other kinds of assessment.
3. Clearly define and situate the task within a problem.
4. Present tasks that are fair, reasonable, and realistic to the students.
5. Be specific in the prompts about time allotment and criteria for grading the response.
General Guidelines in Problem-Solving Test Items
 Problem-solving test items are used to measure the learner’s ability to solve problems that require quantitative
knowledge and competencies and/or critical thinking skills.
 There are different variations of the quantitative problem-solving:
1. One answer choice – this type of question contains four or five options, and students are required to choose the
best answer.
e.g. What is the mean of the following score distribution: 32, 44, 56, 69, 75, 77, 95, 96?
a. 68 c. 72 e. 76
b. 69 d. 74
The correct answer is A (68).
2. All possible answer choices – This type of question has four or five options, and students are required to choose
all of the options that are correct.
e.g. Consider the following score distribution: 12,14, 14, 17, 24, 27, 28, 30. Which of the following is/are the
correct measure/s of central tendency? Indicate all possible answers.
a. Mean = 20 d. Median = 17
b. Mean = 22 e. Mode = 14
c. Median = 16
3. Type-in answer – This type of question does not provide options to choose from. Instead, the learners are asked
to supply the correct answer. The teacher should inform the learners at the start how their answers will be rated.
For example, the teacher may require just the correct answer or may require learners to present the step-by-step
procedures for coming up with their answers. On the other hand, for non-mathematical problem solving, such
as a case study, the teacher may present a rubric on how their answers will be rated.
e.g. Compute the mean of the following score distribution: 32, 44, 56, 69, 75, 77, 95, 96. Indicate your answer
in the blank provided.

LESSON 6 ESTABLISHING TEST VALIDITY AND RELIABILITY


What is Test Reliability?
 Reliability is the consistency to measure under three conditions:
1. When retested on the same person.
2. When retested on the same measure.

11
3. Similarity or responses across items that measure the same characteristics.
 There are different factors that affect the reliability of a measure:
o The number of items in a test – the more items a test has, the likelihood of reliability is high.
o Individual differences of participants – fatigue, lack of concentration, perseverance, innate ability.
o External environment – includes room temperature, noise level, exposure to material, and quality of
instruction.
Different Ways to Establish Test Reliability
Method in Testing Reliability How is this Reliability Done? What Statistics is Used?
1. Test-retest You have a test, and you need to Correlation refers to a statistical
administer it at one time to a group of procedure where linear relationship is
examinees. Administer it again at expected for two variables. You may
another time to the “same group” of use Pearson Product Moment
examinees. Correlation or Pearson r because test
Test-retest is applicable for tests that data are usually in an interval scale
measure stable variables, such as (refer to a statistics book for Pearson r)
aptitude and psychomotor measures
(e.g., typing test, tasks in physical
education)
2. Parallel Forms Parallel forms are applicable if there Correlate the test results for the first
are two versions of the test. This is form and the second form. Significant
usually done when the test is and positive correlation coefficients
repeatedly used for different groups, are expected.
such as entrance examinations and
licensure examinations. Different
versions of the test are given to a
different group of examinees.
3. Split-Half Administer a test to a group of The correlation coefficient obtained
examinees. The items need to be split using Pearson r and Spearman-Brown
into halves, usually using the odd-even should be significant and positive to
technique. In this technique, get the mean that the test has internal
sum of the points in the odd-numbered consistency reliability.
items and correlate it with the sum of
points of the even-numbered items.
4. Test of internal Consistency This technique will work well when A statistical analysis called
Using Kuder-Richardson and the assessment tool has a large number Cronbach’s alpha or the Kuder
Cronbach’s Alpha Method of items. It is also applicable for scales Richardson is used to determine the
and inventories (e.g., the Likert scale internal consistency of the items. A
from “strongly agree” to “strongly Cronbach’s alpha value of 0.60 and
disagree”) above indicates that the test items have
internal consistency.
5. Inter-rater Reliability Inter-rater is applicable when the A statistical analysis called Kendall’s
assessment requires the use of multiple Tau coefficient of concordance is used
raters. to determine if the ratings provided by
multiple raters agree with each other.
 The very basis of statistical analysis to determine reliability is the use of Linear Regression.
1. Linear Regression – is demonstrated when you have two variables that are measured, such as two sets of scores
in a test at two different times by the same participants.
o A straight line is formed and it is said to have a correlation between the two sets of scores.
2. Computation of Pearson r Correlation
o The index of the linear regression is called a Correlation Coefficient.
o When the points in a scatterplot tend to fall within the linear line, the correlation is said to be strong.
o When the direction of the scatterplot is directly proportional, the correlation coefficient will have a positive
value.
o If the line is inverse, the correlation coefficient is called the Pearson r.
e.g. A teacher gave a spelling of two-syllable words with 20 items for Monday and Tuesday, using the
Pearson r:
𝑁(∑𝑋𝑌)−(∑𝑋)(∑𝑌)
𝑟= 2 2 2 2
√[𝑁(∑𝑋 )−(∑𝑋) ][𝑁(∑𝑌 )−(∑𝑌) ]
Monday Test Tuesday Test
X Y X2 Y2 XY
10 20 100 400 200
9 15 81 225 135
6 12 36 144 72
10 18 100 324 180
12 19 144 361 228
4 8 16 64 32
5 7 25 49 35
7 10 49 100 70
8 13 64 169 104

12
∑𝑋 = 71 ∑𝑌 = 122 ∑𝑋 2 = 615 ∑𝑌 2 = 1,836 ∑𝑋𝑌 = 1,056
∑X – Add all the X sores (Monday scores)
∑Y – Add all the Y scores (Tuesday scores)
X2 – Square the value of the X scores (Monday scores)
Y2 – Square the value of the Y scores (Tuesday scores)
XY – Multiply the X and Y scores
∑X2 – Add all the squared values of X
∑Y2 – Add all the squared values of Y
∑XY – Add all the product of X and Y
Substitute the values in the formula:
10(1328)−(87)(139)
𝑟= 2 2
√[10(871)−(87) ][10(2125)−(139) ]
𝑟 = 0.80
3. Difference Between a Positive and a Negative Correlation:
o Positive Correlation – when the value of the correlation coefficient, the higher the score in X and Y.
o Negative Coefficient – the higher the scores in X – the lower the scores in X.
o When the same test is administered to the same group of participants, usually a positive correlation indicates
the reliability or consistency of the score.
4. Determining the strength of a Correlation
o The strength of the correlation also indicates the strength of the reliability of the test. This is indicated by
the value of the Correlation Coefficient. The closer the value to 1.00 or -1.00, the stronger the correlation.
Below is the guide:
0.80 – 1.00 Very strong relationship
0.60 – 0.79 Strong relationship
0.40 – 0.59 Substantial / Marked relationship
0.20 – 0.39 Weak relationship
0.00 – 0.19 Negligible relationship
5. Determining the significance of the correlation.
o The correlation obtained between two variables will be due to the chance.
o In order to determine if the correlation is free of certain errors, it is tested for significance.
o Another statistical analysis mentioned to determine the internal consistency of the test is CRONBACH’S
Alpha.
Student Item Item Item Item Item Total for each Score – Mean (Score-
1 2 3 4 5 case (X) Mean)2
A 5 5 4 4 1 19 2.8 7.84
B 3 4 3 3 2 15 -1.2 1.44
C 2 5 3 3 3 16 -0.2 0.04
D 1 4 2 3 3 13 -3.2 10.24
E 3 3 4 4 4 18 1.8 3.24
̅
𝑋𝑐𝑎𝑠𝑒 = 16.2 2
∑(𝑆𝑐𝑜𝑟𝑒 − 𝑀𝑒𝑎𝑛) = 22.8
Test Validity
o A measure is valid when it measures what is supposed to measure.
o If a quarterly exam is valid, then the contents should directly measure the objectives of the curriculum.
Type of validity Definition Procedure
Content Validity When the items represent the domain The items are compared with the
being measured objectives of the program. The items
need to measure directly the objectives
(for achievement) or definition (for
scales). A reviewer conducts the
checking.
Face Validity When the test is presented well, free of The test items and layout are reviewed
errors, and administered well. and tried out on a small group of
respondents. A manual for administration
can be made as a guide for the test
administrator.
Predictive Validity A measure should predict a future A correlation coefficient is obtained
criterion. An example is an entrance exam where the X-variable is used as the
predicting the grades of the students after predictor and the Y-variable as the
the first semester. criterion.
Construct Validity The components or factors of the test The Pearson r can be used to correlate the
should contain items that are strongly items for each factor. However, there is a
correlated. technique called factor analysis to
determine which items are highly
correlated to form a factor.
Concurrent Validity When two or more measures are present The scores on the measures should be
for each examinee that measure the same correlated.
characteristic
Convergent Validity When components or factors of a test are Correlation is done for the factors of the
hypothesized to have a positive correlation test.
13
Divergent Validity When the components or factors of a test Correlation is done for the factors of the
are hypothesized to have a negative test.
correlation. An example of correlation is
the scores in a test on intrinsic and
extrinsic motivation.

CHAPTER III – ORGANIZATION, UTILIZATION, AND COMMUNICATION OF TEST RESULTS


LESSON 7 ORGANIZATION OF TEST DATA USING TABLES AND GRAPH
How Do We Organize Test Data and Interpret Test Score Distribution?
 Test data are better appreciated and communicated if they are arranged, organized, and presented in a clear and
concise manner.
 Good presentation requires designing a table that can be read easily and quickly.
 Tables and graphs are common tools that help readers better understand the test results.
 Consider a group of raw scores administered to 100 college students.
Table 7.1. Scores of 100 College Students in a Final Examination
53 30 21 42 33 41 42 45 32 58
36 51 42 49 64 46 57 35 45 51
57 38 49 54 61 36 53 48 52 49
41 58 42 43 49 51 42 50 62 60
33 43 78 52 58 45 53 40 60 33
75 66 78 52 58 45 53 40 60 33
46 45 79 33 46 43 47 37 33 64
37 36 36 46 41 43 42 47 56 62
50 53 49 39 52 52 50 37 53 40
34 43 43 57 48 43 42 42 65 35
How Do We Organize and Present Ungrouped Data Through Tables?
 Simple list of raw data.
 Raw scores are easy to get because these are scores that are obtained from administering a test.
Table 7.2. Frequency Distribution of Test Scores
Test Scores (X) Frequency (f) Percent Cumulative Percent
21.00 1 1.0 1.0
30.00 1 1.0 2.0
32.00 1 1.0 3.0
33.00 6 6.0 9.0
34.00 1 1.0 10.0
35.00 3 3.0 13.0
36.00 4 4.0 17.0
37.00 4 4.0 21.0
38.00 1 1.0 22.0
39.00 1 1.0 23.0
40.00 2 2.0 25.0
41.00 3 3.0 28.0
42.00 9 9.0 37.0
43.00 7 7.0 44.0
45.00 4 4.0 48.0
46.00 4 4.0 52.0
47.00 2 2.0 54.0
48.00 2 2.0 56.0
49.00 6 6.0 62.0
50.00 4 4.0 66.0
51.00 3 3.0 69.0
52.00 4 4.0 73.0
53.00 5 5.0 78.0
54.00 1 1.0 79.0
56.00 1 1.0 80.0
57.00 4 4.0 84.0
58.00 3 3.0 87.0
60.00 2 2.0 89.0
61.00 1 1.0 90.0
62.00 3 3.0 93.0
64.00 2 2.0 95.0
65.00 1 1.0 96.0
66.00 1 1.0 97.0
75.00 1 1.0 98.0
78.00 1 1.0 99.0
79.00 1 1.0 100.0
Total 100 100.0
o The listing of scores can be in descending or ascending order.
o There is no grouping of scores but recording of the frequency in a single test score.

14
o One can distinguish the highest and lowest scores and the corresponding for each score.
o The cumulative percentage in the last column calculates the percentage of the cumulative frequency.
o In the 6th row, the test score of 35 has a corresponding cumulative percentage of 13. This means that 13
percent of the class obtained a score below 35.
o Conversely, one can say that 87 percent of the scores are above 35.
Table 7.3. Frequency Distribution of Grouped Test Scores
Class Interval Midpoint (X1) f Cumulative Cumulative
Frequency (cf) Percentage
75-80 77 3 100 100
70-74 72 0 97 97
65-69 67 2 97 97
60-64 62 8 95 95
55-59 57 8 87 87
50-54 52 17 79 87
45-49 47 18 62 62
40-44 42 21 44 44
35-39 37 13 23 23
30-34 32 9 10 10
25-29 27 0 1 1
20-24 22 1 1 1
Total (N) 100
o Apparently, the data presented in tables 7.1 and 7.2 have been condensed as a result of the grouping of
scores.
o Table 7.3 illustrates a grouped frequency distribution of test scores.
o Let us consider the cumulative percentage in the 5th row of the Class Interval of 55-59, which is 87: we say
that 87 percent of the students got a score below 60.
o In table 7.3 the second column enters the midpoint of the test score in each class interval.
o To compute for class interval:
i = H-L
C
where i = size of the class intervals
H = highest test score
L = lowest test score
C = number of classes
o Transmutation Table for the Grading System.
 If the total items is 100, then the passing mark is 50%, the computation is:
Transmutation Table
Graphical Presentation of Test Data
1. Histogram – a histogram is a type of graph appropriate for quantitative data such as test scores.
2. Frequency Polygon – is a visual representation of a distribution.
3. Cumulative Frequency Polygon – is essentially a line graph drawn on graph paper by plotting actual lower or upper
limits of the class intervals on the -axis and the respective cumulative frequencies of these class intervals on the –axis.
4. Bar Graph
4.1. Vertical Bar Graph – can be defined as a graphical representation of data, quantities, or numbers using bars or
strips.
4.2. Horizontal Bar Graph – is a graph in the form of rectangular bars.
5. Pie Graph – a pie chart is very easy you may use an ordinary protractor.
Which Graph is the Best?
 No one can give a definite answer to this question.
o We cannot say what the best is.
o The histogram is the easiest in many cases of qualitative data, but it may not be appealing if you want to
compare the performance of two or more groups.
o Bar graph works well with qualitative data and if you want to compare the performance of subgroups of
examinees
o Frequency and percentage polygon are useful for treating quantitative data
o The cumulative frequency and percentage polygons are valuable for determining the percentage of
distribution that falls below or above a given point.

WHAT ARE THE VARIATIONS IN THE SHAPES OF FREQUENCY DISTRIBUTION?


 Researchers and scientists have found that empirical data when recorded, fit the following shapes of frequency
distribution.
What is Skewness?
Skewness is a measurement of the distortion of symmetrical distribution or asymmetry in a data set. Skewness is
demonstrated on a bell curve when data points are not distributed symmetrically to the left and right sides of the median on
a bell curve.
WHAT IS KURTOSIS?
 Another way of differentiating frequency distribution is
Fig. 7.14
 It is the flatness of the distribution.
15
 It is also the consequence of how high or peaked the distribution is – known as KURTOSIS.
 X in the flatness distribution – Platykurtic – broad or flat distribution.
 Y is the normal distribution – MesoKurtic – intermediate distribution.
 Z is the steepest or slimmer – LeptoKurtic – narrow distribution
LESSON 8. ANALYSIS, INTERPRETATION, AND USE OF TEST DATA
What are the different measures to analyze, interpret, and use appropriate test results?
 To analyze test scores using the measures of central, tendency, variability, position, and co-variability.
 What are the measures of central tendency?
o Measures of central tendency “means the central location or point of convergence of a set of values
o Test scores have a tendency to converge at the central value
o This value is the average of the set of scores.
o A measure of central tendency gives a single value that represents a given set of scores.
o Three commonly used measures of central tendency:
 Mean – the most preferred measure for use with test scores known as the arithmetic mean.
∑𝑋
Formula: 𝑥̄ = , where x̄ = the mean
𝑁
∑X = summation of all the score
N = the no. of scores in the set
 The traditional long computation techniques are using:
 Calculator
 Excel
 Statistical Software – SPSS
Table 8.1 Scores of 100 College Students in a Final Examination
53 30 21 42 33 41 42 45 32 58
36 51 42 49 64 46 57 35 45 51
57 38 49 54 61 36 53 48 52 49
41 58 42 43 49 51 42 50 62 60
33 43 37 57 35 33 50 42 62 49
75 66 78 52 58 45 53 40 60 33
46 45 79 33 46 43 47 37 33 64
37 36 36 46 41 43 42 47 56 62
50 53 49 39 52 52 50 37 53 40
34 43 43 57 48 43 42 42 65 35
∑𝑋𝐹
x̄ = 𝑁
where X = the midpoint of the class interval
f = frequency of each class interval
N = total frequency
Median – the value that divides the ranked score into halves, or the middle value.
Table 8.2 Frequency Distribution of Grouped Test Scores

Cumulative Cumulative
Class Interval Midpoint (X) f X, f
Frequency (cf) Percentage
75-80 77 3 231 100 100
70-74 72 0 0 97 97
65-69 67 2 134 97 97
60-64 62 8 496 95 95
55-59 57 8 456 87 87
50-54 52 17 884 79 79
45-49 47 18 846 62 62
40-44 42 21 882 44 44
35-39 37 13 481 23 23
30-34 32 9 288 10 10
25-29 27 0 0 1 1
20-24 22 1 22 1 1
Total (N) 100 ∑X, f = 4720
Formula:
Mdn = Lower Limit + Size of the class interval
Median Class
𝑁
= 2 – cumulative frequency below the median class
__________________________________________
frequency of the median class
Table 8.2 Frequency Distribution of Grouped Test Scores
Class Interval Midpoint (X) f X, f Cumulative Cumulative
Frequency (cf) Percentage
75-80 77 3 231 100 100
70-74 72 0 0 97 97
65-69 67 2 134 97 97
60-64 62 8 496 95 95
55-59 57 8 456 87 87
16
50-54 52 17 884 79 79
45-49 47 18 846 62 62
40-44 42 21 882 44 44
35-39 37 13 481 23 23
30-34 32 9 288 10 10
25-29 27 0 0 1 1
20-24 22 1 22 1 1
Total (N) 100 ∑X, f = 4720

Applying the formula:


1. You need a column for cumulative frequency. This is now shown on the 5th column
for data in Table 8.2
𝑁
2. Determine 2 , which one-half of the number of scores of examinees is.
3. Find the class interval of the 50th score. In this case, where there are 100 scores, the
50th score is in the class interval of 45-95. This class interval of 45-95 becomes the
median class. We marked lines in the table to indicate where the median class is
located for easy reference when computing the median value.
4. Find the exact limits of the median class. In this case, class 44.5 – 49.5. The lower
limit then is 44.5.
Mode – the easiest measure of central tendency to obtain
 Considering the test data in table 8.2, it can be seen that the highest frequency occupied
in the class interval 45-49. The rough estimate of the mode is 42.
When are the mean, median, and mode appropriately used?
1. Scale of Measurement
a. Four levels of measurement:
 Nominal – the number is used for labelling or identification purposes only.
e.g. student ID No. – Female / Male
 Ordinal level – when the values can be ranked
e.g. academic award, percentile rank, raw scores in university entrance or division-wide exam
 Interval level of measurement – both nominal or ordinal scale – equal interval, e.g. temperature
reading
e.g. A student who gets a score of 120 on a reading ability test is NOT twice the better as the one who gets 60 on the same
test.
 Ratio Scale – the highest level of measurement. It carries the properties of the nominal, ordinal,
and interval scales.
e.g. In a typing class where the words are typed in a minute or words spelled correctly.
b. Which of the most used measure of central tendency? MEAN because it is appropriate for the interval and
ratio variable.
c. Median in some cases – this is esp. true when the test is difficult or when the students are not prepared for
the test.
How do measures of central tendency treatment skewness?
 In a perfectly symmetrical distribution, the mean, median, and mode have the same value.
 The value of the median is between the mean and mode
 On the other hand, the mean will have the smallest value as influenced by the extremely low scores, and the median
still lies between the mode and mean.
What are the measures of dispersion?
 One important description of statistics in the area of assessment is the measures of dispersion.
 It indicates: variability, spread, or scatter
 Measures of Variability give us the estimate to determine how the scores are compressed, which contributes to the
flatness or peakedness of the distribution
 There are several indices of variability, and the most commonly used in the area of measurement are the following:
1. Range – the difference between the highest and lowest score
e.g. Range = Highest – Lowest scores
= 96 – 40
= 56
2. Variance and Standard Deviation:
o Most widely used measure
o Considered the most accurate to represent
Class A Class B Class C
22 16 12
18 15 12
16 15 12
14 14 12
12 12 12
11 11 12
9 11 12
7 9 12
6 9 12
5 8 12
17
o The classes have different scores; however, the mean is 12. This yields the variance.
o Getting the square = STANDARD DEVIATION
∑(𝑥−µ)2
σ2 = 𝑁
where σ2 – population variance
µ - population mean
x – score in the distribution
formula: Square for the standard deviation
∑(𝑥−µ)2
σ2 = 𝑁
where σ – population standard deviation
µ - population mean
x – score in the distribution
to calculate for σ:
∑(𝑥−𝑥̄ )2
𝑠= 𝑁−1
where:
s = standard deviation
x = raw score
𝑥̄ = mean score
N = number of scores in the distribution
Class A Class B
x (x-𝒙̄ ) (𝒙 − 𝒙̄ )𝟐 x (x-𝒙̄ ) (𝒙 − 𝒙̄ )𝟐
22 22-12 100 16 16-12 16
18 18-12 36 15 15-12 9
16 16-12 16 15 15-12 9
14 14-12 4 14 14-12 4
12 12-12 0 12 12-12 0
11 11-12 1 11 11-12 1
9 9-12 9 11 11-12 1
7 7-12 25 9 9-12 9
6 6-12 36 9 9-12 9
5 5-12 49 8 8-12 16
𝒙̄ = 12 ∑(𝑥 − 𝑥̄ )2 = 276 𝒙̄ = 12 ∑(𝑥 − 𝑥̄ )2 = 74
An alternative formula
∑𝑥 2 −(∑𝑥)2 /𝑁
SD = √
𝑁
where: ∑𝑥 = sum of the squares of the raw scores
(∑𝑥)2 = square of the sum of all the raw scores
N = number of examinees
How does standard deviation relate to skewness?
 We determine the skewness of a distribution only graphically.
Fig. 8.5 Homogenous Test Score Distributions in Different Skewness
 Coefficient of skewness formula
3(𝑥̄ −𝑚𝑑𝑛)
SK , where SK = Skewness
𝑆𝐷
𝑥̄ = Mean
Mdn = Median
SD = Standard deviation
What are the measures of position?
A. Measure of Central Tendency
B. Measures of Dispersion
C. Measures of Position
a. Quartile – also used as a measure of the spread of data in the interquartile range (IQR)
 It is the difference between the third and first quartiles. (Q3-Q1)
 One-half of this gives
Semi-Interquartile Range or Quartile Range (Q)
e.g.
b. Decile – It divides the distribution into 10 equal parts
c. Percentile – It divides the distribution into 100 parts.
What is the Coefficient of Variation as a Measure of Relative Dispersion?
 Absolute Dispersion – Original scores which are expressed in units
 Coefficient of Variation - relative Dispersion is dimensionless “unit free”
- The ratio of the standard deviation and the mean of the distribution
σ
CV = µ (100) in a percentage value
If the score in Math is 40, with a standard deviation of 10, then the Coefficient of Variation (CV)
10
CV = 40 (100)
= .25
= 25%

18
How is standard deviation applied in a normal distribution?
 Standard Deviation – the most useful measure of variability and research.
o Normal Distribution – a symmetrical distribution – with a normal curve.
Figure 8.6 The Normal Curve
Figure 8.7 The Areas under the Normal Curve
1. The mean, median, and mode are all equal.
2. The curve is symmetrical. As such, the value in a specific area on the left is equal to the value of its
corresponding area on the right.
3. The curve changes from concave to convex and approaches the X-axis, but the tails do not touch the horizontal
axis.
4. The total area on the curve is equal to 1
What are Normal Scores?
 A score can be interpreted, the mean and variability placement.
 The raw score can be connected to the Z-score.
 Z-score – the most useful score to express a raw score in relation to mean and standard deviation.
𝒙−𝒙̄
Z= 𝑆
x-𝑥̄ = deviation score
x = below the average

A Z-score is called a standard score because it is expressed in standard deviation units.


Figure 8.8 A comparison of Score Distributions with Different Means and Standard Deviations
Figure 8.9 Different Raw Scores in one Z-score Distribution
𝟖𝟔−𝟖𝟎 𝟗𝟎−𝟗𝟓
Ze = 3 = 2; Zp = 2 = -2.5
 T-Score
o A transformed standard score – a scaling of mean σ in a Z-score is transformed into a mean of 50, and the
standard deviation in Z-score is multiplied by 10.
T-Score – 50 + 10z
e.g. A Z-score of -2 is equivalent to a T-score of 30;
T-score = 50 + 10 (-2)
= 50-20
= 30
 Stanine Score – Standard Nine
o Simply arrange the raw score from lowest to highest and with the percentage of the score.
What are the Measures of Co-variability?
 Measure of Co-variability – tell us to a certain extent the relationship between two tests or two factors.

LESSON 9. GRADING AND REPORTING OF TESTS RESULTS


How should you assign grades and communicate their meaning?
 Grading and reporting learners’ test performance is a complex task. It requires specific knowledge, skills, and
experiences.
 Grading and reporting are fundamental elements in the teaching and learning process.

What are the purposes of Grading and Reporting Learners’ Test Performance?
 Communicate the level of learning of the learners in specific course content.
 Give feedback on what specific topic learners have mastered
 Grades serve as a motivator for learners to study and do better
 Give parents information about children’s achievements.
What are the different methods in Scoring Tests Performance Test?
1. Number Right Scoring (NR). The test score is the sum of the scores for correct responses.
2. Negative Marking (NM). Assign positive values to correct answers while punishing the learners for incorrect
responses.
 Both NR and NM methods of scoring multiple-choice tests are prone to guessing, which affects the test validity and
reliability
 Other scorings were introduced:
o Partial Credit Scoring Methods attempt a learners’ degree of level of knowledge with respect to each
response option given.
o Multiple Answer Scoring Method – allow learners to have multiple answers for each item.
o Retrospect Correcting for Guessing – considers omitted or no-answer items are incorrect. The correction
for guessing is implemented later.
o Standard Setting – Standards based on norm–referenced assessment is derived from the test performance
of a certain group of learners. While standards from criterion–referenced assessments are based on preset
standards specified from the very start by the teacher or school in general.
o Holistic Scoring involves giving a single, overall assessment score for an essay, writing composition, or
other performance-type assessment as a whole.

19
The following is an example of a Rubric for an Oral Presentation:
Rating / Grade Characteristics
A It is very organized. Has a clear opening statement that catches the audience’s
(Exemplary interest. The content of the report is comprehensive and demonstrates substance and
depth. Delivery is very clear and understandable. Uses slides/multimedia equipment
effortlessly to enhance the presentation.
B It is mostly organized. Has an opening statement relevant to the topic. Covers
(Satisfactory) important topics. Has an appropriate pace and without distracting mannerisms.
Looks at slides to keep on track.
C Has opening statement relevant to the topic and but does not give an outline of speech;
(Emerging) is somewhat disorganized. Lacks content and depth in the discussion of the topic.
Delivery is fast and not clear; some items are not covered well. Relies heavily on
slides and notes and makes little eye contact.
D Has no opening statement regarding the focus of the presentation. Does not give
(Unacceptable) adequate coverage of the topic. It is often hard to understand, with a voice that is too
soft or too loud and a pace that is too quick or too slow. Just reads slides; slides have
too much text.
ANALYTIC SCORING:
 Involves assessing each aspect of a Performance Task
o Essay writing o Class debate
o Oral presentation o Research paper
 Grades are given by averaging
 Advantage:
o Its reliability
o Provides information about strengths and weaknesses.
Rubric for a Final Research Paper
Criteria / Indicators Expert Proficient Apprentice Novice
4 3 2 1
1. Introduction At least A to C Any two of the Any one of the None of the
a. Clearly identifies are satisfied given indicators given indicators given indicators
and discusses are satisfied is satisfied is satisfied.
research
focus/purpose
b. Research focus is
clearly grounded
in previous
research /
theoretically
relevant literature
c. The significance
of the study is
clearly identified
(and how it adds to
previous research)
d. Others, please
specify
2. Method At least A to C Any two of the Any one of the None of the
Provides accurate and thorough are satisfied given indicators given indicators given indicators
information on the following: are satisfied is satisfied is satisfied.
a. Research method, design, and
context
b. Data sources, collection
procedure, and tools
c. Data analysis
d. Others, please specify.
3. Results At least A to C Any two of the Any one of the None of the
a. Results are clearly explained on a are satisfied given indicators given indicators given indicators
comprehensive level and are well- are satisfied is satisfied is satisfied.
organized.
b. Tables/figures clearly and concisely
convey the data
c. Statistical analyses are appropriate
tests and are accurately interpreted
d. For others, please specify
4. Conclusions, Discussions, and At least A to C Any two of the Any one of the None of the
Recommendations are satisfied given indicators given indicators given indicators
a. Interpretations/analysis of results are satisfied is satisfied is satisfied.
are thoughtful and insightful; are
clearly informed by the study’s

20
results; and thoroughly address and
how they supported, refuted, and/or
informed the
hypotheses/proposition.
b. Discussions on how the study relates
to and/or enhances the present
scholarship in this area are adequate.
c. Suggestions for further research in
this area are insightful and
thoughtful.
d. Others, please specify
5. Documentation and Quality At least A to C Any two of the Any one of the None of the
of Sources are satisfied given indicators given indicators given indicators
a. Cites all data obtained from other are satisfied is satisfied is satisfied.
sources
b. APA style is accurately used in both
text and references.
c. Sources are all scholarly and clearly
relate to research
d. For others, please specify
6. Spelling and Grammar At least A to C Any two of the Any one of the None of the
a. No error in spelling are satisfied given indicators given indicators given indicators
b. No error in grammar are satisfied is satisfied is satisfied.
c. No error in the use of punctuation
marks
d. For others, please specify
7. Manuscript Format At least A to C Any two of the Any one of the None of the
a. Title page has proper APA are satisfied given indicators given indicators given indicators
formatting are satisfied is satisfied is satisfied.
b. Used correct headings and
subheadings consistently
c. Proper margins were observed
d. For others, please specify
Final Grade
 Primary Trait Scoring
o Focuses only one aspect of scoring
o Advantage or disadvantage
o It needs a detailed scoring guide

 Multiple-Trait Scoring
o It is like analytic scoring because of its focus
o Focuses on specific features
o Ability to present arguments clearly
o Organize one’s thoughts
o Correct grammar, punctuation, and spelling
What are the different types of test scores?
 Grading methods communicate the teachers’ evaluative appraisal of learners' level of achievement or performance
in a test or task
 Test scores can take the form of:
o Raw score - number of items answered correctly on a test
- may be useful if everyone knows the test coverage
o Percentage Score – it is interpreted as the percent of content, skills, or knowledge that learners have a solid
group of:
 Most appropriate to use in the teacher-made test or criterion-reference test
 It is suitable to use in subjects wherein a standard has been set.
o Criterion – Reference Grading System
 Test scores are based on performance in specified learning goals
 It is premised on the assumption that learners’ performance is independent of the performance of
the other learners in their group/class.
Types of Criterion-Referenced Scores or Grades:
 Pass or Fail Grade
o Needs a standard or cut-off score
o Appropriate for comprehensive or licensure exams because there is no limit to
the number of examinees who pass or fail
o Advantages:
 It takes the pressure of the learners to get a high letter or numerical grade
 It gives the learner a clear-cut idea of their strengths/weaknesses
 It allows learners to focus on true understanding
 Letter Grade – one of the most commonly used grading systems.
o A, B, C, D, E or five-level grading scale
o A – highest level; E or F – lowest grade

21
o While letters are easy to understand they are not so clear to parents, learners, and
other stakeholders.
 Plus (+) and Minus (-) letter Grade
(+)/(-) Letter Grades Interpretation
A+ Excellent
A Superior
A- Very Good
B+ Good
B Very Satisfactory
B- High Average
C+ Average
C Fair
C- Pass
D Conditional
E/F Failed
 Categorical Grades
Exceeding Meeting Approaching Emerging Not
Standards Standards Standards Standards Exceeding
Standards
Advanced Intermediate Basic Novice Below Basic
Exemplary Accomplished Developing Beginning Inadequate
Expert Proficient Competent Apprentice Novice
Master Distinguished Proficient Intermediate Novice
o Norm-Referenced Grading System
 In grading, learners’ test scores are compared with those of peers.
 Norm-referenced grading allows teachers to:
 Compare learners’ test performance with that of other learners
 Compare learners’ performance in one test (subtest) with another test.
 Compare learners’ performance in one form of the test with another form of the test
administered at an earlier date.
Types of Norm-Referenced Scores
 Developmental Score – transformed from raw scores and reflects the average performance
at age and grade level.
o Grade Equivalent Score
 Describes the test performance of a learner in terms of a grade level and
the months since the beginning of the school year. A decimal point is
used between the grade and month in grade equivalents. 7.5 – 7 taking
the test at the end of the fifth month of the school year.
o Age-Equivalent Score – The learner’s score of 11-5 means that his age equivalent
is 11 years and 5 months
 Percentile Rank – if a learner obtained a score of 75 percentile rank in a standardized
achievement test. A score is more than 75%.
 Stanine Score – expresses test result in nine equal steps
Description Stanine Percentile Rank
Very High 9 96 and above
Above Average 8 90-95
7 77-89
Average 6 60-76
5 40-59
4 23-39
Below Average 3 11-22
2 4-10
Very Low 1 3 and below
 Standard Score – they are raw scores that are converted into a common scale of
measurement that provides a meaningful description of the individual score.
o Z-Score
o T-Score
T = 50 + 10z
A T-score of 50 is considered average
What are the General Guidelines for Grading Tests or Performance Tasks?
1. Stick to the purpose of the assessment.
o Determine the purpose of the test. Formative, summative, diagnostic
2. Be guided by the learning outcomes. What is included in the test? Learners should know.
3. Develop grading criteria – it saves time in the grading process
4. Inform the learner what scoring methods are to be used.
5. Decide on what type of test scores to use.
What are the General Guidelines for Grading Essay Tests?
- Scoring essay responses can be made more rigorous by developing a scoring scheme.
1. Identify the criteria for rating the essay
2. Determine the type of rubric to use
22
3. Prepare the rubric
Point Values Sample Performance Benchmarks
1 Needs Improvement Beginning Novice Inadequate
2 Satisfactory Developing Apprentice Developing
3 Good Accomplished Proficient Proficient
4 Exemplary Exceptional Distinguished Skilled
4. Evaluate essay anonymously
5. Score one essay question at a time.
6. Be conscious of your own biases when evaluating a paper
7. Review initial scores and comments before giving the final rating
8. Get two or more raters for essay
9. Write comments
What is the New Grading System of the Phil. K-12 Program?
The components are: Written Work, Performance Task, and Quarterly Assessment Weight for Grades 1-10:
Component:
Component Senior High School
Core Subjects Immersion / Research / All Other Immersion / Research
Business Simulation / Subjects / Exhibit /
Exhibit / Performance Performance
Written Work 30% 40% 20%
Performance Task 50% 40% 60%
Quarterly Assessment 20% 20% 20%

How Should Test Results be communicated to Different Stakeholders?


 Learners
 Parents
 Other Stakeholders
Regularly, every grading period

Thank You.
God Bless Us All!

Maám Inday Joven 2023!

23

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy