0% found this document useful (0 votes)
760 views42 pages

Qualities of An Evaluation Tool

This document discusses key qualities and characteristics of evaluation instruments, focusing on validity and reliability. It defines validity as the extent to which a test measures what it intends to measure and discusses different types of validity evidence including content, criterion, and construct validity. Reliability is defined as the consistency of measurement. Validity is concerned with the appropriateness of test interpretation while reliability pertains to consistency of results. Several factors can influence the validity of an instrument like unclear directions, inappropriate difficulty level, or ambiguity.

Uploaded by

ShijiThomas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
760 views42 pages

Qualities of An Evaluation Tool

This document discusses key qualities and characteristics of evaluation instruments, focusing on validity and reliability. It defines validity as the extent to which a test measures what it intends to measure and discusses different types of validity evidence including content, criterion, and construct validity. Reliability is defined as the consistency of measurement. Validity is concerned with the appropriateness of test interpretation while reliability pertains to consistency of results. Several factors can influence the validity of an instrument like unclear directions, inappropriate difficulty level, or ambiguity.

Uploaded by

ShijiThomas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

QUALITIES OF

EVALUATION
INSTRUMENTS
Mrs. Shiji Thomas
Professor
Caritas College of Nursing
Characteristics/ Qualities of evaluation
procedures
Essential qualities Other qualities
• Validity • Relevance
• Reliability • Equilibrium
• Objectivity • Discrimination
• Usability
1. Validity
• The extent to which the test really measures what it is intended to measure
• It refers to the appropriateness of the interpretations made from test scores and other evaluation
results , with regard to a particular use
• Validity is always concerned with the specific use of the results and the soundness of our
proposed interpretations
• Validity is relative and always specific for a particular test
• To be valid, the measuring instrument should be reliable and relevant
• As the reliability of a test increases, it becomes more valid
• Validity of a test is the relevance of a test to its objective
• Validity pertains to the results of a test and not to the instrument itself
• Validity is always specific to some particular use ; it is not a general
quality of a test
Nature of validity
• Validity refers to the appropriateness of the interpretation of the results of
a test or evaluation instrument for a given group of individuals, and not to
the instrument itself
• Validity is a matter of degree(high validity, moderate validity & low
validity); it does not exist on an all-or-none basis
• Validity is always specific to some particular use or interpretation
• Validity is a unitary concept
Procedure Meaning
Approaches to test validation
Content related evidence Compare the test tasks to How well the sample of
the test specifications test task represents the
describing the task domain of tasks to be
domain under measured
consideration
Criterion related Compare test scores How well test
evidence with another measure of performance predicts
performance obtained at future performance or
a later date(for estimates current
prediction) or with performance on some
another measure of valued measure other
performance obtained than the test itself(called
concurrently(for a criterion
estimating the present
status)
Procedure Meaning
Approaches
Construct related
evidence
to
Establish the test
meaningvalidation
of How well the test
the scores on the test by performance can be
controlling the interpreted as a
development of the test, meaningful measure of
evaluating the some characteristic or
relationships of the quality?
scores with other
relevant measures, and
experimentally
determining what factors
influence test
performance
Content related evidence
• Content validation is a process of determining the extent to which a set of
test tasks provides a relevant and representative sample of the domain of
tasks about which interpretations of tests scores are made
Content validation in the testing of classroom
achievement
Classroom instruction
Determines which intended learning outcomes(objectives) are to be
achieved by pupils

Achievement domain
Specifies and delimits a set of instructionally relevant learning tasks to be
measured by a test

Achievement test
Provides a set of relevant test items designed to measure a representative
sample of the tasks in the achievement domain
Content validation and test development
• Identifying the learning outcomes to be measured
• Preparing a test plan that specifies the sample of items to be used
• Construct a test that closely fits the set of test specifications
Table of specifications
• The content of a course or curriculum may be broadly defined to include
both subject matter content and instructional objectives
• The former is concerned with the topics to be learned and the latter with
the types of performance pupils are expected to demonstrate (eg: knows,
comprehends, applies)
Table of specifications showing the relative
emphasis in percent to be given to the content area
Content area Instructional objectives
and
Knowsinstructional
Comprehends objectives
Applies Total
concepts concepts concepts
Plants 8 4 4 16

Animals 10 5 5 20

Weather 12 8 8 28

Earth 12 4 2 18

Sky 8 4 6 18

TOTAL 50 25 25 100
Criterion related evidence
• Defined as the process of determining the extent to which test
performance is related to some other valued measure of performance
• The second measure of performance(criterion) may be obtained at some
future date(interested in predicting future performance) or concurrently
(interested in estimating present performance)
Predictive validation study

September 17 December 10
Scholastic aptitude scores Achievement test scores
(test performance) (criterion performance)

Concurrent validation study

September 17 September 17
Scholastic aptitude scores Achievement test scores
(test performance) (criterion performance)
• The key element in both types of criterion related study is the degree of relationship
between the two sets of measure: 1. the test scores and 2. the criterion to be predicted
• The relationship is expressed by means of a correlation coefficient or an expectancy table
• A correlation coefficient(r) indicates the degree of relationship between two sets of
measures
• 1.00 = perfect positive correlation
• .00 = no relationship
• -1.00= perfect negative relationship
• When correlation coefficient is used to express the degree of relationship
between a set of test scores and some criterion measure, it is called
validity coefficient
• Validity coefficients must be judged on a relative basis, the larger
coefficients being favored
• Expectancy table is a simple and practical means of expressing criterion
related evidence of validity
• A two fold chart with the test scores(the predictor) arranged in categories
down the left side of the table and the measure to be predicted(the
criterion)arranged in categories across the top of the table
• For each category of scores on the predictor, the table indicates the
percentage of individuals who fall within each category of the criterion
Expectancy table showing the relation between scholastic
aptitude scores and course grades for 30 students in science
Grouped course
PERCENTAGE IN EACH SCORE CATEGORY RECEIVING EACH GRADE
scholastic
aptitude
scores
(STANINES)
E D C B A
ABOVE 14 43 43
AVERAGE
(7,8,9)
AVERAGE 19 37 25 19
(4,5,6)
BELOW 57 29 14
AVERAGE
(1,2,3)
Construct related evidence/validity
• The construct related category of evidence focuses on test performance as a
basis for inferring the possession of certain psychological characteristics
• A construct is a psychological quality that we assume exists in order to
explain some aspect of behavior
• Eg mathematical reasoning, intelligence, creativity, honesty, anxiety etc
• Construct validation may be defined as a process of determining the extent to
which test performance can be interpreted in terms of one or more
psychological construct
Process of construct validation
• Identifying and describing, by means of a theoretical framework, the
meaning of the construct to be measured
• Deriving hypotheses regarding test performance from the theory
underlying the construct and
• Verifying the hypotheses by logical and empirical means
Factors influencing validity
• Unclear directions
• Reading vocabulary and sentence structure too difficult
• Inappropriate level of difficulty of the test items
• Poorly constructed test items
• Ambiguity
• Test items inappropriate for the outcomes being measured
• Inadequate time limits
• Test too short
• Improper arrangement of items
• Identifiable patterns of answers
2. RELIABILITY
Reliability refers to the consistency of measurement- that is , how consistent
test scores or other evaluation results are form one measurement to another
• Reliability of test scores is typical;;y reported by means of a reliability
coefficient or the standard error of measurement
Reliability coefficient

• A correlation coefficient that indicates the degree of relationship between


two sets of measures obtained from the same procedure
• We may administer the same test twice to a group, with a time interval in
between(test-retest method), administer two equivalent forms of the test in
close succession(Equivalent forms method), administer two equivalent
forms of a the test with a time interval in between (test-retest with
equivalent forms method) or administer the test once and compute the
consistency of responses within the test(internal consistency method)
Methods of estimating reliability
Method Type of information provided
Test-retest method The stability of test scores over a given period of
time
Equivalent forms method The consistency of the test scores over different
forms of the test (that is different samples of
items)
Test=retest with equivalent forms The consistency of test scores over both a time
interval and different forms of the test
Internal consistency method The consistency of test scores over different parts
of the test
Test-retest method
• Requires administering the same form of the test to the same group after some time
interval
• The length of time interval should fit the type of interpretation to be made from the
results.
• Test retest reliability coefficients are influenced both by errors within the
measurement procedures and by the day-to-day stability of the students responses
• Longer time periods between testing will result in low reliability coefficients, due to
greater changes in the students
• The report is “the stability of test scores obtained in the same from over a three month
period was .90”
Equivalent forms method
• Two equivalent forms of a test(also called alternate forms or parallel
forms) are administered to the same group during the same testing session
• The test forms are equivalent in the sense that they are built to measure
the same abilities, constructed independently
• A high reliability coefficient indicates the adequacy of the test sample
• A high reliability coefficient would indicate that the two forms are
measuring apparently the same thing
Test –retest method with equivalent forms
• Combination of previous two methods
• Two different forms of the same test are administered with time
intervening
• This is the most demanding estimate of reliability, since it takes into
account all possible source of variation
• The reliability coefficient reflects errors within the testing procedure,
consistency over different samples of items and the day to day stability of
the student’s responses
Internal consistency methods
• Require only a single administration of a test
• Split-half method, involves scoring the odd items and the even items
separately and correlating the two sets of scores.
• This correlation coefficient indicates the degree to which the two
arbitrarily selected halves of the test provide the same results.
• The reliability coefficient for the total test is determined by applying the
Spearman-Brown Prophecy formula
Spearman-Brown Prophecy formula

• Reliability of total test = 2 x reliability of ½ test



• 1 + reliability of ½ test
• Eg: if we obtained a correlation coefficient of .60 for two halves of a test,
the reliability for the total test is computes as
• 2 x .60 / 1+ .60 = .75
Kuder – Richardson formula
• Used to estimate the reliability of test scores rom single administration of
the test.
• It requires three type of information
1. The number of items in the test
2. The mean and
3. The standard deviation
Kuder – Richardson formula contd..
• Reliability estimate(KR 21) = 1 - M (K –M)
• K (s2)
• K = No. of items in the test
• M= The mean of the test scores & s = SD of test scores
• the reliability coefficients for classroom tests typically range between .
60 and .80
Factors that lower the reliability of test scores

• Test items are based on too few items


• Range of score is too limited
• Testing conditions are too inadequate
• Scoring is subjective
Standard error of measurement
• Especially useful way of expressing test reliability because it indicates
the amount of error to allow for when interpreting individual test scores
• SEM = s √ 1 – r n
• s = Standard deviation
• r n = Reliability coefficient
• Eg: SEM = 4.5 √ 1 – .61 = 2.8 , approximately 3
Standard error of measurement
• The SEM shows how many points we must add to, and subtract from, an
individual’s test score in order to obtain “ reasonable limits” for
estimating that individuals true score
3. OBJECTIVITY
• It means that an individual’s score is the same, or essentially the same,
regardless of who is doing the scoring
• Objectivity has two aspects- objectivity of test and that of scoring
• Objectivity of the test is defined as the extent to which a student’s score is
based on his actual answer or performance on the test and not only the opinion
of the examiner
• Usually determined by having precise questions and predetermined scoring
scheme
4. COMPREHENSIVENESS
• The test should have adequate sample of major lesson objectives to
provide a valid measure of student achievement
5. DISCRIMINATION
• The test should be constructed in such a manner that it will detect or
measure small differences in achievement
• Essential if the test is used to rank the students on the basis of individual
achievement or for assigning grades
• can be determined by item analysis
6. USABILITY
• In selecting tests and other evaluation instruments, practical
considerations should not be neglected
• Consider the expertise of teachers in measurement, time available, cost of
testing, ease of interpretation etc
7. RELEVANCE
• The test should contain only relevant tasks
8. EQUILIBRIUM
• A balanced assessment sets target in all domains of learning and all
domains of intelligence
Miscellaneous qualities
• Fairness
• Administrability
• Scorability
• Practicality and efficiency

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy