0% found this document useful (0 votes)
10 views10 pages

Objectives

The document outlines the essential characteristics of assessment, focusing on validity, reliability, and usability. It emphasizes that validity pertains to the appropriateness of interpretations of assessment results, while reliability refers to the consistency of those results across different contexts. Usability addresses the practicality of the assessment process, ensuring it is economical, easy to administer, and produces interpretable results.

Uploaded by

evin27844
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views10 pages

Objectives

The document outlines the essential characteristics of assessment, focusing on validity, reliability, and usability. It emphasizes that validity pertains to the appropriateness of interpretations of assessment results, while reliability refers to the consistency of those results across different contexts. Usability addresses the practicality of the assessment process, ensuring it is economical, easy to administer, and produces interpretable results.

Uploaded by

evin27844
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

3/20/2025

1 VALIDITY, RELIABILTY AND USABILITY

2 Essential assessment charactertics


Validity

Reliability

Usability

3 Validity and reliability


Validity
adequacy and appropriateness of the interpretations and uses of assessment
results

E.g.
If the results are to be used as a measure of students’ reading skills
our interpretations are to be based on evidence that the scores actually reflect
reading skills
not impacted by irrelevant factor, such as the vocabulary or linguistic complexity

4 Validity and reliability


Reliability
the consistency of assessment results

E.g.
we get similar scores when the same assessment procedure is used with the same
students on two different occasions
a high degree of reliability from one occasion to another

We get similar scores when different teachers independently rate student
performances on the same assessment task
a high degree of reliability from one rater to another

5 Validity and reliability


Reliability
we are concerned with consistency of the results
rather than with appropriateness of the interpretations made from the results
(which is validity).

Reliability (consistency) of measurement is needed to obtain valid results, but can
have reliability without validity
6 Usability
Refers to the practicality of the procedure
Not about the other qualities present

Assessment procedure should

1
6 3/20/2025


Assessment procedure should
Be economical in terms of time and money
Be easily administered
Be easily scored
Produce results that can be accurately interpreted

7 Nature of validity
Validity
The appropriateness of the interpretation and use of the results

A matter of degree
it does not exist on all-or-none basis. (high validity, low validity)

Specific to some particular use or interpretation for a specific population of test
takers
No assessment is valid for all purposes
When indicating computational skill
the mathematics test may have a high degree of validity for 3rd and 4th
graders but a low degree of validity for the 2nd and 5th graders
A reading test
may have high validity for skimming and scanning and low validity for
inferencing

Necessary to consider the specific interpretation or use to be made of the results

8 Major considerations in assessment validation
Content
The assessment content and specifications from which it was derived

Construct
The nature of the characteristics being measured

Assessment-criterion relationships
The relation of the assessment results to other measures

Consequences
The consequences of the uses and interpretations of the results

9 Content
How an individual performs on a domain of tasks that the assessment is supposed
to represent

E.g. knowledge of 200 words
we select 20 words and generalize it to the knowledge of 200

2
9
3/20/2025

E.g. knowledge of 200 words


we select 20 words and generalize it to the knowledge of 200
the extent to which our 20-word test constituted a representative sample of the
200 words

the goal in the consideration of content validation
to determine if a set of assessment tasks
provides a relevant and representative sample of the domain of tasks
10 Content
The definition of the domain to be assessed
derive from the identification of goals and objectives

The assessment begins with a content area that reflects the goals and objectives

Steps
Specifying the domain of instructionally relevant tasks
Specifying the emphasis according to the priority of goals and objectives
Constructing or selecting a representative set of assessment tasks

From what has been taught
to what is to be measured
to what should be emphasized in the assessment
to a representative sample of relevant tasks

11 Content
Assessment development to enhance validity

Table of specifications

Subject-mater content (topics to be learned)

Instructional objectives (types of performance)



12 Content
Assessment development to enhance validity
The percentage in the table
The relative degree of emphasis that each content area and each instructional
objective is to be given in the test


13 Content
Table of specifications

3
3/20/2025

13
Table of specifications

The specifications should be in harmony with what was taught
The weights assigned in the table reflect the emphasis that was given during
instruction

The more closely the Qs match the specified sample
the more valid a measure of student learning

It can be used in selecting tests that publishers prepare
How well do they match with our table of specifications?

14 Construct
Is the test actually measuring the construct it claims it is measuring?

A construct is an individual characteristic or an abstract theoretical concept
assumed to exist to explain some aspect of behavior
Reading comprehension, inferencing, speaking proficiency, intelligence,
creativity, anxiety, mathematical reasoning, etc.

These are called constructs because they are theoretical constructions that are used
to explain performance on an assessment
15 Construct
Construct validation
the process of determining if the performance on an assessment can be
interpreted in terms of a construct(s)

Two questions are important in construct validations

Does the assessment adequately represent the intended construct? (construct
underrepresentation)
Problem-solving task turning into a memorization task

Is performance influenced by factors that are irrelevant to the construct?
(construct-irrelevant variance)
A mathematics test influenced by reading demands

16 Methods used in construct validation


Defining the domain(area) or tasks to be measured (also in content validation)

Analyzing the response process required by the assessment tasks
Thinking aloud or interviewing (to check on mental process)

Comparing the scores of known groups
A prediction of differences for a particular test or assessment can be checked

4
3/20/2025

Comparing the scores of known groups


A prediction of differences for a particular test or assessment can be checked
against groups that are known to differ and the results used as a partial support
for construct validation (e.g. mathematics majors vs English majors)
The test should be able to distinguish them

Comparing scores before and after a particular learning experience or experimental
treatment
Scores increase with instruction?

Comparing scores with other similar measures (also an assessment-criterion
consideration)
E.g. high correlation between like tests and lower correlation between unlike tests

17 Assessment-criterion considerations
When test scores are to be used
to predict future performance
to estimate current performance on some valued measure other than the test
itself (called a criterion)

Concerned with evaluating the relationship between the test and the criterion

18 Assessment-criterion considerations
For example, can ALES scores indicate success at exams in masters programs?

The degree of relationship can be described by statistically correlating the two set of
scores
The resulting correlation coefficient provides a numerical summary of the degree
of relationship between the two sets of scores

Scatter plots and expectancy tables can also be used.

19

20
Example on excel

Interpretation

Interpretation
.90 to

21 Consideration of consequences
Assessments are intended to contribute to improved learning, but do they?

What impact do assessments have on teaching?

5
3/20/2025
21


What impact do assessments have on teaching?

What are the possibly negative, unintended consequences of a particular use of
assessment results?

High importance associated with test results lead teachers to focus narrowly on
what is on the test while ignoring important parts of the curriculum not covered by
the test

E.g. Changing the construct of teaching from problem-solving to memorization
ability because of a high-stakes test

An example: college professors preparing for YDS for several years and end up
passing exam but not speaking English

22 Factors influencing validity


Factors in the test or assessment itself
Unclear directions
Difficult language
Ambiguity
Inadequate time limit (construct-irrelevant variance)
Overemphasis of easy-to-assess aspects and disregard difficult-to-assess aspects
(construct underrepresentation)
Poorly-constructed test items (e.g. providing clues)
Test too short (i.e. may not be represenative)
Improper arrangement of test (like most difficult ones first)
Identifiable pattern of answers (T, F, T, F, T, F, T, F)

23 Factors influencing validity


Factors in administration and scoring
Insufficient time
Unfair aid to students
Cheating
Unreliable scoring
Failing to follow directions
Adverse physical and psychological conditions

Factors in student responses (like motivation, fear, anxiety)

24 Reliability
The consistency of measurement
how consistent test scores or results are from one assessment to another

The more consistent the assessment results are from one measurement to another
the fewer errors there will be

25

6
24
3/20/2025

The more consistent the assessment results are from one measurement to another
the fewer errors there will be
Consequently, the greater reliability

25 Reliability
An estimate of reliability refers to a particular type of consistency
Different periods of time
Different samples of tasks
Different raters

Low reliability means low validity
But high reliability does not mean high validity
26 Determining reliability in correlation methods
Consistency
over a period of time
over different forms of assessment
within the assessment itself
different raters

27 Test-retest method
The same assessment
administered twice to the same group of students
with a given time interval between the two (a measure of stability)
Not too long not too short for the purpose

The longer the interval between the first and second assessments
influenced by changes in the student characteristic being measured
the smaller the reliability coefficient will be

28 Test-retest method
Stability is important when results are used for several years
like English test scores, but not as important for a unit test

The test-retest method is not very relevant for teacher-constructed classroom tests
Not desirable to readminister the same assessment

In choosing standardized tests, stability is an important criterion
29 Equivalent(parallel)-forms method
Uses two different but equivalent forms of an assessment

Two different tests are prepared based on the same set of specifications
Administered to the same group of students in a short period of time
The resulting assessment scores are correlated

It does not tell anything about long-term stability

30

7
3/20/2025

It does not tell anything about long-term stability


30 Split-half method
The assessment is administered to a group of students in the usual manner and
then is divided in half for scoring purposes

E.g. to score the even-numbered and the odd-numbered tasks separately

This produces two scores for each student
When correlated, provides a measure of internal consistency

To estimate the scores’ reliability based on the full-length assessment, Spearman
Brown formula is applied
31 Interrater consistency
When student work is judgmentally scored
whether the same scores are assigned by another judge

Consistency can be evaluated with correlation
the scores assigned by one judge with those assigned by another judge

To achieve acceptable levels of interrater consistency
Agreed on scoring-rubrics
Training of raters to use those rubrics with examples of student work
32 Writing rubric

33

34

35

36

37

38

39

40 Examples

41 Reliability methods

42 Standard error of measurement


The amount of variation in the scores would be directly related to the reliability of
the assessment procedures
Low reliability by large variations in the student’s assessment results
High reliability by little variation from one assessment to another

To estimate the amount of variation to be expected in the scores

8
3/20/2025


To estimate the amount of variation to be expected in the scores
Standard error of measurement

The standard error of measurement is the standard deviation of the errors of
measurement

When the standard error of measurement is small, the confidence band is narrow
(indicating high reliability)
Greater confidence that the obtained score is near the true score

A teacher who is aware of the standard error of measurement realizes that it is
impossible to be dogmatic in interpreting minor differences in assessment scores

43 Standard error of measurement

44 Factors influencing reliability measures


Number of assessment tasks
The larger the number of assessment tasks (e.g. questions) on an assessment, the
higher its reliability will be

Spread of scores
The larger the spread of scores, the higher the estimate of reliability
Individuals stay in the same relative position in a group from one assessment to
another

Objectivity
Degree to which equally competent scorers obtain the same results
Objectivity can be increased by careful phrasing of the questions and by a
standard set of rules for scoring
45 Usability
Ease of administration
Easy directions? Complicated directions? Requires expertise to implement?

Time required for administration
Allot as much time needed to obtain valid and reliable scores, not more

Ease of interpretation and application
If misinterpreted, there is no use and may even be harmful to some individual or
group

Availability of equivalent forms or comparable forms
Can also be useful in measuring development

Cost of testing
To save money, one should not prefer tests with lower validity and reliability
estimates

9
3/20/2025

To save money, one should not prefer tests with lower validity and reliability
estimates

10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy