0% found this document useful (0 votes)

10 views10 pages

Objectives

The document outlines the essential characteristics of assessment, focusing on validity, reliability, and usability. It emphasizes that validity pertains to the appropriateness of interpretations of assessment results, while reliability refers to the consistency of those results across different contexts. Usability addresses the practicality of the assessment process, ensuring it is economical, easy to administer, and produces interpretable results.

Uploaded by

evin27844

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views10 pages

Objectives

Uploaded by

evin27844

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

3/20/2025

1 VALIDITY, RELIABILTY AND USABILITY

2 Essential assessment charactertics

Validity

Reliability

Usability


3 Validity and reliability

Validity
adequacy and appropriateness of the interpretations and uses of assessment
results

E.g.
If the results are to be used as a measure of students’ reading skills
our interpretations are to be based on evidence that the scores actually reflect
reading skills
not impacted by irrelevant factor, such as the vocabulary or linguistic complexity

4 Validity and reliability

Reliability
the consistency of assessment results

E.g.
we get similar scores when the same assessment procedure is used with the same
students on two different occasions
a high degree of reliability from one occasion to another

We get similar scores when different teachers independently rate student
performances on the same assessment task
a high degree of reliability from one rater to another

5 Validity and reliability

Reliability
we are concerned with consistency of the results
rather than with appropriateness of the interpretations made from the results
(which is validity).

Reliability (consistency) of measurement is needed to obtain valid results, but can
have reliability without validity
6 Usability
Refers to the practicality of the procedure
Not about the other qualities present

Assessment procedure should

1
6 3/20/2025


Assessment procedure should
Be economical in terms of time and money
Be easily administered
Be easily scored
Produce results that can be accurately interpreted

7 Nature of validity
Validity
The appropriateness of the interpretation and use of the results

A matter of degree
it does not exist on all-or-none basis. (high validity, low validity)

Specific to some particular use or interpretation for a specific population of test
takers
No assessment is valid for all purposes
When indicating computational skill
the mathematics test may have a high degree of validity for 3rd and 4th
graders but a low degree of validity for the 2nd and 5th graders
A reading test
may have high validity for skimming and scanning and low validity for
inferencing

Necessary to consider the specific interpretation or use to be made of the results

8 Major considerations in assessment validation
Content
The assessment content and specifications from which it was derived

Construct
The nature of the characteristics being measured

Assessment-criterion relationships
The relation of the assessment results to other measures

Consequences
The consequences of the uses and interpretations of the results

9 Content
How an individual performs on a domain of tasks that the assessment is supposed
to represent

E.g. knowledge of 200 words
we select 20 words and generalize it to the knowledge of 200

2
9
3/20/2025

E.g. knowledge of 200 words

we select 20 words and generalize it to the knowledge of 200
the extent to which our 20-word test constituted a representative sample of the
200 words

the goal in the consideration of content validation
to determine if a set of assessment tasks
provides a relevant and representative sample of the domain of tasks
10 Content
The definition of the domain to be assessed
derive from the identification of goals and objectives

The assessment begins with a content area that reflects the goals and objectives

Steps
Specifying the domain of instructionally relevant tasks
Specifying the emphasis according to the priority of goals and objectives
Constructing or selecting a representative set of assessment tasks

From what has been taught
to what is to be measured
to what should be emphasized in the assessment
to a representative sample of relevant tasks

11 Content
Assessment development to enhance validity

Table of specifications

Subject-mater content (topics to be learned)

Instructional objectives (types of performance)



12 Content
Assessment development to enhance validity
The percentage in the table
The relative degree of emphasis that each content area and each instructional
objective is to be given in the test


13 Content
Table of specifications

3
3/20/2025

13
Table of specifications

The specifications should be in harmony with what was taught
The weights assigned in the table reflect the emphasis that was given during
instruction

The more closely the Qs match the specified sample
the more valid a measure of student learning

It can be used in selecting tests that publishers prepare
How well do they match with our table of specifications?

14 Construct
Is the test actually measuring the construct it claims it is measuring?

A construct is an individual characteristic or an abstract theoretical concept
assumed to exist to explain some aspect of behavior
Reading comprehension, inferencing, speaking proficiency, intelligence,
creativity, anxiety, mathematical reasoning, etc.

These are called constructs because they are theoretical constructions that are used
to explain performance on an assessment
15 Construct
Construct validation
the process of determining if the performance on an assessment can be
interpreted in terms of a construct(s)

Two questions are important in construct validations

Does the assessment adequately represent the intended construct? (construct
underrepresentation)
Problem-solving task turning into a memorization task

Is performance influenced by factors that are irrelevant to the construct?
(construct-irrelevant variance)
A mathematics test influenced by reading demands

16 Methods used in construct validation

Defining the domain(area) or tasks to be measured (also in content validation)

Analyzing the response process required by the assessment tasks
Thinking aloud or interviewing (to check on mental process)

Comparing the scores of known groups
A prediction of differences for a particular test or assessment can be checked

4
3/20/2025

Comparing the scores of known groups

A prediction of differences for a particular test or assessment can be checked
against groups that are known to differ and the results used as a partial support
for construct validation (e.g. mathematics majors vs English majors)
The test should be able to distinguish them

Comparing scores before and after a particular learning experience or experimental
treatment
Scores increase with instruction?

Comparing scores with other similar measures (also an assessment-criterion
consideration)
E.g. high correlation between like tests and lower correlation between unlike tests



17 Assessment-criterion considerations
When test scores are to be used
to predict future performance
to estimate current performance on some valued measure other than the test
itself (called a criterion)

Concerned with evaluating the relationship between the test and the criterion

18 Assessment-criterion considerations
For example, can ALES scores indicate success at exams in masters programs?

The degree of relationship can be described by statistically correlating the two set of
scores
The resulting correlation coefficient provides a numerical summary of the degree
of relationship between the two sets of scores

Scatter plots and expectancy tables can also be used.

19

20
Example on excel

Interpretation

Interpretation
.90 to

21 Consideration of consequences
Assessments are intended to contribute to improved learning, but do they?

What impact do assessments have on teaching?

5
3/20/2025
21


What impact do assessments have on teaching?

What are the possibly negative, unintended consequences of a particular use of
assessment results?

High importance associated with test results lead teachers to focus narrowly on
what is on the test while ignoring important parts of the curriculum not covered by
the test

E.g. Changing the construct of teaching from problem-solving to memorization
ability because of a high-stakes test

An example: college professors preparing for YDS for several years and end up
passing exam but not speaking English

22 Factors influencing validity

Factors in the test or assessment itself
Unclear directions
Difficult language
Ambiguity
Inadequate time limit (construct-irrelevant variance)
Overemphasis of easy-to-assess aspects and disregard difficult-to-assess aspects
(construct underrepresentation)
Poorly-constructed test items (e.g. providing clues)
Test too short (i.e. may not be represenative)
Improper arrangement of test (like most difficult ones first)
Identifiable pattern of answers (T, F, T, F, T, F, T, F)


23 Factors influencing validity

Factors in administration and scoring
Insufficient time
Unfair aid to students
Cheating
Unreliable scoring
Failing to follow directions
Adverse physical and psychological conditions

Factors in student responses (like motivation, fear, anxiety)


24 Reliability
The consistency of measurement
how consistent test scores or results are from one assessment to another

The more consistent the assessment results are from one measurement to another
the fewer errors there will be

6
24
3/20/2025

The more consistent the assessment results are from one measurement to another
the fewer errors there will be
Consequently, the greater reliability

25 Reliability
An estimate of reliability refers to a particular type of consistency
Different periods of time
Different samples of tasks
Different raters

Low reliability means low validity
But high reliability does not mean high validity
26 Determining reliability in correlation methods
Consistency
over a period of time
over different forms of assessment
within the assessment itself
different raters

27 Test-retest method
The same assessment
administered twice to the same group of students
with a given time interval between the two (a measure of stability)
Not too long not too short for the purpose

The longer the interval between the first and second assessments
influenced by changes in the student characteristic being measured
the smaller the reliability coefficient will be

28 Test-retest method
Stability is important when results are used for several years
like English test scores, but not as important for a unit test

The test-retest method is not very relevant for teacher-constructed classroom tests
Not desirable to readminister the same assessment

In choosing standardized tests, stability is an important criterion
29 Equivalent(parallel)-forms method
Uses two different but equivalent forms of an assessment

Two different tests are prepared based on the same set of specifications
Administered to the same group of students in a short period of time
The resulting assessment scores are correlated

It does not tell anything about long-term stability

7
3/20/2025

It does not tell anything about long-term stability



30 Split-half method
The assessment is administered to a group of students in the usual manner and
then is divided in half for scoring purposes

E.g. to score the even-numbered and the odd-numbered tasks separately

This produces two scores for each student
When correlated, provides a measure of internal consistency

To estimate the scores’ reliability based on the full-length assessment, Spearman
Brown formula is applied
31 Interrater consistency
When student work is judgmentally scored
whether the same scores are assigned by another judge

Consistency can be evaluated with correlation
the scores assigned by one judge with those assigned by another judge

To achieve acceptable levels of interrater consistency
Agreed on scoring-rubrics
Training of raters to use those rubrics with examples of student work
32 Writing rubric

40 Examples

41 Reliability methods

42 Standard error of measurement

The amount of variation in the scores would be directly related to the reliability of
the assessment procedures
Low reliability by large variations in the student’s assessment results
High reliability by little variation from one assessment to another

To estimate the amount of variation to be expected in the scores

8
3/20/2025


To estimate the amount of variation to be expected in the scores
Standard error of measurement

The standard error of measurement is the standard deviation of the errors of
measurement

When the standard error of measurement is small, the confidence band is narrow
(indicating high reliability)
Greater confidence that the obtained score is near the true score

A teacher who is aware of the standard error of measurement realizes that it is
impossible to be dogmatic in interpreting minor differences in assessment scores

43 Standard error of measurement

44 Factors influencing reliability measures

Number of assessment tasks
The larger the number of assessment tasks (e.g. questions) on an assessment, the
higher its reliability will be

Spread of scores
The larger the spread of scores, the higher the estimate of reliability
Individuals stay in the same relative position in a group from one assessment to
another

Objectivity
Degree to which equally competent scorers obtain the same results
Objectivity can be increased by careful phrasing of the questions and by a
standard set of rules for scoring
45 Usability
Ease of administration
Easy directions? Complicated directions? Requires expertise to implement?

Time required for administration
Allot as much time needed to obtain valid and reliable scores, not more

Ease of interpretation and application
If misinterpreted, there is no use and may even be harmful to some individual or
group

Availability of equivalent forms or comparable forms
Can also be useful in measuring development

Cost of testing
To save money, one should not prefer tests with lower validity and reliability
estimates

9
3/20/2025

To save money, one should not prefer tests with lower validity and reliability
estimates



Assessment in Learning II PPT 2 (Principles of High Quality Assessment)
No ratings yet
Assessment in Learning II PPT 2 (Principles of High Quality Assessment)
67 pages
Ethz Thesis Template
100% (3)
Ethz Thesis Template
7 pages
Review of Related Literature About Beauty Salon
50% (2)
Review of Related Literature About Beauty Salon
7 pages
Types of Validity
No ratings yet
Types of Validity
6 pages
Table of Specification
100% (3)
Table of Specification
8 pages
Q1 Grade 9 HEALTH DLL Week 2 PDF
100% (1)
Q1 Grade 9 HEALTH DLL Week 2 PDF
8 pages
Dissertation Posters Examples
100% (2)
Dissertation Posters Examples
5 pages
EDUC3 Module2
No ratings yet
EDUC3 Module2
5 pages
Validity
No ratings yet
Validity
16 pages
Video Game Development As Public History
No ratings yet
Video Game Development As Public History
34 pages
From Literacy
No ratings yet
From Literacy
12 pages
Validity
No ratings yet
Validity
27 pages
Assumptions in Linear Regression
No ratings yet
Assumptions in Linear Regression
3 pages
Learning Objectives Analysis Report
No ratings yet
Learning Objectives Analysis Report
4 pages
Azarnoosh Et Al, 2016
No ratings yet
Azarnoosh Et Al, 2016
7 pages
PPT05 - Hypothesis Test For Mean - One Sample
No ratings yet
PPT05 - Hypothesis Test For Mean - One Sample
31 pages
Chinese Universities CSC Cybersecurity
No ratings yet
Chinese Universities CSC Cybersecurity
3 pages
Abib
No ratings yet
Abib
3 pages
Local Vs Glocal
No ratings yet
Local Vs Glocal
1 page
Validity
No ratings yet
Validity
33 pages
Aremu A.O Assignment 5
No ratings yet
Aremu A.O Assignment 5
6 pages
El-Angbawi Et Al-2015-Cochrane Database of Systematic Reviews
No ratings yet
El-Angbawi Et Al-2015-Cochrane Database of Systematic Reviews
30 pages
Classroom Assessment
No ratings yet
Classroom Assessment
16 pages
Zwick 2015
No ratings yet
Zwick 2015
16 pages
PrinciplesofAssessment Properties of Assessment Methods
No ratings yet
PrinciplesofAssessment Properties of Assessment Methods
45 pages
Project Based Learning Action Research 2023 New
No ratings yet
Project Based Learning Action Research 2023 New
26 pages
Midterm Educ-3 Talo
No ratings yet
Midterm Educ-3 Talo
13 pages
Validity and Reliability
100% (4)
Validity and Reliability
19 pages
Assessment
No ratings yet
Assessment
192 pages
Al1 Final Reviewer
No ratings yet
Al1 Final Reviewer
170 pages
Liceo de Masbate: in The Service of God and The Poor!
No ratings yet
Liceo de Masbate: in The Service of God and The Poor!
4 pages
Babi
No ratings yet
Babi
6 pages
Materials and Content Development in English Coursebook
No ratings yet
Materials and Content Development in English Coursebook
11 pages
Assessment of Learning-1
No ratings yet
Assessment of Learning-1
24 pages
L2-Intro To Assessment
No ratings yet
L2-Intro To Assessment
21 pages
Gian Brochure Iitm 2017 171003k08
No ratings yet
Gian Brochure Iitm 2017 171003k08
2 pages
What Is Test
No ratings yet
What Is Test
35 pages
Personal DevelopmentQ3W5.significant People
No ratings yet
Personal DevelopmentQ3W5.significant People
86 pages
Assessment As An Integral Part of Teaching
No ratings yet
Assessment As An Integral Part of Teaching
46 pages
Ea Notes 2024 Final
No ratings yet
Ea Notes 2024 Final
55 pages
Using ChatGPT For Second Language Writing
No ratings yet
Using ChatGPT For Second Language Writing
3 pages
ERAS CV Guide by Heba Bader
100% (1)
ERAS CV Guide by Heba Bader
10 pages
KNUC Post-Graduate Internship On Economic and Political Development
No ratings yet
KNUC Post-Graduate Internship On Economic and Political Development
6 pages
Manufacturing Scale Up Drugs and Biologics
No ratings yet
Manufacturing Scale Up Drugs and Biologics
5 pages
Principles of High Quality Assessment
No ratings yet
Principles of High Quality Assessment
31 pages
The Internship Project
No ratings yet
The Internship Project
7 pages
Educ 116 PDF
No ratings yet
Educ 116 PDF
29 pages
Iot Trainer Kit Training For Vocational School Teachers As Preparation Towards The 4.0 Industry Era
No ratings yet
Iot Trainer Kit Training For Vocational School Teachers As Preparation Towards The 4.0 Industry Era
17 pages
Crag in 2004
No ratings yet
Crag in 2004
9 pages
Midterm Exam Reviewer
No ratings yet
Midterm Exam Reviewer
13 pages
Principles of High Quality Assessment and Reliability
No ratings yet
Principles of High Quality Assessment and Reliability
49 pages
Lecture Notes On Characteristics of Tests
No ratings yet
Lecture Notes On Characteristics of Tests
10 pages
Introduction To Assessment
No ratings yet
Introduction To Assessment
49 pages
Chapter 4 Principles of Classroom Assessment
No ratings yet
Chapter 4 Principles of Classroom Assessment
9 pages
Productivity Value Chain Analysis of Cassava in TH
No ratings yet
Productivity Value Chain Analysis of Cassava in TH
9 pages
Aeronautical Engineering Tentative Scheme PDF
No ratings yet
Aeronautical Engineering Tentative Scheme PDF
6 pages
Assessment
No ratings yet
Assessment
2 pages
Educational Measurement and Evaluation
No ratings yet
Educational Measurement and Evaluation
4 pages
Understanding Assessment
No ratings yet
Understanding Assessment
29 pages
8602 Assignment No 2
No ratings yet
8602 Assignment No 2
23 pages
Assessment and Evaluation in Education - Teacher Note Forthe Midterm
No ratings yet
Assessment and Evaluation in Education - Teacher Note Forthe Midterm
8 pages
EVALUATION
No ratings yet
EVALUATION
11 pages
8602 Assignment
No ratings yet
8602 Assignment
30 pages
Muhammad Nadeem 0000748844 8602 B.ED (1.5 YEARS) SPRING 2024 1 2
No ratings yet
Muhammad Nadeem 0000748844 8602 B.ED (1.5 YEARS) SPRING 2024 1 2
22 pages
Labeeb 0000758542 8602 B.ED (1.5 YEARS) SPRING 2024 1 2: Educational Assessment and Evaluation
No ratings yet
Labeeb 0000758542 8602 B.ED (1.5 YEARS) SPRING 2024 1 2: Educational Assessment and Evaluation
22 pages
An Overview of A Well-Structured Essay: Created By: Darren Chiang-Schultheiss English Department Fullerton College
No ratings yet
An Overview of A Well-Structured Essay: Created By: Darren Chiang-Schultheiss English Department Fullerton College
11 pages
Vocabulary
No ratings yet
Vocabulary
14 pages
Anum Saddique 0000762728 8602 B.ED (1.5 YEARS) SPRING 2024 1 2
No ratings yet
Anum Saddique 0000762728 8602 B.ED (1.5 YEARS) SPRING 2024 1 2
22 pages
Educational Assessment and Evaluation 8602
No ratings yet
Educational Assessment and Evaluation 8602
18 pages
Principles of Learning - Defining Learning Targets
No ratings yet
Principles of Learning - Defining Learning Targets
25 pages
Student Name:: Anum Saddique
No ratings yet
Student Name:: Anum Saddique
24 pages
Unit Iii - Designing and Developing Assessments: Let's Read These
No ratings yet
Unit Iii - Designing and Developing Assessments: Let's Read These
21 pages
Javiria Shafiq 8602-02
No ratings yet
Javiria Shafiq 8602-02
17 pages
6406 Classroom Assessment Assignment 2
No ratings yet
6406 Classroom Assessment Assignment 2
8 pages
Lesson 5 Criteria To Consider When Constructing Good Test Items
No ratings yet
Lesson 5 Criteria To Consider When Constructing Good Test Items
22 pages
Week 1: My Learning Essentials
No ratings yet
Week 1: My Learning Essentials
7 pages
Qualities or Characteristics Desired in An Assessment Instrument
No ratings yet
Qualities or Characteristics Desired in An Assessment Instrument
7 pages
Muhammad Abbas 0000759749 8602 B.ED (1.5 YEARS) SPRING 2024 1 2
No ratings yet
Muhammad Abbas 0000759749 8602 B.ED (1.5 YEARS) SPRING 2024 1 2
22 pages
ACVP Phase II Candidate Hand
No ratings yet
ACVP Phase II Candidate Hand
15 pages
Unit Iii - Designing and Developing Assessments
No ratings yet
Unit Iii - Designing and Developing Assessments
5 pages
8602 Assignment
No ratings yet
8602 Assignment
30 pages
8602 2nd Mohsin
No ratings yet
8602 2nd Mohsin
22 pages
Activity 3 Assessment
No ratings yet
Activity 3 Assessment
8 pages
Assessment: Center of Excellence For Teacher Education
No ratings yet
Assessment: Center of Excellence For Teacher Education
7 pages
Edexcel GCE: Specimen Paper Time: 1 Hour 30 Minutes
No ratings yet
Edexcel GCE: Specimen Paper Time: 1 Hour 30 Minutes
5 pages
DoE R
No ratings yet
DoE R
17 pages
Validity
No ratings yet
Validity
48 pages
Axis Bank
No ratings yet
Axis Bank
28 pages
Thesis A
No ratings yet
Thesis A
77 pages
Lecture
No ratings yet
Lecture
14 pages
Validity Refers To How Well A Test Measures What It Is Purported To Measure
No ratings yet
Validity Refers To How Well A Test Measures What It Is Purported To Measure
6 pages
ASSESSMENT
No ratings yet
ASSESSMENT
6 pages
Validity and Reliability: Purpose of Tests
No ratings yet
Validity and Reliability: Purpose of Tests
19 pages
Validity and Reliability
No ratings yet
Validity and Reliability
19 pages
CBRC My Review Prof Ed 2.5
No ratings yet
CBRC My Review Prof Ed 2.5
3 pages
Meeting the Assessment Requirements of the Award in Education and Training
From Everand
Meeting the Assessment Requirements of the Award in Education and Training
Nabeel Zaidi
No ratings yet
PMP Exam Companion
From Everand
PMP Exam Companion
SUJAN
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Objectives

Uploaded by

Objectives

Uploaded by

3/20/2025

1 VALIDITY, RELIABILTY AND USABILITY

2 Essential assessment charactertics

3 Validity and reliability

4 Validity and reliability

5 Validity and reliability

E.g. knowledge of 200 words

16 Methods used in construct validation

Comparing the scores of known groups

22 Factors influencing validity

23 Factors influencing validity

It does not tell anything about long-term stability

42 Standard error of measurement

43 Standard error of measurement

44 Factors influencing reliability measures

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.