0% found this document useful (0 votes)
29 views48 pages

Of Tests and Testing

The document discusses psychological testing and measurement. It covers topics like the three-tier system of psychological tests, types of tests, assumptions about testing, what makes a good test, and everyday considerations in psychometrics. The document provides information on different kinds of tests as well as reliability, validity, guidelines, and cost-effectiveness in testing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views48 pages

Of Tests and Testing

The document discusses psychological testing and measurement. It covers topics like the three-tier system of psychological tests, types of tests, assumptions about testing, what makes a good test, and everyday considerations in psychometrics. The document provides information on different kinds of tests as well as reliability, validity, guidelines, and cost-effectiveness in testing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

The Science of

Psychological
Measurement: Of
Tests and Testing
Prepared and Presented By:

Ms. Giselle C. Honrado


Content

● Three-Tier System of Psychological Tests


● Types of Tests**
● Assumptions About Psychological Testing and Assessment
● What’s a “Good Test”?
● Norms
Three-Tier System of Psychological Test
(Test User Qualifications)
•These are those that can be administered, scored and interpreted by
Level responsible non-psychologist
•Have carefully read the manual and are familiar with the overall purpose of
A testing
•Educational achievement or proficiency tests fall under this category

Level •Require some technical knowledge of test construction


•Use of appropriate advanced coursework in psychology and related course
B ● Aptitude Tests and adjustment inventories applicable to normal
populations

Level •Require advanced degree in Psychology or licensed psychologist

C •Advanced training or supervised experience in a particular test


•Example: Projective Tests , Individual Mental Tests
Types of Tests

1. Standardized and Non-Standardized


● Standardized Tests are those
instruments that have prescribed
directions for administration, scoring,
and interpretation.
● Non Standardized Tests are exemplified
by teacher-made tests either for
formative or summative evaluation of
student performance.
Types of Tests

2. Norm-referenced and Criterion- referenced


Tests
● Norm-referenced Tests are those
instruments whose score interpretation is
based on the performance of a particular
group.
● Criterion-referenced Tests are measures
whose criteria for passing or failing have
been decided beforehand. The individual’s
score is interpreted in absolute terms.
Types of Tests
3. Objective and Subjective Tests
● Objective Tests are those instruments that do
not involve any personal sharing. Usually these
are used to measure achievement like multiple
choice tests, True or False test. Provide
consistency in administration to ensure
freedom from the examiner’s own beliefs and
biases.
● Subjective Test are measures that are sensitive
to rate and examiner beliefs. They employ
open-ended questions, which have more than
one correct answer or way of expressing the
correct answer (essay questions).
Types of Tests
4. Power and Speed Tests
● Power Tests limit perfect scores by
including difficult test items that few
individuals can answer correctly. These
tests measure how well test takers can
perform given items of varying difficulty
regardless of time or speed of response.
● Speed Test use limited testing time to
prevent perfect scores. These tests have
easy questions but include too many items
to answer in the allotted time.
Types of Tests
5. Verbal and Non Verbal Tests
● Verbal Tests are measures that involve
words.
● Non Verbal Test are instruments that do
not /minimal use of words; rather they use
drawings or patterns or diagrams.
Types of Tests
6. Spiral and Cyclical Tests
● Spiral Test are measures that have progressive level of
difficulty.
○ Type of intelligence assessment which the
focused themes are distributed throughout the
test and become increasingly difficult as the test
progresses. (Reference: APA Dictionary of Psychology)
● Cyclical Tests are tests that have several sections
which are spiral in nature.
○ Tests with several sections that are spiral and the
items become progressively more difficult as one
moves through the test.
○ Nearly all intelligence testing sections are spiral
tests and the intelligence test themselves are
cyclical. (Reference: tests.com)
Assumptions About Psychological
Testing and Assessment

Read your reference Psychological Testing by


Cohen and Swerdlik p. 115-121

And create CONCEPT MAPS.


What’s a “Good Test”?
Two Criteria for Psychometric Soundness of Tests:

● Reliability
○ involves the consistency of the measuring tool: the precision with
which the test measures and the extent to which error is present
in measurements.
○ yields the same numerical measurement every time it measures
the same thing under the same conditions.
What’s a “Good Test”?
Two Criteria for Psychometric Soundness of Tests:

● Validity
○ Test measures what it purports to measure.
○ Questions related to validity:
■ Do the items adequately sample the range of areas that must be sampled to
adequately measure the construct?
■ How do individual items contribute to or detract from the test’s validity?
■ What do these scores really tell us about the targeted construct? How are
high scores on the test related to test takers’ behavior? How are low scores
on the test related to test takers’ behavior? How do scores on this test
relate to scores on other tests purporting to measure the same construct?
How do scores on this test relate to scores on other tests purporting to
measure opposite types of constructs?
What’s a “Good Test”?
Other Considerations

A good test is one that:

● trained examiners can administer, score, and interpret with a minimum of


difficulty.
● one that yields actionable results that will ultimately benefit individual test
takers or society at large (*Everyday Psychometrics)
● contains adequate norms (normative data); norms provide a standard with
which the results of measurement can be compared
Everyday Psychometrics

Putting Tests to the Test


1. Why Use This Particular Instrument or Method?
● What is the objective of using a test and how well does the test under
consideration meet that objective?
● Who is this test designed for use with (age of test takers? reading level? etc.)
and how appropriate is it for the targeted test takers?
● How is what the test measures defined?
● What type of data will be generated from using this test, and what other
types of data will it be necessary to generate if this test is used? Do alternate
forms of this test exist?
Everyday Psychometrics

Putting Tests to the Test


2. Are There Any Published Guidelines for the Use of This Test?

● Published guidelines involve use of tests and measurement techniques,


measurement tools
● Example: child custody decision
○ (1) the assessment of parenting capacity, (2) the assessment of psychological and
developmental needs of the child, and (3) the assessment of the goodness of fit
between the parent’s capacity and the child’s needs.
○ an educated opinion about who should be awarded custody can be arrived at only
after evaluating (1) the parents (or others seeking custody), (2) the child, and (3) the
goodness of fit between the needs and capacity of each of the parties.
Everyday Psychometrics

Putting Tests to the Test


3. Is This Instrument Reliable?

● Determining whether a particular instrument is reliable starts with a careful


reading of the test’s manual and of published research on the test, test
reviews, and related sources. But it doesn’t end here.
● Measuring reliability is not always a straightforward matter.
○ Test-retest reliability - ex. Consistency of perception
Everyday Psychometrics

Putting Tests to the Test


4. Is This Instrument Valid?

● Determining whether a particular instrument is valid starts with a careful


reading of the test’s manual as well as published research on the test, test
reviews, and related sources.
● Questions related to the validity of a test can be complex and colored more
in shades of gray than black or white
● Starts as research to determine the validity of an individual instrument for a
particular objective may end with research as to which combination of
instruments will best achieve the objective.
Everyday Psychometrics

Putting Tests to the Test


5. Is This Instrument Cost-Effective?

● Group measures of intelligence could be administered quickly and address


the needs more efficiently than individual administered test.
● It could be said that group tests had greater utility than individual tests.
Everyday Psychometrics

Putting Tests to the Test


6. What Inferences May Reasonably Be Made from This Test Score, and How
Generalizable Are the Findings?

● Intimately related to considerations regarding the inferences that can be


made are those regarding the generalizability of the findings. (Norms -
population of people used to help develop a test)
● Other factors that may affect the generalizability of test findings:
○ How items on a test are worded
○ How a test was administered (explicit directions for testing conditions and
test administration procedures)
○ Culture (taken into account in terms of administration, scoring, and
interpretation of any test)
You are now aware of the types of questions
experts ask when evaluating tests.

What’s a good test?

● Seems to be a simple question but it doesn’t necessarily have simple answers


Norms

Norm-Referenced Testing and Assessment

● Method of evaluation and a way of deriving meaning from test scores by


evaluating an individual test taker’s score and comparing it to scores of a
group of test takers.
○ The meaning of an individual test score is understood relative to other scores on
the same test.
● Common Goal of NRT: To yield information on a test taker’s standing ot
ranking relative to some comparison group of test takers.
Norms
Norm

● Refer to the behavior that is usual, average, normal, standard, expected, or typical.

Norms

● Psychometrics context: the test performance data of a particular group of test


takers that are designed for use as a reference when evaluating or interpreting
individual test scores.

Normative Sample

● Group of people whose performance on a particular test is analyzed for reference


in evaluating the performance of individual test takers.
Norms
Norming

● Refer to the process of deriving norms


● May be modified to describe a particular type of norm derivation
○ Race Norming - controversial practice of norming on the basis of race or
ethnic background

User Norms or Program Norms

● “Consist of descriptive statistics based on a group of test takers in a given period


of time rather than norms obtained by formal sampling methods” (Nelson, 1994)
Norms: Sampling to Develop Norms

Standardization or Test Standardization

● The process of administering a test (scoring and interpreting) to a


representative sample of test takers for the purpose of establishing norms
● Standardized: it has clearly specified procedures for administration and
scoring, typically including normative data.
Norms: Sampling to Develop Norms

Question #1: How is sampling done in the process of developing a test?

1 2 3
Norms: Sampling to Develop Norms
Question #1: How is sampling done in the process of developing a test?

Sampling Methods / Techniques:

● Stratified Sampling
● Stratified-Random Sampling
● Purposive Sampling
● Incidental or Convenience Sampling

Other Techniques:

● Clustered sampling
● Systematic Sampling
● Quota Sampling
● Snowball
Sampling Techniques
● Probability or Random
○ Simple Random Sampling
○ Systematic Sampling
○ Stratified Sampling
○ Cluster Sampling
● Non-Probability or Non-Random
○ Convenience Sampling
○ Quota Sampling
○ Panel Sampling
○ Judgmental Sampling
○ Snowball Sampling
Norms: Sampling to Develop Norms
Norms: Sampling to Develop Norms
Question #2: What are the steps to take in developing norms for a standardized test?

1. Sampling
2. Test Developer administers the test according to the standard set of instructions
3. Test Developer describes the recommended setting for giving the test
4. Establish a standard set of instructions and conditions under which the test is
given
5. Collect and analyze the data
Norms: Sampling to Develop Norms
Question #2: What are the steps to take in developing norms for a standardized test?

6. Summarize the data using descriptive statistics (measures of central tendency and
measures of variability)

7. Provide a precise description of the standardization sample itself

8. Develop norms with data derived from a group of people who are presumed to be
representative of the people who will take the test in the future.

● “Provide information to support recommended interpretations of the


results, including the nature of the content, norms or comparison groups,
and other technical evidence” (Code of Fair Testing Practices in Education,
2004)
Types of Norms
Percentiles

● A ranking that conveys


information about the relative
position of a score within a
distribution of scores
● An expression of the percentage
of people whose score on a test
or measure falls below a
particular raw score
Types of Norms
Age Norms

● Also known as age-equivalent


scores
● Indicates the average performance
of different samples of test takers
who were at various stages at the
time the test was administered.
Types of Norms
Grade Norms

● Designed to indicate the average


test performance of test takers in
a given school grade
● Developed by administering the
test to representative samples of
children over a range of
consecutive grade levels
● Disadvantage: useful only with
respect to years and months of
schooling completed
Types of Norms
National Norms

● Derived from a normative


sample that was nationally
representative of the
population at the time the
norming study was conducted.
Types of Norms
National Anchor Norms

● An equivalency table for scores on


two nationally standardized tests
designed to measure the same
thing
● Provide some stability to test
scores by anchoring them to other
test scores
● Use equipercentile method or the
equivalency of scores on different
tests is calculated with reference
to corresponding percentile scores
Types of Norms
Subgroup Norms

● a normative sample can be


segmented by any of the criteria
initially used in selecting subjects
(eg. age, socio-economics,
geographic region, race) the
segmentation are more narrowly
defined subgroup norms.
Types of Norms
Local Norms

● Provide normative information


with respect to the local
population’s performance on
some test
● Test users may wish to evaluate
scores on the basis of reference
groups drawn from a specific
geographic or institutional setting
Types of Norms

Standard Score Norms

● Express individual’s distance


from the mean in terms of the
standard deviation of the
distribution. Standard score is a
derived score which has a fixed
mean and a fixed standard
deviation
● Z-scores, T-scores, Stanine,
Deviation IQ
Fixed Reference Group Scoring
Systems

● The distribution of scores obtained


on the test from one group of test
takers (fixed reference group) is
used as the basis for the
calculation of test scores for future
administrations of the test.
Norm-Referenced vs. Criterion-
Referenced Evaluation

● Norm-referenced evaluation -
evaluate test score in relation to
other scores on the same test.
● A type of standardized test that
compares students’ performances
to one another.
Norm-Referenced vs. Criterion-
Referenced Evaluation

● Criterion - a standard on which a


judgment or decision may be based
● Criterion-referenced evaluation -
method of evaluation and a way of
deriving meaning from test scores by
evaluating an individual’s score with
reference to a set standard.
Norm-Referenced vs. Criterion-
Referenced Evaluation

● Criterion-referenced evaluation -
another name is domain- or content-
referenced testing and assessment
● See footnote page 136
Norm-Referenced vs. Criterion-
Referenced Evaluation
● Although acknowledging that content-referenced interpretations can be referred to as
criterion-referenced interpretations, the 1974 edition of the Standards for Educational and
Psychological Testing also noted a technical distinction between interpretations so
designated: “Content-referenced interpretations are those where the score is directly
interpreted in terms of performance at each point on the achievement continuum being
measured. Criterion-referenced interpretations are those where the score is directly
interpreted in terms of performance at any given point on the continuum of an external
variable. An external criterion variable might be grade averages or levels of job performance”
(p. 19; footnote in original omitted).
Norm-Referenced vs. Criterion-Referenced
Evaluation
● Norm-referenced interpretation - a usual area
of focus is how an individual performed
relative to other people who took the test.
● Criterion-referenced interpretation - usual
area of focus is the test taker’s performance:
can or cannot do; has or has not learned; does
or does not meet specified criteria
● specify and test what about the social and
cultural world matters to avoid making
inferences based on group labels
associated with ethnicity or race.
● used broadly to refer to assessments that
Culturally incorporate students' cultures, four distinct

Informed terms are commonly used when


discussing culturally informed
Assessment assessments:
○ Culturally Sensitive assessments
recognize that cultural differences
and similarities between students
exist without assigning them a value,
reflecting a broad range of cultural
identities and practices.
○ Culturally Relevant assessments take
students' everyday lived cultural experiences
and link them to intended learning,
representing the cultural values and
differences of the test-taking population.
○ Culturally Responsive assessments provide
students flexibility during assessments so that
Culturally students have opportunities to bring their own
cultural references and fluencies into
Informed demonstrations of achievement.

Assessment ○ Culturally Sustaining assessments include


students as part of the design process,
becoming a demonstration of students'
heritage and community cultural practices.
○ Reference; Malick, S. (2020). Using culturally responsive practices to foster
learning during school closures: Challenges and opportunities for equity.
[Blog]. Regional Educational Laboratory Mid-Atlantic.
https://ies.ed.gov/ncee/ edlabs/regions/midatlantic/app/Blog/Post/1031
END OF THE
CHAPTER

Get ready for a


Long Quiz.

Thank you!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy