Of Tests and Testing
Of Tests and Testing
Psychological
Measurement: Of
Tests and Testing
Prepared and Presented By:
● Reliability
○ involves the consistency of the measuring tool: the precision with
which the test measures and the extent to which error is present
in measurements.
○ yields the same numerical measurement every time it measures
the same thing under the same conditions.
What’s a “Good Test”?
Two Criteria for Psychometric Soundness of Tests:
● Validity
○ Test measures what it purports to measure.
○ Questions related to validity:
■ Do the items adequately sample the range of areas that must be sampled to
adequately measure the construct?
■ How do individual items contribute to or detract from the test’s validity?
■ What do these scores really tell us about the targeted construct? How are
high scores on the test related to test takers’ behavior? How are low scores
on the test related to test takers’ behavior? How do scores on this test
relate to scores on other tests purporting to measure the same construct?
How do scores on this test relate to scores on other tests purporting to
measure opposite types of constructs?
What’s a “Good Test”?
Other Considerations
● Refer to the behavior that is usual, average, normal, standard, expected, or typical.
Norms
Normative Sample
1 2 3
Norms: Sampling to Develop Norms
Question #1: How is sampling done in the process of developing a test?
● Stratified Sampling
● Stratified-Random Sampling
● Purposive Sampling
● Incidental or Convenience Sampling
Other Techniques:
● Clustered sampling
● Systematic Sampling
● Quota Sampling
● Snowball
Sampling Techniques
● Probability or Random
○ Simple Random Sampling
○ Systematic Sampling
○ Stratified Sampling
○ Cluster Sampling
● Non-Probability or Non-Random
○ Convenience Sampling
○ Quota Sampling
○ Panel Sampling
○ Judgmental Sampling
○ Snowball Sampling
Norms: Sampling to Develop Norms
Norms: Sampling to Develop Norms
Question #2: What are the steps to take in developing norms for a standardized test?
1. Sampling
2. Test Developer administers the test according to the standard set of instructions
3. Test Developer describes the recommended setting for giving the test
4. Establish a standard set of instructions and conditions under which the test is
given
5. Collect and analyze the data
Norms: Sampling to Develop Norms
Question #2: What are the steps to take in developing norms for a standardized test?
6. Summarize the data using descriptive statistics (measures of central tendency and
measures of variability)
8. Develop norms with data derived from a group of people who are presumed to be
representative of the people who will take the test in the future.
● Norm-referenced evaluation -
evaluate test score in relation to
other scores on the same test.
● A type of standardized test that
compares students’ performances
to one another.
Norm-Referenced vs. Criterion-
Referenced Evaluation
● Criterion-referenced evaluation -
another name is domain- or content-
referenced testing and assessment
● See footnote page 136
Norm-Referenced vs. Criterion-
Referenced Evaluation
● Although acknowledging that content-referenced interpretations can be referred to as
criterion-referenced interpretations, the 1974 edition of the Standards for Educational and
Psychological Testing also noted a technical distinction between interpretations so
designated: “Content-referenced interpretations are those where the score is directly
interpreted in terms of performance at each point on the achievement continuum being
measured. Criterion-referenced interpretations are those where the score is directly
interpreted in terms of performance at any given point on the continuum of an external
variable. An external criterion variable might be grade averages or levels of job performance”
(p. 19; footnote in original omitted).
Norm-Referenced vs. Criterion-Referenced
Evaluation
● Norm-referenced interpretation - a usual area
of focus is how an individual performed
relative to other people who took the test.
● Criterion-referenced interpretation - usual
area of focus is the test taker’s performance:
can or cannot do; has or has not learned; does
or does not meet specified criteria
● specify and test what about the social and
cultural world matters to avoid making
inferences based on group labels
associated with ethnicity or race.
● used broadly to refer to assessments that
Culturally incorporate students' cultures, four distinct
Thank you!