PSYCHOLOGICAL ASSESSMENT CHAPTER 3-6 (Summary) With
PSYCHOLOGICAL ASSESSMENT CHAPTER 3-6 (Summary) With
Measures of Variability
Variability is an indication of how scores
in a distribution are scattered or
dispersed.
Meta-Analysis
- techniques used to statistically
combine information across studies
- Effect size: estimates derived from meta-
analysis
Criterion-Referenced (Domain-Referenced): method
Assumptions about Psychological Testing of evaluation and a way of deriving meaning from test
and Assessment scores by evaluating an
1. Psychological traits and states exist individual's test score with reference to a set
2. Psychological traits and states can be standard.
quantified and
measured Norm-Referenced: a method of evaluation and a way
3. Test-related behavior predicts non-test related of deriving meaning from test scores by evaluating an
behavior individual's test score and comparing it to scores of
4. Tests and other measurement techniques have group of testtakers on the same test.
strengths
and weaknesses SAMPLING TO DEVELOP NORMS
5. Various sources of error are part of the Population- set of individuals with at least one
assessment process common, observable characteristic.
6. Testing and assessment can be conducted fair
and unbiased Sample- representative of the whole population.
manner
7. Testing and assessment benefit society Sampling- the process of selection the portion
of the universe deemed to be representative of
Trait- any distinguishable, relatively enduring the whole population.
way in which one individual varies from another.
Types of sampling:
States- relatively less enduring Psychological 1. Stratified sampling- researchers divide subjects
traits exist as constructs from overt into
behavior. subgroups called strata based on characteristics that
they share.
Norms- used as a frame of reference for
interpreting test scores. 2. Convenience/ Incidental sampling- is a non-
probability sampling method where units are selected
Normative sample or norm group- group of for inclusion in the sample because they are the
people presumed to be representative of the easiest for the researcher to access.
universe of people
3. Purposive sampling- is a non-probability sampling
Norming- process of deriving norms. in which researchers rely on their own judgment
when choosing members of the population to
User Norms/ Program Norms- consist of participate in their surveys.
descriptive statistics based on a group of test-
takers in a given period rather than norms 4. Snowball sampling- is a nonprobability sampling
obtained by formal sampling methods. technique where existing study subjects recruit future
subjects from among their acquaintances.
Standardization- the process of administering
a test to a representative sample of test-takers to
5. Clustered sampling- researchers divide a
establish norms.
population into smaller groups known as clusters.
They then randomly select among these clusters to
form a sample.
Type of Norms
1. Percentiles- expression of the percentage of people whose score on a test or measure falls below a
particular raw score. Ex. you got a percentile of 75 in your math test. This means your score is better
than the 75% who took the exam.
- Percentage Correct: an expression of the number of items answered correctly, multiplied by 100 and
divided by the total number of items.
2. Age norms- also known as age-equivalent scores- dn
3. Grade norms- dn
o developmental norms-- developed based on any trait, ability, skill, or other characteristic that is
presumed to be developed, deteriorate, or be affected by chronological age, school grade, or
stage of life.
4. National norms-
5. National anchor norms- provide some stability to test scores by anchoring them to other test scores.
- Equipercentile method- equivalency of scores
on different tests is calculated concerning corresponding percentile scores.
6. Subgroup norms Ex. age, socio-economics, geographic region, race
7. Local norms
Fixed Reference Group Scoring System- The distribution of scores obtained on the test from one group of test-
takers (fixed reference group) is used as the basis for the calculation of test scores for future administrations of
the test.
3. Internal Consistency/ Inter-item Consistency- refers to the degree of correlation among all the items on a
scale.
- It its useful in assessing the homogeneity of a test.
- Homogeneity- refers to the extent to which items in a scale measure a single trait.
- Heterogeneity- the degree to which a test measures different factors.
Methods:
4. Split- Half Reliability- obtained by correlating two pairs from equivalent halves of a single test
administered once.
•odd-even reliability
• Spearman-brown formula
Other methods of estimating Internal Consistency:
➢ Kuder-Richardson formulas (KR-20/ KR-21)
- By G. Frederic Kuder and M.W. Richardson
- KR-20- highly homogenous & dichotomous items
- KR-21- simplified version KR-20, assuming items have the same difficulty
Criterion Contamination- a situation in which a response measure (the criterion) is influenced by factors
that are not related to the concept being measured.
- also applies to a criterion measure that has been based, at least in part, on predictor measures.
o Concurrent Validity- is an index of the degree to which a test score is related to some criterion measure
obtained at the same time.
Predictive Validity- is an index of the degree to which a test score predicts some criterion measure.
- base rate - false positive -false negative -hit rate - miss rate
Incremental Validity- the degree to which an additional predictor explains something about the criterion
measure that is not explained by predictors already in use.
Construct Validity- judgement about the appropriateness of inferences drawn from test scores regarding
individual standings on a variable called a construct.
• construct: informed, scientific idea developed or hypothesized to describe or explain behavior.
• “umbrella validity”
Evidence of Construct Validity
1. Evidence of Homogeneity- refers to how uniform a test is in measuring a single concept.
o Statistical tools used to test homogeneity: Pearson R,
Spearman Rho, Coefficient Alpha
2. Evidence of changes with age- some constructs are expected to change over time.
3. Evidence of pretest- posttest changes
- Evidence that test scores change as a result of some experience between a pretest and posttest.
4. Evidence from distinct groups
- Scores on the test vary in a predictable way as a function of membership in some group.
5. Convergent validity- if scores on the test correlates to other test that measures the same construct.
6. Discriminant validity- tests whether constructs that are not supposed to be related are actually unrelated.
Factor Analysis
- In psychometric research, it is a data reduction method in which several sets of scores and correlations between
them are analyzed.
- Its purpose is to identify factor or factors in common between test scores on subscales within a particular test,
or the factors in common between scores on a series of tests.
o Exploratory Factor Analysis- estimating, or extracting factors; deciding how many factors to retain; and
rotating factors to an interpretable orientation.
o Confirmatory Factor Analysis- researches test the degree to which a hypothetical model fits the actual
data.
o Factor Loading- conveys information about the text towhich the factor determines the test scores or
scores.
Validity Coefficient- is a correlation coefficient that provides a measure of the relationship between test scores
and scores on the criterion measure.
- Typically, Pearson correlation coefficient is used to determine the validity between two measures.
How high should the coefficient of validity be?
• There are no rules for determining the minimum acceptable size of a validity coefficient.
• Validity coefficients need to be large enough to enable the test user to make accurate decisions within
the unique context in which a test is being used.
Test Bias
• It is a factor inherent in attest that systematically prevents accurate, impartial measurement.
• Implies systematic variation
• It can be justified if some portion of its variance stems from some factor/s that are irrelevant to
performance on the criterion measure.
TYPES OF RATING ERROR
Generosity error- the tendency of the rater to be lenient in rating.
Central tendency error- is the tendency of the rater to be reluctant in giving extreme scores, so the rating falls
in the middle of the rating continuum.
Severity error- is the tendency of the rater to be too strict in rating so the rating falls in the low end of the rating
continuum.
Halo effect- is the tendency of the rater to give higher scores more than what the ratee deserves because he
fails to discriminate between independent aspects of the ratee's behavior.
Test Fairness
• The extent to which a test is used in an impartial, just, and equitable way.
• Society strives for fairness in test use by means of legislation, judicial decisions, and administrative
regulations.