0% found this document useful (0 votes)
187 views11 pages

PSYCHOLOGICAL ASSESSMENT CHAPTER 3-6 (Summary) With

Uploaded by

Beverly Sanico
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
187 views11 pages

PSYCHOLOGICAL ASSESSMENT CHAPTER 3-6 (Summary) With

Uploaded by

Beverly Sanico
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

PSYCHOLOGICAL ASSESSMENT CHAPTER 3-6 (summary)

Chapter 3-4: Statistics Refresher


Measurement- the act of assigning 4. Ratio Scales- all mathematical operations can
numbers or symbols to characteristics of be performed.
things according to rules Properties: Identity, Magnitude, Equal
Scale- is a set of numbers whose Interval and Absolute Zero
properties model empirical properties of the objects Ex. height, weight, Kelvin
to which the numbers are
assigned. Describing Data
Distribution- as a set of test scores
Discrete scale- contains numeric data arrayed for recording or study.
that have a finite number of possible
values and can only be whole numbers. Raw Score- is a straightforward,
unmodified counting of performance that
Continuous Scale- are numerical data is usually numerical.
that can theoretically be measured in Ex. Number of items responded correctly
infinitely small units. With decimals on an achievement test
Error- it is the collective influence of all
of the factors on a test score or Frequency Distribution- all scores are
measurement beyond those specifically listed alongside the number of times each
measured by the test or measurement. score occurred.
Scales of Measurement (N.O.I.R) - Scores might be listed in a tabular or
1. Nominal Scales- involves in classification or graphic form.
categorization based on one or more
distinguishing characteristics, Property: Identity 1. Simple frequency distribution
Ex. Gender, Name, Religion, ID number 2. Grouped frequency distribution
2. Ordinal Scales- involves classification and
rank ordering on some characteristics. Graph- a diagram or chart composed of
Properties: Identity and Magnitude lines, points, bars, or other symbols that
Ex. Birth order, Rank in class describe and illustrate data.
3. Interval Scales- each unit on the scale is equal
3 types:
to any other unit on the scale.
1. Histogram- a graph with vertical lines
Properties: Identity, Magnitude and Equal
drawn at the true limits of each test score
Interval- no true zero
(or class interval, forming a series of
4. Ratio Scales- all mathematical operations can
continguous rectangles.
be performed. W true zero
2. Bar graph- numbers indicative of
frequency appear on the y-axis, and
reference to some categorization appears
on the x-axis.
3. Frequency Polygon- a continuous line
connecting the points where test scores or
class intervals meet frequencies.
Measures of Central Tendency Normal Curve
- A statistic that indicates the average or >“Laplace-Gaussian Curve”- Karl Firedrich Gauss
midmost score between the extreme scores >Karl Pearson was credited for being the first to refer
in a distribution. the curve as the “normal curve”
Mean- arithmetic mean or the average of the Characteristics:
given scores in a distribution. • bell-shaped
Median- middle score in a distribution. • asymptotic (both tails approach but never meet
the x-axis)
Mode- most frequently occurring score in a • symmetrical
distribution • mean=median=mode

Measures of Variability
Variability is an indication of how scores
in a distribution are scattered or
dispersed.

1. Range- the difference between the highest


score and the lowest score.

2. Interquartile Range- it is equal to the difference


between Q3 and Q1.
- There are four quarters and three quartiles in a
distribution.

3. Average Deviation- the average of the


absolute deviations from a central point.

4. Standard Deviation- equals to the square


root of the variance.

Variance- is equal to the arithmetic


mean of the squares of the differences between
the scores in a distribution and their mean.

z Scores- results from the conversion of a raw


Skewness score into a number indicating how many
(the extent to which symmetry is absent) standard deviation units the raw score is below
positive skew- most of the scores fall in the low or above the mean of the distribution.
end of the distribution Mean= 0 SD= 1

negative skew- most of the scores fall in the


high end of the
distribution
T Scores- can also be called fifty plus or minus Pearson r
ten • Pearson correlation coefficient/ Pearson product-
scale. moment coefficient of correlation.
- Mean= 50 SD= 10, devised by W.A. McCall - Karl Pearson
- It was named T score in honor of his professor - Statistical tool used when the relationship
E.L. Thorndike between the variable is linear and when the
• IQ scores two variables being correlated are continuous.
mean= 100 SD=15
• A scores (GRE and SAT) Correlation Coefficient (r)
mean= 500 SD=10 Coefficient of determination- is an indication of how
• Stanine much variance is shared by the X- and Y- variables in a
mean= 5 SD= approx. 2 correlation.
Linear Transformation: direct numerical Ex. r= 0.60 r2= 36%
relationship to the original score. (normal
distribution) Spearman Rho
- rank-order correlation coefficient/ rank-
Nonlinear transformation: data presented are difference correlation coefficient
not normally distributed yet comparisons with - Charles Spearman
normal distributions need to be made. - used when the sample size is small and
when both sets of measurements are in ordinal (rank-
Correlation Coefficient (r) order)
• It is a number that provides an index of the
strength of the Graphic Representations of Correlation
relationship between two variables. Scatterplot simple graphing of the coordinate points
• Positive Correlation- two variables for values of the x-variable and the y-variable.
simultaneously increase or
simultaneously decrease. (direct relationship)
• Negative Correlation- one variable increases
while the other
decreases. (inverse relationship)
• It does not imply causation.
• (r) ranges from -1 to +1

Meta-Analysis
- techniques used to statistically
combine information across studies
- Effect size: estimates derived from meta-
analysis
Criterion-Referenced (Domain-Referenced): method
Assumptions about Psychological Testing of evaluation and a way of deriving meaning from test
and Assessment scores by evaluating an
1. Psychological traits and states exist individual's test score with reference to a set
2. Psychological traits and states can be standard.
quantified and
measured Norm-Referenced: a method of evaluation and a way
3. Test-related behavior predicts non-test related of deriving meaning from test scores by evaluating an
behavior individual's test score and comparing it to scores of
4. Tests and other measurement techniques have group of testtakers on the same test.
strengths
and weaknesses SAMPLING TO DEVELOP NORMS
5. Various sources of error are part of the Population- set of individuals with at least one
assessment process common, observable characteristic.
6. Testing and assessment can be conducted fair
and unbiased Sample- representative of the whole population.
manner
7. Testing and assessment benefit society Sampling- the process of selection the portion
of the universe deemed to be representative of
Trait- any distinguishable, relatively enduring the whole population.
way in which one individual varies from another.
Types of sampling:
States- relatively less enduring Psychological 1. Stratified sampling- researchers divide subjects
traits exist as constructs from overt into
behavior. subgroups called strata based on characteristics that
they share.
Norms- used as a frame of reference for
interpreting test scores. 2. Convenience/ Incidental sampling- is a non-
probability sampling method where units are selected
Normative sample or norm group- group of for inclusion in the sample because they are the
people presumed to be representative of the easiest for the researcher to access.
universe of people
3. Purposive sampling- is a non-probability sampling
Norming- process of deriving norms. in which researchers rely on their own judgment
when choosing members of the population to
User Norms/ Program Norms- consist of participate in their surveys.
descriptive statistics based on a group of test-
takers in a given period rather than norms 4. Snowball sampling- is a nonprobability sampling
obtained by formal sampling methods. technique where existing study subjects recruit future
subjects from among their acquaintances.
Standardization- the process of administering
a test to a representative sample of test-takers to
5. Clustered sampling- researchers divide a
establish norms.
population into smaller groups known as clusters.
They then randomly select among these clusters to
form a sample.
Type of Norms
1. Percentiles- expression of the percentage of people whose score on a test or measure falls below a
particular raw score. Ex. you got a percentile of 75 in your math test. This means your score is better
than the 75% who took the exam.
- Percentage Correct: an expression of the number of items answered correctly, multiplied by 100 and
divided by the total number of items.
2. Age norms- also known as age-equivalent scores- dn
3. Grade norms- dn
o developmental norms-- developed based on any trait, ability, skill, or other characteristic that is
presumed to be developed, deteriorate, or be affected by chronological age, school grade, or
stage of life.
4. National norms-
5. National anchor norms- provide some stability to test scores by anchoring them to other test scores.
- Equipercentile method- equivalency of scores
on different tests is calculated concerning corresponding percentile scores.
6. Subgroup norms Ex. age, socio-economics, geographic region, race
7. Local norms
Fixed Reference Group Scoring System- The distribution of scores obtained on the test from one group of test-
takers (fixed reference group) is used as the basis for the calculation of test scores for future administrations of
the test.

CHAPTER 5-6 SUMMARY

Reliability refers to the consistency in measurement.


Reliability Estimates:
1. Test-Retest Reliability
2. Parallel-Forms and Alternate-Forms Reliability
3. Internal Consistency
4. Inter-scorer Reliability
Reliability Coefficient- is an index of reliability, a proportion that indicates the ratio between the true score
variance on a test and the total variance.
Measurement Error- It includes all of the factors associated with the process of measuring some variable, other
than the variable being measured. X= T + E
Classical Test Theory/ True Score Theory- assumes that the observed score is always equal to true score plus
error.
Types of error:
1. Random error: a source of error in measuring a targeted variable caused by unpredictable fluctuations
and inconsistencies of other variables in the measurement process.
2. Systematic error: a source of error in measuring a variable that is typically constant or proportionate to
what is presumed to be true value of the variable being measured.
Sources of Error Variance:
• Test Construction- item sampling/ content sampling
• Test Administration
- test environment
- Testtaker variables
- Examiner-related variables
• Test Scoring and Interpretation- objectivity/ subjectivity
• Other sources: sampling error, methodological error
Reliability Estimates
1. Test-Retest Reliability- obtained by correlating a pair of scores from the same people on two different
administrations of the test.
- As the time interval between administrations of the same test increases, the correlation between the scores
obtained on each testing decreases.
- Passage of time can be a source of variance.
- Coefficient of stability- refers to the estimate of test-retest reliability when the interval between testing is
greater than six
months.
2. Parallel/ Alternate Forms Reliability- obtained y correlating scores various forms of a test.
Coefficient of equivalence- degree of the relationship between various forms of a test can be evaluated by
means of an alternate-forms or parallel-forms.
- Parallel Form: two forms of the same test and the means and variances of observed test scores are
equal.
- Alternate Form: different versions of a test that have been constructed so as to be parallel.

3. Internal Consistency/ Inter-item Consistency- refers to the degree of correlation among all the items on a
scale.
- It its useful in assessing the homogeneity of a test.
- Homogeneity- refers to the extent to which items in a scale measure a single trait.
- Heterogeneity- the degree to which a test measures different factors.
Methods:
4. Split- Half Reliability- obtained by correlating two pairs from equivalent halves of a single test
administered once.
•odd-even reliability
• Spearman-brown formula
Other methods of estimating Internal Consistency:
➢ Kuder-Richardson formulas (KR-20/ KR-21)
- By G. Frederic Kuder and M.W. Richardson
- KR-20- highly homogenous & dichotomous items
- KR-21- simplified version KR-20, assuming items have the same difficulty

➢ Coefficient Alpha/ Cronbach Alpha


- Developed by Cronbach (1951)
- The mean of all possible split-half correlations corrected by Spearman-brown formula.
- Used on tests containing nondichotomous items.
➢ Average Proportional Distance (APD)
- focuses on the degree of difference that exists between item scores
5. Inter-Scorer Reliability- is the degree of agreement or consistency between two or more scorers with regard
to a particular measure.
- Often used in coding nonverbal behavior
➢ Coefficient of inter-scorer reliability- the degree of

Considerations on the nature of the test


1. Homogeneity versus heterogeneity of test items-
2. Dynamic versus static characteristics
•Dynamic characteristic- trait, state, or ability presumed to be ever-changing as a function of
situational and cognitive experiences.
• Static characteristic- trait, state, or ability presumed to be relatively uchanging as a function of
situational and cognitive experiences.
3. Restrictions or inflation of range- restriction of range happens when you miss out on some of the data,
making the relationship seem weaker, while inflation of range occurs when extreme values make the
relationship seem stronger than it actually is

4. Speed tests versus power tests


• Power test- there is a long time limit enough to allow testtakers to attempt all items and some items
are so difficult that no testtaker can obtain a perfect score.
• Speed test- a type of test used to calculate the number of problems or tasks the participant can solve
or perform in a predesignated block of time.
5. Criterion-referenced tests- a method of evaluation and a way of deriving meaning from
test scores by evaluating an individual's test score with
reference to a set standard.
Domain Sampling Theory- focused on how a particular measure accurately assesses the domain it wants to
measure. It seek to estimate the extent to which specific sources of variation under defined conditions are
contributing to the test score.
• Generalizability Theory - one of its modification developed by Cronbach and colleagues (1972)
- holds the idea that scores vary because of the different conditions during testing.
- “Universe score” replaces “true score”
•Generalizability study- it examines how generalizable scores from a particular test are if the test is
administered in different situations.
• Coefficients of generalizability- represents the influence of particular facets on the test scores. They
are similar to the reliability coefficients in the true score model.
Item repsonse theory- Also called the IRT models / latent-trait theory
•It refers to the probability that a person with X ability will be able to perform at a level of Y.
Standard Error of Measurement
• It provides an estimate of the amount of error inherent in an observed score or measurement.
• The higher the reliability of a test, the lower the SEM (inverse relationship)
• Confidence Interval- a range or band of test scores that is likely to
contain the true score.
Standard Error of Difference
• A statistical measure than can aid a test user in determining how large a difference should be before it
is considered statistically significant.
CHAPTER 6: VALIDITY
VALIDITY- refers to how well a test measures what it purports to measure in a particular context.
Validation- process of gathering and evaluating evidence about validity.
Validation studies- are used in research to compare the accuracy of a measure with a gold standard measure to
identify and eliminate bias.
Local validation studies- when the test user alters a tool in some way, for example, the format, instructions,
language, or content for a particular population of testtakers.
Types of Validity
Face Validity- a judgement concerning how relevant test items appear to be.
Lack of face validity: lack confidence in the perceived effectiveness of the test and decrease in the testtakers
cooperation and motivation.
Content Validity- a judgement of how adequately a test samples behavior representative of the universe of
behavior that the test was designed to sample.
oTable of Specifications/ Test blueprint- a plan regarding the types of information to be covered by the
items, the number of items tapping each area of coverage, the organization of items in the test.
Criterion-Related Validity- judgement of how adequately a test score can be used to infer an individual's most
probable standing on some measure of a criterion.
Criterion- is the standard against which a test or a test score is evaluated.
Characteristics of a criterion:
1. Relevant 2. Valid 3. Uncontaminated

Criterion Contamination- a situation in which a response measure (the criterion) is influenced by factors
that are not related to the concept being measured.
- also applies to a criterion measure that has been based, at least in part, on predictor measures.
o Concurrent Validity- is an index of the degree to which a test score is related to some criterion measure
obtained at the same time.
Predictive Validity- is an index of the degree to which a test score predicts some criterion measure.
- base rate - false positive -false negative -hit rate - miss rate
Incremental Validity- the degree to which an additional predictor explains something about the criterion
measure that is not explained by predictors already in use.
Construct Validity- judgement about the appropriateness of inferences drawn from test scores regarding
individual standings on a variable called a construct.
• construct: informed, scientific idea developed or hypothesized to describe or explain behavior.
• “umbrella validity”
Evidence of Construct Validity
1. Evidence of Homogeneity- refers to how uniform a test is in measuring a single concept.
o Statistical tools used to test homogeneity: Pearson R,
Spearman Rho, Coefficient Alpha
2. Evidence of changes with age- some constructs are expected to change over time.
3. Evidence of pretest- posttest changes
- Evidence that test scores change as a result of some experience between a pretest and posttest.
4. Evidence from distinct groups
- Scores on the test vary in a predictable way as a function of membership in some group.
5. Convergent validity- if scores on the test correlates to other test that measures the same construct.
6. Discriminant validity- tests whether constructs that are not supposed to be related are actually unrelated.
Factor Analysis
- In psychometric research, it is a data reduction method in which several sets of scores and correlations between
them are analyzed.
- Its purpose is to identify factor or factors in common between test scores on subscales within a particular test,
or the factors in common between scores on a series of tests.
o Exploratory Factor Analysis- estimating, or extracting factors; deciding how many factors to retain; and
rotating factors to an interpretable orientation.
o Confirmatory Factor Analysis- researches test the degree to which a hypothetical model fits the actual
data.
o Factor Loading- conveys information about the text towhich the factor determines the test scores or
scores.
Validity Coefficient- is a correlation coefficient that provides a measure of the relationship between test scores
and scores on the criterion measure.
- Typically, Pearson correlation coefficient is used to determine the validity between two measures.
How high should the coefficient of validity be?
• There are no rules for determining the minimum acceptable size of a validity coefficient.
• Validity coefficients need to be large enough to enable the test user to make accurate decisions within
the unique context in which a test is being used.
Test Bias
• It is a factor inherent in attest that systematically prevents accurate, impartial measurement.
• Implies systematic variation
• It can be justified if some portion of its variance stems from some factor/s that are irrelevant to
performance on the criterion measure.
TYPES OF RATING ERROR
Generosity error- the tendency of the rater to be lenient in rating.
Central tendency error- is the tendency of the rater to be reluctant in giving extreme scores, so the rating falls
in the middle of the rating continuum.
Severity error- is the tendency of the rater to be too strict in rating so the rating falls in the low end of the rating
continuum.
Halo effect- is the tendency of the rater to give higher scores more than what the ratee deserves because he
fails to discriminate between independent aspects of the ratee's behavior.
Test Fairness
• The extent to which a test is used in an impartial, just, and equitable way.
• Society strives for fairness in test use by means of legislation, judicial decisions, and administrative
regulations.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy