0% found this document useful (0 votes)
67 views11 pages

PSYCHOLOGICAL ASSESSMENT CHAPTER 3-6 (Summary)

Uploaded by

Beverly Sanico
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views11 pages

PSYCHOLOGICAL ASSESSMENT CHAPTER 3-6 (Summary)

Uploaded by

Beverly Sanico
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

PSYCHOLOGICAL ASSESSMENT CHAPTER 3-6 (summary)

Chapter 3-4: Statistics Refresher


Measurement- the act of assigning 4. Ratio Scales- all mathematical operations can
numbers or symbols to characteristics of be performed.
things according to rules Properties: Identity, Magnitude, Equal
Scale- is a set of numbers whose Interval and Absolute Zero
properties model empirical properties of the objects Ex. height, weight, Kelvin
to which the numbers are
assigned. Describing Data
Distribution- as a set of test scores
Discrete scale- contains numeric data arrayed for recording or study.
that have a finite number of possible
values and can only be whole numbers. Raw Score- is a straightforward,
unmodified counting of performance that
Continuous Scale- are numerical data is usually numerical.
that can theoretically be measured in Ex. Number of items responded correctly
infinitely small units. With decimals on an achievement test
Error- it is the collective influence of all
of the factors on a test score or Frequency Distribution- all scores are
measurement beyond those specifically listed alongside the number of times each
measured by the test or measurement. score occurred.
Scales of Measurement (N.O.I.R) - Scores might be listed in a tabular or
1. Nominal Scales- involves in classification or graphic form.
categorization based on one or more
distinguishing characteristics, Property: Identity 1. Simple frequency distribution
Ex. Gender, Name, Religion, ID number 2. Grouped frequency distribution
2. Ordinal Scales- involves classification and
rank ordering on some characteristics. Graph- a diagram or chart composed of
Properties: Identity and Magnitude lines, points, bars, or other symbols that
Ex. Birth order, Rank in class describe and illustrate data.
3. Interval Scales- each unit on the scale is equal
3 types:
to any other unit on the scale.
1. Histogram- a graph with vertical lines
Properties: Identity, Magnitude and Equal
drawn at the true limits of each test score
Interval- no true zero
(or class interval, forming a series of
4. Ratio Scales- all mathematical operations can
continguous rectangles.
be performed. W true zero
2. Bar graph- numbers indicative of
frequency appear on the y-axis, and
reference to some categorization appears
on the x-axis.
3. Frequency Polygon- a continuous line
connecting the points where test scores or
class intervals meet frequencies.
Measures of Central Tendency Normal Curve
- A statistic that indicates the average or >“Laplace-Gaussian Curve”- Karl Firedrich Gauss
midmost score between the extreme scores >Karl Pearson was credited for being the first to
in a distribution. refer the curve as the “normal curve”
Mean- arithmetic mean or the average of the Characteristics:
given scores in a distribution. • bell-shaped
Median- middle score in a distribution. • asymptotic (both tails approach but never meet
the x-axis)
Mode- most frequently occurring score in a • symmetrical
distribution • mean=median=mode

Measures of Variability
Variability is an indication of how scores
in a distribution are scattered or
dispersed.

1. Range- the difference between the highest


score and the lowest score.

2. Interquartile Range- it is equal to the difference


between Q3 and Q1.
- There are four quarters and three quartiles in a
distribution.

3. Average Deviation- the average of the


absolute deviations from a central point.

4. Standard Deviation- equals to the


square root of the variance.

Variance- is equal to the arithmetic


mean of the squares of the differences
between the scores in a distribution and
their mean.

z Scores- results from the conversion of a raw


score into a number indicating how many
Skewness
standard deviation units the raw score is below
(the extent to which symmetry is
or above the mean of the distribution.
absent)
Mean= 0 SD= 1
positive skew- most of the scores fall in
the low end of the distribution

negative skew- most of the scores fall in


the high end of the Pearson r
distribution • Pearson correlation coefficient/ Pearson product-
T Scores- can also be called fifty plus or moment coefficient of correlation.
minus ten - Karl Pearson
scale. - Statistical tool used when the relationship
- Mean= 50 SD= 10, devised by W.A. between the variable is linear and when the
McCall two variables being correlated are
- It was named T score in honor of his continuous.
professor
E.L. Thorndike
Correlation Coefficient (r)
• IQ scores
Coefficient of determination- is an indication of how
mean= 100 SD=15
much variance is shared by the X- and Y- variables in
• A scores (GRE and SAT)
mean= 500 SD=10 a
• Stanine correlation.
mean= 5 SD= approx. 2 Ex. r= 0.60 r2= 36%
Linear Transformation: direct numerical
relationship to the original score. (normalSpearman Rho
distribution) - rank-order correlation coefficient/ rank-
difference correlation coefficient
Nonlinear transformation: data - Charles Spearman
presented are not normally distributed yet - used when the sample size is small and
comparisons with normal distributions when both sets of measurements are in ordinal
need to be made. (rank-order)

Correlation Coefficient (r) Graphic Representations of Correlation


• It is a number that provides an index of Scatterplot simple graphing of the coordinate points
the strength of the for values of the x-variable and the y-variable.
relationship between two variables.
• Positive Correlation- two variables
simultaneously increase or
simultaneously decrease. (direct
relationship)
• Negative Correlation- one variable
increases while the other
decreases. (inverse relationship)
• It does not imply causation.
• (r) ranges from -1 to +1

Meta-Analysis
- techniques used to statistically
combine information across studies
- Effect size: estimates derived from meta-
analysis
Criterion-Referenced (Domain-Referenced):
method of evaluation and a way of deriving meaning
from test scores by evaluating an
Assumptions about Psychological individual's test score with reference to a set
Testing and Assessment standard.
1. Psychological traits and states exist
2. Psychological traits and states can be Norm-Referenced: a method of evaluation and a
quantified and
way of deriving meaning from test scores by
measured
evaluating an individual's test score and comparing it
3. Test-related behavior predicts non-test
to scores of group of testtakers on the same test.
related behavior
4. Tests and other measurement
techniques have strengths SAMPLING TO DEVELOP NORMS
and weaknesses Population- set of individuals with at least one
5. Various sources of error are part of the common, observable characteristic.
assessment process
6. Testing and assessment can be Sample- representative of the whole population.
conducted fair and unbiased
manner Sampling- the process of selection the portion
7. Testing and assessment benefit society of the universe deemed to be representative of
the whole population.
Trait- any distinguishable, relatively
enduring way in which one individual Types of sampling:
varies from another. 1. Stratified sampling- researchers divide subjects
into
States- relatively less enduring subgroups called strata based on characteristics that
Psychological traits exist as constructs they share.
from overt
behavior. 2. Convenience/ Incidental sampling- is a non-
probability sampling method where units are
Norms- used as a frame of reference for
selected for inclusion in the sample because they are
interpreting test scores.
the easiest for the researcher to access.
Normative sample or norm group-
group of people presumed to be 3. Purposive sampling- is a non-probability sampling
representative of the universe of people in which researchers rely on their own judgment
when choosing members of the population to
Norming- process of deriving norms. participate in their surveys.

User Norms/ Program Norms- consist 4. Snowball sampling- is a nonprobability sampling


of descriptive statistics based on a group technique where existing study subjects recruit
of test-takers in a given period rather than future subjects from among their acquaintances.
norms obtained by formal sampling
methods. 5. Clustered sampling- researchers divide a
population into smaller groups known as clusters.
Standardization- the process of They then randomly select among these clusters to
administering a test to a representative form a sample.
sample of test-takers to establish norms.
Type of Norms
1. Percentiles- expression of the percentage of people whose score on a test or measure falls below a
particular raw score. Ex. you got a percentile of 75 in your math test. This means your score is better
than the 75% who took the exam.
- Percentage Correct: an expression of the number of items answered correctly, multiplied by 100 and
divided by the total number of items.
2. Age norms- also known as age-equivalent scores- dn
3. Grade norms- dn
o developmental norms-- developed based on any trait, ability, skill, or other characteristic that is
presumed to be developed, deteriorate, or be affected by chronological age, school grade, or
stage of life.
4. National norms-
5. National anchor norms- provide some stability to test scores by anchoring them to other test scores.
- Equipercentile method- equivalency of scores
on different tests is calculated concerning corresponding percentile scores.
6. Subgroup norms Ex. age, socio-economics, geographic region, race
7. Local norms
Fixed Reference Group Scoring System- The distribution of scores obtained on the test from one group of test-
takers (fixed reference group) is used as the basis for the calculation of test scores for future administrations of
the test.

CHAPTER 5-6 SUMMARY

Reliability refers to the consistency in measurement.


Reliability Estimates:
1. Test-Retest Reliability
2. Parallel-Forms and Alternate-Forms Reliability
3. Internal Consistency
4. Inter-scorer Reliability
Reliability Coefficient- is an index of reliability, a proportion that indicates the ratio between the true score
variance on a test and the total variance.
Measurement Error- It includes all of the factors associated with the process of measuring some variable,
other than the variable being measured. X= T + E
Classical Test Theory/ True Score Theory- assumes that the observed score is always equal to true score plus
error.

Types of error:
1. Random error: a source of error in measuring a targeted variable caused by unpredictable fluctuations
and inconsistencies of other variables in the measurement process.
2. Systematic error: a source of error in measuring a variable that is typically constant or proportionate to
what is presumed to be true value of the variable being measured.
Sources of Error Variance:
• Test Construction- item sampling/ content sampling
• Test Administration
- test environment
- Testtaker variables
- Examiner-related variables
• Test Scoring and Interpretation- objectivity/ subjectivity
• Other sources: sampling error, methodological error
Reliability Estimates
1. Test-Retest Reliability- obtained by correlating a pair of scores from the same people on two different
administrations of the test.
- As the time interval between administrations of the same test increases, the correlation between the scores
obtained on each testing decreases.
- Passage of time can be a source of variance.
- Coefficient of stability- refers to the estimate of test-retest reliability when the interval between testing is
greater than six
months.
2. Parallel/ Alternate Forms Reliability- obtained y correlating scores various forms of a test.
Coefficient of equivalence- degree of the relationship between various forms of a test can be evaluated by
means of an alternate-forms or parallel-forms.
- Parallel Form: two forms of the same test and the means and variances of observed test scores are
equal.
- Alternate Form: different versions of a test that have been constructed so as to be parallel.

3. Internal Consistency/ Inter-item Consistency- refers to the degree of correlation among all the items on a
scale.
- It its useful in assessing the homogeneity of a test.
- Homogeneity- refers to the extent to which items in a scale measure a single trait.
- Heterogeneity- the degree to which a test measures different factors.
Methods:
4. Split- Half Reliability- obtained by correlating two pairs from equivalent halves of a single test
administered once.
•odd-even reliability
• Spearman-brown formula
Other methods of estimating Internal Consistency:
 Kuder-Richardson formulas (KR-20/ KR-21)
- By G. Frederic Kuder and M.W. Richardson
- KR-20- highly homogenous & dichotomous items
- KR-21- simplified version KR-20, assuming items have the same difficulty

 Coefficient Alpha/ Cronbach Alpha


- Developed by Cronbach (1951)
- The mean of all possible split-half correlations corrected by Spearman-brown formula.
- Used on tests containing nondichotomous items.
 Average Proportional Distance (APD)
- focuses on the degree of difference that exists between item scores
5. Inter-Scorer Reliability- is the degree of agreement or consistency between two or more scorers with regard
to a particular measure.
- Often used in coding nonverbal behavior
 Coefficient of inter-scorer reliability- the degree of

Considerations on the nature of the test


1. Homogeneity versus heterogeneity of test items-
2. Dynamic versus static characteristics
•Dynamic characteristic- trait, state, or ability presumed to be ever-changing as a function of
situational and cognitive experiences.
• Static characteristic- trait, state, or ability presumed to be relatively uchanging as a function of
situational and cognitive experiences.
3. Restrictions or inflation of range- restriction of range happens when you miss out on some of the
data, making the relationship seem weaker, while inflation of range occurs when extreme values
make the relationship seem stronger than it actually is

4. Speed tests versus power tests


• Power test- there is a long time limit enough to allow testtakers to attempt all items and some items
are so difficult that no testtaker can obtain a perfect score.
• Speed test- a type of test used to calculate the number of problems or tasks the participant can solve
or perform in a predesignated block of time.
5. Criterion-referenced tests- a method of evaluation and a way of deriving meaning from
test scores by evaluating an individual's test score with
reference to a set standard.
Domain Sampling Theory- focused on how a particular measure accurately assesses the domain it wants to
measure. It seek to estimate the extent to which specific sources of variation under defined conditions are
contributing to the test score.
• Generalizability Theory - one of its modification developed by Cronbach and colleagues (1972)
- holds the idea that scores vary because of the different conditions during testing.
- “Universe score” replaces “true score”
•Generalizability study- it examines how generalizable scores from a particular test are if the test is
administered in different situations.
• Coefficients of generalizability- represents the influence of particular facets on the test scores. They
are similar to the reliability coefficients in the true score model.
Item repsonse theory- Also called the IRT models / latent-trait theory
•It refers to the probability that a person with X ability will be able to perform at a level of Y.
Standard Error of Measurement
• It provides an estimate of the amount of error inherent in an observed score or measurement.
• The higher the reliability of a test, the lower the SEM (inverse relationship)
• Confidence Interval- a range or band of test scores that is likely to
contain the true score.
Standard Error of Difference
• A statistical measure than can aid a test user in determining how large a difference should be before
it
is considered statistically significant.

CHAPTER 6: VALIDITY
VALIDITY- refers to how well a test measures what it purports to measure in a particular context.
Validation- process of gathering and evaluating evidence about validity.
Validation studies- are used in research to compare the accuracy of a measure with a gold standard measure
to identify and eliminate bias.
Local validation studies- when the test user alters a tool in some way, for example, the format, instructions,
language, or content for a particular population of testtakers.
Types of Validity
Face Validity- a judgement concerning how relevant test items appear to be.
Lack of face validity: lack confidence in the perceived effectiveness of the test and decrease in the testtakers
cooperation and motivation.
Content Validity- a judgement of how adequately a test samples behavior representative of the universe of
behavior that the test was designed to sample.
oTable of Specifications/ Test blueprint- a plan regarding the types of information to be covered by the
items, the number of items tapping each area of coverage, the organization of items in the test.
Criterion-Related Validity- judgement of how adequately a test score can be used to infer an individual's most
probable standing on some measure of a criterion.
Criterion- is the standard against which a test or a test score is evaluated.
Characteristics of a criterion:
1. Relevant 2. Valid 3. Uncontaminated

Criterion Contamination- a situation in which a response measure (the criterion) is influenced by


factors that are not related to the concept being measured.
- also applies to a criterion measure that has been based, at least in part, on predictor measures.
o Concurrent Validity- is an index of the degree to which a test score is related to some criterion measure
obtained at the same time.
Predictive Validity- is an index of the degree to which a test score predicts some criterion measure.
- base rate - false positive -false negative -hit rate - miss rate
Incremental Validity- the degree to which an additional predictor explains something about the criterion
measure that is not explained by predictors already in use.
Construct Validity- judgement about the appropriateness of inferences drawn from test scores regarding
individual standings on a variable called a construct.
• construct: informed, scientific idea developed or hypothesized to describe or explain behavior.
• “umbrella validity”

Evidence of Construct Validity


1. Evidence of Homogeneity- refers to how uniform a test is in measuring a single concept.
o Statistical tools used to test homogeneity: Pearson R,
Spearman Rho, Coefficient Alpha
2. Evidence of changes with age- some constructs are expected to change over time.
3. Evidence of pretest- posttest changes
- Evidence that test scores change as a result of some experience between a pretest and posttest.
4. Evidence from distinct groups
- Scores on the test vary in a predictable way as a function of membership in some group.
5. Convergent validity- if scores on the test correlates to other test that measures the same construct.
6. Discriminant validity- tests whether constructs that are not supposed to be related are actually unrelated.
Factor Analysis
- In psychometric research, it is a data reduction method in which several sets of scores and correlations
between them are analyzed.
- Its purpose is to identify factor or factors in common between test scores on subscales within a particular
test, or the factors in common between scores on a series of tests.
o Exploratory Factor Analysis- estimating, or extracting factors; deciding how many factors to retain;
and rotating factors to an interpretable orientation.
o Confirmatory Factor Analysis- researches test the degree to which a hypothetical model fits the
actual data.
o Factor Loading- conveys information about the text towhich the factor determines the test scores or
scores.
Validity Coefficient- is a correlation coefficient that provides a measure of the relationship between test scores
and scores on the criterion measure.
- Typically, Pearson correlation coefficient is used to determine the validity between two measures.
How high should the coefficient of validity be?
• There are no rules for determining the minimum acceptable size of a validity coefficient.
• Validity coefficients need to be large enough to enable the test user to make accurate decisions
within the unique context in which a test is being used.
Test Bias
• It is a factor inherent in attest that systematically prevents accurate, impartial measurement.
• Implies systematic variation
• It can be justified if some portion of its variance stems from some factor/s that are irrelevant to
performance on the criterion measure.
TYPES OF RATING ERROR
Generosity error- the tendency of the rater to be lenient in rating.
Central tendency error- is the tendency of the rater to be reluctant in giving extreme scores, so the rating falls
in the middle of the rating continuum.
Severity error- is the tendency of the rater to be too strict in rating so the rating falls in the low end of the
rating continuum.
Halo effect- is the tendency of the rater to give higher scores more than what the ratee deserves because he
fails to discriminate between independent aspects of the ratee's behavior.
Test Fairness
• The extent to which a test is used in an impartial, just, and equitable way.
• Society strives for fairness in test use by means of legislation, judicial decisions, and administrative
regulations.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy