0% found this document useful (0 votes)
29 views26 pages

Psychological Assessment (Finals)

Uploaded by

ryujin shin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views26 pages

Psychological Assessment (Finals)

Uploaded by

ryujin shin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

between these two terms as well as related terms such

0psychological testing and asses ment as psychological test user and psychological assessor.

All fields of human endeavour use measurement in some We define psychological assessment as the gathering and
form, and each field has its own set of measuring tools integration of psychology-related data for the purpose
and measuring units. For example, if you’re recently of making a psychological evaluation that is
engaged or thinking about becoming engaged, you may accomplished through the use of tools such as tests,
have learned about a unit of measure called the carat. interviews, case studies, behavioural observation, and
If you’ve been shopping for a computer, you may have specially designed apparatuses and
learned something about a unit of measurement called a measurement procedures. We define psychological testing
byte. as the process of measuring psychology-related variables
by means of devices or procedures designed to obtain a
As a student of psychological measurement, you need a sample of behavior.
working familiarity with some of the commonly used
units of measure in psychology as well as knowledge of Varieties of assessment
some of the many measuring tools employed. In the The term assessment may be modified in a seemingly
pages that follow, you will gain that knowledge as endless number of ways, each such modification
well as an acquaintance with the history of referring to a particular variety or area of assessment.
measurement in psychology and an understanding of its Also intuitively obvious, the term educational assessment
theoretical basis. refers to, broadly speaking, the use of tests and other
tools to evaluate abilities and skills relevant to success
Testing and Assessment or failure in a school or pre-school context.
The roots of contemporary psychological testing and
assessment can be found in early twentieth- century For the record, the term retrospective assessment may be
France. In 1905, Alfred Binet and a colleague published a defined as the use of evaluative tools to draw
test designed to help place Paris schoolchildren in conclusions about psychological aspects of a person as
appropriate classes. During World War II, the military they existed at some point in time prior to the
would depend even more on psychological tests to screen assessment. Psychological assessment by means of
recruits for service. smartphones also serves as an example of an approach
to assessment called ecological momentary assessment
Following the war, more and more tests purporting to (EMA). EMA refers to the “in the moment” evaluation of
measure an ever-widening array of psychological specific problems and related cognitive and behavioral
variables were developed and used. There were tests to variables at the very time and place that they occur.
measure not only intelligence but also personality,
brain functioning, performance at work, and many The process of assessment
other aspects of psychological and social functioning. In general, the process of assessment begins with a
referral for assessment from a source such as a teacher,
Psychological Testing and Assessment school psychologist, counselor, judge, Other assessors
Defined view the process of assessment as more of a
The world’s receptivity to Binet’s test in the early collaboration between the assessor and the assessed. In
twentieth century spawned not only more tests but that approach, therapeutic self-discovery and new
more test developers, more test publishers, more test understandings are encouraged throughout the
users, and the emergence of what, logically enough, has assessment process.
become known as a testing enterprise. “Testing” was the
term used to refer to everything from the administration Another approach to assessment that seems to have
of a test (as in “Testing in progress”) to the interpretation picked up momentum in recent years, most notably in
of a test score (“The testing indicated that . . .”). educational settings, is referred to as dynamic
assessment (Poehner & van Compernolle, 2011). The term
The OSS model—using an innovative variety of dynamic may suggest that a psychodynamic or
evaluative tools along with data from the evaluations psychoanalytic approach to assessment is being applied.
of highly trained assessors—would later inspire what is The term dynamic may suggest that a psychodynamic or
now referred to as the assessment center approach to psychoanalytic approach to assessment is being applied.
personnel evaluation (Bray, 1982). Society at large is
best served by a clear definition of and differentiation
However, that is not the case. As used in the present and facial expressions in response to the interviewer, the
context, dynamic is used to describe the interactive, extent of eye contact, apparent willingness to cooperate,
changing, or varying nature of the assessment. In and general reaction to the demands of the interview.
general, dynamic assessment refers to an interactive
approach to psychological assessment that usually In its broadest sense, then, we can define an interview
follows a model of (1) evaluation, (2) intervention of as a method of gathering information through direct
some sort, and (3) evaluation. communication involving reciprocal exchange. In some
instances, what is called a panel interview (also
The Tools of Psychological Assessment referred to as a board interview) is employed. Here, more
The Test than one interviewer participates in the assessment.
A test may be defined simply as a measuring device or Motivational interviewing may be defined as a
procedure. When the word test is prefaced with a therapeutic dialogue that combines person- centered
modifier, it refers to a device or procedure designed to listening skills such as openness and empathy, with the
measure a variable related to that modifier. In a like use of cognition-altering techniques designed to
manner, the term psychological test refers to a device or positively affect motivation and effect therapeutic
procedure designed to measure variables related to change.
psychology (such as intelligence, personality, aptitude,
interests, attitudes, or values). The Portfolio
Students and professionals in many different fields of
The term format pertains to the form, plan, structure, endeavor ranging from art to architecture keep files of
arrangement, and layout of test items as well as to their work products. These work products—whether
related considerations such as time limits. retained on paper, canvas, film, video, audio, or some
Format is also used to refer to the form in which a test other medium—constitute what is called a portfolio. As
is administered: computerized, pencil-and-paper, or some samples of one’s ability and accomplishment, a portfolio
other form. may be used as a tool of evaluation.

In testing and assessment, we may formally define score Case history data refers to records, transcripts, and
as a code or summary statement, usually but not other accounts in written, pictorial, or other form that
necessarily numerical in nature, that reflects an preserve archival information, official and informal
evaluation of performance on a test, task, interview, or accounts, and other data and items relevant to an
some other sample of behavior. Scoring is the process of assessed. Case history data may include files or excerpts
assigning such evaluative codes or statements to from files maintained at institutions and agencies such
performance on tests, tasks, interviews, or other as schools, hospitals, employers, religious institutions,
behaviour samples. In the world of psychological and criminal justice agencies.
assessment, many different types of scores exist.
Behavioral Observation
Scores themselves can be described and categorized in If you want to know how someone behaves in a
many different ways. For example, one type of score is particular situation, observe his or her behaviour in that
the cut score. A cut score (also referred to as a cutoff situation. Such “down-home” wisdom underlies at least
score or simply a cutoff) is a reference point, usually one approach to evaluation. Behavioral observation, as
numerical, derived by judgment and used to divide a set it is employed by assessment professionals, may be
of data into two or more classifications. defined as monitoring the actions of others or oneself by
visual or electronic means while recording quantitative
The Interview and/or qualitative information regarding those actions.
In everyday conversation, the word interview conjures This variety of behavioral observation is referred to as
images of face-to-face talk. But the interview as a tool naturalistic observation.
of psychological assessment typically involves more
than talk. If the interview is conducted face-to-face, Role play may be defined as acting an improvised or
then the interviewer is probably taking note of not only partially improvised part in a simulated situation. A
the content of what is said but also the way it is being role-play test is a tool of assessment wherein assessed
said. are directed to act as if they were in a particular
situation. Assessed may then be evaluated with regard
More specifically, the interviewer is taking note of both to their expressed thoughts, behaviors, abilities, and
verbal and nonverbal behavior. Nonverbal behavior may other variables. (Note that role play is hyphenated
include the interviewee’s “body language,” movements,
when used as an adjective or a verb but not as a noun.). The test user
Role play is useful in evaluating various skills. Psychological tests and assessment methodologies are
used by a wide range of professionals, including
Computers as Tools clinicians, counselors, school psychologists, human
We have already made reference to the role computers resources personnel, consumer psychologists,
play in contemporary assessment in the context of experimental psychologists, and social psychologists.
generating simulations. They may also help in the
measurement of variables that in the past were quite The test taker
difficult to quantify. As test administrators, computers We have all been test takers. However, we have not all
do much more than replace the “equipment” that was so approached tests in the same way.
widely used in the past (a number 2 pencil). Computers
can serve as test administrators (online or off) and as Society at large
highly efficient test scorers. Within seconds they can The societal need for “organizing” and “systematizing” has
derive not only test scores but patterns of test scores. historically manifested itself in such varied questions as
Scoring may be done on-site (local processing) or “Who is a witch?,” “Who is schizophrenic?,” and “Who is
conducted at some central location (central processing). qualified?” The specific questions asked have shifted with
societal concerns.
The acronym CAPA refers to the term computer-assisted
psychological assessment. By the way, here the word Other parties
assisted typically refers to the assistance computers Beyond the four primary parties we have focused on
provide to the test user, not the testtaker. Another here, let’s briefly make note of others who may
acronym you may come across is CAT, this for participate in varied ways in the testing and assessment
computer adaptive testing. The adaptive in this term is a enterprise. Organizations, companies, and governmental
reference to the computer’s ability to tailor the test to agencies sponsor the development of tests for various
the testtaker’s ability or test-taking pattern. reasons, such as to certify personnel.

Other Tools In What Types of Settings Are


The next time you have occasion to stream a video,
Assessments Conducted, and Why?
fire-up that Blu-ray player, or even break-out an old
DVD, take a moment to consider the role that video can
Educational settings
You are probably no stranger to the many types of tests
play in assessment. In fact, specially created videos are
administered in the classroom. As mandated by law,
widely used in training and evaluation contexts. Many
tests are administered early in school life to help
items that you may not readily associate with
identify children who may have special needs. In
psychological assessment may be pressed into service for
addition to school ability tests, another type of test
just that purpose. In general, there has been no shortage
commonly given in schools is an achievement test, which
of innovation on the part of psychologists in devising
evaluates accomplishment or the degree of learning that
measurement tools, or adapting existing tools, for use in
has taken place. You know from your own experience
psychological assessment.
that a diagnosis may be defined as a description or
conclusion reached on the basis of evidence and opinion.
Who Are the Parties?
Parties in the assessment enterprise include developers
Clinical settings
Tests and many other tools of assessment are widely
and publishers of tests, users of tests, and people who
used in clinical settings such as public, private, and
are evaluated by means of tests. Additionally, we may
military hospitals, inpatient and outpatient clinics,
consider society at large as a party to the assessment
private-practice consulting rooms, schools, and other
enterprise.
institutions.
The test developer Counseling settings
Assessment in a counseling context may occur in
Test developers and publishers create tests or other
environments as diverse as schools, prisons, and
methods of assessment. The American Psychological
governmental or privately owned institutions.
Association (APA) has estimated that more than 20,000
new psychological tests are developed each year.
How Are Assessments Conducted? Aptitude Test
If a need exists to measure a particular variable, a – Measures an individual’s potential for learning a
way to measure that variable will be devised. As Figure specific task, ability or skill
1–5 just begins to illustrate, the ways in which – Assumption: No assumptions about specific prior
measurements can be taken are limited only by learning experiences
imagination. Keep in mind that this figure illustrates – Validation process: Content validity and Predictive
only a small sample of the many methods used in Validity
psychological testing and assessment. – Examples: DAT, SATT
The photos are not designed to illustrate the most
typical kinds of assessment procedures. With reference to Achievement Test
testing and assessment, protocol typically refers to the - This test provides a measure for the amount, rate and
form or sheet or booklet on which a testtaker’s responses level of learning, success or accomplishment,
are entered. In this context, rapport may be defined as a strengths/weaknesses in a particular subject or task
working relationship between the examiner and the – Assumption: Assumes prior relatively standardized
examinee. educational learning experiences
– Validation process: Content validity
Assessment of people with disabilities – Example: National Achievement Test
People with disabilities are assessed for exactly the same
reasons people with no disabilities are assessed: to obtain Personality Test
employment, to earn a professional credential, to be - measures traits, qualities, attitudes or behaviors that
screened for psychopathology, and so forth. determine a person’s individuality
– can measure overt or covert dispositions and levels of
adjustment as well
general types of psychological tests – can be measured idiographically (unique
characteristics) or nomothetically (common
characteristics
according to variable measured – has three construction strategies namely:
theory-guided inventories, factor-analytically derived
Ability Tests inventories, criterion-keyed inventories
- Assess what a person can do – examples: NEOPI, 16PF, MBTI, MMPI
- Includes Intelligence Tests, Achievement Tests and
Aptitude Tests Interest Inventory
- Best conditions are provided to elicit a person’s full – Measures an individual’s performance for certain
capacity or maximum performance activities or topics and thereby help determine
- There are right and wrong answers occupational choice or make career decisions
- Objective of motivation: for the examinee to do his best – Measure the direction and strength of interest
– Assumption: Interests though unstable, have a certain
Tests of Typical Performance stability or else it cannot be measured
- Assess what a person usually does – Stability is said to start at 17 years old
- Includes personality tests, interest/attitude/values – Broad lines of interests are more stable while specific
inventories lines of interests are more unstable, they can change a
- Typical performance can still manifest itself even in lot.
conditions not deemed as best – Example: CII
- There are no right or wrong answers
- Objective of motivation: for the examinee to answer Attitude Inventory
questions honestly - Direct observation on how a person behaves in relation
to certain things
Specific Types of Psychological Tests – Attitude questionnaires or scales (Bogardus Social
- Intelligence Test Distance Scale, 1925)
– measures general potential – Reliabilities are good but not as high as those of tests
– Assumption: fewer assumptions about specific prior of ability
learning experiences – Attitude measures have not generally correlated very
– Validation process: Content Validity and Construct highly with actual behavior – Specific behaviors,
Validity however, can be predicted from measures of attitude
– examples: WAIS, WISC, CFIT, RPM toward the specific behavior
Values Inventory – Administered individually and scored subjectively
- Purports to measure generalized and dominant – Have 5 types/techniques: Completion Technique,
interests Expressive Technique, Association Technique, Construction
– Validity is extremely difficult to determine by Technique, Choice or Ordering Technique
statistical methods – With low levels of reliability and validity – Examples:
– The only observable criterion is overt behavior Rorschach Inkblot Test, TAT, HTP, SSCT, DAP
– Employed less frequently than interest in vocational
counseling and career decision-making Norm-Referenced Test
- raw scores are converted to standard scores
Diagnostic Test
- This test can uncover and focus attention on Criterion-Referenced Test
weaknesses of individuals for remedial purposes - raw scores are referenced to specific cut-off scores

Power Test Basic Principles in the Use of


- Requires an examinee to exhibit the extent or depth of Psychological Tests
his understanding or skill 1. Tests are samples of behavior
– Test with varying level of difficulty 2. Tests do not reveal traits or capacities directly
3. Psychological maladjustments selectively and
Speed Test differentially affect the test scores
- Requires the examinee to complete as many items as 4. The psychometric and projective approaches, although
possible indistinguishable, are mutually complementary
– Contains items of uniform and generally simple level
of difficulty Psychological Tests are used in the
following settings:
Creativity Test Educational Settings
- A test which assesses an individual’s ability to produce – Basis for admission and placement to an academic
new/original ideas, insights or artistic creations that are institution
accepted as being social, aesthetic or scientific value – Identify developmental problems or exceptionalities
– Can assess the person’s capacity to find unusual or for which a student may need special assistance
unexpected solutions for vaguely defined problems – Assist students for educational od vocational planning
– Intelligence tests and achievement tests are used from
Neuropsychological Test an early age. From kindergarten on, tests are used for
- Measures cognitive, sensory, perceptual and motor placement and advancement.
performance to determine the extent, locus and – Educational institutions have to make admissions and
behavioral consequences of brain damage, given to advancement decisions regarding students. e.g, SAT, GRE,
persons with known or suspected brain dysfunction subject placement tests
– Example: Bender-Gestalt II – Used to assess students for special education programs.
Also, used in diagnosing learning difficulties
Objective Test Clinical Settings
- Standardized test – Tests of Psychological Adjustment and tests which can
– Administered individually or in groups classify and/or diagnose patients are used extensively.
– Objectively scored – Psychologist generally use a number of objective and
– There are limited number of responses projective personality tests.
– Uses norms – Neuropsychological tests which examine basic mental
– There is a high level of reliability and validity function also fall into this category. Perceptual tests
– Ex: Personality Inventories, Group Intelligence Test are used detecting and diagnosing brain damage
– For diagnosis and treatment planning
Projective Test Counseling Settings
- Test with ambiguous stimuli which measures wishes, – Counseling in schools, prisons, government or private
intrapsychic conflicts, dreams and unconscious motives institutions
– Projective tests allow the examinee to respond to Geriatric Settings
vague stimuli with their own impressions – Assessment for the aged
– Assumption is that the examinee will project his
unconscious needs, motives, and conflicts onto the
neutral stimulus
Business Settings (Personnel Testing) Self-Knowledge
– Tests are used to assess: training needs, worker’s – psychological tests also supply a potent source of
performance in training, success in training programs, self-knowledge and in some cases, the feedback a person
management development, leadership training, and receives from psychological tests is so self-affirming that
selection it can change the entire course of a person’s life
– For example, the Myers -Briggs type indicator is used Program Evaluation
extensively to assess managerial potential. Type testing – another use of psychological tests is the systematic
is used to hopefully match the right person with the job evaluation of educational and social programs (they are
they are most suited for. designed to provide services which improve social
– Selection of employees’ classification of individuals to conditions and community life)
positions suited for them – Basis for promotion a. Diagnostic Evaluation – refers to evaluation
Military Settings conducted before instruction.
– For proper selection of military recruits and placement b. Formative Evaluation – refers to evaluation
in the military duties conducted during or after instruction.
Government and Organizational c. Summative Evaluation – refers to evaluation
Credentialing conducted at the end of a unit or a specified period
– For promotional purposes, licensing, certification or of time
general credentialing of professionals Research – psychological tests also play a major role in
Courts both the applied and the theoretical branches of
– Evaluate the mental health of people charged with a behavioral researches
crime
– Investigating malingering cases in courts Steps in (Clinical) Psychological
– Making child custody/annulment/divorce decisions Assessment
Academic Research Settings 1. Deciding what is being assessed
2. Determining the goals of assessment
Uses of Psychological Test 3. Selecting standards for making decisions
Classification 4. Collecting assessment data
– assigning a person to one category rather than the 5. Making decisions and judgments
other a. Placement 6. Communicating results
– refers to sorting of persons into different programs
appropriate to their needs/skills (example: a university Approaches in Psychological Assessment
mathematics placement exam is given to students to 1. Nomothetic Approach - characterized by efforts to
determine if they should enroll in calculus, in algebra learn how a limited number of personality traits can be
or in a remedial course) b. Screening applied to all people
– refers to quick and simple tests/procedures to identify 2. Idiographic Approach - characterized by efforts to
persons who might have special characteristics or needs learn about each individual’s unique constellation of
(example: identifying children with exceptional thinking personality traits, with no attempt to characterize each
and the top 10% will be singled out for a more person according to any particular set of traits
comprehensive testing) c. Certification
– determining whether a person has at least the Making Inferences and Decisions in
minimum proficiency in some discipline/activity Psychological Testing and Assessment
(example: right to practice medicine after passing the 1. Base Rate - An index, usually expressed as a
medical board exam; right to drive a car) d. Selection proportion, of the extent to which a particular trait,
– example: provision of an opportunity to attend a behavior, characteristic, or attribute exists in a
university; opportunity to gain employment in a population
company or in a government 2. Hit Rate - The proportion of people a test or other
Aptitude Testing measurement procedure accurately identifies as
a. Low selection ratio possessing or exhibiting a particular trait, behavior,
b. Low success ratio characteristic, or attribute
Diagnosis and Treatment Planning 3. Miss Rate - The proportion of people a test or other
– diagnosis conveys information about strengths, measurement procedure fails to identify accurately with
weaknesses, etiology and best choices for treatment respect to the possession or exhibition of a trait, behavior,
(example: IQ tests are absolutely essential in diagnosing characteristic, or attribute; a "miss" in this context is an
intellectual disability) inaccurate classification or prediction and can be
classified as:
a. False Positive (Type I error) - an inaccurate Process Testing may be Assessment is
prediction or classification indicating that a individual or group typically
testtaker did possess a trait or other attribute in nature. After test individualized. In
being measured when in reality the testtaker did administration, the contrast to testing,
not tester will assessment more
b. False Negative (Type II error) - an inaccurate typically add up typically focuses on
prediction of classification indicating that a “the number of how an individual
testtaker did not possess a trait or other attribute correct answers or processes rather
being measured when in reality the testtaker did the number of than simply the
certain types of results of that
Cross-Cultural Testing responses… with processing
1. Parameters where cultures vary little if any regard
– Language for the how or
– Test Content mechanics of such
– Education content”
– Speed (Tempo of Life) Role of The tester is not the The assessor is the
2. Culture Free Tests Evaluator key to the process; key to the process
– An attempt to eliminate culture so nature can be practically of selecting tests
isolated speaking, one tester and/or other tools
– Impossible to develop such because culture is evident may be substituted of evaluation as
in its influence since birth or an individual for another tester well as in drawing
– The interaction between nature and nurture is without conclusions from
cumulative and not relative appreciably the entire
3. Culture Fair Tests affecting the evaluation
– These tests were developed because of the non-success evaluation
of culture-free tests – Nurture is not removed but Skill of Testing typically Assessment
parameters are common an fair to all Evaluator requires typically requires
– Can be done using three approaches such as follows: technician-like an educated
✓ Fair to all cultures skills in terms of selection of tools of
✓ Fair to some cultures administering and evaluation, skill in
✓ Fair only to one culture scoring a test as evaluation, and
4. Culture Loadings well as in thoughtful
– The extent to which a test incorporates the vocabulary, interpreting a test organization and
concepts, traditions, knowledge, and feelings, associated result integration of data
with culture Outcome Typically, testing Typically,
yields a test score assessment entails a

psychological testing vs. or series of test


scores
logical
problem-solving
approach that
brings to bear

psychological asses ment many sources of


data designed to
shed light on a
referral question
Psychological Psychological
Testing Assessment Duration Typically, Longer, lasting
assessment entails a from a few hours to
Objective Typically, to obtain Typically to answer
logical a few days or more
some gauge, usually a referral question,
problem-solving
numerical in solve a problem, or
approach that
nature, with regard arrive at a decision
brings to bear
to an ability or through the use of
many sources of
attribute tools of evaluation
data designed to
Focus How one person or The uniqueness of a
shed light on a
group compares given individual,
referral question
with others group, or situation
(nomothetic) (idiographic)
Sources of One person, the test Often collateral psychological asses ment history and cultural
Data taker only sources, such as
relatives or OUTCOME WEIGHT ITEMS
teachers, are used 1. Apply technical concepts, basic principles and topics
in addition to the of psychometrics and psychological assessment. 20%
subject of the 2. Describe the process, research methods and statistics
assessment used in test development and standardization. 20%
Qualificat Knowledge of tests Knowledge of 3. Identify the importance, benefits and limitations of
ion for and testing testing and other psychological assessment. 10% 19 4. Identify, assess and
Use procedures assessment methods evaluate the methods and tools of psychological
as well as of the assessment relative to the specific purpose and context:
specialty area school, industry and community. 20%
assessed 5. Evaluate the administration and scoring procedures of
(psychiatric intelligence and objective personality testing and other
disorders, job alternative form of tests. 15%
requirements, etc.) 6. Apply ethical considerations and standards in the
Cost Inexpensive, Very expensive, various dimensions of psychological assessment. 15%
especially when requires intensive TOTAL 100% 150
group testing is use of highly
done qualified Brief History
professionals • Ancient Roots
• Individual Differences
clinical dif erences betwe n projective tests •

Early Experimental Psychologists
Intelligence Testing
• World War I
and psychometric (objective tests) •

Personality Testing
Psychological Testing in the Philippines
Points of Psychometric test
Comparison/ Projective test ANCIENT ROOTS
• Chinese Civilization
Difference
• Greek Civilization
Definiteness Allows variation in Subjects are judged
• European Universities
of Task responses and in very much the
recall more same basis
individualized Early Beginning of Testing
response pattern Chinese Civilization
Response The subject gives It can be more - Emperor Kao-tsu
Choice vs. whatever response objectively scored - first examination for service
Constructed seems fitting and does not - T'ang dynasty
Response within the range depend on fluency - systematized the test content
allowed by the test or expressive skills - Ming dynasty
direction - open to all male-provincial national level (test
Analysis of Watches the subject Formal scoring performance)
Results at work from a plays large part in - start of the Organized Testing Movement
general direction scoring the test
INDIVIDUAL DIFFERENCES
Measured in • Charles Darwin
standard norms • Francis Galton
Emphasis on The tester is The tester accompanies EXPERIMENTAL PSYCHOLOGISTS
every numerical score • Johan Friedrich Herbart
Critical satisfied in
with a warning
Validation comparing regarding the error of
• Ernst Heinrich Weber
impression based on the measurement and • Gustav Theodor Fechner
one procedure with every prediction with an • Wilhelm Wundt
index that shows how • Edward Titchener
impression gained
likely it is to come true
from another • Guy Montrose Whipple
• Louis Leon Thurstone
INTELLIGENCE TESTING INTELLIGENCE TESTING
• Jean Esquirol • Charles Spearman
• Edouard Seguin - english psychologist known for work in statistics, as a
• James McKeen Cattell pioneer of factor analysis, and for Spearman's rank
• Lewis M. Terman correlation coefficient. He also did seminal work on
models for human intelligence, including his theory
Mentally retarded individuals that disparate cognitive test scores reflect a
(Intelligence Test) single General intelligence factor and coining the
Jean Esquirol term g factor
- distinguishing between idiots • Louis Thurstone
- hierarchy of retardation - American psychologist who was instrumental in the
Edourd Seguin development of psychometrics, the science that measures
- focus on sensory discrimination and motor control mental functions, and who developed statistical
- treatment for retarded individuals techniques for multiple-factor analysis of performance
James Mckeen Cattell on psychological tests
- stress quantification, ranking and rating • David Wechsler
- first Psychologist to teach statistical analysis of - he is best known for his intelligence tests. He was one
experimental results of the most influential advocates of the role of
- he began th Psychologival Review with Baldwin nonintellective factors in testing. He emphasized that
- helped found America association of University factors other than intellectual ability are involved in
Professors(AAUP) intelligent behavior
- Organized Psychological corp. • Raymond Cattell
- trained more graduate students in psychology - he is best known for his 16-factor personality model,
- he use the term "Mental test" James Mckeen Cattell developing the concept of fluid versus crystallized
Lewis Terman intelligence, and working with factor and multivariate
- an influential psychologist who is known for his analysis
version of the Stanford-Binet intelligence test and for • Guilford
his longitudinal study of giftedness. His work added - an American psychologist best remembered for his
important contributions to the understanding of how psychometric study of human intelligence, including the
intelligence influences life success, health, and outcomes distinction between convergent and divergent production
• Sternberg
INTELLIGENCE TESTING - Triarchic Theory of Intelligence or Three Forms of
Alfred Binet Intelligence was formulated by Robert Sternberg, a
- first truly psychological test of mental ability prominent figure in research of human intelligence. The
developer theory by itself was among the first to go against the
- he believes that assessing such cognitive function such psychometric approach to intelligence and take a more
as memory, attention ,imagination and comprehension cognitive approach, which leaves it to the category of
would provide a more appropriate measure of the cognitive-contextual theories
intelligence • Howard Gardner
- Binet's turned to test of cognitive ability when he - Theory of Multiple Intelligences - Defined “Intelligence”
found significant differences between his children and as: a. species’ characteristics
adult subjects b. individual differences
c. fit execution of assignment
Alfred Binet (SB5 IQ Classification)
176-225 : Profoundly Gifted THEORIES OF INTELLIGENCE
161-175 : Extremely Gifted I. Interactionism
145-160 : Very Gifted II. Factor-Analytic Theories of Intelligence
130-144 : Gifted III. Information Processing Approach
120-129 : Superior
110-119 : High Average INTERACTIONISM
90-109 : Average - refers to the complex concept by which heredity and
80-89 : Low Average environment are presumed to interact and influence the
development of one’s intelligence
FACTOR ANALYTIC THEORIES OF R(retrieval); S (cognitive speediness); T
INTELLIGENCE (processing/decision speed)
A. Two factor Theory of Intelligence - 1 st Strata = Components of each of the 2nd strata
- also called as g-factor theory H. McGrew-Flanagan’s CHC Model
- Charles Spearman – Pioneer of factor analysis - Integration of Cattell, Horn, and Carroll’s Theories.
- postulates the existence of a general intellectual - suggestion of 10 broad stratum abilities
ability factor, g , that is partially tapped by all other - has no provision for g but assumes its existence.
intellectual ability factor - believes that g lack’s practical use in
- s – specific components; g – used for predicting overall psycho-educational assessment.
intelligence - (Gf)Fluid, (Gc)Crystallized, (Gq)Quantitative,
- correlated abilities are presumed to have greater value (Ga)Auditory, (Gv) Visual, (Gsm)Short Term Memomy,
of g. (Slr)Long term and retrieval, (Grw)Reading and Writing,
B. Guilford Theory of Intelligence (Gs)Processing Speed, (Gt)Reaction Time
- believed that there is no single factor to point out in
intelligence INFORMATION PROCESSING APPROACH
C. Thurstone Theory of Intelligence - approach to studying cognition in a computer-like
- Verbal Meaning fashion of encoding, retention, and retrieval
- Rote Memory A. Simultaneous vs. Successive Processing
- Perceptual Speed B. PASS Model
- Word Fluency C. Triarchic Theory of Intelligence
- Reasoning • Simultaneous Processing – also called as parallel
- Spatial Relations processing
- Number Facility • Information is integrated all at one time.
• Integrated and synthesized. • Ex. Art Appreciation
D. Gardner’s Theory of Multiple
• Successive Processing - also called as sequential
Intelligence processing
E. Cattell’s Gc and Gf Intelligence • Information is individually processed in sequence
– Gf Intelligence • Logical and analytical
• Gc / Crystallized Intelligence – acquired skills and • Ex. Memorizing telephone numbers
knowledge that are dependent on exposure to a PASS MODEL
particular culture and informal and formal education • An extension of Simultaneous and Successive approach
• Gf / Fluid Intelligence – nonverbal, culture free, and • Included planning and attention as another factor of
independent of specific instruction intelligence
F. Horn’s Gv and Gq Model • Planning – Strategy development for problem solving
• Vulnerable abilities – Intelligence that decline with • Attention – receptivity to information
age; tend not to return to preinjury levels following TRIARCHIC THEORY OF INTELLIGENCE
brain damage. (Ex. Gv) • Proposed by Sternberg
• Maintained Abilities – Intelligence that tend not to • Conceptualized that intelligence has three components:
decline with age; may return to pre injury level • Metacomponents - planning, monitoring, and
following brain damage. (Ex. Gq) evaluating
Gv - Visual Processing • Performance component– Administer the instructions of
Glr – Long term and Retrieval metacomponent
Ga – Auditory Processing • Knowledge-Acquisition Component – Learning how to
Gsm – Short term Memory do as specific task.
Gq – Quantitative Processing • Successful Intelligence – Gauge the extent to which
Grw – Reading and Writing one effecting adapts, shares , shapes, and selects
Gs – Speed of processing environments in a way than conforms to both personal
G. Carroll’s Three Stratum Theory of and societal standards of success
Cognitive Abilities
• Proposed a hierarchical model – all abilities listed in WORLD WAR I
each of the stratum are subsumed/incorporated in the • Robert Yerkes
strata above it • Arthur Otis
- 3 rd Strata = g • Robert Woodworth
- 2 nd Strata = Gf (Fluid); Gc (Crystallized); Y( memory
and learning); V (broad visual perception); U (auditory);
PERSONALITY TESTING - protecting clients
• Herman Rorschach - professionals scope of competency
• Murray & Morgan - no harm by acting responsibly and avoiding
- early 1940’s exploitation
• Raymond Cattell - protecting confidentiality and privacy
• McCrae & Costa - maintaining the integrity of the profession
4. Functions and Purposes of Ethical
IN THE PHILIPPINES Codes
- Psychological Testing in the Philippines - identify values for members of the organization to
- Philippines Psych testing Movement Royal Decree of strive for as they perform their duties
December 20, 1863 - set boundaries for both appropriate and inappropriate
- Put in place a system of assessment for admission to behavior
civil service .Manuel Carreon - provide guidelines for practitioners facing difficult
- Dissertation on Philippine Studies in Mental situations encountered in the course of work
Measuement .Sinfroso Padilla performance
- Philippines Self- administering Test Of mental ability - communicate a framework for defining and monitoring
Ordonez relationship boundaries of all types
- Reported that psychological test were being used with - provide guidelines for day-to-day decision-making by
the inmates in bilibid all professionals along with the staff and volunteers in
- Reseach Evaluation and Guidance Division Bureau of the organization
Public Schools - protect integrity and reputation of the professional
- Came out with battery of test Lazo, Vasquez-de jesus and/or individual members of an organization and the
and Edralin-Tiglao organization itself
- Research on test used in clinical and industrial setting - establish high standards of ethical and professional
in Manila- Carlota conduct within the culture of the organization
- Research on trends in psychological testing in - protect health and safety of clients, while promoting
education,training and research institutions. quality of services provided to them
- VirgilioEnriquez - enhance public safety
- Panukat ng Ugali't Pagkatao
- Annadaisy Carlota
- Panukat ng Pagkatabg Pilipino
5. Limitations of Ethical Codes
ethical standards in psychological asses ment - codes can lack clarity
- a code can conflict with another code, personal values,
A. Ethics organizational practice, or local laws and regulations
1. Ethics Defined - codes are usually reactive rather than proactive
- the moral framework that guides and inspires the - a code may not be adaptable to another cultural
Professional setting
- an agreed-on set of morals, values, professional 6. Ethical Values
conduct and standards accepted by a community, group, - basic beliefs that an individual think to be true
or culture - the bases on which an individual makes a decision
- a social, religious, or civil code of behavior considered regarding good or bad, right or wrong, most important
correct, especially that of a particular group, profession, or least important
or individual - cultural, guiding social behavior
2. Professional Ethics - organizational, guiding business or other professional
- it is the core of every discipline behavior
- addresses professional conduct and ethical behavior, 7. Universal Ethical Values
issues of confidentiality, ethical principles and - Autonomy: Enhance freedom of personal identity
professional code of ethics, ethical decision- making - Obedience: Obey legal and ethically permissible
- provide a mechanism for professional accountability directives
- serve as a catalyst for improving practice - Conscientious Refusal: Disobey illegal or unethical
- safeguard our clients directives
3. All professional ethics have - Beneficence: Help others
relationships and dissimilarities, but - Gratitude: “Giving back,” or passing good along to
all focus on: others
- Competence: Be knowledgeable and skilled - integrity is based on honesty, and on truthful, open
- Justice: Be fair, distribute by merit and accurate communications
- Stewardship: Use resources judiciously - maximizing impartiality and minimizing biases
- Honesty and Candor: Tell the truth - it includes recognizing, monitoring, and managing
- Fidelity: Don’t break promises potential biases, multiple relationships, and other
- Loyalty: Don’t abandon conflicts of interest that could result in harm and
- Diligence: Work hard exploitation of persons and peoples.
- Discretion: Respect confidentiality and privacy - avoiding incomplete disclosure of information unless
- Self-improvement: Be the best that you can be complete disclosure is culturally inappropriate, or
- Non-maleficence: Don’t hurt anyone violates confidentiality, or carries the potential to do
- Restitution: Make amends to persons injured various harm to individuals, families, groups, or
- Self-interest: Protect yourself communities
8. Law and Ethics - not exploiting persons or peoples for personal,
- ;law presents minimum standards of behavior in a professional, or financial gain
professional field - complete openness and disclosure of information must
- ethics provides the ideal for use in decision-making be balanced with other ethical considerations, including
B. Common Ethical Issues and Debates the need to protect the safety or confidentiality of
1. When to break confidentiality? persons and peoples, and the need to respect cultural
2. Release of psychological reports to the public expectations
3. Golden rule in assessing and diagnosing public figures - avoiding conflicts of interest and declaring them when
4. Multiple relationships they cannot be avoided or are inappropriate to avoid
5. Acceptance of gifts 4. Professional and Scientific
6. Dehumanization responsibilities to society
7. Divided Loyalties - we shall undertake continuing education and training
8. Labelling and Self-Fulfilling Prophecy to ensure our services continue to be relevant and
applicable
- generate researches

C. Psychological Association of the


Philippines (PAP) Ethical Principles D. Roles of a Psychometrician
1. Respect for Dignity of Persons and 1. Administering and scoring of objective personality
Peoples tests; structured personality tests, excluding projective
- respect for the unique worth and inherent dignity of tests and other higher level of psychological tests;
all human beings; 2. Interpreting the results of these tests and preparing a
- respect for the diversity among persons and peoples; written report on these results; and
- respect for the customs and beliefs of cultures. 3. Conducting preparatory intake interviews of clients
2. Competent caring for the well-being for psychological intervention sessions
of persons and peoples 4. All the assessment reports prepared and done by the
- maximizing benefits, minimizing potential harm, and psychometrician, shall always bear the signature of the
offering or correcting harm supervising psychologist who shall take full
- application of knowledge and skills that are responsibility for the integrity of the report
appropriate for the nature of a situation as well as E. Ethical Standards in Psychological
social and cultural context Assessment
- adequate self-knowledge of how one's values, 1. Responsibilities of Test Publishers
experiences, culture, and social context might influence - the publisher is expected to release tests of high
one's actions and interpretations quality
- active concern for the well-being of individuals, - the publisher is expected to market product in a
families, groups, and communities; responsible manner
- taking care to do no harm to individuals, familics, - the publisher restrict distributions of test only to
groups, and communities; person with proper qualification
- developing and maintaining competence
2. Publication and Marketing Issues
3. Integrity - the most important guideline is to guard against
premature release of a test
- the test authors should strive for a balanced required in supporting the claims of test
presentation of their instruments and refrain from developers
one-sided presentation of information > examine specimen sets, disclosed tests or samples
3. Competence of Test Purchasers of questions, directions, answer sheets, manuals,
4. Responsibilities of Test Users and score reports before selecting a test
- best interest of clients > ascertain whether the test content and norm
- informed consent group(s) or comparison group(s) is appropriate for
> must be presented in a clear and understandable the intended test takers
manner to both the student & parent > select and use only those tests for which the
> reason for the test administration skills needed to administer the test and interpret
> tests and evaluations procedures to be used scores correctly are available
> how assessment scores will be used 6. Test Administration, Scoring and
> who will have access to the results Interpretation
> written informed consent must be obtained from - Basic Principles
the student’s parents, guardian or the student (if > to ensure fair testing, the tester must become
he or she has already reached ‘legal’ age) thoroughly familiar with the test. Even a simple
- human relations test usually presents one or more stumbling blocks
- avoiding harassments which can be anticipated if the tester studies the
- duty to warn manual in advance or even takes time to take the
- confidentiality test himself before administering
- expertise of test users > the tester must maintain an impartial and
- obsolete tests and the standard of care scientific attitude. The tester must be keenly
- consideration of individual differences interested with the persons they test, and desire to
5. Appropriate Assessment Tool see them do well. It is the duty of the tester to
Selection obtain from each subject the best record he can
- Criteria For Test Selection produce
> it must be relevant to the problem > establishing and maintaining rapport is
> appropriate for the patient/client necessary if the subject is to do well. That is, the
> familiar to the examiner subject must feel that he wants to cooperate with
> adaptable to the time available the tester. Poor rapport is evident by the presence
> valid and reliable of inattention during directions, giving up before
- Need For Battery Testing time is up, restlessness or finding fault with the
> no single test proves to yield a diagnosis in all test
cases, or to be in all cases correct in the diagnosis > in case of individual testing, where each question
it indicates is given orally, unintended help can be given by
> psychological maladjustment whether mild or facial expression or words of encouragement.
severe may encroach any or several of the Thereon, taking test is always concerned to know
functions tapped by the tests, leaving other how well he is doing and watches the examiner
functions absolutely or relatively unimpaired for indications of his success. The examiner must
- What Test Users Should Do? maintain a completely unrevealing expression,
> First define the purpose for testing and the while at the same time silently assuring the
population to be tested. Then, select a test for subject of his interest in what he says or do
that purpose and that population based on a > in individual testing, the tester observes the
thorough review of the available information and subject’s performance with care. He notes the time
materials to complete each task and any errors, he watches
> investigate potentially useful sources of for any unusual method of approaching the task.
information, in addition to test scores, to Observation and note taking must be done in a
corroborate the information provided by tests subtle and unobtrusive manner so as not to
> read the materials provided by test developers indirectly or directly affect the subject’s
and avoid using tests for which unclear or performance of the task
incomplete information is provided - General Procedures/Guidelines
> become familiar with how and when the test > Conditions of testing
was developed and tried out - Physical Condition. The physical condition
> read independent evaluations of a test and of where the test is given may affect the
possible alternative measures. Look for evidence
test scores. If the ventilation and lighting are - no degree of mechanical perfection of the test
poor, the subject will be handicapped. themselves can ever take the place of good
- Condition of the Person. Sate of the person judgment and psychological insight of the
affects the results, if the test is given when he examiner.
is fatigued, when his mind is concerned with > guessing
other problems, or when he is emotionally - it is against the rules for the tester to give
disturbed, results will not be a fair sample of supplementary advices; he must retreat to such
his behavior. formula as “Use your judgment.” (But the tester
- Test Condition. The testing condition can often is not to give his group an advantage by telling
be improved by spacing the tests to avoid them this trade secret.)
cumulative fatigue. Test questionnaires, answer - the person taking the test is usually wise to
sheets and other testing materials needed must guess freely. (But the tester is not to give his
always be in good condition so as not to hinder group an advantage by telling them this trade
good performance. secret.)
- Condition of the Day. Time of the day may - from the point of view of the tester, the
influence scores, but is rarely important. Alert tendency to guess is an unstandardized aspect
subjects are more likely to give their best than of the testing situation which interferes with
subjects who are tired and dispirited. Equally accurate measurement.
good results can be produced at any hour, - the systematic advantage of the guesser is
however, if the subjects want to do well. eliminated if the test manual directs
> Control of the group everyone to guess, but guessing introduces large
- group tests are given only to those reasonably chances of errors. Statistical comparison of “do
and cooperative subjects who expects to do as not guess” instruction and “do guess”
the tester requests. Group testing then, is a instruction show that with the latter,
venue for a problem in command. the test has slightly lesser predictive
- directions should be given simply, clearly and value.
singly. The subjects must have a chance to ask - the most widely accepted practice now is to
questions whenever they are necessary but the educate students that wild guessing is to their
examiner attempts to anticipate all reasonable disadvantage, but to encourage them to respond
questions by full directions. when they can make an informed judgment as
- effective control may be combined with good to the most reasonable answer even if they are
rapport if the examiner is friendly, avoid an uncertain.
antagonistic, overbearing or fault attitude. - the motivation most helpful to valid testing is
- the goal of the tester is to obtain useful a desire on the part of the subject that the
information about people; that is to elicit good score be valid. Ideally the subject becomes a
information from the results of the test. There partner in testing himself. The subject must
is no value adhering rigidly to a testing place himself on a scale, and unless he cares
schedule if the schedule will not give true about the result he cannot be measured
information. Common sense is the only safe accurately.
guide in exceptional situations. - the desirability of preparing the subject for
> directions of the subject the test by appropriate advance information is
- the most important responsibility of the test increasingly recognized. This information
administrator is giving directions. increases the person’s confidence, and reduces
- it is imperative that the tester gives the standard test anxiety that they might
directions exactly as provided in the manual. If otherwise have
the tester understands the importance of this - Scoring
responsibility, it is simple to follow the printed > hand scoring
directions, reading them word for word, adding > machine scoring
nothing and changing nothing. 7. Responsible Report Writing and
> judgments left to the examiner Communication of Test Results
- the competent examiner must possess a high - What Is a Psychological Report?
degree of judgment, intelligence, sensitivity to > an abstract of a sample of behavior of a patient
the reactions of others, and professionalism, as or a client derived from results of psychological
well as knowledge with regards to scientific tests
methods and experience in the use of > a very brief sample of one’s behavior
psychometric techniques.
- Criteria For a Good Psychological Report Thus, they note, a clinician might observe
> Individualized – written specifically for the instances of slow bodily movements and excessive
client delays in answering questions and from this
> Directly and adequately answers a referral infer that the patient is “retarded motorically.”
question With the further discovery that the patient eats
> Clear – written in a language that can be easily and sleeps poorly, cries easily, reports a
understood constant sense of futility and discouragement and
> Meaningful – perceived by the reader as clear shows characteristic test behaviors, the
and is understood by the reader generalization is now broadened as “depressed.”
> Synthesized – details are formed into broader - Hypothetical constructs - Assumption of an
concepts about the specific person inner state which goes logically beyond
> Delivered on time description of visible behavior. Such constructs
- Principles Of Value In Writing Individualized imply causal conditions, related personality
Psychological Report traits and behaviors and allow prediction
> avoid mentioning general characteristics, which of future events. It is the movement from
could describe almost anyone, unless the description to construction which is the sense of
particular importance in the given case is made clinical interpretation
clear > Level III
> describe the particular attributes of the - the effort is to develop a coherent and
individual fully, using as distinctive terms as inclusive theory of the individual life or a
possible “working image” of the patient. In terms of a
> simple listing of characteristics is not helpful; general theoretical orientation, the clinician
tell how they are related and organized in the attempts a full-scale exploration of the
personality individual’s personality, psychosocial situation,
> information should be organized and developmental history
developmentally with respect to the time line of - Sources Of Error In Psychological Interpretation
the individual life > Information Overload
> many of the problems of poor reports, such as - too much material, making the clinician
vague generalizations, overqualification, clinging overwhelmed
to the immediate data, stating the obvious and - studies have been shown that clinical judges
describing stereotypes are understandable but typically use less information than is available
undesirable reactions to uncertainty to them
> validate statements with actual behavioral - the need is to gather optimal, rather than
responses maximal, amount of information of a sort
> avoid, if possible, the use of qualities such as “It digestible by the particular clinician
appears”, “tends to”, etc. for these convey the - obviously, familiarity with the tests involved,
psychologist’s uncertainties or indecisions type of patient, referral questions and the like
> avoid using technical terms. Present them using figure in deciding how much of what kind of
layman’s language material is collected and how extensible it can
- Levels Of Psychological Interpretation be interpreted
> Level I > Schematization
- there is minimal amount of any sort of - all humans have a limited capacity to process
interpretation information and to form concepts
- there is a minimal concern with intervening - consequently, the resulting picture is of the
processes individual is schematized and simplified,
- data are primarily treated in a sampling or perhaps catering to one or a few salient and
correlate way dramatic and often, pathological,
- there is no concern with underlying characteristics
constructs - the resulting interpretations are too organized
- found in large-scale selection testing and consistent and the person emerges as a
- for psychometric approaches two-dimensional creature
> Level II - the clinical interpreter has to be able to
- Descriptive generalizations - From the tolerate complexity and deal at one time with
particular behaviors observed, we generalize to more data than he can comfortably handle
more inclusive, although still largely > Insufficient internal evidence for interpretation
behavioral and descriptive categories.
- ideally, interpretations should emerge as revealed in contradictory interpretations made
evidence converges from many sources, such as side by side
different responses and scores of the same tests, - on the face of it, someone cannot be called
responses of different tests, self-report, both domineering and submissive
observation, etc. > Overpathologizing
- particularly for interpretations at higher - always highlights the negative not the
levels, supportive evidence is required positive aspect of behavior
- results from lack of tests, lack of responses - emphasizes the weakness rather than the
- information between you and the client strengths of a person
> Insufficient external verification of - a balance between the positive and negative
interpretation must be the goal
- too often clinicians interpret assessment - sandwich method (positive-negative-positive)
material and report on the patients without is a recommended approach
further checking on the accuracy of their > Over-“psychologizing”
statements - giving of interpretation when there is none
- information between you and the relevant (e.g., scratching of hands – anxious, itchy)
others - avoid generalized interpretations of overt
- verify statements made by patients behaviors
> Overinterpretation - must probe into the meaning/motivations
- “Wild analysis” behind observed behaviors
- temptation to over-interpret assessment - Essential Parts Of a Psychological Report
material in pursuit of a dramatic or > Industrial setting
encompassing formulation - identifying Information
- deep interpretations, seeking for unconscious - test administered
motives and nuclear conflicts or those which - test results
attempt genetic reconstruction of the - skills and abilities
personality are always to be made cautiously - personality profile
and only on the basis of convincing evidence - summary/recommendation
- interpreting symbols in terms of fixed > Clinical setting
meanings is a cheap and usually inaccurate - personal information
attempt at psychoanalytic interpretation - referral question
- at all times, the skillful clinician should be - test administered
able to indicate the relationship between the - behavioral observation (Test and Interview)
interrupted hypothetical variable and its - test results and interpretation
referents to overt behavior - summary formulation
> Lack of Individualization - diagnostic impression
- it is perfectly possible to make correct - recommendation
statements which are entirely worthless F. Rights of Test Takers
because they could as well apply to anyone 1. Be treated with courtesy, respect, and impartiality,
under most conditions regardless of your age, disability, ethnicity, gender,
- “Aunt Fanny syndrome”/”PT Barnum Effect” national origin, religion, sexual orientation or other
- what makes the person unique (e.g., both personal characteristics
patients are anxious – how does one patient 2. Be tested with measures that meet professional
manifest his anxiety) standards and that are appropriate, given the manner in
> Lack of Integration which the test results will be used
- human personality is organized and 3. Receive information regarding their test results
integrated usually in hierarchical system 4. Least stigmatizing label
- it is of central importance to understand 5. Informed Consent
which facets of the personality are most 6. Privacy and Confidentiality
central and which are peripheral, which needs
to sub serve others and how defensive, coping
and ego functions are organized, if
understanding of the personality is to be
--- results profile
---
achieved
- over-cautiousness, insufficient knowledge or a --- psychological evaluation ---
lack of a theoretical framework are sometimes
chapter iv: statistics refresher major disadvantage of this technique is that only
ordinal data can be generated.
A. Scales of Measurement Example: Rank the following brands of cold
drinks you like most and assign it a number 1.
1. Primary Scales of Measurement
Then find the second most preferred brand and
a. Nominal: a non-parametric measure that is also called
assign it a number 2. Continue this procedure until
categorical variable, simple classification. We do not
you have ranked all the brands of cold drinks in
need to coto distinguish one item from another.
order of preference. Also remember that no two
Example: Sex (Male and Female); Nationality
brands should receive the same rank order.
(Filipino, Japanese, Korean); Color (Blue, Red and
Yellow) Brand Rank
b. Ordinal: a non-parametric scale wherein cases are Coke 1
ranked or ordered; they represent position in a group Pepsi 3
where the order matters but not the difference between
Sprite 2
the values.
Example: 1st, 2nd, 3rd, 4th and 5th; Pain Limca 4
threshold in a scale of 1 – 10, 10 being the highest c. Constant Sum: respondents are asked to allocate a
c. Interval: a parametric scale wherein this scale use constant sum of units such as points, rupees or chips
intervals equal in amount measurement where the among a set of stimulus objects with respect to some
difference between two values is meaningful. Moreover, criterion. For example, you may wish to determine how
the values have fixed unit and magnitude. important the attributes of price, fragrance, packaging,
Example: Speed of a car (70KpH); Temperature cleaning power and lather of a detergent are to
(Fahrenheit and Celsius only) consumers. Respondents might be asked to divide a
d. Ratio: a parametric scale wherein this scale is similar constant sum to indicate the relative importance of the
to interval but include a true zero point and relative attributes. The advantage of this technique is saving
proportions on the scale make sense. time. However, the main disadvantages of are the
Example: Height and Weight respondent may allocate more or fewer points than
those specified. The second problem is respondents might
2. Comparative Scales of Measurement be confused.
a. Paired Comparison: a comparative technique in which Example: Between attributes of detergent, please
a respondent is presented with two objects at a time and allocate 100 points among the attributes so that
asked to select one object according to some criterion. your allocation reflects the relative importance
The data obtained are in ordinal nature. you attach to each attribute. The more points an
Example: Pairing the different brands of cold attribute receives, the more important the
drink with one another please put a check mark in attribute is. If an attribute is not at all important,
the box corresponding to your preference assign it zero points. If an attribute is twice as
important as some other attribute, it should
Brand Coke Pepsi Sprite Limca receive twice as many points.
Coke Attribute Number of Points
Pepsi ✓ ✓
Price 50
Sprite ✓
Fragrance 05
Limca ✓ ✓ ✓
No. of 3 1 2 0 Packaging 10
Times
Cleaning Power 30
Prefer
red Lather 05
b. Rank Order: respondents are presented with several Total Points 100
items simultaneously and asked to rank them in order of
priority. This is an ordinal scale that describes the d. Q-Sort Technique: This is a comparative scale that uses
favoured and unfavoured objects, but does not reveal the a rank order procedure to sort objects based on
distance between the objects. The resultant data in rank similarity with respect to some criterion. The important
order is ordinal data. This yields a better result when characteristic of this methodology is that it is more
comparisons are required between the given objects. The important to make comparisons among different
responses of a respondent than the responses between
different respondents. Therefore, it is a comparative have semantic meaning. It can be used to find whether a
method of scaling rather than an absolute rating scale. respondent has a positive or negative attitude towards
In this method the respondent is given statements in a an object. It has been widely used in comparing brands
large number for describing the characteristics of a and company images. It has also been used to develop
product or a large number of brands of products. advertising and promotion strategies and in a new
Example: The bag given to you contain pictures of product development study.
90 magazines. Please choose 10 magazines you Example: Please indicate you attitude towards
prefer most, 20 magazines you like, 30 magazines work using the scale below:
which you are neutral (neither like nor dislike), Attitude towards work
20 magazines you dislike and 10 magazines you Boring : Interesting : Unnecessary : Necessary
prefer least e. Staple Scale: The staple scale was originally
Prefer Most Like Neutral Dislike Prefer Least developed to measure the direction and intensity of an
attitude simultaneously. Modern versions of the staple
(10) (20) (30) (20) (10) scale place a single adjective as a substitute for the
semantic differential when it is difficult to create pairs
3. Non-Comparative Scales of of bipolar adjectives. The modified staple scale places a
Measurement single adjective in the center of an even number of
a. Continuous Rating Scales: the respondent’s rate the numerical values.
objects by placing a mark at the appropriate position on Example: Select a plus number for words that you
a continuous line that runs from on extreme of the think describe personnel banking of a bank
criterion variable to the other. accurately. The more accurately you think the
Example: How would you rate the TV word describes the bank, the larger the plus
advertisement as a guide for buying number you should choose. Select a minus number
Strongly Agree 10 9 8 7 6 5 4 3 2 1 Strongly Disagree for words you think do not describe the bank
b. Itemized Rating Scale: itemized rating scale is a scale accurately. The less accurate you think the word
having numbers or brief descriptions associated with describes the bank, the larger the minus number
each category. The categories are ordered in terms of you should choose.
scale position and the respondents are required to select +3 +3
one of the limited numbers of categories that best +2 +2
describes the product, brand, company or product +1 +1
attribute being rated. Itemized rating scales are widely Friendly Personnel Competitive Loan Rate
used in marketing research. This can take the graphic, -1 -1
verbal or numerical form. -2 -2
c. Likert Scale: the respondents indicate their own -3 -3
attitudes by checking how strongly they agree or Descriptive Statistics
disagree with carefully worded statements that range Frequency Distributions – distribution of scores by frequency
from very positive to very negative towards the with which they occur
attitudinal object. Respondents generally choose from Measures of Central Tendency – a statistic that indicates the
average or midmost score between the extreme scores in a
five alternatives (say strongy agree, agree, neither agree
distribution
nor disagree, disagree, strongly disagree). A likert scale �
a. Mean – formula: = ΣX (for ungrouped distribution)
may include a number of items or statements. �

Disadvantage of Likert scale is that it takes longer time �
= Σ(X) (for grouped distribution)
to complete that other itemized rating scales because b. Median – the middle score in a distribution
respondents have to read each statement. Despite the c. Mode – frequently occurring score in a distribution
above disadvantages, this scale has several to ***Appropriate use of central tendency measure according to
type of data being used:
advantages. It is easy to construct, administer and use.
Example: I believe that ecological questions are Types of Data Measure
the most important issues facing human beings Nominal Data Mode
today. Ordinal Data Median
1 2 3 4 5 Interval / Ratio Mean
Data (Normal)
Strongly Disagree Neutral Agree Strongly
Disagree Agree Interval / Ratio Median
Data (Skewed)
d. Semantic Differential Scale: This is a seven-point
rating scale with end points associated with bipolar
labels (such as good and bad, complex and simple) that
3. Measures of Variability
– a statistic that describe the amount of variation in a
distribution
a. Range – the difference between the highest and the
lowest scores
b. Interquartile range – the difference between Q1 and
Q3
c. Semi-Interquartile range – interquartile range divided
by 2
d. Standard Deviation – the square root of the averaged
squared deviations about the mean 3. “z” Scores – Mean of 0, SD of 1 (Formula: � − �) SD
T scores – Mean of 50, SD of 10 (Formula: z-score X 10+50)
4. Measures of Location Stanines – Mean of 5, SD of 2 (Formula: z-score X 2+5)
a. Percentiles – an expression of the percentage of b. Negative skew
people whose score on a test or measure falls below a - relatively few scores fall at the negative end
particular raw score - reflects a very easy type of test
������ �� �������� ������
Formula for Percentile= × 4. Sten – Mean of 5.5, SD of 2 (Formula: z-score X 2+5.5)
����� ������ �� ��������
5. IQ scores – Mean of 100, SD of 15
100
6. A scores – Mean of 500, SD of 100
b. Quartiles – one of the three dividing points between
the four quarters of a distribution, each typically
D. Inferential Statistics
1. Parametric vs. Non-Parametric Tests
labelled Q1, Q2 and Q3
c. Deciles – divided to 10 parts Parametric Test Non-Parametric Test
Requirements Normal Distribution Normal Distribution is not
5. Skewness required
- a measure of the asymmetry of the probability Homogenous Variance
distribution of a real-valued random variable about its Homogenous Variance is not
Interval or Ratio Data required

Nominal or Ordinal Data


Common Pearson’s Correlation Spearman’s Correlation
Statistical
6. Kurtosis Tools Independent Measures
- the sharpness of the peak of a frequency-distribution t-test
curve Mann-Whitney U test
One-way, independent- Kruskal-Wallis H test
measures ANOVA
Paired t-test Wilcoxon Signed-Rank test
One-way, repeated-meas Friedman’s test
ures ANOVA

2. Measures of Correlation
1. T scores – Mean of 50, SD of 10 (Formula: z-score X 10 + a. Pearson’s Product Moment Correlation – parametric
50) test for interval data
2. Stanines – Mean of 5, SD of 2 (Formula: z-score X 2 + b. Spearman Rho’s Correlation – non-parametric test for
5) ordinal data
c. The Normal Curve and Standard Scores c. Kendall’s Coefficient of Concordance – non-parametric
test for ordinal data
d. Phi Coefficient – non-parametric test for dichotomous
nominal data
e. Lambda – non-parametric test for 2 groups
(dependent and independent variable) of nominal data
***Correlation Ranges:
1.00 : Perfect relationship
0.75 – 0.99 : Very strong relationship
0.50 – 0.74 : Strong relationship
0.25 – 0.49 : Weak relationship chapter v: psychometric properties of a good test
0.01 – 0.24 : Very weak relationship
0.00 : No relationship Reliability
- the stability or consistency of the measurement
3. Measures of Prediction
1. Goals or Reliabillity
a. Biserial Correlation – predictive test for artificially
a. estimate errors in psychological measurement
dichotomized and categorical data as criterion with
b. devise techniques to improve testing so errors are
continuous data as predictor
rerduced
b. Point-Biserial Correlation – predictive test for
genuinely dichotomized and categorical data as criterion 2. Sources of Measurement Error
with continuous data as predictors Source of Error Type of Test Prone to Appropriate Measures
c. Tetrachoric Correlation – predictive test for Each Error Source Used to Estimate Error
dichotomous data with categorical data as criterion and Appropriate Measures Tests score with a Scorer reliabilty
categorical data as predictors Used to Estimate Error degree of subjectivity
d. Simple Linear Regression – a predictive test which Time Sampling Error Tests of relatively Test-Retest Reliabillity
involves one criterion that is continuous in nature with stable traits or (rtt), a.k.a. Stability
only one predictor that is continuous behavior Coefficient
e. Multiple Linear Regression – a predictive test which Content Sampling Error Tests for which Alternate-form
involves one criterion that is continuous in nature with consistency of results, reliability (a.k.a.
more than one continuous predictor as a whole, is required coefficient of
f. Ordinal Regression – a predictive test which involves a equivalence) or
criterion that is ordinal in nature with more than one split-half reliability
predictors that are continuous in (a.k.a. coefficient of
internal consistency)
4. Chi-Square Test Inter-Item Inconsistency Tests that require Split-half reliability or
a. Goodness of Fit – used to measure differences and inter-item consistency more stringent internal
involves nominal data and only one variable with 2 or consistency measures,
more categories such as KR-20 or
b. Test of Independence – used to measure correlation Cronbach Alpha
and involves nominal data and two variables with two Inter-item Inconsistency Tests that require Internal consistency
or more categories and Content inter-item consistency measures and additional
Heterogeneity combined and homogeneity evidence of homogeneity
5. Comparison of Two Groups Time and Content Tests that require stability Delayed alternate-form
a. Paired t-test – a parametric test for paired groups Sampling error and consistency of result, reliability
combined as a whole
with normal distribution
b. Unpaired t-test – a parametric test for unpaired 3. Types of Reliability
groups with normal distribution A. Test-Retest Reliability
c. Wilcoxon Signed-Rank Test – a non-parametric test - compare the scores of individuals who have been
for paired groups with non-normal distribution measured twice by the instrument
d. Mann-Whitney U test – a non-parametric test for - this is not applicable for tests involving reasoning and
unpaired groups with non-normal distribution ingenuity
- longer interval will result to lower correlation
6. Comparison of Three or More Groups coefficient while shorter interval will result to higher
a. Repeated measures ANOVA – a parametric test for correlation
matched groups with normal distribution - the ideal time interval for test-retest reliability is 2-4
b. One-way/Two-Way ANOVA – a parametric test for weeks
unmatched groups with normal distribution - source of error variance is time sampling
c. Friedman F test – a non-parametric test for matched - utilizes Pearson r or Spearman rho
groups with non-normal distribution B. Parallel-Forms/Alternate Forms Reliability
d. Kruskal-Wallis H test – a non-parametric test for - same persons are tested with one form on the first
unmatched groups with non-normal distribution occasion and with another equivalent form on the
second
7. Factor Analysis - the administration of the second, equivalent form
either takes place immediately or fairly soon.
- the two forms should be truly paralleled, - the higher the reliability of the test, the lower the
independently constructed tests designed to meet the SEM
same specifications, contain the same > Error – long standing assumption that factors other
- number of items, have items which are expressed in the than what a test attempts to measure will influence
same form, have items that cover the same type of performance on the test
content, have items with the same range of difficulty, > Trait Error – are those sources of errors that reside
and have the same instructions, time limits, illustrative within an individual taking the test (such as, I didn’t
examples, format and all other aspects of the test study enough, I felt bad that missed blind date, I forgot
- has the most universal applicability to set the alarm, excuses)
- for immediate alternate forms, the source of error > Method Error – are those sources of errors that reside
variance is content sampling in the testing situation (such as lousy test instructions,
- for delayed alternate forms, the source of error too-warm room, or missing pages).
variance is time sampling and content sampling > Confidence Interval – a range or band of test scores
- utilizes Pearson r or Spearman rho that is likely to contain the true score
C. Split-Half Reliability > Standard error of the difference – a statistical measure
- two scores are obtained for each person by dividing the that can aid a test user in determining how large a
test into equivalent halves (odd-even split or top-bottom difference should be before it is considered statistically
split) significant
- the reliability of the test is directly related to the 6. Factors Affecting Test Reliability
length of the test a. Test Format
- the source of error variance is content sampling b. Test Difficulty
- utilizes the Spearman-Brown Formula c. Test Objectivity
D. Other Measures of Internal Consistency/Inter-Item d. Test Administration
Reliability – source of error variance is content sampling e. Test Scoring
and content heterogeneity f. Test Economy
> KR-20 g. Test Adequacy
– for dichotomous items with varying level of difficulty 7. What to do about low reliability?
> KR-21 - increase the number of items
– for dichotomous items with uniform level of difficulty - use factor analysis and item analysis
> Cronbach Alpha/Coefficient Alpha - use the correction of attenuation formula – a formula
– for non-dichotomous items (likert or other multiple that is being used to determine the exact correlation
choice) between two variables if the test is deemed affected by
> Average Proportional Distance error
– focuses on the degree of difference that exists between
item scores Validity
E. Inter-Rater/Inter-Observer Reliability - a judgment or estimate of how well a test measures
- degree of agreement between raters on a measure what it purports to measure in a particular test
- source of error variance is inter-scorer differences
1. Types of Validity
- often utilizes Cohen’s Kappa statistic
a. Face Validity
4. Reliability Ranges - the least stringent type of validity, whether a test
1 : perfect reliability (may indicate redundancy and looks valid to test users, examiners and examinees
homogeneity) Examples: ✓ An IQ test containing items which
≥ 0.9 : excellent reliability (minimum acceptability for measure memory, mathematical ability,
tests used for clinical diagnoses) verbal reasoning and abstract reasoning has a
≥ 0.8 < 0.9 : good reliability, good face validity.
≥ 0.7 < 0.8 : acceptable reliability (minimum ✓ An IQ test containing items which measure
acceptability for psychometric tests), depression and anxiety has a bad face validity.
≥ 0.6 < 0.7 : questionable reliability (but is still ✓ Inkblot test have low face validity because
acceptable for research purposes), test takers question whether the test really
≥ 0.5 < 0.6 : poor reliability, measures personality
< 0.5 : unacceptable reliability, ✓ A self-esteem rating scale which has items
0 : no reliability. like “I know I can do what other people can do.”
5. Standard Error of Measurement and “I usually feel that I would fail on a task.”
- an index of the amount of inconsistency or the amount has a good face validity.
of expected error in an individual’s score
b. Content Validity d. Construct Validity
Definitions and concepts What is a construct?
✓ whether the test covers the behavior domain to be ✓ An informed scientific idea developed or
measured which is built through the choice of hypothesized to describe or explain a behavior;
appropriate content areas, questions, tasks and items something built by mental synthesis.
✓ It is concerned with the extent to which the
✓ Unobservable, presupposed traits; something
test is representative of a defined body of content
that the researcher thought to have either high or
consisting of topics and processes.
✓ Content validation is not done by statistical analysis low correlation with other variables
but by the inspection of items. A panel of experts can Construct Validity defined
review the test items andrate them in terms of how ✓ A test designed to measure a construct must
closely they match the objective or domain estimate the existence of an inferred, underlying
specification. characteristic based on a limited sample of
✓ This considers the adequacy of representation of the behavior
conceptual domain the test is designed to cover. ✓ Established through a series of activities in
✓ If the test items adequately represent the domain
which a researcher simultaneously defines some
of possible items for a variable, then the test has
construct and develops instrumentation to
adequate content validity.
✓ Determination of content validity is often made by measure it.
expert judgment. ✓ A judgment about the appropriateness of
c. Criterion-Related Validity inferences drawn from test scores regarding
What is a criterion? individual standings on a variable called
✓ standard against which a test or a test score is construct.
evaluated. ✓ Required when no criterion or universe of
✓ A criterion can be a test score, psychiatric content is accepted as entirely adequate to define
diagnosis, training cost, index of absenteeism, the quality being measured.
amount of time. ✓ Assembling evidence about what a test means.
✓ Characteristics of a criterion: ✓ Series of statistical analysis that one variable
• Relevant is a separate variable.
• Valid and Reliable ✓ A test has a good construct validity if there is
• Uncontaminated: Criterion contamination an existing psychological theory which can
occurs if the criterion based on predictor support what the test items are measuring.
measures; the criterion used is a criterion of ✓ Establishing construct validity involves both
what is supposed to be the criterion logical analysis and empirical data (Example: In
Criterion-Related Validity Defined: measuring aggression, you have to check all past
✓ indicates the test effectiveness in estimating an research and theories to see how the researchers
individual’s behavior in a particular situation measure that variable/construct)
✓ Tells how well a test corresponds with a ✓ Construct validity is like proving a theory
particular criterion. through evidences and statistical analysis.
✓ A judgment of how adequately a test score can Evidences of Construct Validity
be used to infer an individual’s most probable ✓ Test is homogenous, measuring a single
standing on some measure of interest construct.
Types of Criterion-Related Validity: • Subtest scores are correlated to the total
✓ Concurrent Validity – the extent to which test test score.
scores may be used to estimate an Individual’s • Coefficient alpha may be used as
present standing on a criterion homogeneity evidence.
✓ Predictive – the scores on a test can predict • Spearman Rho can be used to correlate an
future behavior or scores on another test taken in item to another item.
the future • Pearson or point biserial can be used to
✓ Incremental Validity – this type of validity is correlate an item to the total test score.
related to predictive validity wherein it is defined (item-total correlation)
as the degree to which an additional predictor ✓ Test score increases or decreases as a function
explains something about the criterion measure of age, passage of time, or experimental
that is not explained by predictors already in use manipulation.
• Some variable/construct are expected to
change with age.
✓ Pretest, posttest differences • Leniency Error/Generosity Error
• Difference of scores from pretest and – a rating error that occurs as a result of a rater’s
posttest of a defined construct after careful tendency to be too forgiving and insufficiently
manipulation would provide validity critical
✓ Test scores differ from groups. • Central Tendency Error
• Also called a method of contrasted group – a type of rating error wherein the rater exhibits
• T-test can be used to test the difference of a general reluctance to issue ratings at either a
groups. positive or negative extreme and so all or most
✓ Test scores correlate with scores on other test in ratings cluster in the middle of the rating
accordance to what is predicted. continuum
• Discriminant Validation ✓ Proximity Error
> Convergent Validity – rating error committed due to proximity/similarity of
– a test correlates highly with other the traits being rated
variables with which it should correlate ✓ Primacy Effect
(example: Extraversion which is highly – “first impression” affects the rating
correlated sociability) ✓ Contrast Effect
> Divergent Validity – the prior subject of assessment affects the latter
– a test does not correlate significantly subject of assessment
with variables from which it should differ ✓ Recency Effect
(example: Optimism which is negatively – tendency to rate a person based from recent
correlated with Pessimism) recollections about that person
• Factor Analysis ✓ Halo Effect
– a retained statistical technique for analyzing – a type of rating error wherein the rater views the
the interrelationships of behavior data object of the rating with extreme favour and tends to
> Principal Components Analysis bestow ratings inflated in a positive direction
– a method of data reduction ✓ Impression Management
> Common Factor Analysis ✓ Acquiescence
– items do not make a factor, the factor ✓ Non-acquiescence
should predict scores on the item and is ✓ Faking-Good
classified into two (Exploratory Factor ✓ Faking-Bad
Analysis for summarizing data and 3. Test Fairness
Confirmatory Factor Analysis for – this is the extent to which a test is used in an
generalization of factors) impartial, just and equitable way
• Cross-Validation 4. Factors Influencing Test Validity
- revalidation of the test to a criterion based on a. Appropriateness of the test
another group different from the original group b. Directions/Instructions
from which the test was validated c. Reading Comprehension Level
> Validity Shrinkage d. Item Difficulty
– decrease in validity after cross validation. c. Norms
> Co-validation – designed as reference for evaluating or
– validation of more than one test from the interpreting individual test scores
same group 1. Basic Concepts
> Co-norming a. Norm
– norming more than one test from the - behavior that is usual or typical for
same group members of a group
2. Test Bias b. Norms
- this is a factor inherent in a test that systematically - reference scores against which an
prevents accurate, impartial measurement individual’s scores are compared
✓ Rating Error c. Norming
– a judgment resulting from the intentional or - process of establishing test norms
unintentional misuse of rating scales d. Norman
• Severity Error/Strictness Error - test developer who will use the norms
– less than accurate rating or error in evaluation 2. Establishing Norms
due to the rater’s tendency to be overly critical a. Target Population
b. Normative Sample
c. Norm Group responses of test takers. Scoring procedures should be
- Size audited as necessary to ensure consistency and accuracy
- Geographical Location of application
- Socioeconomic Level d. Interpretation
3. Types of Norms – There should be common interpretations among similar
a. Developmental Norms results. Many factors can impact the valid and useful
– Mental Age interpretations of test scores. These can be grouped into
* Basal Age several categories including psychometric, test taker,
* Ceiling Age and contextual, as well as others
* Partial Credits a. Psychometric Factors:
– Intelligence Quotient - factors such as the reliability, norms, standard error
– Grade Equivalent Norms of measurement, and validity of the instrument are
– Ordinal Scales important when interpreting test results. Responsible
test use considers these basic concepts and how each
impacts the scores and hence the interpretation of the
test development test results
b. Test Taker Factors:
A. Standardization - factors such as the test taker’s group membership and
how that membership may impact the results of the test
1. When to decide to standardize a test?
a. No test exists for a particular purpose is a critical factor in the interpretation of test results.
b. The existing tests for a certain purpose are not Specifically, the test user should evaluate how the test
adequate for one reason or the another. taker’s gender, age, ethnicity, race, socioeconomic status,
marital status, and so forth, impact on the individual’s
2. Basic Premises of standardization
- the independent variable is the individual being tested results
- the dependent variable is his behavior
c. Contextual Factors:
- behavior = person x situation - the relationship of the test to the instructional
program, opportunity to learn, quality of the
- in psychological testing, we make sure that it is the
person factor that will ‘stand out’ and the situation educational program, work and home environment, and
factor is controlled other factors that would assist in understanding the
test results are useful in interpreting test results. For
- control of extraneous variables = standardization
example, if the test does not align to curriculum
3. What should be standardized?
standards and how those standards are taught in the
a. Test Conditions
classroom, the test results may not provide useful
– there should be uniformity in the testing conditions
information
– physical condition
– motivational condition 4. Tasks of test developers to ensure
b. Test Administration Procedure uniformity of procedures in test
– there should be uniformity in the instructions and administration:
administration proper. Test administration includes – prepare a test manual containing the ff:
carefully following standard procedures so that the test i. Materials needed (test booklets & answer
is used in the manner specified by the test developers. sheets)
The test administrator should ensure that test takers ii. Time limits
work within conditions that maximize opportunity for iii. Oral instructions
optimum performance. As appropriate, test takers, iv. Demonstrations/examples
parents, and organizations should be involved in the v. Ways of handling querries of examinees
various aspects of the testing process 5. Tasks of examiners/test
– Sensitivity to Disabilities: try to help the disable users/psychometricians
subject overcome his disadvantage, such as increasing – ensure that test user qualifications are strictly met
voice volume or refer to other available tests (training in selection, administration, scoring and
– Desirable Procedures of Group Testing: Be care for time, interpretation of tests as well as the required license)
clarity, physical condition (illumination, temperature, – advance preparations
humidity, writing surface and noise), and guess i. Familiarity with the test/s
c. Scoring ii. Familiarity with the testing procedure
– there should be a consistent mechanism and iii. Familiarity with the instructions
procedure in scoring. Accurate measurement iv. Preparation of test materials
necessitates adequate procedures for scoring the v. Orient proctors (for group testing)
6. Standardization sample the student's misunderstanding of the
– a random sample of the test takers used to evaluate learning objective
the performance of others - this may be a difficult task,
– considered a representative sample if the sample especially when constructing a true
consists of individuals that are similar to the group to statement
be tested 2. Test Construction
> Objectivity – be mindful of the following test construction
1. Time-Limit Tasks guidelines:
– every examinee gets the same amount of time for – Deal with only one central thought in
a given task each item
2. Work-Limit Tasks – Be precise
– every examinee has to perform the same amount – Be brief
of work – Avoid awkward wordings or dangling
3. Issue of Guessing constructs
> Stages in Test Development – Avoid irrelevant information
1. Test Conceptualization – Present items in a positive language
– in creating a test plan, specify the following: – Avoid double negatives
– Objective of the Test 3. Test Tryout
– Clear definition of variables/constructs to 4. Item Analysis (Factor Analysis for
be measured Typical-Performance Tests)
– Target Population/Clientele 5. Test Revision
– Test Constraints and Conditions – avoid terms like “all” and “none”
– Content Specifications (Topics, Skills, > Item Analysis
Abilities) – measures and evaluates the quality and
– Scaling Method appropriateness of test questions
✓ Comparative scaling – how well the items could measure ability/trait
✓ Non-comparative scaling 1. Classical Test Theory
– Test Format – analyses are the easiest and the most widely
✓ Stimulus (Interrogative, used form of analyses
Declarative, Blanks, etc.) – often called the “true-score model” which
✓ Mechanism of Response (Structured involves the true core formula:
vs. Free)
✓ Multiple Choice
- more answer options (4-5) reduce the
chance of guessing that an item is – assumes that a person’s test score is comprised of
correct their “true score” plus some measurement error
- many items can aid in student (X = T + e)
comparison and reduce ambiguity, – employs the following statistics
increase reliability a. Item difficulty
- easy to score – the proportion of examinees who got the item
- measures narrow facets of correctly
performance – the higher the item mean, the easier the item is
- reading time increased with more for the group; the lower the item mean, the more
options difficult the item is for the group
- transparent clues (e.g., verb tenses – Formula:
or letter uses “a” or “an”) may
encourage guessing
- difficult to write four or five
reasonable choices
- takes more time to write questions
- test takers can get some correct
answers by guessing – 0.00-0.20 : Very Difficult : Unacceptable
✓ True or False – 0.21-0.40 : Difficult : Acceptable
- ideally a true/false question should – 0.41-0.60 : Moderate : Highly Acceptable
be constructed so that an incorrect – 0.61-0.80 : Easy : Acceptable
response indicates something about – 0.81-1.00 : Very Easy : Unacceptable
b. Item discrimination
– measure of how well an item is able to
distinguish between examinees who are
knowledgeable and not
– how well is each item related to the trait
– the discrimination index range is between
-1.00 to +1.00
– the closer the index to +1, the more effectively
the item distinguishes between the two groups of
examinees
– the acceptable index is 0.30 and above
– Formula: =

– 0.40-above : Very Good Item : Highly Acceptable


– 0.30-0.39 : Good Item : Acceptable
– 0.20-0.29 : Reasonably Good Item : For Revision
– 0.10-0.19 : Difficult Item : Unacceptable
– Below 0.19 : Very Difficult Item : Unacceptable
c. Item reliability index
- the higher the index, the greater the test’s
internal consistency
d. Item validity index
- the higher the index, the greater the test’s
criterion-related validity
e. Distracter Analysis
– all of the incorrect options, or distractors,
should be equally distracting
– preferably, each distracter should be equally
selected by a greater proportion of the lower
scorers than of the top group
f. Overall Evaluation of Test Items
DIFFICULTY LEVEL DISCRIMINATE ITEM
POWER EVALUATION
Acceptable Highly Acceptable Very Good
Item
Highly Acceptable Acceptable Good Item
/ Acceptable
Highly Acceptable Unacceptable Revise the
/ Acceptable Item
Unacceptable Highly Acceptable Discard the
/ Acceptable Item
Unacceptable Unacceptable Discard the
Item
2. Item-Response Theory (Latent Trait Theory)
– sometimes referred to as “modern psychometrics”
– latent trait models aim to look beyond that at
the underlying traits which are producing the test
performance

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy